Making sense out of optimizer trace files

Sometimes the optimizer makes really odd decisions. We can use optimizer (“event 10053”) tracing to obtain clues as to why such decisions were taken, but it’s not simple. First, such trace files tend to be big. Second, they don’t always contain the necessary information. There is a simple trick that can be useful in such cases.

First, I use “optimizer_features_enable” hint to try and find Oracle version where the logic changes. Of course, I don’t need to gather optimizer trace files for that purpose: it suffices to look at explain plan like this:

explain plan for
select /*+ optimizer_features_enable('x.x.x.x') */ ...

select * from table(dbms_xplan.display)

changing the version one by one.

If that odd behavior is present in all versions, we’re out of luck and the method won’t work. But if it the behaviour changes, then I take two optimizer trace files for the versions immediately before and after the change, and run a diff (e.g. using WinMerge or a similar tool) for the sections that list parameter values:

Then I go through the list of parameters that have different values, and plug them into the query text with opt_param hint one by one to see if any of them controls the behavior in question:

explain plan for
select /*+ opt_param('_optimizer_transitivity_retain', 'false') */ ...

It doesn’t work 100% of the time, but in general it has a decent shot at working because weird optimizer behavior is often coming from some advanced features introduced into it at some point, and when such features are added, they normally come with a parameter (as a rule, a hidden one) that allows to switch it off.

But if even that doesn’t work, there something else you can do: look further down in the optimizer trace file, at the list of bugfixes, and go through them one by one, searching in MOS if bug symptoms are similar to your case. But since the list of bugfixes is quite long, I am too lazy to ever do that myself.

Hope that helps someone.