Join predicate pushdown

When a query contains a regular or inline view, there are 3 basic strategies for the optimizer to choose from:

1) merge the view (no “VIEW” operation in the plan)
2) instantiate the view as the whole and join it to the rest of the query (the plan shows a VIEW “operation”)
3) push join predicates inside the view (the plan shows “VIEW PUSHED PREDICATE”).

The last strategy in that list is similar to a correlated subquery in sense that the view definition gets instantiated once per row in the driving row source. Often, this strategy gives the ideal balance between the two extremes, as instantiating the view can result in many unnecessary rows being produced, while merging the view gives the optimizer too many options, making it easier for it to make an error in figuring out optimal join order. As with any query transformation, join predicate pushdown has certain limitations, which sometimes can be very obscure and counter-intuitive, as was the case with a query that I recently had to tune, which had the following structure:

select ...
from (select t1col1,
             decode(sum(decode(t1.flag1, 'Y', decode(t2.flag2, 'N', 1, 0), 0)), 0, 'N', 'Y') AS flag
	  from t1
	  group by t1col1
	  ) i,
     t2,
	 ...
	 tN
where i.t1col1 = t2.t2col1
and i.flag = 'Y'
and ... /*+ other predicates */

It was very easy to establish that the performance problem had to do with the inline view “i”, because it was responsible for most of the query’s execution time and producing over a million of rows, and only a tiny fraction of them remained after the next step. Join predicate pushdown looked like the ideal solution, however, I just couldn’t get the optimizer to do it: push_pred hint wasn’t making any difference.

So I ran a 10053 trace file on the query. Ordinarily, looking for an answer in a 10053 trace file is like looking for a needle in a haystack. However, in this particular case I was more optimistic, because I was interested in a specific query transformation, join predicate pushdown (abbreviated to JPPD as you can see in the “legend” section of the file), so searching the file on “JPPD” yielded results very quickly, and that’s what I found:

 JPPD:     JPPD bypassed: View has non-standard group by.

Now that’s interesting. So the optimizer doesn’t like something about the GROUP BY, but what exactly was it that the optimizer considered as “non-standard”? The answer can be found by looking at the internally rewritten query text (“UNPARSED QUERY IS”) where we can see that the query’s predicate i.flag = ‘Y’ has been pushed down inside the inline view (where it appeared in a HAVING clause).

I ran a couple of quick tests, and indeed, it seems that any HAVING clause, even as simple as HAVING 1=1 is viewed by the optimizer as “non-standard” and the join predicate pushdown is not even attempted. Which is kind of ironic — VIEW PREDICATE PUSHDOWN is prevented by… a predicate pushdown!

I would be interesting to find other similar cases, but there isn’t much information about it on the internet. One interesting example was posted by Mikhail Velikikh:

select x from t group by x

Somehow the optimizer doesn’t like the lack of an aggregation function here, even though it’s perfectly legitimate, and is incapable of pushing a join predicate inside such view.

Getting back to our case — what can be done to obtain the desired plan? One obvious solution would be to change the i.flag = ‘Y’ predicate to prevent its pushdown, e.g. changing it to something like

i.flag || t3.somecolumn = 'Y' || t3.somecolumn

where t3 is any table that is not joined to the inline view, and somecolumn is any column. That way the predicate cannot be evaluated inside the inline view, and therefore it cannot be pushed down, allowing JPPD transformation to take place. Of course you need to be very careful so that such rewrite wouldn’t change the meaning of the query.

In this particular case, however, there was another solution, a much better one from the point of view of SQL’s readability. The DECODE expression is simply checking for existence of rows with flag1 = ‘Y’ and flag2 = ‘N’, so if it’s rewritten as:

and exists (select null from t1 where col1 = t2.t2col1 and flag1 = 'Y' and flag2 = 'N')

makes the query both faster and much easier to understand.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s