Oracle cost-based optimizer (CBO) is great, but sometimes it’s making wrong choices even when correct inputs are fed to it. In such cases, you need a tool to override CBOs choices, and one of the most popular tools is optimizer hints. The main reason they’re so popular is that they allow “quick-and-dirty” kind of fixes for performance issues (provided that query text can be altered). Other ways may be more reliable, but generally require more work, and who wants to do work that can be avoided? Unfortunately, there’s a well known downside to the hints — it’s very easy to run into problems if you only fix a part of the plan.
When this happens, hints can lead to terrible execution plans. For a long time, I’ve been looking for a good example to illustrate this problem, and finally this week I encountered a case which appears to be suitable for this purpose.
The incident was a fairly standard plan flip that occured on a Monday, the next day after the weekly stats gathering job. The query had the following structure:
SELECT /*+ ORDERED INDEX(b i$b$idversion_fk) */... FROM A, A A1, B WHERE (((A.id = A1.id) AND (A.version = A1.version)) OR ((A.id = A1.parentid) AND (A.version = A1.parentversion)) ... AND NOT EXISTS (SELECT /*+ INDEX(b1 i$b$idversion_fk) */ NULL FROM B B1 WHERE B1.id = A1.b_id and B1.latestversionflag = 1)
Can you spot the problem? Or more accurately, two problems. The bigger one is a dangerous combination of hints, in presence of a correlated subquery, which makes it even more dangerous. The optimizer would normally attempt to tranform NOT EXISTS into an antijoin, which means that the ORDERED hint can no longer unambiguously define the join order for the query, because it dictates that A1 should be accessed after A, and B after A1, but what about B1?!
So the new plan was driving the query from B1, which was bad enough. But to make things worse, the optimizer also had to take into account the INDEX hint, which told it which index to use for accessing the table, without defining the exact index access method. As a result, the driving operation here was an INDEX SKIP SCAN with TABLE ACCESS BY ROWID, and the selectivity of the skip scan was truly terrible. So essentially, the entire table was read via single-block reads, leading to two or even three orders of magnitudes degradation in terms of performance.
Another problem with the query was the OR condition in the join, which lead to a terribly inaccurate cardinality estimate — were it not for this OR operator, the optimizer might have been able to grope the right join order somehow.
If we want to make the optimizer access tables in a specific order, it would be a good idea to transform NOT EXISTS into an antijoin explicitly, eliminating the ambiguity in join order. This should significantly lower the probability of various nasty surprises.
Of course, an even better idea would be using a safer tool, like a SQL profile or an SPM baseline, to lock in the desired plan.