Join cardinality

In my previous post I showed an example of how a query’s performance can be improved using the waste minimization technique. My focus was primarily on identifying and enforcing the correct plan, but I received some questions regarding the root cause of the problem: why the optimizer came up with a wrong join order? It’s a very interesting question, and it deserves a separate post so that it could be explored in detail.

Let’s take another look at the simplified query and its plan:

ON     A.A_COL10 = B.B_COL10
       AND A.A_COL11 = B.B_COL11
       AND A.A_COL1 = B.B_COL3
ON     A.A_COL1 = V.V_COL1
       AND A.A_COL2 = V.V_COL2
       AND A.A_COL3 = V.V_COL3
 AND A.A_COL4 = :B1
 AND A.A_COL12 IN ('X', 'Y')
 ((A.A_COL6 IS NULL AND A.A_COL7 = 'Z') 
 (B.B_COL2 = 'W' AND A.A_COL8 is null AND A.A_COL9 is not null)) 

| Id  | Operation                                     | Name                         | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |                                                                                                                                                       
|  16 |         NESTED LOOPS                          |                              |      1 |      1 |   408K (11)|   4260 |00:01:19.19 |      13M|                                                                                                                                                       
|* 17 |          HASH JOIN                            |                              |      1 |    221K|   180K (21)|   4256K|00:00:29.67 |     401K|                                                                                                                                                       
|* 18 |           HASH JOIN                           |                              |      1 |    964K| 25795  (17)|   1006K|00:00:03.65 |   42094 |                                                                                                                                                       
|  19 |            TABLE ACCESS FULL                  | V1                           |      1 |    587K|  7756  (22)|    588K|00:00:00.31 |   26918 |                                                                                                                                                       
|* 20 |            HASH JOIN                          |                              |      1 |    450K|  9447  (17)|    450K|00:00:01.75 |   15176 |                                                                                                                                                       
|  21 |             TABLE ACCESS FULL                 | V2                           |      1 |    450K|  1587  (27)|    450K|00:00:00.12 |    4705 |                                                                                                                                                       
|* 22 |             HASH JOIN                         |                              |      1 |    274K|  3160  (18)|    275K|00:00:00.42 |   10471 |                                                                                                                                                       
|  23 |              TABLE ACCESS FULL                | V4                           |      1 |     37 |     2   (0)|     37 |00:00:00.01 |       2 |                                                                                                                                                       
|  24 |              TABLE ACCESS FULL                | V3                           |      1 |    274K|  3055  (15)|    275K|00:00:00.14 |   10469 |                                                                                                                                                       
|* 25 |           TABLE ACCESS FULL                   | A                            |      1 |   1012K|   119K (26)|   4383K|00:00:11.32 |     359K|                                                                                                                                                       
|* 26 |          TABLE ACCESS BY INDEX ROWID          | B                            |   4256K|      1 |     1   (0)|   4260 |00:00:47.34 |      12M|                                                                                                                                                       
|* 27 |           INDEX UNIQUE SCAN                   | UQ$B                         |   4256K|      1 |     0   (0)|    214K|00:00:42.68 |      12M|                                                                                                                                                                                                                                                                                                          
17 - access(A.A_COL1=V2.V2_COL1 AND A.A_COL2=V2.V2_COL2 AND                                                                                                                                                                      
18 - access(A.A_COL1=V1.V1_COL1)                                                                                                                                                                                                                                            
20 - access(V2.V2_COL3=A.A_COL1)                                                                                          
22 - access(V3.V3_COL1=V4.V4_COL1)                                                                                
25 - filter((A.A_COL4=:B1 AND INTERNAL_FUNCTION(C.CONFIRMTYPE)))                                                                   
26 - filter((B.B_COL1=:B1 AND ((A.A_COL6 IS NULL AND A.A_COL7='Z') OR (B.B_COL2='W'         
            AND A.A_COL8 IS NULL AND A.A_COL9 IS NOT NULL)) AND A.A_COL1=B.B_COL3))                     
27 - access(A.A_COL10=B.B_COL10 AND A.A_COL11=B.B_COL11)  

So we have a view (successfully merged by the CBO) V consisting of 4 tables, V1, V2, V3 and V4, and two tables, A and B. The optimizer thinks that the join to B is incredibly selective (step 17 is expected to produce 221K rows, but after joining to B the optimizer expects that only 1 row would remain)! Actually, in a way the optimizer is correct: indeed, joining to B does reduce the dataset considerably, but less than the CBO suggests (2 orders of magnitude instead of 5).

This prompts two questions:

1) Why is the optimizer wrong by two orders of magnitude?
2) If it thinks that joining to B reduces the data set this much, why doesn’t it try to do this earlier, making the intermediate data set smaller?

The first question is easier. If we look at the predicates to operation 17, we can see 3 equality conditions. By default, the optimizer assumes independent distributions for columns values, so it simply multiplies the three individual join predicate selectivities. As it turns out, this assumption is not accurate in this case, as there is a strong correlation between COL3 and the other two columns (I spotted it by noticing that removing a join on COL3 doesn’t change the rowcount significantly, despite the fact that it has a relatively high NDV).

The second question is more interesting. If the optimizer thinks that joining to B reduces the datasets so strongly, why doesn’t it do it at an earlier stage? First of all, according to the optimizer’s logic, it’s not all that important, because most of the costs fall on accessing (and joining) the tables A and B rather than the tables V1-V4 that make up the view V. But still, even though it’s a small portion of cost, still, wouldn’t it be beneficial to reduce it? Perhaps, but according to the optimizer, joining A to V before joining to B is a better strategy, because it allows to reduce eliminate 80% of rows before performing the expensive join to B. Of course eliminating only 80% of rows is much less impressive 99.9995% elimination, but it’s more beneficial because it’s reduces the cost of a much more expensive operation.

In reality, however, joining A to V doesn’t change the reduce the number of rows much. How come the optimizer is wrong about that? The answer can be found in the predicate section: the hash join on line 17 references 3 distinct tables: A, V1 and V2. I.e. instead of a simple two-table join we have an entangled join, where A is joined to the result of (V1,V2) join via columns from both V1 and V2. In such situations, the join cardinality is much harder to estimate accurately.

The reason for that is that Oracle doesn’t have any special treatment for multi-table joins: it simply joins one table at a time. But the intermediate result set that is being joined to a table, doesn’t have any optimizer statistics, such us column NDVs. So instead of properly adjusting the two-table formula to the multi-table case, Oracle uses it in a naive way: instead of operating on values from the intermediate data set on one side, and the table being joined on the other, it simply takes NDVs and other stats from the tables referenced in the join predicates, and plugs them into the standard formula.

In our case this means that the cardinality of joining A to V would be calculated as: (cardinality of outer row source) x (cardinality of inner row source) x (join selectivity), where (join selectivity) is 1/greater(NDV(A.A_COL1) x NDV(V2.V2_COL1)) x 1/greater(NDV(A.A_COL2) x NDV (V2.V2_COL2)) x 1/greater(NDV(A.A_COL3) x NDV(V1.V1_COL3)).

Why does this formula give a wrong result? Because there’s no fundamental reason why it should give the correct one. To be fair, even the two-table formula isn’t much more than an educated guess; but in this case at least hope that it would give a more or less realistic answer often enough, because it’s built upon some solid foundation. If we take a rather typical case of a PK-FK join in a master-detail kind of a situation, the 2-table join formula would give the accurate answer.

But pushing this to the 3-table “entabled” join case doesn’t really have such solid foundations. In fact, as Jonathan Lewis noticed in “Cost Based Fundamentals”, if you use transitive closure to legally re-arrange multi-table join predicates, you often would be getting different results! I checked it in this case as well — and indeed, I was able to change query’s cardinality estimates by a couple of orders of magnitude without changing its semantics.

There are cases when transitive closure can be used to “disentangle” a join, e.g.:

A.a1 = B.b1 and C.c1 = A.a1 and C.c2 = B.b2 
A.a1 = B.b1 and C.c1 = B.b1 and C.c2 = B.b2

and when this is possible, there is a decent chance that the optimizer would be able to get a more accurate estimate of the join cardinality after the rewrite. In our case, however, the problem is that some of the joins are done inside the view, so in order to “disentable” the joins we’d need to change the view’s DDL.

It is because of the difficulties of this nature that I am not a fan of the cardinality-centric approach to SQL tuning problems. Or to be more accurate, I reserve this approach for root cause analysis after a solution has been offered if time allows.

So to sum up:

1) Getting join cardinalities right for multi-column joins is a tricky business
2) When dealing with “entangled” multi-table joins it’s even trickier
3) Overall, join cardinality formula works well for simple cases like joins with lookup or dimension tables; anything more than that leaves the accuracy of the join cardinality to the chance
4) Sometimes transitive closure can allow to “disentangle” join predicates — if such transformation is possible, it can improve the accuracy of the join cardinality estimate.

2 thoughts on “Join cardinality

  1. Hello!

    Your blog content is extremely interesting and well written.
    But:) please tune the page layout. The left and right panels combined occupy more than 60% of the horizontal space of the page.
    Please make much more room for the content. It’s difficult to read the execution plans and your words. It’s a constant scrolling up-down left-right.

    I guess that your other readers would also be happy with such change.

    Thanks and keep up the good work.

    1. Hi Rob,

      thanks for your feedback. I agree that there is a lot of room for improvement in the readability of the blog. Unfortunately, most of the settings cannot be easily changed, at least not without some additional investments both in terms of money (switching to the premium WordPress plan) and time. I cannot afford this investment at the moment, but I’ll keep your comment in mind, and if an opportunity presents itself, I’ll work on improving the blog’s design.

      Best regards,

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s