Method R and parallelism (another real-life example)

In my previous post I mentioned method R as probably the most efficient approach to SQL optimization. However, it is important to focus on correct metrics for it to work correctly.
Consider this example (once again, the query is still running, so the only reliable diagnostic tool at our disposal is SQL real-time monitor):

SQL Plan Monitoring Details (Plan Hash Value=776230426)
=========================================================================================================================================================================================================================================
| Id   |              Operation              |           Name            |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read  | Read  | Write | Write | Mem  | Temp | Activity |        Activity Detail         | Progress |
|      |                                     |                           | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs  | Bytes | Reqs  | Bytes |      |      |   (%)    |          (# samples)           |          |
=========================================================================================================================================================================================================================================
|    0 | SELECT STATEMENT                    |                           |         |       |      7334 |   +474 |     1 |     263K |       |       |       |       |      |      |     0.06 | Cpu (12)                       |          |
|    1 |   FILTER                            |                           |         |       |      7334 |   +474 |     1 |     263K |       |       |       |       |      |      |     0.13 | Cpu (28)                       |          |
|    2 |    PX COORDINATOR                   |                           |         |       |      7807 |     +1 |    65 |       2M |       |       |       |       |      |      |     0.07 | os thread startup (8)          |          |
|      |                                     |                           |         |       |           |        |       |          |       |       |       |       |      |      |          | Cpu (6)                        |          |
| -> 3 |     PX SEND QC (RANDOM)             | :TQ10003                  |     19M |  988K |      7337 |   +472 |    32 |       2M |       |       |       |       |      |      |     0.05 | Cpu (10)                       |          |
| -> 4 |      HASH JOIN RIGHT OUTER BUFFERED |                           |     19M |  988K |      7799 |    +10 |    32 |       2M |  1888 | 457MB | 12716 |   3GB | 891M |   3G |     0.92 | Cpu (120)                      |          |
|      |                                     |                           |         |       |           |        |       |          |       |       |       |       |      |      |          | direct path write temp (73)    |          |
|    5 |       PX RECEIVE                    |                           |    249K |    36 |         1 |    +11 |    32 |       8M |       |       |       |       |      |      |          |                                |          |
|    6 |        PX SEND BROADCAST            | :TQ10000                  |    249K |    36 |         2 |    +10 |    32 |       8M |       |       |       |       |      |      |     0.00 | Cpu (1)                        |          |
|    7 |         PX BLOCK ITERATOR           |                           |    249K |    36 |         1 |    +11 |    32 |     249K |       |       |       |       |      |      |          |                                |          |
|    8 |          TABLE ACCESS FULL          | COUNTERPARTY              |    249K |    36 |         1 |    +11 |   390 |     249K |       |       |       |       |      |      |          |                                |          |
|    9 |       NESTED LOOPS OUTER            |                           |     19M |  987K |       466 |    +31 |    32 |      16M |       |       |       |       |      |      |          |                                |          |
|   10 |        HASH JOIN RIGHT OUTER        |                           |     19M |  987K |       486 |    +11 |    32 |      16M | 12580 |   3GB | 12548 |   3GB |      |      |     3.31 | Cpu (364)                      |          |
|      |                                     |                           |         |       |           |        |       |          |       |       |       |       |      |      |          | direct path read temp (96)     |          |
|      |                                     |                           |         |       |           |        |       |          |       |       |       |       |      |      |          | direct path write temp (235)   |          |
|   11 |         PX RECEIVE                  |                           |    124M |  5534 |        15 |    +13 |    32 |     127M |       |       |       |       |      |      |     0.06 | Cpu (13)                       |          |
|   12 |          PX SEND HASH               | :TQ10001                  |    124M |  5534 |        20 |    +11 |    32 |     127M |       |       |       |       |      |      |     0.09 | Cpu (18)                       |          |
|   13 |           VIEW                      |                           |    124M |  5534 |        20 |    +11 |    32 |     127M |       |       |       |       |      |      |          |                                |          |
|   14 |            UNION-ALL                |                           |         |       |        20 |    +11 |    32 |     127M |       |       |       |       |      |      |     0.01 | Cpu (3)                        |          |
|   15 |             PX BLOCK ITERATOR       |                           |    105M |  4000 |        13 |    +11 |    32 |     107M |       |       |       |       |      |      |          |                                |          |
|   16 |              TABLE ACCESS FULL      | TRADEREPORTINGDETAIL      |    105M |  4000 |        13 |    +11 |   654 |     107M | 33374 |   4GB |       |       |      |      |     0.39 | Cpu (30)                       |          |
|      |                                     |                           |         |       |           |        |       |          |       |       |       |       |      |      |          | direct path read (51)          |          |
|   17 |             PX BLOCK ITERATOR       |                           |     20M |  1534 |         8 |    +23 |    32 |      20M |       |       |       |       |      |      |          |                                |          |
|   18 |              TABLE ACCESS FULL      | TRADEREPORTINGDETAIL      |     20M |  1534 |         9 |    +22 |   413 |      20M | 12907 |   2GB |       |       |      |      |     0.18 | Cpu (6)                        |          |
|      |                                     |                           |         |       |           |        |       |          |       |       |       |       |      |      |          | direct path read (31)          |          |
|   19 |         PX RECEIVE                  |                           |     19M |  969K |       428 |    +31 |    32 |      16M |       |       |       |       |      |      |     0.13 | Cpu (28)                       |          |
|   20 |          PX SEND HASH               | :TQ10002                  |     19M |  969K |       431 |    +26 |    32 |      16M |       |       |       |       |      |      |     0.21 | Cpu (44)                       |          |
|   21 |           VIEW                      | ARCHIVETRADE              |     19M |  969K |       430 |    +27 |    32 |      16M |       |       |       |       |      |      |     0.03 | Cpu (6)                        |          |
|   22 |            UNION-ALL                |                           |         |       |       430 |    +27 |    32 |      16M |       |       |       |       |      |      |     0.08 | Cpu (16)                       |          |
|   23 |             PX BLOCK ITERATOR       |                           |      3M | 86377 |        15 |    +27 |    32 |       2M |       |       |       |       |      |      |          |                                |          |
|   24 |              TABLE ACCESS FULL      | TRADE                     |      3M | 86377 |        16 |    +26 |   611 |       2M |  396K |  48GB |       |       |      |      |     1.31 | Cpu (185)                      |          |
|      |                                     |                           |         |       |           |        |       |          |       |       |       |       |      |      |          | direct path read (90)          |          |
|   25 |             PX BLOCK ITERATOR       |                           |     17M |  882K |       418 |    +39 |    32 |      13M |       |       |       |       |      |      |          |                                |          |
|   26 |              TABLE ACCESS FULL      | TRADE                     |     17M |  882K |       418 |    +39 |   887 |      13M |    7M | 882GB |       |       |      |      |    58.15 | Cpu (4146)                     |     100% |
|      |                                     |                           |         |       |           |        |       |          |       |       |       |       |      |      |          | direct path read (8055)        |          |
|   27 |        TABLE ACCESS BY INDEX ROWID  | COUNTERPARTYMAP           |       1 |       |       467 |    +30 |   16M |      16M |     4 | 32768 |       |       |      |      |     0.55 | Cpu (116)                      |          |
|   28 |         INDEX RANGE SCAN            | COUNTERPARTYMAPSSSSCPID_I |       1 |       |       467 |    +30 |   16M |      16M |  1569 |  12MB |       |       |      |      |     0.96 | Cpu (196)                      |          |
|      |                                     |                           |         |       |           |        |       |          |       |       |       |       |      |      |          | db file sequential read (6)    |          |
|   29 |    UNION-ALL                        |                           |         |       |      7334 |   +474 |    2M |     799K |       |       |       |       |      |      |     0.03 | Cpu (6)                        |          |
|   30 |     TABLE ACCESS BY INDEX ROWID     | TRADEREPORTINGDETAIL      |       1 |     4 |      7322 |   +474 |    2M |     414K |  405K |   3GB |       |       |      |      |     6.78 | Cpu (36)                       |          |
|      |                                     |                           |         |       |           |        |       |          |       |       |       |       |      |      |          | db file sequential read (1386) |          |
|   31 |      INDEX UNIQUE SCAN              | TRADEREPORTINGDETAIL_PK   |       1 |     3 |      7332 |   +474 |    2M |     841K |  660K |   5GB |       |       |      |      |    14.15 | Cpu (69)                       |          |
|      |                                     |                           |         |       |           |        |       |          |       |       |       |       |      |      |          | db file sequential read (2898) |          |
|      |                                     |                           |         |       |           |        |       |          |       |       |       |       |      |      |          | read by other session (2)      |          |
|   32 |     TABLE ACCESS BY INDEX ROWID     | TRADEREPORTINGDETAIL      |       1 |     4 |      3277 |  +4531 |    1M |     385K |  116K | 910MB |       |       |      |      |     3.29 | Cpu (17)                       |          |
|      |                                     |                           |         |       |           |        |       |          |       |       |       |       |      |      |          | db file sequential read (674)  |          |
|   33 |      INDEX UNIQUE SCAN              | TRADEREPORTINGDETAIL_PK   |       1 |     3 |      7327 |   +481 |    1M |     741K |  357K |   3GB |       |       |      |      |     8.97 | Cpu (51)                       |          |
|      |                                     |                           |         |       |           |        |       |          |       |       |       |       |      |      |          | db file sequential read (1831) |          |
|   34 |    TABLE ACCESS BY INDEX ROWID      | BOOK                      |       1 |     2 |      7334 |   +474 | 29067 |     4446 |       |       |       |       |      |      |          |                                |          |
|   35 |     INDEX UNIQUE SCAN               | BOOK_UI                   |       1 |     1 |      7334 |   +474 | 29067 |    29067 |    51 | 408KB |       |       |      |      |          |                                |          |
|   36 |    INLIST ITERATOR                  |                           |         |       |      7332 |   +476 |  250K |    15444 |       |       |       |       |      |      |          |                                |          |
|   37 |     INDEX RANGE SCAN                | COUNTERPARTYMAPSSSSCPID_I |       1 |     2 |      7332 |   +476 |  495K |    15444 |    44 | 352KB |       |       |      |      |     0.05 | Cpu (10)                       |          |
|   38 |    INLIST ITERATOR                  |                           |         |       |      7309 |   +499 |  231K |     9139 |       |       |       |       |      |      |     0.00 | Cpu (1)                        |          |
|   39 |     INDEX RANGE SCAN                | COUNTERPARTYMAPSSSSCPID_I |       1 |     2 |      7311 |   +497 |  460K |     9139 |    94 | 752KB |       |       |      |      |     0.03 | Cpu (6)                        |          |
|      |                                     |                           |         |       |           |        |       |          |       |       |       |       |      |      |          | db file sequential read (1)    |          |
========================================================================================================================================================================================================================================="

A little background: this is a one-off data extract running on a reporting database, so it doesn’t really have to have an absolutely optimal plan, but it does need to complete within reasonable time. However, it has been running for over 2 hours and there is no end in sight.

Where does one start? According to method R, we should focus on the operation where most time is spent. The full table scan at line 26 looks like a good candidate (58.15%, according to the “activity” column). And at the first sight it all makes sense: a full table scan of a nearly 1 TB table, of course this would take a lot of time! And there are probably better access paths here — if you check the table cardinality from DBA_TABLES, you’ll find that it contains nearly a billion rows, while the real-time monitor clearly shows that only 13M of them (i.e. just slightly more than 1%!) are kept. So if we spend some time here, we’d probably find some indexes which would allow to read the data more efficiently. And we’d improve the query execution time, for sure. By several minutes at most.

If we look more carefully, we’d see several clues that this huge and inefficient full table scan is far from being the biggest problem.

Clue 1: there is a bunch of operations with “PX” in them, i.e. it’s a parallel plan. Unlike some other operations, full scans work very nicely with parallelism.

Clue 2: the full table scan is already completed by now (progress = 100%)

Clue 3: given clues 1 and 2, it makes sense to look at the “time active (s)” (which I normally don’t pay attention to because it can be ambiguous for child operations). It shows that the full table scan took less than 7 minutes.

So the problem is something else. And if we keep looking, we find a few more clues that point to it:

Clue 4: operations 29-33 (i.e. operation 29 and its children) have combined # of samples that corresponds to 31% of total activity, i.e. it’s the second biggest branch of the plan in terms of DB time. But “time active (s)” columns indicates that the actual wall clock time for this operation is much much bigger (according to ASH, approximately x20 bigger, which roughly agrees with the figure in this column) because unlike the full table scan, these operations are executed serially.

Clue 5: finally, if we look at the parent operation of 29, we’ll see that it’s a FILTER operation 1. So apparently there is a subquery which optimizer couldn’t transform into a join, and this is where our biggest problem is.

Of course, there are other ways to arrive at the same conclusion — for example, by quickly scanning the plan itself before looking at activity percentages. A FILTER operation, especially when located high, is an indicator of a potential problem (a subquery couldn’t get unnested — so we’re executing lots of index lookups and produce lots of rows which we only filter out at the very end of query execution).

One more thing: I mentioned that this was a one-off data extract that needed to complete by a certain deadline for a reason. If this were a frequently executed query, then the optimization logic could have been different. Imagine, for instance, that it ran every night and the actual execution time didn’t matter. But since the query is running frequently, the fact that the inefficient parallel operation is wasting database resources may become more significant, which would have resulted in shifting the focus of the investigation back to the full table scan.

Conclusion

Activity % column in SQL real-time monitor report can be very confusing for parallel queries. If what matters is the wall clock time, and not DB time (i.e. the goal is to get the results quickly, not minimize resource usage), then one needs to pay attention to other information in the report, like “time active (s)”, and look out for slow serial operations which could be small in terms of DB time (% activity) but big in terms of wall clock time.

9 thoughts on “Method R and parallelism (another real-life example)”

  1. I’m not sure your depiction of Method-R is correct: it’s about response time, not activity, so it wouldn’t jump to line 26 it would be looking at the active time in exactly the way you did.

    1. Hi Jonathan,

      you’re absolutely right. I updated the blog, hopefully now it would be more clear that it’s not the method R but wrong choice of the metric that was the problem here.

  2. As a frustrated end user .. what would you do in this scenario? Kill the query and start again? or say let it go? I dont think there is anyone of knowing if you are 5 mins or 5 hours from completion. Is there?

    1. Line 4 is a right outer join with an input from line 9 of 16M rows, so the join has to produce at least 16M rows while line 2 tells us that we’ve processed 2M so far corresponding to which line 29 tells us that we’ve executed the filter 2M times – so I think we might guess that we’re about 1/8th of the way through the total execution time.

      1. I noticed something curious in the plan. Operations 29-33 are reflecting the “exists (select … from A union all select … from B)” subquery. So one could expect both operands of the UNION ALL (the A and the B) to have the equal number of starts (“execs”) here? How come one has 2M (ops 30 and 31) and the other one 1M (ops 32 and 33)? Does this mean that the optimizer is smart enough to figure out that it may suffice to evaluate only the first operand of UNION ALL because this is an EXIST subquery (i.e. if there are rows returned from A and evaluating B is redundant)? And if yes, does this query transform have a name?

      2. Your assumption is correct – the optimizer is smart enough to know that it doesn’t have to query B if it has found a row in A.

        I’ve only every thought of it as the natural consequence of a semi-join being a “stop on first success”, though I did test it carefully the first time I thought of trying the construction in case it didn’t work; I doubt if it has any particular name.

        Historically it used to be worth getting exists and not-exists subqueries in the right order to do this kind of short-circuiting.

    2. Hi Daryl,

      thanks for your comment. As Jonathan showed here, it’s possible to estimate the number of iterations of non-driving children of the FILTER operation by estimating the cardinality of the driving child operation. But we didn’t actually bother to do any actual calculations because we found that the EXISTS clause was logically redundant (developer’s mistake). After that, killing and re-running the query was an easy call because we could see that without that subquery the longest part was the full tablescan which only took several minutes.

  3. Ah – so the SQL was poorly expressed with a logically redundant EXISTS clause – I refer the honourable gentleman to my comment on the previous post that you have to approach a problem from all angles.

    There is little point focusing on indexes, plan stability etc if your SQL is badly written in the first place.

    1. Dominic,

      but isn’t almost all code we’re dealing with poorly written? Well written code doesn’t often require attention, so if a query is referred to a performance analyst, then chances are, there’s more than just one thing wrong with it. An average SQL statement I need to tune is several screens long and has at least a dozen various problems. So finding _a_ problem is easy, the hard part is to find the one problem that is affecting the execution time the most.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s