Database query tuning is mostly about getting better plans. Mostly, but not always. Sometimes, the problem has nothing to do with the plan, and you might need to get a bit creative to find a solution. In this recent case a query was showing a decent performance when running from SQL Developer, but it took about 5 times longer to complete when running from R. The plan was the same, so I knew that it was irrelevant. The R session wasn’t showing as active most of the time, so it was fairly clear that the problem was fetching data — i.e. it was fetching too few rows at a time which lead to a large number of roundtrips, and consequently, high waits on “idle” event “SQL*Net message from client”.
On one of the databases I’m looking after (184.108.40.206, Solaris, non-RAC), several different INSERT statements (all into tablespaces with manually managed segments) suffer from occasional hiccups. The symptoms are always the same: in one of the sessions, the INSERT gets stuck doing lots of single-block I/O against one of the indexes on the inserted table, and if other sessions are running similar INSERTs, they hang on enq: TX – index contention. The situation can last just a few seconds, but sometimes it’s much longer than this (several minutes), in which case the impact on the application is quite serious.
Imagine the following situation: you are supporting an application with many different components and a busy release cycle. One a Monday morning you find that quite a few processes in the database now run slower. Very soon, you find out that the slowdown is due to increased CPU time, but where to move from there? There is no evidence that CPU is too stressed, causing CPU queuing. You cannot isolate the problem to any specific PL/SQL procedure or SQL id, and there seems to be no relationship between affected SQL statements. You also check the changes that went in the last weekend — there are quite a few of them, but none seems to be particularly relevant. So what do you do?
When a query contains a regular or inline view, there are 3 basic strategies for the optimizer to choose from:
1) merge the view (no “VIEW” operation in the plan)
2) instantiate the view as the whole and join it to the rest of the query (the plan shows a VIEW “operation”)
3) push join predicates inside the view (the plan shows “VIEW PUSHED PREDICATE”).
In my previous post I showed an example of how a query’s performance can be improved using the waste minimization technique. My focus was primarily on identifying and enforcing the correct plan, but I received some questions regarding the root cause of the problem: why the optimizer came up with a wrong join order? It’s a very interesting question, and it deserves a separate post so that it could be explored in detail.
Today I’d like to share another tuning example from a recent case at work, which in my opinion is good for illustrating typical steps involved in SQL optimization process.
I was handed a poorly performing query with a relatively verbose text, so I will only give the general structure here (it will also prevent me from accidentally disclosing some sensitive information from that application):
A few weeks ago, I received a request to review an AWR report for a database suffering from instance-level performance issues. Here are the the key parts of that report (with some masking):
WORKLOAD REPOSITORY report for DB Name DB Id Instance Inst Num Release RAC Host ------------ ----------- ------------ -------- ----------- --- ------------ XXXX XXXXX XXXXX 1 10.2.0.5.0 NO XXXX Snap Id Snap Time Sessions Curs/Sess --------- ------------------- -------- --------- Begin Snap: 65115 03-May-16 11:00:09 152 17.8 End Snap: 65116 03-May-16 12:00:18 152 17.7 Elapsed: 60.16 (mins) DB Time: 2,712.41 (mins) Cache Sizes ~~~~~~~~~~~ Begin End ---------- ---------- Buffer Cache: 5,856M 5,856M Std Block Size: 8K Shared Pool Size: 2,048M 2,048M Log Buffer: 14,340K Load Profile ~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Redo size: 2,455,599.10 14,087.84 Logical reads: 613,415.60 3,519.18 Block changes: 12,238.64 70.21 Physical reads: 12,233.70 70.19 Physical writes: 1,517.54 8.71 User calls: 1,159.19 6.65 Parses: 39,080.15 224.20 Hard parses: 32.45 0.19 Sorts: 708.22 4.06 Logons: 0.31 0.00 Executes: 39,393.06 226.00 Transactions: 174.31 ...
Oracle cost-based optimizer (CBO) is great, but sometimes it’s making wrong choices even when correct inputs are fed to it. In such cases, you need a tool to override CBOs choices, and one of the most popular tools is optimizer hints. The main reason they’re so popular is that they allow “quick-and-dirty” kind of fixes for performance issues (provided that query text can be altered). Other ways may be more reliable, but generally require more work, and who wants to do work that can be avoided? Unfortunately, there’s a well known downside to the hints — it’s very easy to run into problems if you only fix a part of the plan.
When this happens, hints can lead to terrible execution plans. For a long time, I’ve been looking for a good example to illustrate this problem, and finally this week I encountered a case which appears to be suitable for this purpose.
It’s been forever since I last shared any of my performance troubleshooting experiences at work. This week, I got a case that I think is worth publishing, and I decided to write about it in my blog. So, here we go…
A few days ago, I received a complaint about unstable performance of one of frequently running SQL reports on a 11gR2 database. Most of the time it completed within a couple of minutes, however, on certain occasions it took much longer than that, and once it even took over 20 minutes.
I have a special stored report in SQL developer for conveniently displaying key statistics from DBA_HIST_SQLSTAT which is very helpful as a first step when analyzing unstable SQL performance:
Continue reading “Workarounds”
If you work with I/O benchmarking of Oracle databases, you are almost certainly familiar with SLOB. SLOB is more than just an I/O benchmark — it’s become a de-facto industry standard. It’s simple, powerful and efficient, and it captures a plethora of metrics, both from the OS (output of iostat, mpstat etc.) and the database itself (in the form of an AWR report).
One thing that is missing though is visualization. It’s fairly easy to fix using an external plotting tool (like gnuplot or R), but what data would you plot? AWR only gives you average event times and histograms with ridiculously poor resolution. And if you want to see a high-resolution picture of your I/O (and you do — I’ll discuss the importance of that later on), it’s not enough.