On one of the databases I’m looking after (188.8.131.52, Solaris, non-RAC), several different INSERT statements (all into tablespaces with manually managed segments) suffer from occasional hiccups. The symptoms are always the same: in one of the sessions, the INSERT gets stuck doing lots of single-block I/O against one of the indexes on the inserted table, and if other sessions are running similar INSERTs, they hang on enq: TX – index contention. The situation can last just a few seconds, but sometimes it’s much longer than this (several minutes), in which case the impact on the application is quite serious.
Today I’d like to share another tuning example from a recent case at work, which in my opinion is good for illustrating typical steps involved in SQL optimization process.
I was handed a poorly performing query with a relatively verbose text, so I will only give the general structure here (it will also prevent me from accidentally disclosing some sensitive information from that application):
Nested loop join appears like the simplest thing there could be — you go through one table, and as you go, per each row found you probe the second table to see if you find any matching rows. But thanks to a number of optimizations introduced in recent Oracle releases, it has become much more complex than that. Randolf Geist has written a great series of posts about this join mechanism (part 1, part 2 and part 3) where he explores in a great detail how numerous nested loop optimization interact with various logical I/O optimizations for unique and non-unique indexes. Unfortunately, it doesn’t cover the physical I/O aspects, and that seems to me like the most interesting part — after all, that was the primary motivation behind introducing all those additional nested loop join mechanism on the top of the basic classical nested loop. So I conducted a study on my own, and I’m presenting my results in the mini-series that I’m opening with this post.
Chasing cost efficiency, business often cuts back on money spent on UAT boxes used for performance testing. More often than not, this is a bad-decision, because the only thing worse than not having a UAT environment is having a UAT environment that is nothing like production. It gives a false sense of security while exposing your application to all sorts of nasty surprises. In this post I tried to summarize a few typical configuration differences between UAT and production which can affect performance test results in a major way.
Continue reading “Lies, damned lies and non production-like performance testing”
SQL trace file provide the highest level of detail possible about SQL execution. The problem with that information is converting it to a convenient format for further analysis. One very good solution is parsetrc tool by Kyle Hailey written in Perl. It gives high-resolution histograms, I/O transfer rates as a function of time, and other very useful info. Unfortunately, I myself am not a Perl expert, so it’s a bit difficult for me to customize this tool when I need something slightly different from defaults (e.g. change histogram resolution, look at events not hardcoded into the script etc.). Another limitation is that since the tool is external to the database, you can’t join the data anything else (like ASH queries). So I found another solution for raw trace file analysis: external tables + regexp queries.
Nowadays, data in databases is wrapped in may layers of cache: result cache, buffer cache, OS page cache, storage hardware cache… They greatly improve performance, but they also make it less stable and harder to predict. And when I/O performance takes a turn for worse, one has to go through multiple layers of cache trying to understand what went wrong and why. I had such a case not too long ago.
The incident took place on an Oracle 184.108.40.206 database running on a Solaris 10 server. The first symptom was decreased throughput of one of batch processes in the database by about 30 percent. Since the process was spending more than half of its time doing I/O (as ASH indicated), it was easy to establish that the slowdown was linked to increased time of db file sequential reads by plotting average wait times from DBA_HIST_SYSTEM_EVENT (the SQL code for this and other queries in this blog post can be found below, in the Appendix):
As most of database developers who have to work with database performance analysis and tuning, I have my own collection of favorite sqlplus scripts for various occasions. Over years, I have turned many of them into PL/SQL code which allows greater flexibility. Imagine, for examples, trying to call a sqlplus script from another sqlplus script with both in and out parameters, or simulating autonomous transactions in sqlplus — obviously, such things are much easier in PL/SQL. Of course, PL/SQL has certain drawbacks of its own, one of them being the rather peculiar way Oracle handles user security. More specifically, you cannot use privileges received via roles inside PL/SQL code. This can be really annoying, especially for performance benchmarking code, which heavily relies on access to data dictionary views, granted via SELECT_CATALOG_ROLE or similar, e.g.: Continue reading “Accessing data dictionary views from PL/SQL”