Very often, significant performance benefits can be obtained by using some very basic knowledge of the application, its data and business rules. Sometimes even less than that: even if you are not familiar with the application logic at all, you can still use common sense to make some reasonable guesses that would get you a long way in improving query’s performance. Here is an example (based on an actual query that I had to tune today).
Today I’d like to share another tuning example from a recent case at work, which in my opinion is good for illustrating typical steps involved in SQL optimization process.
I was handed a poorly performing query with a relatively verbose text, so I will only give the general structure here (it will also prevent me from accidentally disclosing some sensitive information from that application):
Nested loop join appears like the simplest thing there could be — you go through one table, and as you go, per each row found you probe the second table to see if you find any matching rows. But thanks to a number of optimizations introduced in recent Oracle releases, it has become much more complex than that. Randolf Geist has written a great series of posts about this join mechanism (part 1, part 2 and part 3) where he explores in a great detail how numerous nested loop optimization interact with various logical I/O optimizations for unique and non-unique indexes. Unfortunately, it doesn’t cover the physical I/O aspects, and that seems to me like the most interesting part — after all, that was the primary motivation behind introducing all those additional nested loop join mechanism on the top of the basic classical nested loop. So I conducted a study on my own, and I’m presenting my results in the mini-series that I’m opening with this post.
Log buffer space is a simple, yet frequently misunderstood wait event. The main reason for that is probably its name. It sounds as if it points immediately to the answer: if space in the log buffer is the issue, then increasing it surely should resolve it. Well, unfortunately even though log buffer space is simple, it’s not that simple.
Continue reading “Log buffer space”
Chasing cost efficiency, business often cuts back on money spent on UAT boxes used for performance testing. More often than not, this is a bad-decision, because the only thing worse than not having a UAT environment is having a UAT environment that is nothing like production. It gives a false sense of security while exposing your application to all sorts of nasty surprises. In this post I tried to summarize a few typical configuration differences between UAT and production which can affect performance test results in a major way.
Continue reading “Lies, damned lies and non production-like performance testing”
In my recent post I showed how log file sync (LFS) and log file parallel write (LFPW) look for normal systems. I think it would also be interesting to compare that to the situation when LGWR does not have enough CPU.
I happen to have collected LGWR and database-level trace files for a 18.104.22.168 database on a Solaris 10 server which was under serious pressure (50 threads mostly inserting and committing data, only 32 CPUs). The AWR showed significant OS_CPU_WAIT_TIME (comparable to BUSY_TIME and much larger than IDLE_TIME) so I know for sure that CPU was an issue. And here is what LFS and LFPW histograms plotted from the trace file (as described here) looked like:
Not every commit results in a redo write. This is because there are multiple optimizations (some controlled by the user e.g. with COMMIT_LOGGING parameter, some automatic) that aim at reducing the number of redo writes caused by commits by grouping redo records together. Such group or “piggyback” commits are important for understanding log file sync waits and various statistics around it. In particular, “piggyback” commits play a key role when many sessions commit concurrently at a high rate, as described in my previous post. I made myself a little demo to actually see this mechanism in work with my own eyes. I think it could be of interest for others, so I’m sharing it here. Since the demo involves stopping and resuming background process, I wouldn’t recommend running it on anything other than a designated private sandbox environment.