Working with database performance troubleshooting means dealing with challenging and exciting problems. But it also means doing certain things over and over. And over. Like typing the same query or clicking the same button zillion times a day. And when you’re doing something very frequently, anything that can be done to simplify or automate such tasks becomes very important. So I decided to share some of the tricks I use to optimize my working experience in a short mini-series. This post will be on GUI, and the part II will be on command line.
This week I participated (online option) in HotSOS conference — the only conference that is focused entirely around Oracle database performance and internals. Like a year ago, it was a great conference, I really enjoyed it. Predictably, the number one topic was the in-memory option. Probably every third talk was about it. I learned a lot of stuff I didn’t know before (and I strongly suspect that this stuff wouldn’t be easy to find anywhere else).
Occasionally one might want to know what a segment is made of in terms of block types. For example, you notice that the number of blocks in an index segment is somewhat larger than the number of branch and leaf blocks, and wonder what kind of blocks accounts for the difference. The only way to do this is by dumping index blocks (e.g. as described in Richard Foote’s blog here). Dumping blocks is easy, but analyzing them — not so much. Sure, there exists a plethora of tools that allow to parse text from the OS side (awk, perl, sed and whatnot), but this leads to usual problems: OS access, scripting skills, certain platforms may not have the scripting tool you’re most comfortable with, and even more importantly: scripts cannot do cool stuff that Oracle can (like joining data to other data) . Fortunately, those difficulties can be circumvented by using regexp + external files as I already posted in my blog here. This time, I’d like to show how this technique can be adjusted for index block dumps.
Last year, I spent some time researching redo log related performance problems, which resulted in a mini-series, including one post devoted specifically to one previously unknown scenario of excessive log file sync waits. I am happy to announce that a service request opened on the back of this research resulted in the MOS note on “log file parallel write” wait event (Doc ID 34583.1) having been updated with a general description of this scenario and factors that may contribute to it. Unfortunately, more specific information regarding this issue has been put to an internal Oracle note because of the limitations that concern underscore parameters. So if you are dealing with log file sync (or log buffer space) issues, then I strongly recommend to log in to MOS and familiarize yourself with the updated version of the note as soon as possible.
Except a few special cases, optimizing SQL is about minimizing I/O. And by “I/O” we normally mean “physical I/O”, because everybody knows that logical I/O (LIO) is much, much faster. But how much faster exactly? For a long time, this question has been bothering me. It looks like there has been little research in this area. Basically the only thorough investigation I managed to find on the subject was one by Cary Millsap and co-authors. So I conducted some research on my own. Continue reading
Last week I posted a blog on high CPU utilization during I/O. The blog title contained a question mark because it seemed like there were more questions than answers. Were the CPU utilization numbers real or just an instrumentation artifact? Were they distorted by double-counting? If real, where was it coming from? About a dozen top experts participated in that discussion, and everybody had a different theory. I’ve spent some time looking at the problem on different test environments from the OS angle, so now I have some answers I’d like to share.
Yesterday, I posted a trace file that seemed odd to me Twitter. This resulted in a very lively discussion, and several very interesting theories were put forward. Even though the issue hasn’t been solved yet, I would like to post a summary of this discussion (plus my findings) so that more people would be able to give their inputs. Also, some of the theories put forward to explain the observed anomaly are very interesting on their own.
The issue is very simple (which I think is one of the reasons why it resulted in such a heated debate). It has to do with unreasonably high CPU utilization with low values of the db_file_multiblock_read_count (from that point on, MBRC) parameter. Here is the setup I used in my tests:
Disjunction (logical OR) is known to cause various performance problems, from extreme parse times (e.g. here) to sub-optimal plans. A common solution to such problems is getting rid of OR’s by “OR expansion” (i.e. rewrite via UNION ALL), although it doesn’t work in 100% cases. In this post, I will consider an example of an OR problem that can be solved differently.
In my previous post I mentioned method R as probably the most efficient approach to SQL optimization. However, it is important to focus on correct metrics for it to work correctly.
Consider this example (once again, the query is still running, so the only reliable diagnostic tool at our disposal is SQL real-time monitor):
First principles, Clarice. Simplicity. Read Marcus Aurelius. Of each particular thing ask: what is it in itself? What is its nature?
What do you do when you get a call about a query that keeps running without completing, where the query text itself looks very simple:
select col1, col2, ... from someview where colA = '12345-A' and colB in ('B') order by colN, colM
and the SQL plan monitor report looks like this: