Reading SQL plans

SQL tuning is the key to database performance tuning, and yet this seems to be a “blind spot” for many database specialists. I think it’s probably because it falls right on the boundary between DBA’s and developer’s responsibilities: so while a DBA expects database developers take care of performance while developing code, developers often neglect that, relying on a “develop first, let the DBA tune it later” approach. Also, until recently, there were surprisingly few good (and accessible to a newbie) descriptions of how to read a SQL plan.

Now that we have Chritian Antognini’s great book “Troubleshooting Oracle Performance”, the situation has improved dramatically. But still, I think that a blog post on that subject won’t hurt: after all, it’s free and it’s written by someone who still remembers difficulty his first analyzing SQL plans. :)

Continue reading “Reading SQL plans”

The case of a stuck query (mutexes in 10g)

In my earlier post, I mentioned an interesting case which among other symptoms, featured high CPU usage. In this post I would like to tell more about that case.

I was investigating poor performance on a 10.2.0.5 Oracle database, and I was asked to look at it after yet another outage (the database wasn’t accepting any connections for over two hours). The AWR report showed massive mutex and latch contention:

Continue reading “The case of a stuck query (mutexes in 10g)”

AWR reports: interpreting CPU usage

Introduction

Let’s start with some basic concepts. AWR reports deal with several kinds of time. The simplest kind is the elapsed time , it’s just the interval of time between the start and end snapshots. Another important quantity is DB time, which is defined as time in user calls during that period. It can be (and for a busy system typically is) greater than the elapsed time. However, the reason for that is not the number of CPUs as some experts incorrectly state (apparently, they confuse it with CPU time that we’ll discuss below, e.g. here), it’s that this time is a sum over all active user processes which are using CPU or waiting for something. Note that it only counts time spent in user calls, i.e. background processes are not included in that. Continue reading “AWR reports: interpreting CPU usage”

A problem with dbms_stats.copy_table_stats

Last week, when I was tuning a SQL statement at a 10.2.0.4 database, I came across a rather interesting issue.

The SQL was performing poorly because of an inefficient plan (wrong join order), and it was relatively straighforward to determine the reason for that: a table partition appeared empty on the plan, while in fact it contained a few hundred thousand records. The table was range-partitioned on a timestamp column (1 partition = 1 day).

As the development team explained to me, they were relying on a rather sophisticated maintenance procedure that was looking over a previous period to chose a suitable partition as the source, and then copied its stats into a newly created partition before it was filled with data to prevent it from being analyzed as empty. Continue reading “A problem with dbms_stats.copy_table_stats”

SQL tuning: real-life example

An example of tuning — nothing special, but it does illustrate several aspects of tuning work, so I thought I’d make a blog post out of it.

Yesterday I as contacted by a development team working with a 10.2.0.4 database. Their complaint was the traditional “the system is slow”. Despite very unspecific symptoms, they were very specific about the time when it all started. I didn’t have access to AWR on that database, so instead I looked at top wait events in DBA_HIST_SYSTEM_EVENT for recent snapshots, nothing interesting. The top wait event was db file sequential read (with db file scattered read as a very distant second). Since that particular group was working with just one user in the database, I looked at ASH data for this user — same thing, just a bunch of db file seqential reads. Continue reading “SQL tuning: real-life example”

Dtrace LIO with new features is released

A new version of a nifty tool to dig into internals of logical IO by Alexander Anokhin and Andrey Nikolaev

Alexander Anokhin

What’s new in DTrace LIO?
Jump to introduction: Dynamic tracing of Oracle logical I/O

1. The list of supported functions performing logical I/O is extended:
Note: some function names below are my suggestions.

Consistent gets:

  • kcbgtcr – Kernel Cache Buffer Get Consistent Read. This is general entry point for consistent read.
  • kcbldrget – Kernel Cache Buffer Load Direct-Read Get. The function performing direct-path read. Interesting detail: 10.2 the function kcbldrget is called just after kcbgtcr, in 11.2 by (from) kcbgtcr.

Current gets (db block gets):

  • kcbgcur – Kernel Cache Buffer Get Current Read
  • kcbget – Kernel Cache Buffer Get Buffer. This is analogue of kcbgcur function, as I observed this function is called for index branch and leaf blocks
  • kcbnew – Kernel Cache Buffer New Buffer
  • kcblnb (kcblnb_dscn in 11.2) – Kernel Cache Buffer Load New Buffer. The function performing direct-path load. Decoding of block coordinates is not supported…

View original post 480 more words

Reading AWR reports: case study

In my previous post I described some sections that are typically useful when interpreting AWR data. However, sometimes the answer comes from an unexpected source. For example, the workload profile section of the report contains key information for understanding what the database looks like, but it seldom gives a direct answer to the problem (except for maybe excessive parsing and excessive commits). But recently I came across a case when this section was enough to identify the root cause of a non-trivial issue: Continue reading “Reading AWR reports: case study”