Bad plan or something else?
The first step in any tuning activity is determining the scope of an issue — so if a query is not performing satisfactory, then the first question is whether it’s query’s execution plan or something else (e.g. a global database issue or even a problem external to the database). Full diagnostics may be time-consuming or even inaccessible, so it’s desirable to resolve this question by just looking at the query and its basic statistics. There is a method of doing that using SQL efficiency, and in this post I’m going to describe it.
The method is not my invention. I read about it in Christian Antognini’s book “Troubleshooting Oracle Performance”, but in fact I was using it in a slightly different form long before. Continue reading “SQL efficiency”
In my earlier post, I mentioned an interesting case which among other symptoms, featured high CPU usage. In this post I would like to tell more about that case.
I was investigating poor performance on a 10.2.0.5 Oracle database, and I was asked to look at it after yet another outage (the database wasn’t accepting any connections for over two hours). The AWR report showed massive mutex and latch contention:
Continue reading “The case of a stuck query (mutexes in 10g)”
Let’s start with some basic concepts. AWR reports deal with several kinds of time. The simplest kind is the elapsed time , it’s just the interval of time between the start and end snapshots. Another important quantity is DB time, which is defined as time in user calls during that period. It can be (and for a busy system typically is) greater than the elapsed time. However, the reason for that is not the number of CPUs as some experts incorrectly state (apparently, they confuse it with CPU time that we’ll discuss below, e.g. here), it’s that this time is a sum over all active user processes which are using CPU or waiting for something. Note that it only counts time spent in user calls, i.e. background processes are not included in that. Continue reading “AWR reports: interpreting CPU usage”
Last week, when I was tuning a SQL statement at a 10.2.0.4 database, I came across a rather interesting issue.
The SQL was performing poorly because of an inefficient plan (wrong join order), and it was relatively straighforward to determine the reason for that: a table partition appeared empty on the plan, while in fact it contained a few hundred thousand records. The table was range-partitioned on a timestamp column (1 partition = 1 day).
As the development team explained to me, they were relying on a rather sophisticated maintenance procedure that was looking over a previous period to chose a suitable partition as the source, and then copied its stats into a newly created partition before it was filled with data to prevent it from being analyzed as empty. Continue reading “A problem with dbms_stats.copy_table_stats”