Peeling through layers of cache

Nowadays, data in databases is wrapped in may layers of cache: result cache, buffer cache, OS page cache, storage hardware cache… They greatly improve performance, but they also make it less stable and harder to predict. And when I/O performance takes a turn for worse, one has to go through multiple layers of cache trying to understand what went wrong and why. I had such a case not too long ago.

The incident took place on an Oracle 11.2.0.3 database running on a Solaris 10 server. The first symptom was decreased throughput of one of batch processes in the database by about 30 percent. Since the process was spending more than half of its time doing I/O (as ASH indicated), it was easy to establish that the slowdown was linked to increased time of db file sequential reads by plotting average wait times from DBA_HIST_SYSTEM_EVENT (the SQL code for this and other queries in this blog post can be found below, in the Appendix):

Continue reading “Peeling through layers of cache”

Accessing data dictionary views from PL/SQL

As most of database developers who have to work with database performance analysis and tuning, I have my own collection of favorite sqlplus scripts for various occasions. Over years, I have turned many of them into PL/SQL code which allows greater flexibility. Imagine, for examples, trying to call a sqlplus script from another sqlplus script with both in and out parameters, or simulating autonomous transactions in sqlplus — obviously, such things are much easier in PL/SQL.  Of course, PL/SQL has certain drawbacks of its own, one of them being the rather peculiar way Oracle handles user security. More specifically, you cannot use privileges received via roles  inside PL/SQL code. This can be really annoying, especially for performance benchmarking code, which heavily relies on access to data dictionary views, granted via SELECT_CATALOG_ROLE or similar, e.g.: Continue reading “Accessing data dictionary views from PL/SQL”

A wrapper sqlplus script for dbms_xplan.display_cursor

When producing SQL execution plans with rowsource statistics, one has a choice whether or not specify the sql_id. It’s simpler to run dbms_stats.display_cursor without specifying sql_id, but it can be tricky, because then the plan for the last statement executed by this session will be displayed, and it can be not the statement you expected, especially when running the statement from a GUI like Toad. But even with sqlplus you can still get unexpected results, e.g. a plan for a hidden call to a dbms_output routine instead of the last statement you ran.

Below I describe a method to bypass this difficulty.

Method

Copy your SQL into a file (let’s call it slow_query.sql). Then in the same directory open a sqlplus session, and in it run the wrapping script as shown below:


set termout off
set serverout off
set linesize 200
set pagesize 9999
alter session set statistics_level = all;
@slow_query.sql;
spool slow_query_plan.lst
select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));
spool off

You can save it to a script and used it repeatedly with different queries (all that needs to be changed is the name of the script with the query on line 6).

The script produces a text file with nicely formatted plan with rowsource statistics, ready for analysis or posting. Note that since the output is suppressed

(for the reasons explained in the next section), in case of a SQL error you won’t see one — if that happens, just temporarily re-enable the output (set termout on,

set serverout on) to find out the cause of the error.

Explanation

If you’re not interested in how it’s working, then you can stop reading — just copy-paste the script below, and you’re all set.

For those who are more curios, I’ll describe the purpose of individual instructions in the script:

set termout off — suppress output of query rows on the screen. This is necessary, if the query produces many rows of output: displaying them would slow down the script considerably

set serverout off — I’m not entirely sure how, but this line takes care of wrapped sqlplus calls to dbms_output.disable or dbms_output.get_lines. If you don’t add this line, then you may end up having the plan for one of such call instead of your statement

set linesize 200 — this makes sure that lines aren’t wrapped

set pagesize 9999 — prevents headers of plan columns from appearing on each screen

alter session set statistics_level = all — this is the key to obtaining rowsource statistics (i.e. how many rows were produced at each step and how long the step took), without it the execution plan is almost useless for tuning

The rest of the script should be fairly self-explanatory.

SQL efficiency

Bad plan or something else?

The first step in any tuning activity is determining the scope of an issue — so if a query is not performing satisfactory, then the first question is whether it’s query’s execution plan or something else (e.g. a global database issue or even a problem external to the database). Full diagnostics may be time-consuming or even inaccessible, so it’s desirable to resolve this question by just looking at the query and its basic statistics. There is a method of doing that using SQL efficiency, and in this post I’m going to describe it.

The method is not my invention. I read about it in Christian Antognini’s book “Troubleshooting Oracle Performance”, but in fact I was using it in a slightly different form long before. Continue reading “SQL efficiency”

The case of a stuck query (mutexes in 10g)

In my earlier post, I mentioned an interesting case which among other symptoms, featured high CPU usage. In this post I would like to tell more about that case.

I was investigating poor performance on a 10.2.0.5 Oracle database, and I was asked to look at it after yet another outage (the database wasn’t accepting any connections for over two hours). The AWR report showed massive mutex and latch contention:

Continue reading “The case of a stuck query (mutexes in 10g)”

AWR reports: interpreting CPU usage

Introduction

Let’s start with some basic concepts. AWR reports deal with several kinds of time. The simplest kind is the elapsed time , it’s just the interval of time between the start and end snapshots. Another important quantity is DB time, which is defined as time in user calls during that period. It can be (and for a busy system typically is) greater than the elapsed time. However, the reason for that is not the number of CPUs as some experts incorrectly state (apparently, they confuse it with CPU time that we’ll discuss below, e.g. here), it’s that this time is a sum over all active user processes which are using CPU or waiting for something. Note that it only counts time spent in user calls, i.e. background processes are not included in that. Continue reading “AWR reports: interpreting CPU usage”

A problem with dbms_stats.copy_table_stats

Last week, when I was tuning a SQL statement at a 10.2.0.4 database, I came across a rather interesting issue.

The SQL was performing poorly because of an inefficient plan (wrong join order), and it was relatively straighforward to determine the reason for that: a table partition appeared empty on the plan, while in fact it contained a few hundred thousand records. The table was range-partitioned on a timestamp column (1 partition = 1 day).

As the development team explained to me, they were relying on a rather sophisticated maintenance procedure that was looking over a previous period to chose a suitable partition as the source, and then copied its stats into a newly created partition before it was filled with data to prevent it from being analyzed as empty. Continue reading “A problem with dbms_stats.copy_table_stats”