AWR report case study: stating the obvious

It’s been a while since I came across an interesting and complex case. However, yesterday on an OTN forum thread I saw a case which was interesting by its simplicity. Even though it’s almost trivial on the technical level, it’s very useful to highlight typical tuning “rookie mistakes” (I can remember quite a few cases from not so long ago, when I did similar mistakes, too).

The author posts a question about “library cache: mutex X” events that are ruining performance of his 2-node RAC cluster. The original post doesn’t contain any specifics except for CPU utilization percentage on both nodes.

Within a few hours, a few replies appear, most of them either trying to shed light on this particular wait event or sharing similar experiences. I asked the original poster to provide key sections of an AWR report (workload profile, top events, database/host CPU, top SQL), which he soon did:

Continue reading “AWR report case study: stating the obvious” →

Troubleshooting log file sync waits — a simple case study

Yesterday I was asked to give my opinion on the AWR below:

DB Name	DB Id	Instance	Inst num	Release	        RAC	Host
*****	*****	*****   	1	        10.2.0.4.0	NO	******

	        Snap Id	Snap Time	   Sessions  Cursors/Session
Begin Snap:	66340	6/20/2012 14:30	   117	  9.3
End Snap:	66341	6/20/2012 15:00	   115	  9.1
Elapsed:		30.16 (mins)
DB Time:		35.57 (mins)

Report Summary

Cache Sizes

	                Begin	End
Buffer Cache:	        1,888M	1,888M	Std Block Size:	8K
Shared Pool Size:	1,024M	1,024M	Log Buffer:	3,168K

Load Profile

	                Per Second	Per Transaction
Redo size:	        504,437.43	3,150.59
Logical reads:	        23,547.86	147.07
Block changes:	        1,900.01	11.87
Physical reads:	        1,931.69	12.06
Physical writes:	50.85	        0.32
User calls:	        478.35	        2.99
Parses:	                20.39	        0.13
Hard parses:	        0.05	        0
Sorts:	                8.67	        0.05
Logons:	                0.1	        0
Executes:	        250.29	        1.56
Transactions:	        160.11

% Blocks changed per Read:	8.07	Recursive Call %:	18.31
Rollback per transaction %:	0	Rows per Sort:	178.25

...

Top 5 Timed Events

Event	                Waits	        Time(s)	Avg Wait(ms)	% Total Call Time	Wait Class
CPU time		                1,882		        88.2
log file sync	        293,178	        1,217	4	        57	                Commit
log file parallel write	290,903	        961	3	        45	                System I/O
db file scattered read	301,788	        45	0	        2.1	                User I/O
db file parallel write	11,839	        31	3	        1.5	                System I/O

Continue reading “Troubleshooting log file sync waits — a simple case study” →

Nullability, indexes and joins

A few days ago, I happened to participate in an interesting thread on an OTN forum. We weren’t able to answer all of the questions raised by the original poster, but this discussion stirred some thoughts that eventually lead me to writing this post.

One of the key things that make database a database and not a data dump, is integrity constraints. They allow to implement restrictions on data in a very efficient fashion, and independently of the current application interface. But constraints also play a very important role in providing additional information about data to the query optimizer. Failure to povide that information, or providing it incorrectly that may lead to serious performance issues.

Let’s consider an example.

Continue reading “Nullability, indexes and joins” →

SQL efficiency

Bad plan or something else?

The first step in any tuning activity is determining the scope of an issue — so if a query is not performing satisfactory, then the first question is whether it’s query’s execution plan or something else (e.g. a global database issue or even a problem external to the database). Full diagnostics may be time-consuming or even inaccessible, so it’s desirable to resolve this question by just looking at the query and its basic statistics. There is a method of doing that using SQL efficiency, and in this post I’m going to describe it.

The method is not my invention. I read about it in Christian Antognini’s book “Troubleshooting Oracle Performance”, but in fact I was using it in a slightly different form long before. Continue reading “SQL efficiency” →

LVOUG Harmony 2012 highlights

Last week, I attended Latvian Oracle User Group conference Harmony 2012, held in Riga, Latvia on June 1. The event featured several “stars” of the Oracle world, including Tom Kyte, Tanel Poder, Graham Wood, Daniel Morgan and others. Events like this are a great opportunity to catch up with recent trends in Oracle development and tuning, learn new tricks of the trade, hear the best tuning specialist tell how they tackled their most difficult performance cases, and more!

Continue reading “LVOUG Harmony 2012 highlights” →

Where AWR can’t help: identifying top-SQL in absence of cursor sharing

Many database performance problems can be resolved via a top-down approach: you look at database performance globally (using statspack or AWR report) and then you work your way down to specific SQL consuming most of resources. This approach breaks when the application is not using bind variables. The reason for that is that performance reports depend on sql_id to identify a statement, and in absence of cursor sharing a single SQL statement has as many sql_id’s as there are combinations of literal parameters. So if your heaviest query is something like


SELECT * FROM T1 WHERE X=:B1,

then you’ll see in in AWR top-SQL lists, but if it’s

SELECT * FROM T1 WHERE X="some literal value",

then at best you’ll see a bunch of similar statements in the lists, each responsible for a small fraction of the workload. There is also a good chance that your heaviest SQL won’t show up in those lists at all, if it’s overshadowed by frequently run statements with unchanging parameters (or using binds).

Continue reading “Where AWR can’t help: identifying top-SQL in absence of cursor sharing” →

Histograms for strongly skewed columns

On a recent OTN thread, I learned a nice trick by J. Lewis that allows to circumvent certain problems with histograms.

Histograms were designed to solve the problem of estimating cardinality for skewed columns (i.e. where some values occur much more frequently than the others). For columns with low number of distinct values (NDV) Oracle collects a frequency histogram, which can be thought of as a set of two one-dimensional arrays: one containing all possible values, the other containing their frequency (i.e. how many rows have this value). However, if sample size is small, then Oracle can miss rare values, and they won’t be reflected in the histogram. As a result, the cardinality estimates for those values will be wrong (depending on version Oracle will either set it to either 1 or to half of the frequency for the rarest value found). A detailed explanation of the issues with examples can be found in blog posts by J. Lewis and R. Geist.

Continue reading “Histograms for strongly skewed columns” →

DBMS_XPLAN.DISPLAY_CURSOR

In this post, I continue on the topic of examining SQL plans. I will talk about one DBMS_XPLAN function, DISPLAY_CURSOR (because it’s probably the most useful one when troubleshooting ongoing performance issues, and also because other functions have a lot of similiarity to it). I will discuss frequently used options and some common problems.

Preparation

As already mentioned in my previous post on the subject, using DBMS_XPLAN to display rowsource stats requires a bit of preparation. Namely, one needs to either set STATISTICS_LEVEL parameter to ALL (can be done on the session level), or use gather_plan_statistics hint in the query, and then run the query.

Usage DBMS_XPLAN.display_cursor

Once the statement is executed, the plan with row source statistics can be obtained in a convenient format using DBMS_XPLAN.display_cursor.

DBMS_XPLAN.display_cursor takes three parameters, all of which are optional:

Continue reading “DBMS_XPLAN.DISPLAY_CURSOR” →

Library cache locks: a case study

Recently I was asked to help with a case involving library cache locks, and even though it was really simple, I think it’s good for illustrating a few points in performance analysis.

As it often happens, it all started with customers complaining about “the database being slow”. I asked for more specific details, and I found that it wasn’t the entire database, but rather a reporting subsystem. Since I didn’t have specific session sid or sql_id at hand, I started with analyzing a 30-min AWR report, and since there wasn’t much time to look at everything, I just focused on top timed events:

Event                   Waits          Time (s)  Avg Wait(ms) %Total  Wait Class

CPU time                               16,188               46.2
db file sequential read 18,230,956     11,787     1         33.6     User I/O
db file parallel write  278,480        4,509      16        12.9     System I/O
library cache lock      1,131          3,297      2,915     9.4      Concurrency
log file sync           120,356        1,686      14        4.8      Commit

Continue reading “Library cache locks: a case study” →

Reading SQL plans

SQL tuning is the key to database performance tuning, and yet this seems to be a “blind spot” for many database specialists. I think it’s probably because it falls right on the boundary between DBA’s and developer’s responsibilities: so while a DBA expects database developers take care of performance while developing code, developers often neglect that, relying on a “develop first, let the DBA tune it later” approach. Also, until recently, there were surprisingly few good (and accessible to a newbie) descriptions of how to read a SQL plan.

Now that we have Chritian Antognini’s great book “Troubleshooting Oracle Performance”, the situation has improved dramatically. But still, I think that a blog post on that subject won’t hurt: after all, it’s free and it’s written by someone who still remembers difficulty his first analyzing SQL plans. :)

Continue reading “Reading SQL plans” →