A rather odd performance issue happened at work a few days ago. All of a sudden, one of the databases started to work very slowly, and a quick look in ASH data showed that it was spending over 70% of its time waiting on “row cache lock” (not to be confused with “latch: row cache lock”).
It was a test database (188.8.131.52) on an overloaded server with non-standard configuratioin, so my initial reaction was to bounce the database and hope that the symptoms would go away, but it didn’t work. So there was no other choice but to investigate the root cause. The next simplest thing to “turning it off and then back on” was finding the blocking session in ASH (the V$ACTIVE_SESSION_HISTORY view has columns BLOCKING_SESSION and BLOCKING_SESSION_SERIAL#). Most of the entries for this wait event had NULLs in these columns, but among those NULLs there were sid’s and serial#’s of two sessions. Unfortunately, both sessions were from background processes, MMON and SMON, so killing them obviously wasn’t a tempting choice. MMON is responsible for collecting AWR data and such, so the instance could probably survive if it gets killed, but SMON must always be running or the instace would crash.
For many Oracle database patches there is an option to apply them “online”, i.e. without stopping the database and related services (listener, ASM etc.). This is very convenient when downtime is impossible or difficult to arrange. However, this convenience comes at a certain price, including some performance penalties. Such side effects of hot patching are not well understood, even by Oracle support engineers.
Performance tuning is all about time. You measure the time it takes for a certain process to complete, and then you search for ways to reduce this time to improve end-users experience and/or increase the application productivity. But minimizing time is not enough — it’s important to minimize the correct time metric. A typical mistake in database performance optimization to optimize DB time instead of the duration experienced by the end user. If this happens, this can easily result in a situation when DB time is reduced significantly, but the process is still taking almost as long as before, the SLA is still breached, bosses and users are increasingly frustrated.
In this post, I will give two real-life examples of cases when the difference between DB time and elapsed time was the key to understanding the problem.
SQL performance can degrade for many reasons, some of most common are:
– plan changes
– data skewness
– low caching efficiency
– data growth
All these factors are relatively well known. A somewhat less common, although not exceptionally rare scenario, is read consistency overhead due to concurrent DML against queried tables. Because of being less common, this scenario is often overlooked, which leads to false diagnoses (and eventually to “fixes” that can do more harm than good).
In this blog, I mostly write about performance issues (and tools to resolve them). I’d like to make a post about performance non-issues for a change. I think it might be interesting for two reasons. One, if not identified early, non-issues can consume lots of resources in terms of developers’ efforts, may introduce instability to the code and even cause the very performance issues they were supposed to prevent. Two, such non-issues usually reveal actual issues, even though those actual issues are mostly related to non-technical aspects of work, such as communication. I’ll start with a rather specific category of non-issues: when performance doesn’t match expectations because expectations don’t match reality.
Even the great ones do misspeak from time to time. And when they do, this can lead to a terrible mess in people’s heads.
Here is one example:
Are you at 100% utilization? If not, you haven’t accomplished your job yet. You cannot
put CPU in the bank and save it for later. So, if you are running with idle cycles you
should be looking for more ways to use it.
This statement is, at best, very confusing (and you can see some examples of this confusion in this recent OTN thread).
Let’s apply it to a specific example:
Active session history (ASH), available since 10g, is one of the greatest performance troubleshooting tools available for Oracle databases. And one of most underutilized, too. In this article I will give some basic guidelines regarding ASH usage, and will list some of my favorite ASH queries.
ASH is a view (V$ACTIVE_SESSION_HISTORY). It can give most of the information found in trace files, but unlike tracing, it’s always on!
Using ASH is extremely simple, because you don’t have enable any events or go to database server filesystem to retrieve the results: you just query the view. However, there are a few basic rules that allow to use ASH with maximum efficiency, and minize the risk of misinterpreting its output.