Row cache lock

A rather odd performance issue happened at work a few days ago. All of a sudden, one of the databases started to work very slowly, and a quick look in ASH data showed that it was spending over 70% of its time waiting on “row cache lock” (not to be confused with “latch: row cache lock”).

It was a test database (11.2.0.3) on an overloaded server with non-standard configuratioin, so my initial reaction was to bounce the database and hope that the symptoms would go away, but it didn’t work. So there was no other choice but to investigate the root cause. The next simplest thing to “turning it off and then back on” was finding the blocking session in ASH (the V$ACTIVE_SESSION_HISTORY view has columns BLOCKING_SESSION and BLOCKING_SESSION_SERIAL#). Most of the entries for this wait event had NULLs in these columns, but among those NULLs there were sid’s and serial#’s of two sessions. Unfortunately, both sessions were from background processes, MMON and SMON, so killing them obviously wasn’t a tempting choice. MMON is responsible for collecting AWR data and such, so the instance could probably survive if it gets killed, but SMON must always be running or the instace would crash.

Continue reading “Row cache lock”

Caution: hot patches

For many Oracle database patches there is an option to apply them “online”, i.e. without stopping the database and related services (listener, ASM etc.).  This is very convenient when downtime is impossible or difficult to arrange. However, this convenience comes at a certain price, including some performance penalties. Such side effects of hot patching are not well understood, even by Oracle support engineers.

Continue reading “Caution: hot patches”

DB time vs duration

Performance tuning is all about time. You measure the time it takes for a certain process to complete, and then you search for ways to reduce this time to improve end-users experience and/or increase the application productivity. But minimizing time is not enough — it’s important to minimize the correct time metric. A typical mistake in database performance optimization to optimize DB time instead of the duration experienced by the end user. If this happens, this can easily result in a situation when DB time is reduced significantly, but the process is still taking almost as long as before, the SLA is still breached, bosses and users are increasingly frustrated.

In this post, I will give two real-life examples of cases when the difference between DB time and elapsed time was the key to understanding the problem.

Continue reading “DB time vs duration”

Read consistency overhead

SQL performance can degrade for many reasons, some of most common are:
– plan changes
– data skewness
– low caching efficiency
– data growth
– contention.

All these factors are relatively well known. A somewhat less common, although not exceptionally rare scenario, is read consistency overhead due to concurrent DML against queried tables. Because of being less common, this scenario is often overlooked, which leads to false diagnoses (and eventually to “fixes” that can do more harm than good).

Continue reading “Read consistency overhead”

Managing expectations

In this blog, I mostly write about performance issues (and tools to resolve them). I’d like to make a post about performance non-issues for a change. I think it might be interesting for two reasons. One, if not identified early, non-issues can consume lots of resources in terms of developers’ efforts, may introduce instability to the code and even cause the very performance issues they were supposed to prevent.  Two, such non-issues usually reveal actual issues, even though those actual issues are mostly related to non-technical aspects of work, such as communication. I’ll start with a rather specific category of non-issues: when performance doesn’t match expectations because expectations don’t match reality.

Continue reading “Managing expectations”

Is CPU usage 100% really okay?

Even the great ones do misspeak from time to time. And when they do, this can lead to a terrible mess in people’s heads.

Here is one example:

Are you at 100% utilization? If not, you haven’t accomplished your job yet. You cannot
put CPU in the bank and save it for later. So, if you are running with idle cycles you
should be looking for more ways to use it.

This statement is, at best, very confusing (and you can see some examples of this confusion in this recent OTN thread).

Let’s apply it to a specific example:

Continue reading “Is CPU usage 100% really okay?”

ASH basics

Introduction

Active session history (ASH), available since 10g, is one of the greatest performance troubleshooting tools available for Oracle databases. And one of most underutilized, too. In this article I will give some basic guidelines regarding ASH usage, and will list some of my favorite ASH queries.

ASH is a view (V$ACTIVE_SESSION_HISTORY). It can give most of the information found in trace files, but unlike tracing, it’s always on!

Using ASH is extremely simple, because you don’t have enable any events or go to database server filesystem to retrieve the results: you just query the view. However, there are a few basic rules that allow to use ASH with maximum efficiency, and minize the risk of misinterpreting its output.

Continue reading “ASH basics”

Global Hints

Occasionally I encounter a situation when I need to affect a part of the plan that corresponds to a view, e.g.:

select *
from
(
   select v.x, x.y
   from v
) q
where q.x = 1

Such situations are resolved using global hints. Oracle offers two ways to specify a global hint: via a query block identifier (system generated or user defined) or via view aliases. System-generated query block identifiers can be obtained via dbms_xplan.display with ALL or ALIAS option (they have the form SEL$n, where n appears to be same as the depth, e.g. in our case 1 corresponds to the main query, 2 to the inline view, 3 to the view V inside that inline view) or defined by the user via qb_name hint.

Continue reading “Global Hints”

Tuning Analytic Functions

In general, tuning analytic functions (and more generally, all sort operations) is rather difficult. While for most poorly performing queries it’s relatively straightforward to gain some improvements  by applying “eliminate early” principle one way or another, for slow sort operations it’s rarely applicable. Usually options are limiting to rewriting a query without analytics (e.g. using self-joins or correlated subqueries to achieve the same goal) or manually resizing the workarea to reduce/eliminate the use of disk. Recently, however, I had a case where I managed to obtain an excellent performance gain using a different technique that I would like to share in this post.

The original query was selecting about 100 columns using the LAG function on one of the columns in the WHERE clause, but in my test case I’ll both simplify and generalize the situation. Let’s create a table with a sequential id, three filtering columns x, y and z, and 20 sufficiently lengthy columns.

Continue reading “Tuning Analytic Functions”

“Snapshot too old” as a performance issue in disguise

There exists a fair amount of confusion around one of the most famous Oracle errors — ORA-01555 (“snapshot too old”). The difficulty is probably coming from the fact that ORA-01555 is related to internal workings of undo generation and usage in an Oracle database, which a rather complex. Because of that, when this error on their system, inexperienced DBAs and developers often either start looking at UNDO-related parameters, rollback segment sizes, etc., or start looking for any suspicious activity, which they later link to excessive undo “consumption” or “generation”.

Continue reading ““Snapshot too old” as a performance issue in disguise”