Performance tuning is all about time. You measure the time it takes for a certain process to complete, and then you search for ways to reduce this time to improve end-users experience and/or increase the application productivity. But minimizing time is not enough — it’s important to minimize the correct time metric. A typical mistake in database performance optimization to optimize DB time instead of the duration experienced by the end user. If this happens, this can easily result in a situation when DB time is reduced significantly, but the process is still taking almost as long as before, the SLA is still breached, bosses and users are increasingly frustrated.
In this post, I will give two real-life examples of cases when the difference between DB time and elapsed time was the key to understanding the problem.
SQL performance can degrade for many reasons, some of most common are:
- plan changes
- data skewness
- low caching efficiency
- data growth
All these factors are relatively well known. A somewhat less common, although not exceptionally rare scenario, is read consistency overhead due to concurrent DML against queried tables. Because of being less common, this scenario is often overlooked, which leads to false diagnoses (and eventually to “fixes” that can do more harm than good).
In this blog, I mostly write about performance issues (and tools to resolve them). I’d like to make a post about performance non-issues for a change. I think it might be interesting for two reasons. One, if not identified early, non-issues can consume lots of resources in terms of developers’ efforts, may introduce instability to the code and even cause the very performance issues they were supposed to prevent. Two, such non-issues usually reveal actual issues, even though those actual issues are mostly related to non-technical aspects of work, such as communication. I’ll start with a rather specific category of non-issues: when performance doesn’t match expectations because expectations don’t match reality.
Even the great ones do misspeak from time to time. And when they do, this can lead to a terrible mess in people’s heads.
Here is one example:
Are you at 100% utilization? If not, you haven’t accomplished your job yet. You cannot
put CPU in the bank and save it for later. So, if you are running with idle cycles you
should be looking for more ways to use it.
This statement is, at best, very confusing (and you can see some examples of this confusion in this recent OTN thread).
Let’s apply it to a specific example:
Active session history (ASH), available since 10g, is one of the greatest performance troubleshooting tools available for Oracle databases. And one of most underutilized, too. In this article I will give some basic guidelines regarding ASH usage, and will list some of my favorite ASH queries.
ASH is a view (V$ACTIVE_SESSION_HISTORY). It can give most of the information found in trace files, but unlike tracing, it’s always on!
Using ASH is extremely simple, because you don’t have enable any events or go to database server filesystem to retrieve the results: you just query the view. However, there are a few basic rules that allow to use ASH with maximum efficiency, and minize the risk of misinterpreting its output.
Occasionally I encounter a situation when I need to affect a part of the plan that corresponds to a view, e.g.:
select v.x, x.y
where q.x = 1
Such situations are resolved using global hints. Oracle offers two ways to specify a global hint: via a query block identifier (system generated or user defined) or via view aliases. System-generated query block identifiers can be obtained via dbms_xplan.display with ALL or ALIAS option (they have the form SEL$n, where n appears to be same as the depth, e.g. in our case 1 corresponds to the main query, 2 to the inline view, 3 to the view V inside that inline view) or defined by the user via qb_name hint.
In general, tuning analytic functions (and more generally, all sort operations) is rather difficult. While for most poorly performing queries it’s relatively straightforward to gain some improvements by applying “eliminate early” principle one way or another, for slow sort operations it’s rarely applicable. Usually options are limiting to rewriting a query without analytics (e.g. using self-joins or correlated subqueries to achieve the same goal) or manually resizing the workarea to reduce/eliminate the use of disk. Recently, however, I had a case where I managed to obtain an excellent performance gain using a different technique that I would like to share in this post.
The original query was selecting about 100 columns using the LAG function on one of the columns in the WHERE clause, but in my test case I’ll both simplify and generalize the situation. Let’s create a table with a sequential id, three filtering columns x, y and z, and 20 sufficiently lengthy columns.