Managing expectations

In this blog, I mostly write about performance issues (and tools to resolve them). I’d like to make a post about performance non-issues for a change. I think it might be interesting for two reasons. One, if not identified early, non-issues can consume lots of resources in terms of developers’ efforts, may introduce instability to the code and even cause the very performance issues they were supposed to prevent. Two, such non-issues usually reveal actual issues, even though those actual issues are mostly related to non-technical aspects of work, such as communication. I’ll start with a rather specific category of non-issues: when performance doesn’t match expectations because expectations don’t match reality.

Continue reading “Managing expectations” →

Is CPU usage 100% really okay?

Even the great ones do misspeak from time to time. And when they do, this can lead to a terrible mess in people’s heads.

Here is one example:

Are you at 100% utilization? If not, you haven’t accomplished your job yet. You cannot
put CPU in the bank and save it for later. So, if you are running with idle cycles you
should be looking for more ways to use it.

T. Kyte, http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:6108562636780

This statement is, at best, very confusing (and you can see some examples of this confusion in this recent OTN thread).

Let’s apply it to a specific example:

Continue reading “Is CPU usage 100% really okay?” →

“Snapshot too old” as a performance issue in disguise

There exists a fair amount of confusion around one of the most famous Oracle errors — ORA-01555 (“snapshot too old”). The difficulty is probably coming from the fact that ORA-01555 is related to internal workings of undo generation and usage in an Oracle database, which a rather complex. Because of that, when this error on their system, inexperienced DBAs and developers often either start looking at UNDO-related parameters, rollback segment sizes, etc., or start looking for any suspicious activity, which they later link to excessive undo “consumption” or “generation”.

Continue reading ““Snapshot too old” as a performance issue in disguise” →

Positive feedback loops

Ever since I heard Tanel Poder talk about application servers causing positive feedback loops (performance gets worse -> open more sessions -> make performance even worse) in Riga 2012 LVOUG conference, I was curios to see one myself. And recently I got a chance to see a very interesting variety of such behavior.

On the plot above you can see database CPU usage (red line) versus time, plotted together with the number of sessions running same SQL statement (identically the same, i.e. not even different binds — there are no binds in this statement). There is a clear correlation between the two. Apparently, when database performance degrades to a certain point, the application server is not getting the answer fast enough, and launches the query again in another session, thus degrading performance even more, etc.

This behavior is very dangerous and may easily cause prolonged outages (see the plateau towards the right side of the plot — the database server was maxed out in terms of CPU for over 24 hours!).

Moral of the story: don’t spawn more sessions in response to poor performance, unless you want poor performance to quickly become terrible!

Troubleshooting stuck queries

Introduction

In my earlier post, I described a method of troubleshooting slow queries using dbms_xplan. While this method is all you need in most cases, it does have one serious problem: it requires the problem query be completed before any diagnostics are taken. What if the query is so slow that it cannot finish within reasonable time frame? I’ll present several alternatives in this post.

SQL Monitor

If your Oracle version is 11g or higher, then the most convenient tool to deal with “stuck” queries is SQL monitor. It’s very simple in use:

declare
  report clob;
begin
  report := DBMS_SQLTUNE.REPORT_SQL_MONITOR();
  dbms_output.put_line(report);
end;
/

Continue reading “Troubleshooting stuck queries” →

Viewing VPD predicates with DBMS_XPLAN

Oracle Virtual Private Database (VPD), also known as Row Level Security (RLS), provides a very high level of flexibility in exposing data to users. It is also a very convenient tool for forcing hard parsing of a SQL statement either on every execution or depending on some criteria (e.g. see here). VPD works by appending invisible (e.g. query text in V$SQL doesn’t contain them) predicates to SQL statements. This invisibility is mostly a good thing, but in some cases it can also be a nuisance (e.g. when troubleshooting SQL). There are a couple of blogs describing how to view these predicates (here and here), both suggesting DBMS_XPLAN as one of the ways.

However, it doesn’t always work as expected, and sometimes it doesn’t work at all. Consider an example: Continue reading “Viewing VPD predicates with DBMS_XPLAN” →

Nullability, indexes and joins

A few days ago, I happened to participate in an interesting thread on an OTN forum. We weren’t able to answer all of the questions raised by the original poster, but this discussion stirred some thoughts that eventually lead me to writing this post.

One of the key things that make database a database and not a data dump, is integrity constraints. They allow to implement restrictions on data in a very efficient fashion, and independently of the current application interface. But constraints also play a very important role in providing additional information about data to the query optimizer. Failure to povide that information, or providing it incorrectly that may lead to serious performance issues.

Let’s consider an example.

Continue reading “Nullability, indexes and joins” →

LVOUG Harmony 2012 highlights

Last week, I attended Latvian Oracle User Group conference Harmony 2012, held in Riga, Latvia on June 1. The event featured several “stars” of the Oracle world, including Tom Kyte, Tanel Poder, Graham Wood, Daniel Morgan and others. Events like this are a great opportunity to catch up with recent trends in Oracle development and tuning, learn new tricks of the trade, hear the best tuning specialist tell how they tackled their most difficult performance cases, and more!

Continue reading “LVOUG Harmony 2012 highlights” →

Where AWR can’t help: identifying top-SQL in absence of cursor sharing

Many database performance problems can be resolved via a top-down approach: you look at database performance globally (using statspack or AWR report) and then you work your way down to specific SQL consuming most of resources. This approach breaks when the application is not using bind variables. The reason for that is that performance reports depend on sql_id to identify a statement, and in absence of cursor sharing a single SQL statement has as many sql_id’s as there are combinations of literal parameters. So if your heaviest query is something like


SELECT * FROM T1 WHERE X=:B1,

then you’ll see in in AWR top-SQL lists, but if it’s

SELECT * FROM T1 WHERE X="some literal value",

then at best you’ll see a bunch of similar statements in the lists, each responsible for a small fraction of the workload. There is also a good chance that your heaviest SQL won’t show up in those lists at all, if it’s overshadowed by frequently run statements with unchanging parameters (or using binds).

Continue reading “Where AWR can’t help: identifying top-SQL in absence of cursor sharing” →

Histograms for strongly skewed columns

On a recent OTN thread, I learned a nice trick by J. Lewis that allows to circumvent certain problems with histograms.

Histograms were designed to solve the problem of estimating cardinality for skewed columns (i.e. where some values occur much more frequently than the others). For columns with low number of distinct values (NDV) Oracle collects a frequency histogram, which can be thought of as a set of two one-dimensional arrays: one containing all possible values, the other containing their frequency (i.e. how many rows have this value). However, if sample size is small, then Oracle can miss rare values, and they won’t be reflected in the histogram. As a result, the cardinality estimates for those values will be wrong (depending on version Oracle will either set it to either 1 or to half of the frequency for the rarest value found). A detailed explanation of the issues with examples can be found in blog posts by J. Lewis and R. Geist.

Continue reading “Histograms for strongly skewed columns” →