In this blog, I mostly write about performance issues (and tools to resolve them). I’d like to make a post about performance non-issues for a change. I think it might be interesting for two reasons. One, if not identified early, non-issues can consume lots of resources in terms of developers’ efforts, may introduce instability to the code and even cause the very performance issues they were supposed to prevent. Two, such non-issues usually reveal actual issues, even though those actual issues are mostly related to non-technical aspects of work, such as communication. I’ll start with a rather specific category of non-issues: when performance doesn’t match expectations because expectations don’t match reality.
Category: Uncategorized
Is CPU usage 100% really okay?
Even the great ones do misspeak from time to time. And when they do, this can lead to a terrible mess in people’s heads.
Here is one example:
Are you at 100% utilization? If not, you haven’t accomplished your job yet. You cannot
put CPU in the bank and save it for later. So, if you are running with idle cycles you
should be looking for more ways to use it.
This statement is, at best, very confusing (and you can see some examples of this confusion in this recent OTN thread).
Let’s apply it to a specific example:
“Snapshot too old” as a performance issue in disguise
There exists a fair amount of confusion around one of the most famous Oracle errors — ORA-01555 (“snapshot too old”). The difficulty is probably coming from the fact that ORA-01555 is related to internal workings of undo generation and usage in an Oracle database, which a rather complex. Because of that, when this error on their system, inexperienced DBAs and developers often either start looking at UNDO-related parameters, rollback segment sizes, etc., or start looking for any suspicious activity, which they later link to excessive undo “consumption” or “generation”.
Continue reading ““Snapshot too old” as a performance issue in disguise”
Positive feedback loops
Ever since I heard Tanel Poder talk about application servers causing positive feedback loops (performance gets worse -> open more sessions -> make performance even worse) in Riga 2012 LVOUG conference, I was curios to see one myself. And recently I got a chance to see a very interesting variety of such behavior.
On the plot above you can see database CPU usage (red line) versus time, plotted together with the number of sessions running same SQL statement (identically the same, i.e. not even different binds — there are no binds in this statement). There is a clear correlation between the two. Apparently, when database performance degrades to a certain point, the application server is not getting the answer fast enough, and launches the query again in another session, thus degrading performance even more, etc.
This behavior is very dangerous and may easily cause prolonged outages (see the plateau towards the right side of the plot — the database server was maxed out in terms of CPU for over 24 hours!).
Moral of the story: don’t spawn more sessions in response to poor performance, unless you want poor performance to quickly become terrible!
Troubleshooting stuck queries
Introduction
In my earlier post, I described a method of troubleshooting slow queries using dbms_xplan. While this method is all you need in most cases, it does have one serious problem: it requires the problem query be completed before any diagnostics are taken. What if the query is so slow that it cannot finish within reasonable time frame? I’ll present several alternatives in this post.
SQL Monitor
If your Oracle version is 11g or higher, then the most convenient tool to deal with “stuck” queries is SQL monitor. It’s very simple in use:
declare report clob; begin report := DBMS_SQLTUNE.REPORT_SQL_MONITOR(); dbms_output.put_line(report); end; /
Viewing VPD predicates with DBMS_XPLAN
Oracle Virtual Private Database (VPD), also known as Row Level Security (RLS), provides a very high level of flexibility in exposing data to users. It is also a very convenient tool for forcing hard parsing of a SQL statement either on every execution or depending on some criteria (e.g. see here). VPD works by appending invisible (e.g. query text in V$SQL doesn’t contain them) predicates to SQL statements. This invisibility is mostly a good thing, but in some cases it can also be a nuisance (e.g. when troubleshooting SQL). There are a couple of blogs describing how to view these predicates (here and here), both suggesting DBMS_XPLAN as one of the ways.
However, it doesn’t always work as expected, and sometimes it doesn’t work at all. Consider an example: Continue reading “Viewing VPD predicates with DBMS_XPLAN”
Nullability, indexes and joins
A few days ago, I happened to participate in an interesting thread on an OTN forum. We weren’t able to answer all of the questions raised by the original poster, but this discussion stirred some thoughts that eventually lead me to writing this post.
One of the key things that make database a database and not a data dump, is integrity constraints. They allow to implement restrictions on data in a very efficient fashion, and independently of the current application interface. But constraints also play a very important role in providing additional information about data to the query optimizer. Failure to povide that information, or providing it incorrectly that may lead to serious performance issues.
Let’s consider an example.
LVOUG Harmony 2012 highlights
Last week, I attended Latvian Oracle User Group conference Harmony 2012, held in Riga, Latvia on June 1. The event featured several “stars” of the Oracle world, including Tom Kyte, Tanel Poder, Graham Wood, Daniel Morgan and others. Events like this are a great opportunity to catch up with recent trends in Oracle development and tuning, learn new tricks of the trade, hear the best tuning specialist tell how they tackled their most difficult performance cases, and more!
Where AWR can’t help: identifying top-SQL in absence of cursor sharing
Many database performance problems can be resolved via a top-down approach: you look at database performance globally (using statspack or AWR report) and then you work your way down to specific SQL consuming most of resources. This approach breaks when the application is not using bind variables. The reason for that is that performance reports depend on sql_id to identify a statement, and in absence of cursor sharing a single SQL statement has as many sql_id’s as there are combinations of literal parameters. So if your heaviest query is something like
SELECT * FROM T1 WHERE X=:B1,
then you’ll see in in AWR top-SQL lists, but if it’s
SELECT * FROM T1 WHERE X="some literal value",
then at best you’ll see a bunch of similar statements in the lists, each responsible for a small fraction of the workload. There is also a good chance that your heaviest SQL won’t show up in those lists at all, if it’s overshadowed by frequently run statements with unchanging parameters (or using binds).
Continue reading “Where AWR can’t help: identifying top-SQL in absence of cursor sharing”
Histograms for strongly skewed columns
On a recent OTN thread, I learned a nice trick by J. Lewis that allows to circumvent certain problems with histograms.
Histograms were designed to solve the problem of estimating cardinality for skewed columns (i.e. where some values occur much more frequently than the others). For columns with low number of distinct values (NDV) Oracle collects a frequency histogram, which can be thought of as a set of two one-dimensional arrays: one containing all possible values, the other containing their frequency (i.e. how many rows have this value). However, if sample size is small, then Oracle can miss rare values, and they won’t be reflected in the histogram. As a result, the cardinality estimates for those values will be wrong (depending on version Oracle will either set it to either 1 or to half of the frequency for the rarest value found). A detailed explanation of the issues with examples can be found in blog posts by J. Lewis and R. Geist.
