In this blog, I mostly write about performance issues (and tools to resolve them). I’d like to make a post about performance non-issues for a change. I think it might be interesting for two reasons. One, if not identified early, non-issues can consume lots of resources in terms of developers’ efforts, may introduce instability to the code and even cause the very performance issues they were supposed to prevent. Two, such non-issues usually reveal actual issues, even though those actual issues are mostly related to non-technical aspects of work, such as communication. I’ll start with a rather specific category of non-issues: when performance doesn’t match expectations because expectations don’t match reality.
First, let’s define “reality”.
It’s not possible to calculate exact execution time for a query, but it’s possible to make an estimate of the lower bound
in terms of I/O and time.
OLTP-style (i.e. selective) queries cannot do better than several single-block reads per row accessed. Based on this, expected performance:
If you get lucky with cache, you can get one order of magnitude (or even two in extreme cases) better performance, but if you don’t, there’s not much you can do about it.
DW-style (i.e. non-selective) queries cannot do better than hardware transfer speeds, so for traditional (i.e. non-Exadata) servers you can expect:
DW-style queries often cannot benefit much from cache, but in many cases they can be parallelized with minimum overhead (so numbers given are database time, not wall clock time).
Of course the numbers above aren’t exact, so it’s better to obtain your own numbers for your own system: look at average values for db file sequential/scattered read events, look at system stats or benchmark a few simple queries (or all of the above).
Now let’s consider some typical situations when reality and expectations don’t agree with each other.
Missing the big picture
One of the reasons why people may be not satisfied with perfectly working code is that it could be a small detail of some bigger picture. For example, you’re asked to improve performance of a simple query which returns a few rows within a few milliseconds. This should be enough to raise suspicions: no human would care if his reports completes within 10, 20 or 30 milliseconds. So most likely, the query is executed a lot — maybe it’s driven from another query in a PL/SQL block (in which case there is a good chance that the entire loop can be rewritten as a single SQL statement with huge potential performance benefits), maybe it’s called repeatedly from Java code (then you could perhaps benefit from server-side result caching), but in any case you have much more flexibility if you know the actual big problem that your application needs to fix.
Another possibility is that the only reason the query was considered to be underperforming was that it took enough DB time to make it to the top-SQL list of some automated report. If that’s the case, then there’s a good chance that nothing needs fixing at all.
Of course, there could also be cases when there is a valid reason for caring about milliseconds (low-latency applications) — but unless you know that this is the case, asking questions when required to make faster things that are already fast may save a lot of time and effort.
Not knowing the data
There are also cases when SQL may appear to be poorly performing because of variations in data. For example, a user might be running a report for two different financial years, one with many more data than the other. Of course the more data the report has to process, the longer it would take — but the user may be unaware of that.
It’s important not to confuse such “data issues” with real design issues. They may look similar (the real cost of a query depends on which data the user wants to look at, i.e. on the parameter inputs), but they’re different in sense that the useful amount of data (i.e. the amount of data that will be returned to the user or used in calculations) will be the same, but what changes is the amount of data
traversed. For example, a few years ago I faced a situation when a “time-snapshot” kind of query on a timeseries table (partitioned on the time column) was showing linear performance degradation depending on how far in the past the date was. That eventually revealed a fundamental data design flaw in the application which was limiting its scalability.
Biting off more data than being able to chew
Apart from being aware of variations and skews in data, it is useful to make a mental note when a query returns an unusually large amount of data. Of course it doesn’t necessarily mean a problem, but it won’t hurt to find out about its purpose. E.g. it’s not that rare to see reports that return hundreds of thousands of rows. That’s about as many pages as Encyclopedia Britinnica. Does anyone read it? It could well be that it’s just dumped into some folder where nobody every touches it. Or maybe there is some processing (e.g. in Excel) applied to it before it’s read — if so, then it would be much more efficient to do this processing on the database side.
Making unfair comparisons
Sometimes things (reports, processes etc.) are considered broken because they run slower on another database. This means that something is different between the two databases. But the difference can be very subtle. E.g. many test databases show inferior I/O performance compared to production because of smaller (or less efficient) low-level cache (see details here). In such cases there’s nothing to fix on the SQL level, it simply needs that existing testing infrastructure is not very good for performance testing.
Confusing databases, data stores and data dumps
The most “chemically pure” case of “expectation management issues” is when the expectations are based on some fundamentally wrong ideas about what a database is and what it can or cannot do. For example, once I witness Oracle being criticized on the basis that such an expensive and advanced technology only provided reading speed of a few kilobytes per second which they measured by running an OLTP-style query from one of the views.
I think the point they raised is deeper and more interesting that it may appear at the first sight, because it goes deep into the very nature of relational databases. A relational database disassembles data about a single real-life entity and puts it in many different tables (“normalizes” it in terms of relational theory). Doing that helps with data duplication, update anomalies and other problems, but it does cost performance. For example, when we make an OLTP-style query, we need to visit several tables instead of just one, we need to do index probes to find data selectively, we need to read index root and branch blocks to navigate to the row of interest. Finally, our granularity of reading data is limited by the block size — so even if we only need one byte from it, we’ll read the entire block — typically 8192 bytes — and throw away most of it. Eventually, per each byte returned to the end user, thousands of bytes have to be read.
It’s not some kind of a random flaw that Oracle allowed to crawl into the design — its the fundamental feature of a relational database: reporting data from relational databases involves joins, and join cost resources. Because of this, when applied to performance Tom Kyte’s famous “it’s a database, not a data dump” takes on an entirely different meaning: alas, “data dumps” could be much faster than databases (at least as far as read performance is concerned)!
So it may eventually turn out that a relational database is not the proper design for this particular business need, and an alternative solution (a Hadoop cluster, a plain text file, or maybe a good old logbook and a pen — it all depends on the specific situation). Clearing up this misunderstanding at an early stage can save big costs.