Use of packet capture and other advanced tools in network issues troubleshooting

Introduction

In this post I’m sharing my recent experience troubleshooting a rather complex performance issue. The issue showed up during performance testing of a new platform (Exadata Cloud at Customer or ExaCC) where an application is migrating from the current one (on-premises Exadata).

Continue reading →

Business rules, common sense and query performance

Very often, significant performance benefits can be obtained by using some very basic knowledge of the application, its data and business rules. Sometimes even less than that: even if you are not familiar with the application logic at all, you can still use common sense to make some reasonable guesses that would get you a long way in improving query’s performance. Here is an example (based on an actual query that I had to tune today).

Continue reading “Business rules, common sense and query performance” →

Reducing fetch time

Database query tuning is mostly about getting better plans. Mostly, but not always. Sometimes, the problem has nothing to do with the plan, and you might need to get a bit creative to find a solution. In this recent case a query was showing a decent performance when running from SQL Developer, but it took about 5 times longer to complete when running from R. The plan was the same, so I knew that it was irrelevant. The R session wasn’t showing as active most of the time, so it was fairly clear that the problem was fetching data — i.e. it was fetching too few rows at a time which lead to a large number of roundtrips, and consequently, high waits on “idle” event “SQL*Net message from client”.

Continue reading “Reducing fetch time” →

Long-running INSERT

On one of the databases I’m looking after (11.2.0.4, Solaris, non-RAC), several different INSERT statements (all into tablespaces with manually managed segments) suffer from occasional hiccups. The symptoms are always the same: in one of the sessions, the INSERT gets stuck doing lots of single-block I/O against one of the indexes on the inserted table, and if other sessions are running similar INSERTs, they hang on enq: TX – index contention. The situation can last just a few seconds, but sometimes it’s much longer than this (several minutes), in which case the impact on the application is quite serious.

Continue reading “Long-running INSERT” →

Query tuning by waste minimization: a real-life example

Today I’d like to share another tuning example from a recent case at work, which in my opinion is good for illustrating typical steps involved in SQL optimization process.

I was handed a poorly performing query with a relatively verbose text, so I will only give the general structure here (it will also prevent me from accidentally disclosing some sensitive information from that application):

Continue reading “Query tuning by waste minimization: a real-life example” →

Nested loop internals

Nested loop join appears like the simplest thing there could be — you go through one table, and as you go, per each row found you probe the second table to see if you find any matching rows. But thanks to a number of optimizations introduced in recent Oracle releases, it has become much more complex than that. Randolf Geist has written a great series of posts about this join mechanism (part 1, part 2 and part 3) where he explores in a great detail how numerous nested loop optimization interact with various logical I/O optimizations for unique and non-unique indexes. Unfortunately, it doesn’t cover the physical I/O aspects, and that seems to me like the most interesting part — after all, that was the primary motivation behind introducing all those additional nested loop join mechanism on the top of the basic classical nested loop. So I conducted a study on my own, and I’m presenting my results in the mini-series that I’m opening with this post.

Continue reading “Nested loop internals” →

Lies, damned lies and non production-like performance testing

Chasing cost efficiency, business often cuts back on money spent on UAT boxes used for performance testing. More often than not, this is a bad-decision, because the only thing worse than not having a UAT environment is having a UAT environment that is nothing like production. It gives a false sense of security while exposing your application to all sorts of nasty surprises. In this post I tried to summarize a few typical configuration differences between UAT and production which can affect performance test results in a major way.
Continue reading “Lies, damned lies and non production-like performance testing” →

Querying trace files

SQL trace file provide the highest level of detail possible about SQL execution. The problem with that information is converting it to a convenient format for further analysis. One very good solution is parsetrc tool by Kyle Hailey written in Perl. It gives high-resolution histograms, I/O transfer rates as a function of time, and other very useful info. Unfortunately, I myself am not a Perl expert, so it’s a bit difficult for me to customize this tool when I need something slightly different from defaults (e.g. change histogram resolution, look at events not hardcoded into the script etc.). Another limitation is that since the tool is external to the database, you can’t join the data anything else (like ASH queries). So I found another solution for raw trace file analysis: external tables + regexp queries.

Continue reading “Querying trace files” →

Peeling through layers of cache

Nowadays, data in databases is wrapped in may layers of cache: result cache, buffer cache, OS page cache, storage hardware cache… They greatly improve performance, but they also make it less stable and harder to predict. And when I/O performance takes a turn for worse, one has to go through multiple layers of cache trying to understand what went wrong and why. I had such a case not too long ago.

The incident took place on an Oracle 11.2.0.3 database running on a Solaris 10 server. The first symptom was decreased throughput of one of batch processes in the database by about 30 percent. Since the process was spending more than half of its time doing I/O (as ASH indicated), it was easy to establish that the slowdown was linked to increased time of db file sequential reads by plotting average wait times from DBA_HIST_SYSTEM_EVENT (the SQL code for this and other queries in this blog post can be found below, in the Appendix):

Continue reading “Peeling through layers of cache” →

Accessing data dictionary views from PL/SQL

As most of database developers who have to work with database performance analysis and tuning, I have my own collection of favorite sqlplus scripts for various occasions. Over years, I have turned many of them into PL/SQL code which allows greater flexibility. Imagine, for examples, trying to call a sqlplus script from another sqlplus script with both in and out parameters, or simulating autonomous transactions in sqlplus — obviously, such things are much easier in PL/SQL. Of course, PL/SQL has certain drawbacks of its own, one of them being the rather peculiar way Oracle handles user security. More specifically, you cannot use privileges received via roles inside PL/SQL code. This can be really annoying, especially for performance benchmarking code, which heavily relies on access to data dictionary views, granted via SELECT_CATALOG_ROLE or similar, e.g.: Continue reading “Accessing data dictionary views from PL/SQL” →