Use of packet capture and other advanced tools in network issues troubleshooting

Introduction 

In this post I’m sharing my recent experience troubleshooting a rather complex performance issue. The issue showed up during performance testing of a new platform (Exadata Cloud at Customer or ExaCC) where an application is migrating from the current one (on-premises Exadata).

Continue reading “Use of packet capture and other advanced tools in network issues troubleshooting”

Business rules, common sense and query performance

Very often, significant performance benefits can be obtained by using some very basic knowledge of the application, its data and business rules. Sometimes even less than that: even if you are not familiar with the application logic at all, you can still use common sense to make some reasonable guesses that would get you a long way in improving query’s performance. Here is an example (based on an actual query that I had to tune today).

Continue reading “Business rules, common sense and query performance”

Analyzing segment content by block type

Occasionally one might want to know what a segment is made of in terms of block types. For example, you notice that the number of blocks in an index segment is somewhat larger than the number of branch and leaf blocks, and wonder what kind of blocks accounts for the difference. The only way to do this is by dumping index blocks (e.g. as described in Richard Foote’s blog here). Dumping blocks is easy, but analyzing them — not so much. Sure, there exists a plethora of tools that allow to parse text from the OS side (awk, perl, sed and whatnot), but this leads to usual problems: OS access, scripting skills, certain platforms may not have the scripting tool you’re most comfortable with, and even more importantly: scripts cannot do cool stuff that Oracle can (like joining data to other data) . Fortunately, those difficulties can be circumvented by using regexp + external files as I already posted in my blog here. This time, I’d like to show how this technique can be adjusted for index block dumps.

Continue reading “Analyzing segment content by block type”

CPU-starved LGWR

In my recent post I showed how log file sync (LFS) and log file parallel write (LFPW) look for normal systems. I think it would also be interesting to compare that to the situation when LGWR does not have enough CPU.

I happen to have collected LGWR and database-level trace files for a 11.2.0.3 database on a Solaris 10 server which was under serious pressure (50 threads mostly inserting and committing data, only 32 CPUs). The AWR showed significant OS_CPU_WAIT_TIME (comparable to BUSY_TIME and much larger than IDLE_TIME) so I know for sure that CPU was an issue. And here is what LFS and LFPW histograms plotted from the trace file (as described here) looked like:

 

CPU_starved_LGWR_bilog Continue reading “CPU-starved LGWR”

Querying trace files

SQL trace file provide the highest level of detail possible about SQL execution. The problem with that information is converting it to a convenient format for further analysis. One very good solution is parsetrc tool by Kyle Hailey written in Perl. It gives high-resolution histograms, I/O transfer rates as a function of time, and other very useful info. Unfortunately, I myself am not a Perl expert, so it’s a bit difficult for me to customize this tool when I need something slightly different from defaults (e.g. change histogram resolution, look at events not hardcoded into the script etc.). Another limitation is that since the tool is external to the database, you can’t join the data anything else (like ASH queries). So I found another solution for raw trace file analysis: external tables + regexp queries.

Continue reading “Querying trace files”

Global Hints

Occasionally I encounter a situation when I need to affect a part of the plan that corresponds to a view, e.g.:

select *
from
(
   select v.x, x.y
   from v
) q
where q.x = 1

Such situations are resolved using global hints. Oracle offers two ways to specify a global hint: via a query block identifier (system generated or user defined) or via view aliases. System-generated query block identifiers can be obtained via dbms_xplan.display with ALL or ALIAS option (they have the form SEL$n, where n appears to be same as the depth, e.g. in our case 1 corresponds to the main query, 2 to the inline view, 3 to the view V inside that inline view) or defined by the user via qb_name hint.

Continue reading “Global Hints”

Tuning Analytic Functions

In general, tuning analytic functions (and more generally, all sort operations) is rather difficult. While for most poorly performing queries it’s relatively straightforward to gain some improvements  by applying “eliminate early” principle one way or another, for slow sort operations it’s rarely applicable. Usually options are limiting to rewriting a query without analytics (e.g. using self-joins or correlated subqueries to achieve the same goal) or manually resizing the workarea to reduce/eliminate the use of disk. Recently, however, I had a case where I managed to obtain an excellent performance gain using a different technique that I would like to share in this post.

The original query was selecting about 100 columns using the LAG function on one of the columns in the WHERE clause, but in my test case I’ll both simplify and generalize the situation. Let’s create a table with a sequential id, three filtering columns x, y and z, and 20 sufficiently lengthy columns.

Continue reading “Tuning Analytic Functions”

Efficiency-based SQL tuning

Introduction

In order to tune a query, you need to know two things:
– can it be tuned, and if yes, then by how much
– which part of the query (which operation, or which data object) is most promising from the tuning point of view.

Currently existing tuning methods don’t really answer these questions. You either focus on the most expensive operation(s), and hope that you can eliminate them (or transform them into something less cosly), or you focus on the ones where you see a large discrepancy between actual rowcounts and optimizer predictions (cardinality feedback tuning). Either way, you can’t be sure that you’ve set your priorities right. It could well be the case that the cost of the most expensive operation cannot be reduced by much, but you can win back enough performance elsewhere. With the cardinality feedback tuning, you also don’t have any guarantee that improving accuracy of optimizer estimates would eventually transform into acceptable level of performance.

Of course, if the plan only contains a few operations, this is not a big issue, and after a few trials you will usually get to the bottom of the problem. However, when dealing with very complex plans, hundreds operations long, this is not really an option. When dealing with such plans a few months back, I developed for myself a simple tuning method that allows to evaluate with high accuracy potential tuning benefits of plan operations, using rowsource stats and optimizer as input. In this post, I’m sharing this method, as well as a script that implements it.

Continue reading “Efficiency-based SQL tuning”

CPU starvation disguised as an I/O issue (yet another AWR case study)

In AWR analysis, what appears to be the root cause of the issue, can easily turn out to be just a symptom. Last week, Rajat sent me an AWR report which is a perfect illustration of this (thanks Rajat), I posted the key sections from this report below (sorry for less than perfect formatting — I had to manually re-format the HTML version of the report into text).


WORKLOAD REPOSITORY report for
DB Name      DB Id           Instance       Inst num        Release          RAC           Host
DSS          37220993      dss              1               10.2.0.4.0       NO            dssdbnz

                  Snap Id      Snap Time             Sessions      Cursors/Session
Begin Snap:       18471      12-Oct-12 08:30:28      131              1.5
End Snap:         18477      12-Oct-12 14:30:24      108              1.8
Elapsed:          359.93 (mins)
DB Time:          25,730.14 (mins)

Load Profile
                              Per Second      Per Transaction
Redo size:                    325,282.85      103,923.02
Logical reads:                33,390.52       10,667.77
Block changes:                1,307.95        417.87
Physical reads:               1,927.33        615.75
Physical writes:              244.65          78.16
User calls:                   391.34          125.03
Parses:                       68.14           21.77
Hard parses:                  3.33            1.06
Sorts:                        47.86           15.29
Logons:                       3.15            1.01
Executes:                     234.32          74.86
Transactions:                 3.13
% Blocks changed per Read:       3.92       Recursive Call %:      61.11
Rollback per transaction %:      24.71      Rows per Sort:         3325.52

Top 5 Timed Events
Event                               Waits          Time(s)      Avg Wait(ms)      % Total Call Time      Wait Class
free buffer waits              10,726,838      344,377     32      22.3      Configuration
db file sequential read        6,122,262      335,366      55      21.7      User I/O
db file scattered read         3,597,607      305,576      85      19.8      User I/O
CPU time                                      161,491              10.5
read by other session          2,572,875      156,821     61       10.2      User I/O

Operating System Statistics
Statistic                                 Total
AVG_BUSY_TIME                             2,093,109
AVG_IDLE_TIME                             63,212
AVG_IOWAIT_TIME                           18,463
AVG_SYS_TIME                              87,749
AVG_USER_TIME                             2,004,722
BUSY_TIME                                 16,749,988
IDLE_TIME                                 510,692
IOWAIT_TIME                               152,594
SYS_TIME                                  707,137
USER_TIME                                 16,042,851
LOAD                                      4
OS_CPU_WAIT_TIME                          ###############
RSRC_MGR_CPU_WAIT_TIME                    0
VM_IN_BYTES                               5,503,492,096
VM_OUT_BYTES                              2,054,414,336
PHYSICAL_MEMORY_BYTES                     34,288,209,920
NUM_CPUS                                  8
NUM_CPU_SOCKETS                           8

Continue reading “CPU starvation disguised as an I/O issue (yet another AWR case study)”