Positive feedback loops

Ever since I heard Tanel Poder talk about application servers causing positive feedback loops (performance gets worse -> open more sessions -> make performance even worse) in Riga 2012 LVOUG conference, I was curios to see one myself. And recently I got a chance to see a very interesting variety of such behavior.

positive_feedback_loop_example

On the plot above you can see database CPU usage (red line) versus time, plotted together with the number of sessions running same SQL statement (identically the same, i.e. not even different binds — there are no binds in this statement). There is a clear correlation between the two.   Apparently, when database performance degrades to a certain point, the application server is not getting the answer fast enough, and launches the query again in another session, thus degrading performance even more, etc.

This behavior  is very dangerous and may easily cause prolonged outages (see the plateau towards the right side of the plot — the database server was maxed out in terms of CPU for over 24 hours!).

Moral of the story: don’t spawn more sessions in response to poor performance, unless you want poor performance to quickly become terrible!

Tuning very complex SQL

Troubleshooting relatively short plans is simple, once one gets a little practice, but the mere sight of a plan that has several tens or even hundred operations can be very initimidating. In this post I am describing a few tricks that make this task easier.

First of all, use V$SQL_PLAN/V$SQL_PLAN_STATISTICS instead of (or in addition to) the formatted plan produced by DBMS_XPLAN

For example, in DBMS_XPLAN output for a long plan it is hard to figure out relationships between operations, but it is very straightforward to figure them out by querying V$SQL_PLAN, e.g.:


select *
from v$sql_plan p
where sql_id = :sql_id
and id = :op_id
and P.PARENT_ID = :operation_id;

If the statement is a join of several views, and the optimizer doesn’t merge the views, then it is convenient to check which views are contributing to the performance problem the most, e.g.:

select  object_name, s.last_cr_buffer_gets
from v$sql_plan p,
     v$sql_plan_statistics s
where p.sql_id = s.sql_id
and p.child_number = s.child_number
and P.ID = s.operation_id
and p.sql_id = :sql_id
and p.child_number = :child_no
and p.operation = 'VIEW'
order by S.LAST_CR_BUFFER_GETS desc;

Continue reading “Tuning very complex SQL”

Efficiency-based SQL tuning

Introduction

In order to tune a query, you need to know two things:
– can it be tuned, and if yes, then by how much
– which part of the query (which operation, or which data object) is most promising from the tuning point of view.

Currently existing tuning methods don’t really answer these questions. You either focus on the most expensive operation(s), and hope that you can eliminate them (or transform them into something less cosly), or you focus on the ones where you see a large discrepancy between actual rowcounts and optimizer predictions (cardinality feedback tuning). Either way, you can’t be sure that you’ve set your priorities right. It could well be the case that the cost of the most expensive operation cannot be reduced by much, but you can win back enough performance elsewhere. With the cardinality feedback tuning, you also don’t have any guarantee that improving accuracy of optimizer estimates would eventually transform into acceptable level of performance.

Of course, if the plan only contains a few operations, this is not a big issue, and after a few trials you will usually get to the bottom of the problem. However, when dealing with very complex plans, hundreds operations long, this is not really an option. When dealing with such plans a few months back, I developed for myself a simple tuning method that allows to evaluate with high accuracy potential tuning benefits of plan operations, using rowsource stats and optimizer as input. In this post, I’m sharing this method, as well as a script that implements it.

Continue reading “Efficiency-based SQL tuning”

Accessing data dictionary views from PL/SQL

As most of database developers who have to work with database performance analysis and tuning, I have my own collection of favorite sqlplus scripts for various occasions. Over years, I have turned many of them into PL/SQL code which allows greater flexibility. Imagine, for examples, trying to call a sqlplus script from another sqlplus script with both in and out parameters, or simulating autonomous transactions in sqlplus — obviously, such things are much easier in PL/SQL.  Of course, PL/SQL has certain drawbacks of its own, one of them being the rather peculiar way Oracle handles user security. More specifically, you cannot use privileges received via roles  inside PL/SQL code. This can be really annoying, especially for performance benchmarking code, which heavily relies on access to data dictionary views, granted via SELECT_CATALOG_ROLE or similar, e.g.: Continue reading “Accessing data dictionary views from PL/SQL”

Troubleshooting stuck queries

Introduction

In my earlier post, I described a method of troubleshooting slow queries using dbms_xplan. While this method is all you need in most cases, it does have one serious problem: it requires the problem query be completed before any diagnostics are taken. What if the query is so slow that it cannot finish within reasonable time frame? I’ll present several alternatives in this post.

SQL Monitor

If your Oracle version is 11g or higher, then the most convenient tool to deal with “stuck” queries is SQL monitor. It’s very simple in use:

declare
  report clob;
begin
  report := DBMS_SQLTUNE.REPORT_SQL_MONITOR();
  dbms_output.put_line(report);
end;
/

Continue reading “Troubleshooting stuck queries”

CPU starvation disguised as an I/O issue (yet another AWR case study)

In AWR analysis, what appears to be the root cause of the issue, can easily turn out to be just a symptom. Last week, Rajat sent me an AWR report which is a perfect illustration of this (thanks Rajat), I posted the key sections from this report below (sorry for less than perfect formatting — I had to manually re-format the HTML version of the report into text).


WORKLOAD REPOSITORY report for
DB Name      DB Id           Instance       Inst num        Release          RAC           Host
DSS          37220993      dss              1               10.2.0.4.0       NO            dssdbnz

                  Snap Id      Snap Time             Sessions      Cursors/Session
Begin Snap:       18471      12-Oct-12 08:30:28      131              1.5
End Snap:         18477      12-Oct-12 14:30:24      108              1.8
Elapsed:          359.93 (mins)
DB Time:          25,730.14 (mins)

Load Profile
                              Per Second      Per Transaction
Redo size:                    325,282.85      103,923.02
Logical reads:                33,390.52       10,667.77
Block changes:                1,307.95        417.87
Physical reads:               1,927.33        615.75
Physical writes:              244.65          78.16
User calls:                   391.34          125.03
Parses:                       68.14           21.77
Hard parses:                  3.33            1.06
Sorts:                        47.86           15.29
Logons:                       3.15            1.01
Executes:                     234.32          74.86
Transactions:                 3.13
% Blocks changed per Read:       3.92       Recursive Call %:      61.11
Rollback per transaction %:      24.71      Rows per Sort:         3325.52

Top 5 Timed Events
Event                               Waits          Time(s)      Avg Wait(ms)      % Total Call Time      Wait Class
free buffer waits              10,726,838      344,377     32      22.3      Configuration
db file sequential read        6,122,262      335,366      55      21.7      User I/O
db file scattered read         3,597,607      305,576      85      19.8      User I/O
CPU time                                      161,491              10.5
read by other session          2,572,875      156,821     61       10.2      User I/O

Operating System Statistics
Statistic                                 Total
AVG_BUSY_TIME                             2,093,109
AVG_IDLE_TIME                             63,212
AVG_IOWAIT_TIME                           18,463
AVG_SYS_TIME                              87,749
AVG_USER_TIME                             2,004,722
BUSY_TIME                                 16,749,988
IDLE_TIME                                 510,692
IOWAIT_TIME                               152,594
SYS_TIME                                  707,137
USER_TIME                                 16,042,851
LOAD                                      4
OS_CPU_WAIT_TIME                          ###############
RSRC_MGR_CPU_WAIT_TIME                    0
VM_IN_BYTES                               5,503,492,096
VM_OUT_BYTES                              2,054,414,336
PHYSICAL_MEMORY_BYTES                     34,288,209,920
NUM_CPUS                                  8
NUM_CPU_SOCKETS                           8

Continue reading “CPU starvation disguised as an I/O issue (yet another AWR case study)”

Viewing VPD predicates with DBMS_XPLAN

Oracle Virtual Private Database (VPD), also known as Row Level Security (RLS), provides a very high level of flexibility in exposing data to users. It is also a very convenient tool for forcing hard parsing of a SQL statement either on every execution or depending on some criteria (e.g. see here). VPD works by appending invisible (e.g. query text in V$SQL doesn’t contain them) predicates to SQL statements. This invisibility is mostly a good thing, but in some cases it can also be a nuisance (e.g. when troubleshooting SQL).  There are a couple of blogs describing how to view these predicates (here and here), both suggesting DBMS_XPLAN as one of the ways.

However, it doesn’t always work as expected, and sometimes it doesn’t work at all. Consider an example: Continue reading “Viewing VPD predicates with DBMS_XPLAN”

A wrapper sqlplus script for dbms_xplan.display_cursor

When producing SQL execution plans with rowsource statistics, one has a choice whether or not specify the sql_id. It’s simpler to run dbms_stats.display_cursor without specifying sql_id, but it can be tricky, because then the plan for the last statement executed by this session will be displayed, and it can be not the statement you expected, especially when running the statement from a GUI like Toad. But even with sqlplus you can still get unexpected results, e.g. a plan for a hidden call to a dbms_output routine instead of the last statement you ran.

Below I describe a method to bypass this difficulty.

Method

Copy your SQL into a file (let’s call it slow_query.sql). Then in the same directory open a sqlplus session, and in it run the wrapping script as shown below:


set termout off
set serverout off
set linesize 200
set pagesize 9999
alter session set statistics_level = all;
@slow_query.sql;
spool slow_query_plan.lst
select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));
spool off

You can save it to a script and used it repeatedly with different queries (all that needs to be changed is the name of the script with the query on line 6).

The script produces a text file with nicely formatted plan with rowsource statistics, ready for analysis or posting. Note that since the output is suppressed

(for the reasons explained in the next section), in case of a SQL error you won’t see one — if that happens, just temporarily re-enable the output (set termout on,

set serverout on) to find out the cause of the error.

Explanation

If you’re not interested in how it’s working, then you can stop reading — just copy-paste the script below, and you’re all set.

For those who are more curios, I’ll describe the purpose of individual instructions in the script:

set termout off — suppress output of query rows on the screen. This is necessary, if the query produces many rows of output: displaying them would slow down the script considerably

set serverout off — I’m not entirely sure how, but this line takes care of wrapped sqlplus calls to dbms_output.disable or dbms_output.get_lines. If you don’t add this line, then you may end up having the plan for one of such call instead of your statement

set linesize 200 — this makes sure that lines aren’t wrapped

set pagesize 9999 — prevents headers of plan columns from appearing on each screen

alter session set statistics_level = all — this is the key to obtaining rowsource statistics (i.e. how many rows were produced at each step and how long the step took), without it the execution plan is almost useless for tuning

The rest of the script should be fairly self-explanatory.

A map to AWR report

 

Introduction

An average 11g AWR report spans 40 screens broken into approximately 50 sections. That’s a lot, especially for someone who’s not very well familiar with AWR reports, so I decided to make a some sort of a map.  The purpose is to show that this report has a certain structure (which may not be obvious at first sight), and knowing this structure can help extract the most essential information in the fastest way possible.

Types of sections

For simplicity, I break AWR report sections into following categories:

1) basic (key information)

2) detalization (provides details on a specific topic briefly covered in the basic section, such as latches, enqueues etc.)

3) advisories (helps find optimal values of parameters)

4) advanced (stuff that is not generally needed, but can be useful on certain occasions — basically, everything not covered in 1-3).

Continue reading “A map to AWR report”

AWR report: load profile

Introduction

“Load profile” section of the AWR report contains some extremely useful information, and yet it is very often overlooked (often in favor of instance efficiency percentages, which is easier to read but much more likely to mislead).  I decided to make some sort of a short guide for it, describing how different statistics in it can be used to better understand performance of a database.

Continue reading “AWR report: load profile”