Querying trace files

SQL trace file provide the highest level of detail possible about SQL execution. The problem with that information is converting it to a convenient format for further analysis. One very good solution is parsetrc tool by Kyle Hailey written in Perl. It gives high-resolution histograms, I/O transfer rates as a function of time, and other very useful info. Unfortunately, I myself am not a Perl expert, so it’s a bit difficult for me to customize this tool when I need something slightly different from defaults (e.g. change histogram resolution, look at events not hardcoded into the script etc.). Another limitation is that since the tool is external to the database, you can’t join the data anything else (like ASH queries). So I found another solution for raw trace file analysis: external tables + regexp queries.

Using external tables for accessing traces files is described in many blogs (e.g. here) so I don’t need to spend much time on it. I let most of the options default, but I needed to change RECORDS LIMITED BY because otherwise it didn’t work correctly on my Windows 7 machine, so the DDL was something like:

CREATE DIRECTORY TRACEDIR AS 'C:\path_to_trace_dir\';

CREATE TABLE RAWTRACEFILE (	TEXT VARCHAR2(4000)) 
  ORGANIZATION EXTERNAL ( 
    DEFAULT DIRECTORY "TRACEDIR" 
    ACCESS PARAMETERS ( 
          RECORDS DELIMITED BY '\n' characterset 'utf8'
    )
    LOCATION ( 'tracefilename.trc')
  );

Now that the trace file is accessible from the database, I define a simple view on the top of it:

create or replace view V$MYTRACE
AS
with patterns as
(
select 'WAIT #\d*: nam=''(.*)'' ela= (\S*) ([^=]+)=(\S*) ([^=]+)=(\S*) ([^=]+)=(\S*) obj#=(\S*) tim=(\S*)(.*)' waits,
'PARSING IN CURSOR #(\d*)(.*) sqlid=''(.*)''' cursors
from dual
),
cursors as
(
select regexp_replace(text, patterns.cursors, '\1') cursorno,
regexp_replace(text, patterns.cursors, '\3') sqlid
from rawtracefile,
patterns
where text like 'PARSING IN CURSOR%'
),
waits as (
select regexp_replace(text, 'WAIT #(\d*): (.*)', '\1') cursorno,
regexp_replace(text, patterns.waits, '\1') event,
to_number(regexp_replace(text, patterns.waits, '\2')) elapsed_us,
regexp_replace(text, patterns.waits, '\3') p1text,
regexp_replace(text, patterns.waits, '\4') p1,
regexp_replace(text, patterns.waits, '\5') p2text,
regexp_replace(text, patterns.waits, '\6') p2,
regexp_replace(text, patterns.waits, '\7') p3text,
regexp_replace(text, patterns.waits, '\8') p3,
regexp_replace(text, patterns.waits, '\9') obj#,
regexp_replace(text, '(.*)tim=(\S*)(.*)', '\2') tim
from rawtracefile t,
patterns
where t.text like 'WAIT%'
)
select c.sqlid, 
       w."CURSORNO",
       w."EVENT",
       w."ELAPSED_US",
       w."P1TEXT",
       w."P1",
       w."P2TEXT",
       w."P2",
       w."P3TEXT",
       w."P3",
       w."OBJ#",
       w."TIM"
from waits w,
cursors c
where w.cursorno = c.cursorno (+);

That’s it — now I can query data from a raw trace file same way I would query ASH views! For example, if I want a high-resolution histogram on wait times, all I need to do is this:

SELECT  100*WIDTH_BUCKET(ELAPSED_US, 0, 10000, 100) ELAPSED_TIME, 
        COUNT(*) HEIGHT
FROM V$MYTRACE 
WHERE EVENT = 'log file sync'
GROUP BY WIDTH_BUCKET(ELAPSED_US, 0, 10000, 100)
ORDER BY 1;

It’s just as easy to write a query to do anything else you can do with parsetrc (e.g. I/O transfer rate vs. time), and much more! For instance, you can look at correlation between event time and an event parameter (e.g. number of blocks written by LGWR in ‘log file parallel write’), or match background wait events to foreground waits, etc.

A few recommendations:

1) When analyzed trace files are big, it could be beneficial for performance to materialize the view (i.e. create a table using CTAS) rather than parsing the entire file with every query
2) You can switch from one trace file to another using ALTER TABLE <tablename> LOCATION(‘new_file_name.trc’)
3) When analyzing multiple trace files, you can either merge them (e.g. using trcsess service=SYS$USERS) or you can list multiple file names in the LOCATION clause
4) When using the tool to produce high-resolution histograms, pay extra attention to bucket size: smaller bucket sizes give better level of detail, but less statistical accuracy (if the data set is not big enough); thus you may want to use variable bucket sizes — in this case, you’ll need to divide the histogram height (number of counts in the bucket) by the bin width.

Happy tracing!

Upd. Fixed a minor bug in regexp as pointed out by Matthias Rogel. Thanks Matthias!

9 thoughts on “Querying trace files”

Pingback: How not to troubleshoot log file sync waits | Oracle Diagnostician

Hi Nikolay,

great idea !

...select regexp_replace(text, 'WAIT #(\d)*: (.*)', '\1') cursorno,...

doesn’t work for me – is it because I am on linux on not on windows or because it actually should be

...select regexp_replace(text, 'WAIT #(\d.*): (.*)', '\1') cursorno,...

?
However, the latter also doesn’t work for me ( uses CPU forever and does seem to want to return something -> ?) I am on 11.2.0.2

so I replaced this regexp by substr and use

...substr(text, instr(text, '#') + 1, instr(text, ':') - instr(text, '#') - 1) cursorno...

instead.

Regards
Matthias

savvinov says:

September 10, 2014 at 1:47 pm

Hi Matthias,

my regexp worked fine for me on both Windows and Solaris. I haven’t tested them on Linux, but I don’t expect any radical differences with Solaris. I’ve sent you my email on twitter — please send me exact error message and I’ll try to sort it out.

Also, as I suggested in the blog, if facing poor performance, materialize the external table before processing it (and if you then experience performance issues parsing it, materialize once more).

Reply