ASH for OS processes

There is a lot of different tools for analyzing OS process states which can be helpful in resolving non-trivial performance issues. One of the limitations of such tools is that they are mostly active ones — i.e. you have to do some extra work to collect the desired diagnostic information. This is inconvenient when the problem you’re facing is intermittent and manifests itself on an irregular and unpredictable schedule.

However, there are some great passive monitoring tools which can provide you with information about issues that happened in recent past at no additional cost. This is especially true for RAC setups where you normally have ExaWatcher (or OSWatcher for non-Exadata RACs) constantly running in the background and gathering OS-level information with high frequency.

In particular, you have ps snapshots which looks like this (caveat — it looks that way on Linux, I did not check how it looks on other OS’s!):

############################################################
# Starting Time:     03/27/2019 16:19:23
# Sample Interval(s): 5
# Archive Count:     720
# Collection Module: PsExaWatcher
# Collection Command: /opt/oracle.ExaWatcher/FlexIntervalMode.sh '/bin/ps -eo flags,s,ruser,pid,ppid,c,psr,pri,ni,addr,rss,sz,wchan=WIDE-WCHAN-COLUMN,stime,tty,time,cmd'  5  720 196
############################################################
zzz  Count:720
[INFO] Flexible sampling model enabled. The sampling interval and count is subject to change due to system workload.
   zzz  subcount: 0
F S RUSER       PID   PPID  C PSR PRI  NI ADDR   RSS    SZ WIDE-WCHAN-COLUMN STIME TT           TIME CMD
4 S root          1      0  0  16  19   0    -  2020  4855 poll_schedule_tim  2018 ?        17:59:06 /sbin/init
1 S root          2      0  0   4  19   0    -     0     0 kthreadd           2018 ?        00:00:05 [kthreadd]
1 S root          3      2  0   0  19   0    -     0     0 smpboot_thread_fn  2018 ?        01:30:34 [ksoftirqd/0]
...

As you can see, there is some useful information in there, and first of all, WCHAN, which is the wait channel, i.e. the kernel function on which the process is currently sleeping. If your database processes get stuck because of some weird kernel locks, then the chances are, the database instrumentation is not going to tell you much, and this information in ps output can be of tremendous value (for example, it’s quite easy to see how it could have been used to diagnose the NUMA issue I described in my previous post).

In a way, WCHAN is like the wait event for the OS process, so ps output is pretty much like ASH, with samples of process states taken every 5 seconds! Of course, unlike ASH, this is just regular text which requires a bit of processing before it can be filtered and aggregated, but it’s not too difficult.

For example, I used this simple R script below to do the processing:

dir= "C:\\mydata"
fname = "2019_03_27_16_19_23_PsExaWatcher_xxxxxxxx.xxx.xxx.com.dat.bz2"
setwd(dir)
con=bzfile(fname)
lines <- readLines(con )
 
# identify lines that contain timestamps
tstamp.idx <- grep('^   zzz  .*$', lines)
# identify lines that contain data
data.idx <- grep('^(\\S*\\s){16}.*$', lines)
# match data entries to timestamp entries
i <- findInterval(data.idx, tstamp.idx)
# parse data entries
s <- str_match(lines[data.idx], pattern="^(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(.*)$")
# remove the first entry which is the header
s <- s[,-1]
# convert to a data frame
df <- as.data.frame(s, stringsAsFactors=F)
# get the timestamp entries and parse them using regex
t <- lines[tstamp.idx[i]]
tstamp <- gsub(pattern='^\\s*zzz  subcount: \\d*$', x=t, replacement='\\1')
# give meaningful names to the columns
colnames(df) <- c('F', 'S', 'RUSER', 'PID', 'PPID', 'C', 'PSR', 'PRI', 'NI', 'ADDR', 'RSS', 'SZ', 'WCHAN', 'STIME', 'TT', 'TIME', 'CMD')
# add the timestamp
df <- cbind(df[-1,],tstamp)
df <- df[-1,]
df % mutate(tstamp=mdy_hms(tstamp))
 
# filter  Oracle process and remove those sleeping on sk_wait_data
d % filter(grepl(CMD, pattern="racle") & !grepl(WCHAN, pattern='sk_wait_data')) %>% group_by(WCHAN, tstamp) %>% summarize(x=n())
 
# take care of NA issue that leads to ugly gaps in the area graph
d2 %
  tidyr::spread(key = WCHAN, value = x, fill = 0) %>%
  tidyr::gather(key = WCHAN, value = x, - tstamp) %>%
  arrange(tstamp, WCHAN)
 
# produce the plot and call ggplotly to enrich it with interactive features
p % ggplot(aes(x=tstamp,y=x,fill=WCHAN))
ggplotly(p+geom_area())

and then you can slice and dice the data any way you like, and even produce nice OEM-style graphs to visualize the wait channels for the subset of processes of interest:

oem_style_wchan_graph2

It is also possible to parse these files using external tables using techniques for trace file analysis I described in my blog earlier which would allow you to query the contents with plain SQL same way you would with any data table.

Of course one of the big reasons to do trace file analysis on the database side is being able to do the table join with other data. In this particular case it’s not going to be always useful, because it is difficult to link processes to sessions. E.g. ASH doesn’t give you server pid’s (which is a shame), but there are still things that you can do, at least in some cases. For example, for batch jobs or background processes, the OS process can be identified using the CMD column.

So I think this tool can be a useful addition to active OS-side tools used in database performance analysis (like psnapper recently release by Tanel Poder). I would imagine it being even more useful for less thoroughly instrumented database management systems (like mysql), but I myself don’t do much work in that area so I haven’t tried it yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s