Last year, I spent some time researching redo log related performance problems, which resulted in a mini-series, including one post devoted specifically to one previously unknown scenario of excessive log file sync waits. I am happy to announce that a service request opened on the back of this research resulted in the MOS note on “log file parallel write” wait event (Doc ID 34583.1) having been updated with a general description of this scenario and factors that may contribute to it. Unfortunately, more specific information regarding this issue has been put to an internal Oracle note because of the limitations that concern underscore parameters. So if you are dealing with log file sync (or log buffer space) issues, then I strongly recommend to log in to MOS and familiarize yourself with the updated version of the note as soon as possible.
In my recent post I showed how log file sync (LFS) and log file parallel write (LFPW) look for normal systems. I think it would also be interesting to compare that to the situation when LGWR does not have enough CPU.
I happen to have collected LGWR and database-level trace files for a 188.8.131.52 database on a Solaris 10 server which was under serious pressure (50 threads mostly inserting and committing data, only 32 CPUs). The AWR showed significant OS_CPU_WAIT_TIME (comparable to BUSY_TIME and much larger than IDLE_TIME) so I know for sure that CPU was an issue. And here is what LFS and LFPW histograms plotted from the trace file (as described here) looked like:
Not every commit results in a redo write. This is because there are multiple optimizations (some controlled by the user e.g. with COMMIT_LOGGING parameter, some automatic) that aim at reducing the number of redo writes caused by commits by grouping redo records together. Such group or “piggyback” commits are important for understanding log file sync waits and various statistics around it. In particular, “piggyback” commits play a key role when many sessions commit concurrently at a high rate, as described in my previous post. I made myself a little demo to actually see this mechanism in work with my own eyes. I think it could be of interest for others, so I’m sharing it here. Since the demo involves stopping and resuming background process, I wouldn’t recommend running it on anything other than a designated private sandbox environment.