Log buffer space is a simple, yet frequently misunderstood wait event. The main reason for that is probably its name. It sounds as if it points immediately to the answer: if space in the log buffer is the issue, then increasing it surely should resolve it. Well, unfortunately even though log buffer space is simple, it’s not that simple.
Log buffer is an area in memory (SGA, to be more specific) where redo is temporarily stored before it can be written to disk. Oracle only uses redo for recovery, i.e. it’s not needed for normal database operation. This “write and forget” approach (as Jonathan Lewis puts it) is what makes it so simple. It’s pretty much like the classical middle school problem with a pool, a tube which pours the water into the pool, and a drain, through which the water leaves the pool. If the water arrives via the tube at a higher rate than it leaves the pool via the drain, the pool would inevitably get full, no matter how big it is.
So most of log buffer space issues are caused either by too much redo being produced (“too much water poured”) or by slow log writer (“drain clogged”) and not by the size of the log buffer itself (which was basically taken out of DBA’s control since 10gR2 anyway).
Ways to decrease redo generation or improve redo throughput are well known (I will list some of them below), but instead of blindly trying them all, it’s more efficient to understand which of the two scenarios is taking place and why. Following quantities would be useful for that:
– redo generation rate
– redo filesystem latency (actual and expected)
– expected redo filesystem throughput
– redo write rate (number of redo write operations per unit time).
Many of quantities listed above can be found in AWR: redo per second can be found in “Load Profile” section, redo filesystem latency, log file parallel write (the average tells you about redo filesystem latency, while the number of events tells you how many times redo was written) in “foreground events”. I also use a report that gives me a summary of many redo-related metrics from AWR and ASH (see Appendix).
To find what kind of I/O your redo filesystem can handle, you may:
– run some I/O benchmarks (SLOB is probably the best choice if you’re on Linux, there are also some workarounds for using it on other platforms)
– read your hardware specifications and published benchmark results
– talk to your SA and/or storage specialist
– all of the above.
Some useful information is also available in OEM (convenient for real-time monitoring): go to Performance Home, scroll down past the “Average Active Sessions” graph, then click on the “I/O” tab. It will give you database I/O stats for all processes — click on “LGWR” to chose only those related to redo:
The natural first step would be analyzing redo generation rate: is it consistent with the usual values? Is it consistent with what you know about the user activity? The size of the redo generated by a DML statement is typically similar to the amount of new information introduced by it. In case of an INSERT, it’s the size of the inserted row(s). In case of an UPDATE, it’s the combined size of updated column values. In case of a DELETE it would be minimal, as the only thing needed to repeat a DELETE is the rowid of the deleted row. So if you know what kind of DML is being run, you can understand whether the actual redo per second makes sense.
If redo generation rate is higher than expected, it could be due to hot backup running (which causes the full block image to be taken in redo, not just the part that was changed). Another common cause of excessive redo generation is row-by-row processing. On tablespaces with small (smaller than 8k) block sizes, bulk DML statements can be internally processed on the row-by-row basis (see here). If that’s the case, then a very significant reduction in redo generation can be achieved by simply moving objects to a tablespace with larger (8k and up) block size. If some processes that generate massive amounts of redo, but the tables they write to don’t need to be recoverable, then redo generation can be reduced by redefining those table via NOLOGGING and using direct path inserts (conventional path inserts generate redo in any case). If some tables are used as “scratchpads” or “staging areas”, they should be defined as global temporary tables (GTTs), this also helps reduce redo generation rate.
If redo generation is fine, then redo filesystem latency and I/O may give some answers:
– what % of time is LGWR idle?
– what % of time it’s busy with things other than I/O?
– what is average log file parallel write?
– how does it compare to expected I/O latency (e.g. from I/O benchmarks or based on hardware specifications)?
– how many redo writes per second is Oracle doing?
– how many IOPS (I/O operations per second) is this kind of storage supposed to be able to handle?
– how many MB/s of redo is Oracle writing?
– how does that compare to expected redo bandwidth (from I/O benchmarks or based on hardware specifications)?
When going through that list it’s important not to forget impact of write cache (if any), synchronous storage-level replication latency (if any) and anything else that could affect either latency or throughput.
If LGWR is spending a considerable amount of time (5-10% and higher) on things other than log file parallel write (e.g. CPU or control file parallel write), that this should be looked into. CPU issues are capable of causing log buffer space problems (as well as log file sync), although it’s not very common. Non-I/O waits (control file parallel write, log file single write) can be related to excessive log file switching itself, or contention incurred during such switches (enq: CF – contention waits).
If you don’t find any such suspicious waits, check if you are approaching I/O saturation. If you find that you are not anywhere near saturating the I/O, then you’re probably missing something: e.g. some unknown I/O activity against the redo filesystem (e.g. reads by log miner, database replication software, log archive process?), or some bottleneck on the OS side (like serialized log parallelism).
Log buffer space problem typically results from either too much redo being generated, or LGWR writing redo too slowly. The most efficient way to address this problem is quantitative analysis based on following quantities:
– how much redo you can reasonably expect based on user activity
– how much redo is actually being generated
– how much redo can be handled by the redo storage in principle.
This approach allows to quickly narrow down the list of potential suspects and get to the bottom of the issue within minimum time. In order to be able to use that approach, you need to be familiar with your storage (at least on the basic levels) and know your workload.
Appendix: Redo stats AWR report
with lfs as ( select e.snap_id, e.total_waits - lag(e.total_waits) over (partition by e.event_name order by e.snap_id) waits_delta, e.time_waited_micro - lag(e.time_waited_micro) OVER (PARTITION BY e.event_name ORDER BY e.snap_id) time_delta from dba_hist_system_event e where e.event_name = 'log file sync' ), lfpw as ( select e.snap_id, e.total_waits - lag(e.total_waits) over (partition by e.event_name order by e.snap_id) waits_delta, e.time_waited_micro - lag(e.time_waited_micro) OVER (PARTITION BY e.event_name ORDER BY e.snap_id) time_delta from dba_hist_system_event e where e.event_name = 'log file parallel write' ), redo as ( SELECT snap_id, (VALUE - lag(VALUE) OVER (PARTITION BY stat_name ORDER BY snap_id))/1024/1024 redo_size FROM dba_hist_sysstat WHERE stat_name = 'redo size' ORDER BY snap_id DESC ), snap as ( select snap_id, trunc(begin_interval_time, 'mi') begin_interval_time, end_interval_time - begin_interval_time interval_duration from dba_hist_snapshot ), sn as ( select snap_id, begin_interval_time, extract(hour from interval_duration)*3600+ extract(minute from interval_duration)*60+ extract(second from interval_duration) seconds_in_snap from snap ), ash as ( select snap_id, max(active_sess) max_concurrency from ( select snap_id, sample_time, count(*) active_sess from dba_hist_active_sess_history ash where event = 'log file sync' group by snap_id, sample_time ) group by snap_id ), requests as ( select snap_id, avg(p3) avg_lfpw_requests from dba_hist_active_sess_history ash where event = 'log file parallel write' group by snap_id ) select begin_interval_time, round(redo.redo_size/seconds_in_snap,2) redo_gen_MB_per_sec, round(100*lfpw.time_delta/1e6/seconds_in_snap) lgwr_pct_busy, round(avg_lfpw_requests, 2) avg_requests_per_log_write, round(1e6*redo.redo_size/lfpw.time_delta, 2) redo_write_speed, round(redo.redo_size/lfpw.waits_delta, 2) avg_redo_write_size, round(lfs.time_delta/lfs.waits_delta/1000,2) avg_lfs, round(lfpw.time_delta/lfpw.waits_delta/1000,2) avg_lfpw, lfs.waits_delta lfs_num_waits, lfpw.waits_delta lfpw_num_waits, max_concurrency from lfs, lfpw, sn, redo, ash, requests where lfs.snap_id (+) = sn.snap_id and lfpw.snap_id (+) = sn.snap_id and redo.snap_id (+) = sn.snap_id and ash.snap_id = sn.snap_id and requests.snap_id = sn.snap_id order by begin_interval_time desc
Note that it contains some similar, but different quantities: “redo_gen_mb_per_sec” is redo written divided by number of seconds in the snapshot, while “redo_write_speed” is redo written divided by the number of seconds redo writing took (both quantities in MB/s). “Avg_requests_per_write” is calculated from “requests” parameter of “log file parallel write” statistics in ASH and it tells me how many OS I/O requests (on the average) were submitted per one event. “LFS” and “LFPW” abbreviations mean “log file sync” and “log file parallel write”, max_concurrency tells me max number of sessions per snapshot concurrently waiting on log file sync.