Log buffer space

Introduction

Log buffer space is a simple, yet frequently misunderstood wait event. The main reason for that is probably its name. It sounds as if it points immediately to the answer: if space in the log buffer is the issue, then increasing it surely should resolve it. Well, unfortunately even though log buffer space is simple, it’s not that simple.

Log buffer

Log buffer is an area in memory (SGA, to be more specific) where redo is temporarily stored before it can be written to disk. Oracle only uses redo for recovery, i.e. it’s not needed for normal database operation. This “write and forget” approach (as Jonathan Lewis puts it) is what makes it so simple. It’s pretty much like the classical middle school problem with a pool, a tube which pours the water into the pool, and a drain, through which the water leaves the pool. If the water arrives via the tube at a higher rate than it leaves the pool via the drain, the pool would inevitably get full, no matter how big it is.

So most of log buffer space issues are caused either by too much redo being produced (“too much water poured”) or by slow log writer (“drain clogged”) and not by the size of the log buffer itself (which was basically taken out of DBA’s control since 10gR2 anyway).

Ways to decrease redo generation or improve redo throughput are well known (I will list some of them below), but instead of blindly trying them all, it’s more efficient to understand which of the two scenarios is taking place and why. Following quantities would be useful for that:
– redo generation rate
– redo filesystem latency (actual and expected)
– expected redo filesystem throughput
– redo write rate (number of redo write operations per unit time).

 Metrics

Many of quantities listed above can be found in AWR:  redo per second can be found in “Load Profile” section, redo filesystem latency, log file parallel write (the average tells you about redo filesystem latency, while the number of events tells you how many times redo was written) in “foreground events”. I also use a report that gives me a summary of many redo-related metrics from AWR and ASH (see Appendix).

To find what kind of I/O your redo filesystem can handle, you may:
– run some I/O benchmarks (SLOB is probably the best choice if you’re on Linux, there are also some workarounds for using it on other platforms)
– read your hardware specifications and published benchmark results
– talk to your SA and/or storage specialist
– all of the above.

Some useful information is also available in OEM (convenient for real-time monitoring): go to Performance Home, scroll down past the “Average Active Sessions” graph, then click on the “I/O” tab. It will give you database I/O stats for all processes — click on “LGWR” to chose only those related to redo:

OEM_redo_metrics_LGWR

 

Troubleshooting

The natural first step would be analyzing redo generation rate: is it consistent with the usual values? Is it consistent with what you know about the user activity? The size of the redo generated by a DML statement is typically similar to the amount of new information introduced by it. In case of an INSERT, it’s the size of the inserted row(s). In case of an UPDATE, it’s the combined size of updated column values. In case of a DELETE it would be minimal, as the only thing needed to repeat a DELETE is the rowid of the deleted row. So if you know what kind of DML is being run, you can understand whether the actual redo per second makes sense.

If redo generation rate is higher than expected, it could be due to hot backup running (which causes the full block image to be taken in redo, not just the part that was changed). Another common cause of excessive redo generation is row-by-row processing. On tablespaces with small (smaller than 8k) block sizes, bulk DML statements can be internally processed on the row-by-row basis (see here). If that’s the case, then a very significant reduction in redo generation can be achieved by simply moving objects to a tablespace with larger (8k and up) block size. If some processes that generate massive amounts of redo, but the tables they write to don’t need to be recoverable, then redo generation can be reduced by redefining those table via NOLOGGING and using direct path inserts (conventional path inserts generate redo in any case). If some tables are used as “scratchpads” or “staging areas”, they should be defined as global temporary tables (GTTs), this also helps reduce redo generation rate.

If redo generation is fine, then redo filesystem latency and I/O may give some answers:
– what % of time is LGWR idle?
– what % of time it’s busy with things other than I/O?
– what is average log file parallel write?
– how does it compare to expected I/O latency (e.g. from I/O benchmarks or based on hardware specifications)?
– how many redo writes per second is Oracle doing?
– how many IOPS (I/O operations per second) is this kind of storage supposed to be able to handle?
– how many MB/s of redo is Oracle writing?
– how does that compare to expected redo bandwidth (from I/O benchmarks or based on hardware specifications)?
When going through that list it’s important not to forget impact of write cache (if any), synchronous storage-level replication latency (if any) and anything else that could affect either latency or throughput.

If LGWR is spending a considerable amount of time (5-10% and higher) on things other than log file parallel write (e.g. CPU or control file parallel write), that this should be looked into. CPU issues are capable of causing log buffer space problems (as well as log file sync), although it’s not very common. Non-I/O waits (control file parallel write, log file single write) can be related to excessive log file switching itself, or contention incurred during such switches (enq: CF – contention waits).

If you don’t find any such suspicious waits, check if you are approaching I/O saturation. If you find that you are not anywhere near saturating the I/O, then you’re probably missing something: e.g. some unknown I/O activity against the redo filesystem (e.g. reads by log miner, database replication software, log archive process?), or some bottleneck on the OS side (like serialized log parallelism).

Summary

Log buffer space problem typically results from either too much redo being generated, or LGWR writing redo too slowly. The most efficient way to address this problem is quantitative analysis based on following quantities:
– how much redo you can reasonably expect based on user activity
– how much redo is actually being generated
– how much redo can be handled by the redo storage in principle.
This approach allows to quickly narrow down the list of potential suspects and get to the bottom of the issue within minimum time. In order to be able to use that approach, you need to be familiar with your storage (at least on the basic levels) and know your workload.

Appendix: Redo stats AWR report

with 
lfs as
(
  select e.snap_id,
         e.total_waits - lag(e.total_waits) over (partition by e.event_name order by e.snap_id) waits_delta,
         e.time_waited_micro - lag(e.time_waited_micro) OVER (PARTITION BY e.event_name ORDER BY e.snap_id) time_delta
  from dba_hist_system_event e
  where e.event_name = 'log file sync'
),
lfpw as
(
  select e.snap_id,
         e.total_waits - lag(e.total_waits) over (partition by e.event_name order by e.snap_id) waits_delta,
         e.time_waited_micro - lag(e.time_waited_micro) OVER (PARTITION BY e.event_name ORDER BY e.snap_id) time_delta
  from dba_hist_system_event e
  where e.event_name = 'log file parallel write'
),
redo as
(
  SELECT  snap_id,
          (VALUE - lag(VALUE) OVER (PARTITION BY stat_name ORDER BY snap_id))/1024/1024 redo_size
  FROM dba_hist_sysstat
  WHERE stat_name = 'redo size'
  ORDER BY snap_id DESC
),
snap as
(
  select snap_id,
         trunc(begin_interval_time, 'mi') begin_interval_time,
         end_interval_time - begin_interval_time interval_duration
  from dba_hist_snapshot
),
sn as
(
  select snap_id,
         begin_interval_time,
         extract(hour from interval_duration)*3600+
         extract(minute from interval_duration)*60+
         extract(second from interval_duration) seconds_in_snap
  from snap
),
ash as
(
  select snap_id, max(active_sess) max_concurrency
  from
  (
    select snap_id, sample_time, count(*) active_sess
    from dba_hist_active_sess_history ash
    where event = 'log file sync'
    group by snap_id, sample_time
  )
  group by snap_id
),
requests as
(
  select snap_id, avg(p3) avg_lfpw_requests
  from dba_hist_active_sess_history ash
  where event = 'log file parallel write'
  group by snap_id
)
select begin_interval_time,
       round(redo.redo_size/seconds_in_snap,2) redo_gen_MB_per_sec,
       round(100*lfpw.time_delta/1e6/seconds_in_snap) lgwr_pct_busy,
       round(avg_lfpw_requests, 2) avg_requests_per_log_write,
       round(1e6*redo.redo_size/lfpw.time_delta, 2) redo_write_speed,
       round(redo.redo_size/lfpw.waits_delta, 2) avg_redo_write_size,
       round(lfs.time_delta/lfs.waits_delta/1000,2) avg_lfs,
       round(lfpw.time_delta/lfpw.waits_delta/1000,2) avg_lfpw,
       lfs.waits_delta lfs_num_waits,
       lfpw.waits_delta lfpw_num_waits,
       max_concurrency
from lfs,
     lfpw,
     sn,
     redo,
     ash,
     requests
where lfs.snap_id (+) = sn.snap_id
and lfpw.snap_id (+) = sn.snap_id
and redo.snap_id (+) = sn.snap_id
and ash.snap_id = sn.snap_id
and requests.snap_id = sn.snap_id
order by begin_interval_time desc

Note that it contains some similar, but different quantities: “redo_gen_mb_per_sec” is redo written divided by number of seconds in the snapshot, while “redo_write_speed” is redo written divided by the number of seconds redo writing took (both quantities in MB/s). “Avg_requests_per_write” is calculated from “requests” parameter of “log file parallel write” statistics in ASH and it tells me how many OS I/O requests (on the average) were submitted per one event. “LFS” and “LFPW” abbreviations mean “log file sync” and “log file parallel write”, max_concurrency tells me max number of sessions per snapshot concurrently waiting on log file sync.

Advertisements

8 thoughts on “Log buffer space”

  1. The other common reason for slow log writing is cpu starvation (Kevin Closson has a nice blog about that), which also should be considered (and which often confuses folks who look at the available write bandwidth and see that it is well sufficient, leading to confusion unless cpu is considered). The other problem, which is easily handled by adding more redo log groups and/or acreage, is wrapping logs and having to wait for arch to complete freeing a group for new writes (drain is clogged). Nice diagnostic queries for the other issues. Finally, when you can shift some load away from peak redo usage the buffer may be able to transiently get you through to a point where your available cpu and write speeds are in a relative period of catching up.

    1. Hi,

      thanks for your comment. I’m well familiar with Kevin’s article. I think that the cases when bad LGWR performance is caused by CPU starvation and/or CPU scheduling issues do exist, but they are not so common as many experts believe.

      The reason why the CPU theory is so popular is that many people falsely assume that average log file parallel write represents the part of the log file sync caused by I/O. I’ve addressed this topic in my earlier post:

      https://savvinov.com/2014/09/09/how-not-to-troubleshoot-log-file-sync-waits/

      or you can also see this topic explored by James Morle:

      http://www.scaleabilities.co.uk/2010/06/02/log-file-sync-and-awr-not-good-bedfellows/

      I agree that I should have probably mentioned log switches, but I’m planning on publishing a separate post.

      Best regards,
      Nikolay

    1. Hi Hemant,

      I think it’s also possible to have this scenario with synchronous DataGuard replication, although “log file sync” is a more common manifestation of replication network latency.

      Best regards,
      Nikolay

    2. Hemant,

      having some time to think about what you said, I had to reconsider my position. With log file sync waits, it’s easy to see how Data Guard would become a major factor here (with Maximum Availability, your change is not really committed until it’s also committed in the standby database). But I cannot come up with a similar scenario for log buffer space waits. I don’t think that Data Guard would need redo to be in the log buffer in order to copy it to the standby database (because it can be just as easily copied from the redo log files). So off the top of my head, I can’t come up with a realistic scenario for synchronous Data Guard to cause log buffer space.

      Those are, of course, just theoretical considerations. I wish I could run some tests — but unfortunately I haven’t worked with Data Guard for a very long time and there’s no such setup available for testing to me at the moment. If you can come up with such a scenario yourself, or if you ever witness log buffer space due to synchronous Data Guard setup, please let me know.

  2. Hi Savvinov,
    Thank you very much for sharing this interesting post.
    I will just comment on your statement :” I don’t think that Data Guard would need redo to be in the log buffer in order to copy it to the standby database (because it can be just as easily copied from the redo log files)” : I’m not expert in dataguard, but what I understand from reading around , is that, in sync mode , it is indeed based on reading from buffer , because it is even not written to redo logs until it transfer was already initiated.
    I will just copy these excerpts from
    http://www.dba-oracle.com/t_oracledataguard_197_transport_copy_redo_logs.htm :

    SYNC is the synchronized transfers of redo data from the primary to the standby site. When synchronous transfer is in use, the LGWR does not write any redo entry in the online log file of the primary site until the initiated log transfer has passed the network layer successfully. The LGWR will not wait for acknowledgment receipts of the disk I/O operation of redo data on the standby site.

    If more than one standby site is in use and LGWR is serving all of these for log transfer, a synchronous transfer mode will cause the LGWR to wait until all of the archival destinations receive data over the network layer. This may cause a severe performance issue on the primary database.”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s