A few weeks ago I wrote a post about log parallelism causing excessive log file sync waits. Ever since, I’m finding more and more examples how this bug affects OLTP and hybrid databases (and even some data warehouses)! For example, my current employer is a large organization that has several thousand databases (set up at different time by different teams), and according to the studies I conducted on a sample of a few dozen databases, no less than 15-20% of the total Oracle database real estate have this problem. In several cases the scale of the problem is simply scary: e.g. I found a database that spends 43% of its time on log file sync (!) on the average, reaching up to 78% (!!) during peak workloads. Based on the feedback to my posts here and on LinkedIn, the situation in other organizations is no better. It looks like “log file sync” is a very wide-spread problem, and log parallelism is one of the main causes (if not the main cause).
This is why I am happy to announce that Oracle development has created a bug and assigned it a number (19959089). The bug note doesn’t have much at this stage, but at least it does list setting “_log_parallelism_max” to 1 as a workaround, which should encourage more people to test and apply this solution. It would probably still be necessary to get the Oracle support to okay changing the underscore parameter, but a reference to this bug should make it much simpler to obtain. Hopefully, when the bug investigation is concluded, there will be a detailed official note and/or a patch available that would eliminate the need to raise an SR. I’ll post a blog about it when this happens.
Thanks for all of this LFS related articles. We have many dbs with the same problem. All dbs is 11.2.0.4 on Solaris SPARC. LFS waits is 15-20% average (3-5K commits per sec, 32-96 CPUs).
Setting “_use_adaptive_log_file_sync” = false help us a little on test dbs.
Hi Konstantin,
thanks for your comment. I know about “_use_adaptive_log_file_sync” being useful in some cases, but I haven’t seen yet it make a difference as big as log parallelism does (x3 in our case).
If your databases use synchronous storage-level replication, you might wanna check if its latency is within the expected range, as in my experience synchronous replication is the second most common cause of redo-related performance issues like LFS/LBS.
After we obtained the initial x3 improvement in LFS by setting _log_parallelism_max to 1, our SAs were able to detect a networking issue that was affecting the replication latency, and fixing it improved our LFS figured by another x2.5 (so the total effect was 80 ms => 11 ms).
Best regards,
Nikolay