Excessive commits?

Let’s consider a hypothetical scenario. Suppose you have a process A that you want to run faster. This process doesn’t commit (much), so it doesn’t wait on log file sync. However, there is another multi-threaded process, B, that commits very frequently, and spends a lot of time on “log file sync”. You don’t care about the process B, your only goal is to make A go faster. After exhausting your tuning arsenal (to no avail), you go to a production DBA. He looks at the AWR report and says “you’re committing too much”. “Look how much time the system spends on log file sync”. You tell him that the process A doesn’t commit much and doesn’t suffer from log file sync waits, but the DBA insists: “Even so, reducing commits would improve the database performance health in general, and by doing so it would benefit your process as well. Besides, getting rid of all that log file sync noise would help us see the problem with process A more clearly”. You are convinced, and after spending some time with the code, you find a bunch of unnecessary commits inside loops. You remove them. You reduce the commit rate per second by several orders of magnitude! Your database is much healthier now! And your process A will now run… almost four times slower than it did before.

If the end of this story surprises you, it really shouldn’t. Let me show you AWR exceprts that simulate this situation:

Continue reading “Excessive commits?” →

Followup on the 4k DML bug

Another piece of good news — Oracle has opened a bug for yet another anomaly I reported earlier in my blog: row-by-row processing of bulk DML when the block size on the target table is less than the default 8k. So it’s now officially bug 20039770 – “DML SLOW WITH 4K BLOCK SIZE VS 8K BLOCK SIZE”. Their bug description seems a bit off (because 2K shows the same behavior as 4K, and 16/32K as 8K, so it’s not really a case of “4K VS 8K”), but I’m sure they update it accordingly in the course of the investigation by the development team. Unlike the log parallelism bug, this one is not yet open to public (maybe because they don’t have anything to put into the bug note yet, not even a workaround), I’ll post an update when that changes.

If you ever work with small block sizes then I highly recommend you familiarize yourself with this bug, because its impact on DML performance is quite big (I observed x3-x7 effect, but it could be even larger in a different setup). It makes rewrites of row-by-row logic via bulk DML pretty much useless, so if I had known about it earlier, it would have saved me a couple of weeks!

I will post another blog when Oracle concludes their investigation.

Log parallelism bug now has a number

A few weeks ago I wrote a post about log parallelism causing excessive log file sync waits. Ever since, I’m finding more and more examples how this bug affects OLTP and hybrid databases (and even some data warehouses)! For example, my current employer is a large organization that has several thousand databases (set up at different time by different teams), and according to the studies I conducted on a sample of a few dozen databases, no less than 15-20% of the total Oracle database real estate have this problem. In several cases the scale of the problem is simply scary: e.g. I found a database that spends 43% of its time on log file sync (!) on the average, reaching up to 78% (!!) during peak workloads. Based on the feedback to my posts here and on LinkedIn, the situation in other organizations is no better. It looks like “log file sync” is a very wide-spread problem, and log parallelism is one of the main causes (if not the main cause).

This is why I am happy to announce that Oracle development has created a bug and assigned it a number (19959089). The bug note doesn’t have much at this stage, but at least it does list setting “_log_parallelism_max” to 1 as a workaround, which should encourage more people to test and apply this solution. It would probably still be necessary to get the Oracle support to okay changing the underscore parameter, but a reference to this bug should make it much simpler to obtain. Hopefully, when the bug investigation is concluded, there will be a detailed official note and/or a patch available that would eliminate the need to raise an SR. I’ll post a blog about it when this happens.