Another piece of good news — Oracle has opened a bug for yet another anomaly I reported earlier in my blog: row-by-row processing of bulk DML when the block size on the target table is less than the default 8k. So it’s now officially bug 20039770 – “DML SLOW WITH 4K BLOCK SIZE VS 8K BLOCK SIZE”. Their bug description seems a bit off (because 2K shows the same behavior as 4K, and 16/32K as 8K, so it’s not really a case of “4K VS 8K”), but I’m sure they update it accordingly in the course of the investigation by the development team. Unlike the log parallelism bug, this one is not yet open to public (maybe because they don’t have anything to put into the bug note yet, not even a workaround), I’ll post an update when that changes.
If you ever work with small block sizes then I highly recommend you familiarize yourself with this bug, because its impact on DML performance is quite big (I observed x3-x7 effect, but it could be even larger in a different setup). It makes rewrites of row-by-row logic via bulk DML pretty much useless, so if I had known about it earlier, it would have saved me a couple of weeks!
I will post another blog when Oracle concludes their investigation.
A few weeks ago I wrote a post about log parallelism causing excessive log file sync waits. Ever since, I’m finding more and more examples how this bug affects OLTP and hybrid databases (and even some data warehouses)! For example, my current employer is a large organization that has several thousand databases (set up at different time by different teams), and according to the studies I conducted on a sample of a few dozen databases, no less than 15-20% of the total Oracle database real estate have this problem. In several cases the scale of the problem is simply scary: e.g. I found a database that spends 43% of its time on log file sync (!) on the average, reaching up to 78% (!!) during peak workloads. Based on the feedback to my posts here and on LinkedIn, the situation in other organizations is no better. It looks like “log file sync” is a very wide-spread problem, and log parallelism is one of the main causes (if not the main cause).
This is why I am happy to announce that Oracle development has created a bug and assigned it a number (19959089). The bug note doesn’t have much at this stage, but at least it does list setting “_log_parallelism_max” to 1 as a workaround, which should encourage more people to test and apply this solution. It would probably still be necessary to get the Oracle support to okay changing the underscore parameter, but a reference to this bug should make it much simpler to obtain. Hopefully, when the bug investigation is concluded, there will be a detailed official note and/or a patch available that would eliminate the need to raise an SR. I’ll post a blog about it when this happens.
While 8k is the default block size, Oracle supports other block sizes, too. Smaller block sizes (more commonly, 4k) are encouraged for OLTP workloads to reduce concurrency, and even smaller block size, 2k, is recommended for databases running on “Advanced format” (or “4K”) storage. Oracle documentation warns us of possible implications when storing larger rows in 2k/4k blocks, such as potentially larger space overhead due to metadata, or even possibility of row chaining. What it doesn’t warn us about, is that there is also a big difference in the way DML operations work, which has very serious implications for their performance.
For many Oracle database patches there is an option to apply them “online”, i.e. without stopping the database and related services (listener, ASM etc.). This is very convenient when downtime is impossible or difficult to arrange. However, this convenience comes at a certain price, including some performance penalties. Such side effects of hot patching are not well understood, even by Oracle support engineers.
Last week, when I was tuning a SQL statement at a 10.2.0.4 database, I came across a rather interesting issue.
The SQL was performing poorly because of an inefficient plan (wrong join order), and it was relatively straighforward to determine the reason for that: a table partition appeared empty on the plan, while in fact it contained a few hundred thousand records. The table was range-partitioned on a timestamp column (1 partition = 1 day).
As the development team explained to me, they were relying on a rather sophisticated maintenance procedure that was looking over a previous period to chose a suitable partition as the source, and then copied its stats into a newly created partition before it was filled with data to prevent it from being analyzed as empty. Continue reading