Last week I participated in Oracle’s Real World Performance event — four days of lectures, quizzes, live demos and hands-on exercises. It was quite interesting, even more so than I expected it to be.
Understandably, a lot of time was spent discussing the perils of row-by-row processing. After all, it was Real World Performance, so it was based on performance problems that the authors of the course faced most often. And many, if not most, performance problems in the real world come from poor coding habits, in particular, from OLTP or object-oriented mindset brought by inexperienced developers into DW world.
We were shown a lot of demos when several implementations of the same task start running at the same time, and the set-based algorithm inevitably won every time (by a comfortable margin). It all makes perfect sense — indeed, SQL is a language built around set processing tasks, so it is the proper way to address most tasks in the database world, at least as far as DW is concerned.
Top Oracle experts have been talking about this for decades, and when focusing on the main message, some important details tend to get forgotten, which is was prompted me to write a blog post on this subject.
First off, there is important difference between set-based language constructs in SQL, and the actual implementation under the hood. As far as performance is concerned, it is the latter that matters. Failure to make the distinction can cost dearly. Let me give a specific example: rewriting row-by-row inserts inside a loop as a single INSERT/SELECT statement can yield disappointing results if the sum of column lengths exceeds the block size, as in this case internally the insert is handled on the row-by-row basis (see here for more details on this).
In many cases, the set-based side of things is limited to the way the data is acquired and/or prepared. For example, in a hash join, Oracle uses bulk methods to prepare it, but then the probe phase of the join is done on the row-by-row basis. Since memory reads are much faster than the physical I/O, the cost of the probe phase is often negligible, but that’s not always the case, especially when the number of rows in a block is high, and the physical I/O is relatively fast. On an Exadata machine, the situation when a full table scan or a hash join is dominated by the CPU time, is quite typical.
A nasty side effect of this ongoing campaign in favor of set-based processing (which, I repeat, is a just and a noble cause, when not taken to the extreme) is that people start to view PL/SQL as a “slow” language, which is not fair — there are some cases when it can be as efficient (if not more so) as SQL itself. Bryn Llewelyn and Toon Kooppelaars have given a few interesting talks and published some interesting material on the subject (see e.g. here).
So there is a legitimate niche for row-based processing, and it’s not just for when you’re processing a small data set. In some cases it can actually be exactly the opposite — for example, a hash join is theoretically a more set-oriented way of doing things and thus is more appropriate for joining large data sets. In practice, however, it may fail with “Unable to extend TEMP”, whereas nested loops, despite their row-by-row nature, would be able to finish the job.
Maybe one day we would have a special computer chip for each set operation, and then we’d be able to finally do everything in a set-based way. But for now row-by-row processing still has its place.