Except a few special cases, optimizing SQL is about minimizing I/O. And by “I/O” we normally mean “physical I/O”, because everybody knows that logical I/O (LIO) is much, much faster. But how much faster exactly? For a long time, this question has been bothering me. It looks like there has been little research in this area. Basically the only thorough investigation I managed to find on the subject was one by Cary Millsap and co-authors. So I conducted some research on my own. Before presenting my findings, here’s a summary of Cary’s paper: the difference between PIO and LIO time is not the difference between disk access and memory access, it’s much smaller (37x vs 10,000x as per their measurement), and therefore performance optimization needs to address LIO as well as physical I/O (PIO).
There are several reasons why this number cannot be very reliable. The most obvious one is that the paper was published over 10 years ago, there has been a lot of changes in hardware since then. But more importantly, LIO is a much trickier concept than a PIO, and this complexity cannot be very well described by a single number.
Let me explain a bit. We can talk about PIO time because it’s independent of the contents. When you’re reading an 8kB block, whether it’s full of 0’s, or 1’s, or of random numbers, it doesn’t matter: mechanical latencies are going to be same. With LIO, that’s not quite the case, because the time spent on processing contents of a block is: a) comparable with the time spent on reading it b) is very difficult to distinguish from it from instrumentation point of view. Besides, LIO time is greatly influenced by concurrent activity (whether or not a read consistent version needs to be built) and various obscure optimizations (like “fastpath”) with little or no documentation about them. Because of all that, one should keep in mind that “LIO time” cannot as clearly defined concept as PIO time. Still, some simple cases can be analyzed, and some useful conclusions can be drawn from that analysis.
So what I did was the following: I took measurements with several different combinations of row length and number of columns so as to find the minimal LIO time (the “pure” LIO time so to say) and also find how much it changes as the table structure becomes more complex. The table below summarizes the results for the two setups: A, a Windows 7 PC and B, Solaris M-8000 server. both running Oracle 11gR2. A few words about setup: I used equally-sized varchar2 columns with 1, 5 and 25 columns, and row length of 25, 250 and 2500 bytes. I.e. 5 columns and 2500 bytes means 5 columns 500 bytes each. The total “payload” (number of rows times average row length) was also the same in all cases, 1GB. The actual table size, however, varied from test case to test case because of difference in relative overhead and other irregularities, so the LIO was measured by dividing total CPU time by the number of consistent gets. In order to minimize instrumentation error, I preloaded all tables into cache, and cross checked the results via several sources.
To make sure that I minimize artifacts from the test setup I also did the following: 1) checked that results are independent of the way buffer cache is populated with table contents (full table scan and randomly ordered reads by rowid interspersed with reads from other segments gave results that were different less than 10%) 2) checked that same was also true for the way data was read from the buffer cache (full table scan and reads by rowid also gave roughly same results).
row length # columns LIO(microsec), A LIO(microsec), B 2500 1 1.44 5.04 2500 5 1.34 5.44 2500 25 1.43 5.39 250 1 2.06 8.87 250 5 4.41 8.97 250 25 2.50 8.64 25 1 5.32 8.56 25 5 5.52 8.29 25 25 6.85 8.94
Looking at the results table, I make following observations:
1) The smaller row length (i.e. the more densely rows are packed into blocks), the more expensive is a LIO
2) The more columns, the more expensive is a LIO (although there are a couple of anomalies here)
3) Row density affects LIO cost more than the number of columns (although this might be different with other data types)
4) Cost of a LIO is different on different machines (I think that it depends primarily on memory characteristics — e.g. NUMA effects could explain why a plain PC outperforms an expensive enterprise server)
5) In all cases LIO time is much lower than the 53 microseconds from the above mentioned paper. So either LIO became faster over the last decade, or instrumentation errors resulted in it being overestimated in Millsap’s paper.
Whatever the reason, does it mean that LIO matters less? Not really. First of all, even 5-9 microseconds per LIO is not that small a number, especially if we take into consideration the fact that LIO doesn’t seem to benefit much from “multiblock reads”. Consider a full table scan with mbrc = 128 and average “db file scattered rea” of 6 ms (which corresponds to 160 MB/s, a fairly standard data transfer rate). On the other hand, 128 LIOs can easily take 1 ms or more. So the difference between PIO and LIO would be only by a factor of 6, and not by several orders of magnitude as most of us would expect. And that doesn’t even include the cost of actually reconstructing the read-consistent version in case it’s necessary because of concurrent activity, or potential contention for latches!
If anything, LIO is becoming more important nowadays because – enterprise grade storage nowadays comes with big cache of its own, so physical reads are often measured in hundreds or even tens of microseconds. I’m not exaggerating — I’ve seen systems with long-term average “physical reads” of 80 microseconds! – low-latency flash storage is becoming more popular by day (driving the difference between PIO and LIO further down) – memory is rapidly becoming cheaper, which means caches are rapidly getting bigger – advent of in-memory option in Oracle means that people would rely on memory even more – high per-CPU-core licensing costs for in-memory would translate the resource cost of LIO into $$$ in a very direct way – spread of virtualization technologies makes it easier to put freed CPU resources to good use, which means that there will be demand for non-SLA optimization. To put it simply, 10 years ago if you reduce CPU utilization from 80% to 50% meant that you’d be wasting more CPU cycles. Now those CPU cycles can be quickly and cheaply allocated to another database (or application) where they would bring value to the business. So there will be a good reason to tune something to reduce resource usage even without any users complaining (and no one would accuse you of having Compulsive Tuning Disorder!). But that’s a topic for another post.
LIO time depends on block structure, hardware and other factors. Typical values are 1-10 microseconds, which is much lower than the popular estimate of 50 microseconds per LIO. Nevertheless, in many situations time spent on LIO can be comparable to PIO time, so reducing LIO is very important. Because of recent technological changes and other changes, the motivation for LIO optimization is likely to increase in the future.