SQL Performance, Part II — Disk I/O: metrics and scales

While query performance depends on a large number of things, overall scale of query performance for a given database is generally set by disk I/O speed. The most common type of a storage device used in databases is still a hard disk drive (HDD), so let’s consider how it works.

HDD uses magnetic state of a very small region on a disk to encode “0” or “1” (i.e. carry 1 bit of information). A magnetic head placed on a moving attenuator arm is used to write or read information on that region. The head needs to be immediately above the magnetic region to read or write operation, so the head and the disk surface (called the platter) need to constantly move with respect to each other.  This is achieved by rotating the disk (around its spindle) and also by moving the attenuator arm in the transverse direction. Magnetic regions are organized in form of circular tracks. Thanks to the rotation (typically, at 5,400 – 15,000 rotations per minute, or rpm), information from a track can be sequentially read at a relatively high speed (up to hundreds of megabytes per second). This speed can be accurately calculated if we know how much information a track carries, i.e. the linear density, which is often measured in sectors per track, where a sector is typically 512 bytes in size (although some modern drive use a more recent 4k format, i.e. the sector is 4 kilobytes in size). For example, a disk with 2,500 sectors per track (and the traditional 512 byte sector size) rotating at 7,200 rpm will read 2,500 x 120 x 512  bytes per second, i.e. the transfer rate will be about 150 MB/s. HDDs typically have same performance in terms of reads and writes.

When we need to read a small amount of data which is randomly scattered on the surface of the disk, sequential reading is not a very good strategy. Instead, we can jump from one track to another, skipping the data we’re not interested in. In this case we’ll need to spend time on the transverse motion from one track to another (this operation is called seek) and then wait until the disk rotates so that the header is above the right sector. The seek time is typically between 4 and 15 milliseconds (enterprise-grade disks used in production databases tend to be in the lower part of that range). The second part of the delay, the one due to the rotation (the rotation latency) depends entirely on the rotation speed of the disk. For example, for a 7,200 rpm disk there will be 120 rotations per second, so a full rotation will take about 8 milliseconds, and therefore average rotational latency will be about 4 milliseconds (as it can be anywhere between no rotation at all – if we happened to get to the right sector directory – and a full rotation, if we just missed it, so on the average it will be half rotation). Random access time is the sum of the seek time and the rotational latency. For high-end drives it can be as low as 6-8 milliseconds, for less expensive drives (used in commodity hardware) it can be 10-20 milliseconds.

The inverse of the random access time gives the number of random I/O operations the disk can perform per second, i.e. I/O per second, or IOPS.  Databases typically use arrays of disks, where the total IOPS is the sum over all individual disks. E.g. for 1 hard drive, 10 ms access time means 100 IOPS. An array that consists of 100 such disks can do 10,000 IOPS.

If we perform random reads in amounts of 8 kB (the standard Oracle block size), then e.g. for a 6 ms drive we’ll get 167 IOPS times 8 kB = 1.3 MB/s, i.e. the amount of data transferred per unit time will be two orders of magnitude lower. The fundamental advantage of random access is that it allows one to be selective, i.e. instead of reading a large chunk (e.g. table, or table partition) of data and filtering out everything that it not needed, one can selectively read a much smaller amount information, resulting in overall performance benefit. So the key metric for determining efficiency of sequential vs. random access is selectivity, i.e. % of useful data vs. all data. However, random data access is only possible when physical locations of required data can be obtained, which isn’t always the case.

Theoretically, disk performance characteristics can be calculated from its rotation speed, and its geometry (sectors per track, tracks per inch, inner diameter etc.) . In practice, however, those numbers aren’t always available. For example, 63 sectors per track often shown by BIOS has nothing to do with the reality for modern disks, it’s some virtual numbers used for compatibility. The real number is much higher (a few thousands) and isn’t always available, so you may have to either rely on performance metrics provided by the disk manufacturer, or run some benchmark tests.

HDDs often come with some cache, which provides much faster access time. This cache can bias test results and cause the disks to appear faster than they actually are (so when benchmarking, one needs to minimize the impact of the cache). Also, disk arrays are typically configured to have some redundancy (RAID-5, RAID-6, RAID-10) which causes “write penalties” and breaks read/write symmetry – another thing to consider during benchmarking.

While HDD is the most popular storage device, recently, solid state disks (SSD) are getting more popular. They provide much faster access times (1ms or better), and also higher data transfer rates (up to 2GB/s), but in addition to be significantly more expensive than HDD, they also have a range of other problems (lower lifetime, performance declines over time etc.). If you are interested to learn about SSD storage, I recommend you and excellent blog series by flashdba, “Storage for DBAs”.

To sum up:

  • disk access time is typically measured in milliseconds (6-8 ms for high end drives)
  • I/O transfer rate is typically measured in hundreds of megabytes per second per disk
  • optimal access method depends on selectivity.

2 thoughts on “SQL Performance, Part II — Disk I/O: metrics and scales”

    1. Hi Chris, you are very much welcome. It’s really frustrating how little attention is paid to storage matters in database performance texts. So not just myself, but the entire Oracle community owes you a debt of gratitude for addressing this issue with your fantastic mini(actually, not so mini) series.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s