Friday, March 30, 2012

Just What Are Blocking Reads and How Did They Get in My Database?

If you have been around Oracle for very long I am sure you have heard that in Oracle readers don’t block other readers and readers don’t block writers. Of course, this is referring to the processes that happen within Oracle locking structures inside Oracle memory caches. What happens within the physical storage is a completely different story.

Even if you hand Oracle raw devices (which means the Oracle kernel handles the IO instead of passing it off to the operating system) you will still get blocking reads (and blocking writes) if you are using standard hard disk drives. Imagine it this way: a person initiates a full table scan with a multi-block read count of 64 and a block size of 8 kilobytes. The disk subsystem is in a RAID10 with a stripe depth of 64 kilobytes going across 8 disks for a stripe width of 512 kilobytes. What happens to disk access while his full table scan of a multi-megabyte table is happening? Look at Figure 1.



Figure 1: A Blocking Read

So, even if the second user in a blocking read situation is going after totally unrelated data, they have to wait until the first user finishes his read before they get access to the disks. Now multiple this by the number of full table scans in your system and the number of users trying to access data and you can see the issue. How can blocking reads be mitigated? Well in hard drive based system you need to align stripe depth to the size read in a full a table scan to minimize the number of disks involved. Look at Figure 2.



Figure 2: Non-Blocking Read

In the situation in Figure 2 the disk stripe depth is aligned to the product of the db file multi-block read count setting and the block size. This allows each of the first users reads to only tie up one disk at a time. Now when user two tries to get access, as long as they are going for data on a different disk, they can get to their data with no problem.

ASM uses a 1 megabyte stripe depth for data in a normal setup and 4 megabytes for an Exadata cell for just his purpose, to prevent reads form blocking other reads or writes. But, why does this collision happen? The underlying cause is shown in Figure 3.




Figure 3: The Cause of Read Blocking

The ultimate cause of blocking reads and writes is the rotating disk and moving armature inside hard disks. The need to reposition the disks and armatures to read or write the individual data sectors results in blocking access to the disks during each discrete read or write by user for all other users. Of course the only way to get rid of blocking reads and writes, when not accessing the same data block, is by using solid-state memory that has no need to rotate a platter or move an armature. If you could create a read “surface” that could read any point on the disk surface without repositioning the disk, you could get near memory speeds from disk based systems. Unfortunately you can’t do this with current technologies, and, quite frankly, why would you want to? Modern flash technology provides read latency of 250 microseconds or less and with a little help from DDR buffering and advanced write algorithms, 85 microsecond or less write times.

But what if you don’t want to replace your entire SAN system with $20/GB flash technology? Well, in my next blog entry next week I’ll tell you how to eliminate read blocking and still benefit from flash while utilizing you existing SAN or disk technology.

2 comments:

  1. Hi Mike,

    Are you giving any credit to command queuing in your concurrent I/O scenario? A disk can actually be fulfilling several tagged requests in a single rotation so in that case multiple readers may not be blocking each other. It seems your scenario presumes modern drives only do one thing at a time?

    ReplyDelete
    Replies
    1. Yes, command queuing reduces the problem but doesn't eliminate it. Anytime there is a physical movement required to read or write a data block then someone who is trying to access that drive at that moment will be blocked. Each drive consists of multiple heads and multiple drive platters as shown in the image, but the heads are fixed to a single armature and move as a single unit. As long as the disk firmware can align the reads/writes with the heads and the data locations on the disk then yes, multiple operations can occur. This does reduce the contention, but doesn't eliminate it.

      Delete