Thursday, April 19, 2012

ASM Features best used with SSD

ASM in Oracle 11g includes advances that can exploit the performance of SSDs. The features of Preferred Mirror Read and Fast Mirror Resync are the two most prominent features that fit in this category.

Preferred Read Failure Groups

The Preferred Read is not a new idea, but is implemented in Oracle’s volume management in Oracle 11g. The concept is to read from the storage that can present the needed data at a lower latency. Initially, this was designed for WAN or site-specific storage in order to avoid higher-latency site connections. By restricting data reads to the local storage, the application would be able to service requests at nominal read speeds while writes were the only communication needed to traverse the long haul site link. This is a feature that is available to most Operating Systems with their included volume manager and as a feature to Symantec/Veritas through the title Preferred Plex.

As each node of a cluster runs its own instance of ASM, each node can also define its own preferred read failgroup. This allows for each node to operate at top performance with its local storage.

The Preferred Read Failure Groups are assigned in the initialization parameter ASM_PREFERRED_READ_FAILURE_GROUPS. The syntax for the failgroups is designated by DISKGROUP.FAILGROUP and comma-separated if working with multiple diskgroups.

The advantage of Preferred Reads for SSD is not in avoiding the intersite links. Since the concept of preferred reads is to read from a lower latency mirror, the strategy of preferring the reads from a SSD mirrored to HDD can also be employed. This allows SSDs to be deployed with full redundancy at a much lower cost by mirroring to HDD.

The preferred read allows for overall increases in performance. The read performance of course will be serviced at the speed of the SSD where SSDs excel. The writes on the other hand, will be as fast as the slowest disk member. This not a problem for tables and indexes as only reads result in foreground waits, while writes are performed in the background by the lazy writer. Also, since there is no longer a read+write contention on the disk, the disks can now focus performance on solely doing writes. This also emphasizes the gains from write-cached arrays as it allows the disk to service the writes in bursts from Oracle’s lazy writer. This architecture is not recommended for redo, undo, or temp where writes result in foreground waits.

The Preferred Read does not change the application’s read/write ratio. In order for the application to scale with the SSD, the writes must not become the bottleneck prematurely. An excellent option is to deploy a small diskgroup composed of RAM SSDs for the database logs, and deploy FLASH SSDs alongside disks with the preferred read mirror setting for the tables and indexes. This way all of the blocking I/O requests are served from SSD and the HDD merely provides inexpensive redundancy.

Fast Mirror Resync

Oracle 11g includes the Fast Mirror Resync which tracks the changed extents for a given window of repair, a default value of 3.6 hours. If a disk goes offline, then ASM will track the extents associated with the disk. When the disk is recovered and brought online (knowing that all previous data is still intact) the list of changed extents are applied to the recovered disk. This is extremely useful when working with very large LUNs.

For alls systems (SSDs and spinning disks) that undergo offline maintenance or code updates, the Fast Mirror Resync reduces maintenance impact dramatically. The system can be taken offline and quickly serviced and then added back to ASM and only the extents that changed while the service was performed are synchronized. Without the Fast Mirror Resync, ASM would have to completely rebuild the mirror which could take a long time.

The window for disk recovery can be set at a DiskGroup scope with the DISK_REPAIR_TIME attribute or it can be manually set when offlining a disk for scheduled maintenance. If a system is scheduled for 1 hour of maintenance, be sure to allocate enough time for unforeseen issues with a larger maintenance window and longer “DROP AFTER” clause.

ALTER DISKGROUP TIER0 OFFLINE DISK SSD01 DROP AFTER 4h;

This would allow for a recovery window of 4 hours for the expected 30 minute maintenance. When the LUN is reinstated with the ONLINE DISK command, all changed extents are rebalanced to the recovered disk.

Conclusion

The high availability, performance gain, and ease of management lead ASM to a win for storage management. Preventing hot spots on disks intuitively leads to lower mechanical costs. Ease of management incurs lower operations costs. With higher performance, the fundamental cost per transaction is also reduced, and, by using an industry-leading database engine to serve storage, security and availability are not compromised.

The fundamental compliment of ASM with tiered storage gets more value to the investment. SSD technology with ASM management can also ensure uptime without interruption to day-to-day operations, even in times of maintenance. Higher numbers of transactions through lower latency and higher throughput from unburdened disk storage illustrates the raw performance gain of using ASM with tiered storage.

No comments:

Post a Comment