Friday, April 6, 2012

A Holistic Look at the Oracle Memory Layer

The memory layer is the actual area where the database caches, pools, and other memory structures reside. As such, the memory layer is holistic in nature since any changes there are immediately felt on all other areas. Figure 1 is a diagram of the various memory areas in Oracle11g.



Figure 1: Oracle11g memory Areas

Insufficient Memory Resources

Generally speaking, insufficient memory resource shows up as excessive physical reads. Usually the first indication in a properly designed system will be excessive db file sequential reads. If a system is using automatic memory management it may also show as numerous deferred actions or excessive swapping of memory between the shared pool components and the database cache areas. Figure 2 shows an example of memory starvation.



Figure 2: Memory Starvation Indication

Another indication is when the cache analysis section of the AWR or Statspack indicates a large amount of change in physical reads from increasing the cache size as shown in Figure 3.



Figure 3: Buffer Pool Advisory

As you can see in the example shown, doubling the cache size (if possible) would result in at least a 50% reduction in physical reads.

Correcting insufficient memory resources may be as easy as increasing the settings for AMM or the floor values for the various components in the Oracle memory. However, if the system memory has been exhausted then buying more memory and inserting it in the system is indicated. Many times this may not be enough.

Access to memory and the back path to the IO subsystem can also be stressed if there is too much memory. This can be helped by switching to a RAC based system, thus spreading the memory and IO pathways amongst several servers. In a RAC environment excessive interconnect traffic can be a sign that the individual servers have insufficient memory.

Too large a buffer area can result in excessive latching and CPU overhead. It is usually easily detected by the same areas of the AWR or Statspack reports showing the estimated physical reads don’t change starting at a range below our current settings.

Improper Memory Configuration

Within the Oracle memory area there are multiple caches and pools. The wary DBA should be utilizing proper settings for the shared pool database cache, keep, recycle, and multiple block size caches. Additional settings for the large pool, Java pool, and streams pool should be used as needed in a particular environment. Even when AMM is used, floor values should be set for these parameters to ensure minimum levels of performance are met.

When using SGA_TARGET and SGA_MAX_SIZE or MEMORY_TARGET and MEMORY_MAX_SIZE, be sure they aren’t set equal to each other, because this leads to memory thrashing as the internal memory pools and caches are forced to release and reclaim memory, leading to deferred actions. I usually recommend that at least 1-2 gigabytes be between either set of settings. Of course, as you gain operational experience for your system’s needs this gap can be reduced, or increased. The dynamic performance view, v$sga_resize_ops and the AWR resize operations section can be utilized to see if you need to rethink the floor settings for specific parameters.

A typical v$sga_resize_ops query:

select
COMPONENT, OPER_TYPE, OPER_MODE,
INITIAL_SIZE, TARGET_SIZE, FINAL_SIZE,
STATUS,
to_char(START_TIME,'mmdd hh24:mi') start_time,
to_char(END_TIME,'mmdd hh24:mi') end_time
from V$SGA_RESIZE_OPS order by start_time
/

Figure 4 shows an example output from the memory resize query.



Figure 4: Resize Operations Query Results

In some databases, notably data warehouse (DWH) and decision support (DSS), it may be impossible to provide enough cache to support all operations. However, in many cases the developers or the DBAs or both may not be aware of all the options Oracle provides. Many times when reviewing the AWR or Statspack for DWH systems I see they are not utilizing star joins. Star joins will reduce 30 minute queries to 30 seconds when utilized correctly.

In the case where you must have db file scattered reads (full table or index scans), you have to make sure the IO subsystem is properly sized to handle the load. As was said in the section on physical IO, to get adequate performance from disk based systems it may be necessary to over buy the number of disks needed to the tune of 30-40 or more times the number of disks needed for storage capacity alone to get the needed number of IOPS. The major cause of the need for all of these drives is the contention and queuing that result from too-few spindles. By utilizing low latency IO subsystems that don’t block, such as SSDs, you don’t need to over buy storage capacity to meet IOPS needs.

In the case where is it not economical or feasible to add memory to a server or servers (RAC) then replacing some of the high latency disks with low latency IO subsystems such as large cache or SSDs is a method to get higher performance from your system.

Sort Areas

Another area of memory that people tend to set and forget is the setting for PGA_AGGREGATE_TARGET (PGAT). This PGAT setting controls the SORT_AREA_SIZE, HASH_AREA_SIZE and various other sort and hash parameters. One temporary activity that isn’t controlled are the sizes of the BITMAP_MERGE_AREA and BITMAP_SORT_AREA sizes which can be of critical importance if you are using bitmap indexes. I usually look at the instance activity and PGA histogram sections of the AWR or Statspack reports. Some example report sections are shown in Figure 5.



Figure 5: Example Sort Statistics

Notice that the majority of sorts in Figure 5 are in the range of 2-16 megabyte, these should easily be covered by the PGAT setting, 1.0 gb in this database, so why are we getting these small sorts? Processes that do direct IO to and from the PGA may avoid the PGAT and use the traditional SORT_AREA_SIZE setting. In this case you may to go ahead and set the SORT_AREA_SIZE to 16 megabytes to handle these sort needs. A majority of the rest of the sorts are equal to or less than 512 megabytes which should be handled if we boost the setting to 4 gigabytes. Also, in the case of the above report, the temporary tablespace was the number one source of IO in the system as shown in Figure 6.



Figure 6: Temporary Tablespace #1 Source of IO

The PGA Memory Advisor section from the same report is shown in Figure 7.



Figure 7: Example PGA Advisory

When looking at this PGA Memory Advisory section you need to pay attention to the Estd PGA Overalloc Count column. In this report the value for the Overallocation count column doesn’t go to zero until 2400 megabytes. I have found that this section, when followed, tends to under allocate what you need. In this case we have determined that 4 gb is actually a better number. There have been times when there have been sorts (other than those accountable to non-PGAT controlled processes) in the range where the PGAT should have handled and yet the Overallocation count said everything was fine.

DB Flash Cache

If you are running on Solaris or Oracle Linux you can utilize the new 11gR2 DB Flash cache. The DB flash cache can be thought of as an L2 cache for the normal database default memory cache. If you have some spare memory and db sequential reads are your majority wait event, use of a DB flash cache may be the solution to holistically improve the performance of the system. Usually it should be set to between 2 to 10 times the size of your default DB cache size.

The DB Flash Cache is maintained by the DB writer processes. If you configure the DB Flash cache I suggest doubling the number of the DB writer processes to facilitate the management of the DB Flash cache. Usually the DB Flash cache will be on a PCIe based Flash card such as the RamSan70. However, there is no reason why you can’t utilize an external Flash appliance such as the RamSan 630/700/800 series for the DB Flash cache. The only caveat is that the DB Flash cache assignment device must have a latency that is very much less than the main source of the cached blocks or you will not get good performance improvements.

No comments:

Post a Comment