In this next Holistic tuning blog, I will examine database structures. The structure layer consists of the internal physical structures that make up the database. In the beginning it was just tables and indexes; now of course we have several types of tables, several types of indexes, materialized views, LOBs, objects, and many other structures to be concerned about.
Improperly Designed Tables
Even though processors have gotten more powerful and disks and storage arrays more capable, you can still defeat them with improper table design. Generally, improper table designs fall into two broad categories: insufficiently normalized and over normalized designs.
Insufficiently normalized tables result in repeating values within a table. Insufficient table design usually produces excessive IO, excessive sorts, and excessive use of DISTINCT.
Over-normalized tables usually result in too many joins, excessive sorts caused by Cartesian products from insufficient WHERE clauses, and overly complex SQL.
In most cases Oracle will not produce an efficient execution plan for over 6 tables unless you have a data warehouse and utilize a star join path. The undocumented parameter “_optimizer_max_permutations” sets the number of path determinations to 2000. Since the number of permutations is based on the factorial of the number of tables in the join, you reach 2000 between 6 tables (6 n!=720) and 7 tables (7 n!=5040). Place the most important (i.e. the ones that reduce the result set the most) tables first in the FROM clause if you have over 6 tables as (at least last time I reviewed this) the tables are evaluated from left to right when it is going through possible paths.
Use of flex fields can also be a show stopper for performance. Flex fields are fields whose content type is determined by the context of previous rows. In one case I was called on to help with creating a data warehouse from an application that was based on Oracle Applications. In this case they had used the “flex” fields in Oracle applications to specify that a single column was one of: a number, a date, or a character field, depending on other values in the row, in fact there where several different fields in the same row used the same way. In order to resolve one entry into the companion data warehouse we had to recursively join this table to itself seven times.
Use of improperly typed fields can also cause performance and storage problems, for example using a large CHAR based field for small length character values. CHAR reserves whatever space is specified regardless of the length of the input data. This requires extra programming for comparison semantics and wastes storage space. Another example is using CHAR or VARCHAR2 for numeric data, thereby forcing conversion (either implicit or explicit).
Non-use of partitioning is another possible design flaw. Partitioning is an extra license but can be worth it in time and resources saved. The biggest benefit from partitioning is partition elimination. Partition elimination is when only the partitions needed to resolve the query are scanned. Partitions, when used properly (as in with a proper partition key), reduce IO requirements for the entire database by eliminating full table scans. On the other hand, if a partitioned table has an improper partition key it may make performance worse.
Use of Materialized Views
Materialized views, also known as summaries, allow you to pre-build reports, summaries and other “end results” so that when someone issues a query that can use the materialized view, the results appear instantly. Use of proper materialized views is important to reduce IO and CPU usage as well as reducing sort space usage. You need to determine if you need “fresh” data or can stomach using “stale” data.
With fresh data you would want to use an “On-commit” materialized view that ties its refresh cycle to the main table that is in the materialized view. You can also use an “on request” materialized view if data doesn’t have to be fresh. The on request materialized view can be used with a schedule to refresh it at whatever periodicity is needed (based on how stale the results can be.)
Insufficient Indexes
Insufficient indexing results in full table scans, excessive sorting, and excessive CPU usage. Indexes, when used properly are probably one of the most performance enhancing features of Oracle.
Another area that can really affect holistic performance is when you move referential integrity (RI) from the database to the application. There was a large movement to do this type of design to allow the application to be database layer agnostic a few years ago. Unfortunately this design was sometimes only partially utilized leaving some database defined RI mixed with application based RI. In this mixed RI case the foreign keys would often not be indexed resulting in excessive row-locking in the application. Of course this can also happen if you use database based RI and don’t index foreign keys.
Over-Indexing
Over indexing results in excessive times for DML operations such as insert, update, and delete because the amount of IOPS required to perform each action is multiplied several fold with the addition of each index.
Several years ago Kevin Loney did a study to see the effects of multiple indexes on Insert, Update and Delete operations. Essentially the time to do the operations increased by large amounts as the number of indexes increased due to the increased number of IO operations required.
Use of an Incorrect Index
Using the wrong type of index can result in either the index not being used at all, or being used inefficiently. For example, using a single bitmap index will usually result in poorer performance than using the equivalent B-tree index. Using a regular index when a function-based index is required will result in a full table scan. Another problem with indexes is improper use such as using a single concatenated index rather than single indexes. Single indexes can sometimes provide a greater advantage than a single composite by allowing a more diverse group of SQL statements to utilize the single indexes, either as single indexes or by using two or more of the indexes. There are many other issues of this type; be sure to research the proper index to use in each situation.
Use of proper indexing and making sure you aren’t over indexing reduces IO and CPU cycles as well as speeding up queries; it will also make sure you don’t excessively penalize DML (Insert, Update and Delete) operations.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment