Monday, June 11, 2012

The Myth of Database Independence


Introduction

It happens more times than we care to think about. A new project starts, much time and energy is spent on selecting the right database system, hardware is bought, specifications are developed and then the bomb from on high goes off…”This must be database independent”, come again?

The Grail of Database Independence

Database independent applications hold great lures for management, the thought of developing a single application and then, at will, being able to swap out the database underneath it all, say, from expensive Oracle to cheap MySQL.

Another argument for moving the logic from the database tier to the application is that then the company has greater control over the application and isn’t dependent on the database vendor. Isn’t this rather like buying a Mazerratti but replacing the steering with ropes and pulleys and the brakes with a pull-board nailed to the side of the drivers door so you aren’t dependent on the manufacturer, and oh by the way, replace that high-tech fuel injection system with a single barrel venturi carburetor so anyone can maintain it, and jerk out that electronic ignition, we want points and plugs, while you are at it, remove the computer and sensors as we really don’t want to be bothered with them.

Unfortunately, like the quest for the Holy Grail of old, the quest for a truly database independent application is almost always doomed to fail unless one is pure of code and deed. Why do I say this? Because you must code to the least common denominators in order to achieve database independence.

What Are least Common Denominators?

In math the least common denominator (LCD) is that number that will equally divide a group of numbers, for example for {6,27,93} the LCD is 3. But what does this least common denominator principle mean to database independence? Essentially the LCD for a group of databases is the largest set of features that is common to all of the database systems. For example, SQL Server, Oracle, MySQL all have mechanisms to populate a numeric key, SQL Server and MySQL have an auto increment feature, Oracle uses a sequence generator, therefore there would be no common function between the three for populating a number based key. In another case, all use ANSI standard SQL statements so you could use SQL with all three, as long as you limited your usage to pure ANSI standard SQL.

So What is the Big Deal?

Database independence means several things:
  1. Move application logic into the middle or upper tiers
  2. Use no database specific feature
  3. Use no database specific language

Let’s look at these three items for a start and see why they may be issues in many environments.

Move Application Logic into the Middle Tier

PL/SQL, T-SQL and other database languages were developed to provide application logic friendly coding at the lowest level, at the database itself. Oracle, MicroSoft and other database vendors have spent many dollars and hours (years) of effort to make these languages as internally efficient as possible so they can provide the performance needed. Now we are told we must move the logic to the middle tier or into the application itself, mainly because coders are too busy (lazy) to learn PL/SQL, T-SQL or a specific database language. What does this mean exactly?

  1. No triggers – triggers work off of internal PL/SQL or T-SQL logic
  2. No replication – since it can’t be done independent of database code
  3. No referential integrity – again, not all databases support this logic
  4. No packages, procedures, Java in the database – Not database independent if code is run internally
  5. No advanced queuing since it is dependent on database specific routines
  6. No virtual private databases – again this is dependent on internal codes and procedures
  7. No types, objects, methods, materialized views, summaries, dimensions, etc- not all databases support these constructs

All of the code that would be internalized or handled internally by the database in items 1-7 now must be written in external routines and moved to the middle tier or to the application itself. The coders must recreate the wheels of database development for each application.

Use no Database Specific Feature

For the other database systems this isn’t such a big deal, however, Oracle’s feature set is robust but you have to throw all of those features away if you buy into the database independent movement. What will you give up? Let’s just do a partial list:

  1. Bitmap Indexes
  2. Bitmap join indexes
  3. Function based indexes
  4. Advanced text based indexing
  5. Reverse key indexes
  6. Hash, list, composite partitions
  7. No PL/SQL triggers, functions, procedures or packages
  8. No types, methods, etc.
  9. Materialized views, summaries, dimensions
  10. No Java in the database
  11. No XML in the database (at least no parsing)
  12. Advanced security features (password checking, aging, logon restrictions, etc)
  13. Real Application Clusters
  14. Oracle standby features

Of course some times it isn’t as strict as this and may allow anything that is structural in nature to be utilized as long as it doesn’t affect the code base, however, you have to admit most of the items above, if not available, would require additional coding, so it really doesn’t change the list that much.

Use No Database Specific Language

This is the restriction that probable causes the most problems, the restriction on using no database specific language. In Oracle the database specific language is PL/SQL but this also covers the extensions to the ANSI Standard SQL in most cases. PL/SQL is the heart of Oracle, the DBMS packages are of course all written in PL/SQL, triggers are written in PL/SQL, functions, procedures and methods are also written in PL/SQL so you would not be able to use any of them.

If the rule against using a database specific language or extension extends to triggers, referential integrity and internals then you end up throwing away many performance enhancements. Let’s look at an example; PL/SQL developers will recognize the construct in Figure 1 as a key populating trigger.

CREATE OR REPLACE TRIGGER exam_pkey
BEFORE INSERT ON exam FOR EACH ROW
BEGIN
SELECT exam_seq.NEXTVAL
INTO :NEW.exam_pk
FROM DUAL;
END;

Figure 1: Example Simple PL/SQL Trigger

Now, what would the pseudo-code for an external program to do the same thing as this simple trigger look like? Let’s assume we use a central sequence table that stores the owner, table name, column name and last value used.

1. Send Select to database to get ID for row containing our owner, table, column name and last value
2. Implicitly lock row
3. Add 1 to the last value
4. Insert row with new value into table
5. Update row in sequence table
6. Release lock on sequence table
Figure 2: Psuedo-code for External Sequence Routine

The steps in bold in Figure 2 require at least one round trip to the database. In addition, while the table row for our sequence table is locked, no one can add rows to the base table. So rather than send a simple update to the database and letting it handle the nitty-gritty, we have to send: 1 Select, 1 lock, 1 Insert, 1 Update, 1 release lock commands to the database all requiring SQLNet roundtrips. The above pseudo-code was what we had to do back in early versions of Oracle before we had sequences and triggers! Database independent coding is actually taking a giant step backwards! We are talking at least 10 network roundtrips to do what Oracle does in 2, not to mention all the recursive SQL since we really can’t use bind variables the same way in each database.

Of course we also lose the efficiencies of cached sequences, the locking issues will slow down high transaction database applications and the additional network traffic will clog the networks.

This is just a simple example of what truly database independent coding would entail, imagine the more complex PL/SQL routines such as the streams, advanced queuing or DBMS_LOCK having to be recoded in C, C#, Java or some other language?

The Mystery

The real mystery here is why, if the application sponsors or developers really wanted database independence that they would choose Oracle, SQL Server, DB2 or Informix for their application. Of course maybe the database is the corporate standard and it is being forced upon them.

Refusing the use the best tools to get the job done, especially when they don’t cost you anything, is rather dense. Building a database independent application on top of Oracle is akin to ignoring a pile of precut, premeasured, prebuilt walls and roof sections and chopping down trees instead to build a home. The end-point will be a home (how livable will depend on the skill of the builders) but starting with trees you will take longer, have to do many more cuts and probably won’t end up with as well built a house.

The Solution

It is almost a surety that building a database independent application will take longer, be more complex and the resulting application will perform worse than if the strengths of the underlying database are utilized. The number of times a major application has been moved within a company from one database to another, especially from a more capable to a less capable database, is relatively few and almost always, the result is not as good as it seemed it would be.

A majority of issues that are used to drive toward database independence for applications are usually a result of:
  1. Buying too much database to begin with
  2. Hiring inexperienced developers
  3. Developers unwilling to learn database specific skills
  4. Being unable or unwilling to get needed skills
  5. Throne decisions

It is almost always cheaper to hire developers who have a complete understanding of your database (Oracle or something else) or to train your existing developers to understand your database features and capabilities than to reinvent the wheels already in the database system you own. It is guaranteed that in most cases the internalized database feature will perform better than anything you develop to run externally, across the network, in a separate tier.

The Issues with Database Independent Applications

The major issues with database independent applications are:
  1. True portability is almost impossible
  2. It is nearly impossible to achieve acceptable scalability
  3. It is nearly impossible to achieve acceptable performance
  4. Maintenance
  5. Retention of skill sets and knowledge

Achieving True Portability

In order to achieve full portability of code all database and application logic must be contained within the application itself, absolutely no database specific feature can be used. Face it, this just can’t be done except with the most basic of applications. All cross-database vendors have database specific routines that interface to the underlying databases, they also have database specific installation scripts for schemas and structures.

Scalability

As soon as you say “network round trip” you negate scalability, the more round trips an application requires the less it will scale. More calls to the database increases locks, latches and recursive SQL reducing scalability. The more round trips you have in a transaction, the less dependent you are on the server and database for scalability and you move the dependence to the network which now becomes the Achilles heel of your system.

Performance

When code is contained in the database the database does the work internally and many times the code will be further optimized as it is compiled by the internal compilers and interpreters, an example would be the internalization and optimization of referential integrity triggers in Oracle another would be the external compilation of PL/SQL by Oracle11g. Internal code is also pre-loaded after the first execution and has already been parsed and optimized on subsequent executions. When code calls Oracle externally, Oracle has to do soft or hard parses depending on how well the code is written, it also cannot use optimized internal routines for common functions such as sequences and referential integrity. This results in more latching, locking, and recursive SQL generating more load on the database server, and, more load on the application tier servers as well.

Even a perfectly written database independent application cannot out perform a perfectly written database specific application. Database independence is obtained at the cost of efficiency and performance.

Maintenance

Writing the complete application while not taking advantage of database feature sets means re-writing a lot of code that already exists. The code in database independent applications must be maintained by the group that writes it. Unfortunately in this day and age much of the code base is being written using offshore resources, this means the in-house staff, if present at all, now must maintain code that they didn’t even write! Worse yet, your application may be dependent on support once or twice removed (India is now offshoring to China.)

When database specific features, languages and capabilities are utilized this greatly reduces the maintenance needs for your staff, as the maintenance, bug fixes and other worries are now on the database manufacturer and not on your staff.

Retention of Knowledge and Skill Sets

Even if the application is fully developed in-house, can you guarantee that 5 years down the road you will have the expertise to maintain it? Think about how much companies where paying Cobol developers to come out of retirement to fix the Y2K problems. There will always be PL/SQL and Oracle experts, can you say as much for that wonderful new language dejur  you might be developing in right now?

People move on, get promoted, get fired, skill sets atrophy or are considered obsolete. That great deal on developing your application offshore in Y+ might seem a bargain today, but what about 5 years down the road when Z# is the skill set offered and there are no Y+ developers to be found. Take the long look ahead instead of to the next quarter.

Summary

Database independence is a myth, it doesn’t exist. Database independence will always mean poorer scalability and poorer performance than with an application that takes advantage of the strengths of its underlying database. Independent code increases maintenance, decreases reliability and increases dependence on your development teams (in house or off shore.) Just say no to database independence!

No comments:

Post a Comment