Introduction
It happens more times than we care to think about. A new
project starts, much time and energy is spent on selecting the right database
system, hardware is bought, specifications are developed and then the bomb from
on high goes off…”This must be database independent”, come again?
The Grail of Database Independence
Database independent applications hold great lures for
management, the thought of developing a single application and then, at will,
being able to swap out the database underneath it all, say, from expensive
Oracle to cheap MySQL.
Another argument for moving the logic from the database tier
to the application is that then the company has greater control over the
application and isn’t dependent on the database vendor. Isn’t this rather like
buying a Mazerratti but replacing the steering with ropes and pulleys and the
brakes with a pull-board nailed to the side of the drivers door so you aren’t
dependent on the manufacturer, and oh by the way, replace that high-tech fuel
injection system with a single barrel venturi carburetor so anyone can maintain
it, and jerk out that electronic ignition, we want points and plugs, while you
are at it, remove the computer and sensors as we really don’t want to be
bothered with them.
Unfortunately, like the quest for the Holy Grail of old, the
quest for a truly database independent application is almost always doomed to
fail unless one is pure of code and deed. Why do I say this? Because you must
code to the least common denominators in order to achieve database
independence.
What Are least Common Denominators?
In math the least common denominator (LCD) is that number
that will equally divide a group of numbers, for example for {6,27,93} the LCD
is 3. But what does this least common denominator principle mean to database
independence? Essentially the LCD for a group of databases is the largest set
of features that is common to all of the database systems. For example, SQL Server,
Oracle, MySQL all have mechanisms to populate a numeric key, SQL Server and
MySQL have an auto increment feature, Oracle uses a sequence generator,
therefore there would be no common function between the three for populating a
number based key. In another case, all use ANSI standard SQL statements so you
could use SQL with all three, as long as you limited your usage to pure ANSI
standard SQL.
So What is the Big Deal?
Database independence means several things:
- Move
application logic into the middle or upper tiers
- Use no
database specific feature
- Use no
database specific language
Let’s look at these three items for a start and see why they
may be issues in many environments.
Move Application Logic into the Middle Tier
PL/SQL, T-SQL and other database languages were developed to
provide application logic friendly coding at the lowest level, at the database
itself. Oracle, MicroSoft and other database vendors have spent many dollars
and hours (years) of effort to make these languages as internally efficient as
possible so they can provide the performance needed. Now we are told we must
move the logic to the middle tier or into the application itself, mainly
because coders are too busy (lazy) to learn PL/SQL, T-SQL or a specific
database language. What does this mean exactly?
- No
triggers – triggers work off of internal PL/SQL or T-SQL logic
- No
replication – since it can’t be done independent of database code
- No
referential integrity – again, not all databases support this logic
- No
packages, procedures, Java in the database – Not database independent if
code is run internally
- No
advanced queuing since it is dependent on database specific routines
- No
virtual private databases – again this is dependent on internal codes and
procedures
- No
types, objects, methods, materialized views, summaries, dimensions, etc-
not all databases support these constructs
All of the code that would be internalized or handled
internally by the database in items 1-7 now must be written in external
routines and moved to the middle tier or to the application itself. The coders
must recreate the wheels of database development for each application.
Use no Database Specific Feature
For the other database systems this isn’t such a big deal,
however, Oracle’s feature set is robust but you have to throw all of those
features away if you buy into the database independent movement. What will you
give up? Let’s just do a partial list:
- Bitmap
Indexes
- Bitmap
join indexes
- Function
based indexes
- Advanced
text based indexing
- Reverse
key indexes
- Hash,
list, composite partitions
- No
PL/SQL triggers, functions, procedures or packages
- No
types, methods, etc.
- Materialized
views, summaries, dimensions
- No
Java in the database
- No XML
in the database (at least no parsing)
- Advanced
security features (password checking, aging, logon restrictions, etc)
- Real
Application Clusters
- Oracle
standby features
Of course some times it isn’t as strict as this and may
allow anything that is structural in nature to be utilized as long as it
doesn’t affect the code base, however, you have to admit most of the items
above, if not available, would require additional coding, so it really doesn’t
change the list that much.
Use No Database Specific Language
This is the restriction that probable causes the most
problems, the restriction on using no database specific language. In Oracle the
database specific language is PL/SQL but this also covers the extensions to the
ANSI Standard SQL in most cases. PL/SQL is the heart of Oracle, the DBMS
packages are of course all written in PL/SQL, triggers are written in PL/SQL,
functions, procedures and methods are also written in PL/SQL so you would not
be able to use any of them.
If the rule against using a database specific language or
extension extends to triggers, referential integrity and internals then you end
up throwing away many performance enhancements. Let’s look at an example;
PL/SQL developers will recognize the construct in Figure 1 as a key populating
trigger.
CREATE OR REPLACE
TRIGGER exam_pkey
BEFORE INSERT ON
exam FOR EACH ROW
BEGIN
SELECT
exam_seq.NEXTVAL
INTO :NEW.exam_pk
FROM DUAL;
END;
Figure 1: Example Simple
PL/SQL Trigger
Now, what would the pseudo-code for an external program to
do the same thing as this simple trigger look like? Let’s assume we use a
central sequence table that stores the owner, table name, column name and last
value used.
1. Send Select to
database to get ID for row containing our owner, table, column name and last
value
2. Implicitly lock
row
3. Add
1 to the last value
4. Insert row with
new value into table
5. Update row in
sequence table
6. Release lock on
sequence table
Figure 2: Psuedo-code for External Sequence Routine
The steps in bold in Figure 2 require at least one round
trip to the database. In addition, while the table row for our sequence table
is locked, no one can add rows to the base table. So rather than send a simple
update to the database and letting it handle the nitty-gritty, we have to send:
1 Select, 1 lock, 1 Insert, 1 Update, 1 release lock commands to the database
all requiring SQLNet roundtrips. The above pseudo-code was what we had to do
back in early versions of Oracle before we had sequences and triggers! Database
independent coding is actually taking a giant step backwards! We are talking at
least 10 network roundtrips to do what Oracle does in 2, not to mention all the
recursive SQL since we really can’t use bind variables the same way in each
database.
Of course we also lose the efficiencies of cached sequences,
the locking issues will slow down high transaction database applications and
the additional network traffic will clog the networks.
This is just a simple example of what truly database
independent coding would entail, imagine the more complex PL/SQL routines such
as the streams, advanced queuing or DBMS_LOCK having to be recoded in C, C#,
Java or some other language?
The Mystery
The real mystery here is why, if the application sponsors or
developers really wanted database independence that they would choose Oracle,
SQL Server, DB2 or Informix for their application. Of course maybe the database
is the corporate standard and it is being forced upon them.
Refusing the use the best tools to get the job done,
especially when they don’t cost you anything, is rather dense. Building a
database independent application on top of Oracle is akin to ignoring a pile of
precut, premeasured, prebuilt walls and roof sections and chopping down trees
instead to build a home. The end-point will be a home (how livable will depend
on the skill of the builders) but starting with trees you will take longer,
have to do many more cuts and probably won’t end up with as well built a house.
The Solution
It is almost a surety that building a database independent
application will take longer, be more complex and the resulting application will
perform worse than if the strengths of the underlying database are utilized.
The number of times a major application has been moved within a company from
one database to another, especially from a more capable to a less capable
database, is relatively few and almost always, the result is not as good as it
seemed it would be.
A majority of issues that are used to drive toward database
independence for applications are usually a result of:
- Buying
too much database to begin with
- Hiring
inexperienced developers
- Developers
unwilling to learn database specific skills
- Being
unable or unwilling to get needed skills
- Throne
decisions
It is almost always cheaper to hire developers who have a
complete understanding of your database (Oracle or something else) or to train
your existing developers to understand your database features and capabilities
than to reinvent the wheels already in the database system you own. It is
guaranteed that in most cases the internalized database feature will perform
better than anything you develop to run externally, across the network, in a
separate tier.
The Issues with Database Independent Applications
The major issues with database independent applications are:
- True
portability is almost impossible
- It is
nearly impossible to achieve acceptable scalability
- It is
nearly impossible to achieve acceptable performance
- Maintenance
- Retention
of skill sets and knowledge
Achieving True Portability
In order to achieve full portability of code all database
and application logic must be contained within the application itself,
absolutely no database specific feature can be used. Face it, this just can’t
be done except with the most basic of applications. All cross-database vendors
have database specific routines that interface to the underlying databases,
they also have database specific installation scripts for schemas and
structures.
Scalability
As soon as you say “network round trip” you negate
scalability, the more round trips an application requires the less it will
scale. More calls to the database increases locks, latches and recursive SQL
reducing scalability. The more round trips you have in a transaction, the less
dependent you are on the server and database for scalability and you move the
dependence to the network which now becomes the Achilles heel of your system.
Performance
When code is contained in the database the database does the
work internally and many times the code will be further optimized as it is
compiled by the internal compilers and interpreters, an example would be the
internalization and optimization of referential integrity triggers in Oracle
another would be the external compilation of PL/SQL by Oracle11g. Internal code
is also pre-loaded after the first execution and has already been parsed and
optimized on subsequent executions. When code calls Oracle externally, Oracle
has to do soft or hard parses depending on how well the code is written, it
also cannot use optimized internal routines for common functions such as
sequences and referential integrity. This results in more latching, locking,
and recursive SQL generating more load on the database server, and, more load
on the application tier servers as well.
Even a perfectly written database independent application
cannot out perform a perfectly written database specific application. Database
independence is obtained at the cost of efficiency and performance.
Maintenance
Writing the complete application while not taking advantage
of database feature sets means re-writing a lot of code that already exists.
The code in database independent applications must be maintained by the group
that writes it. Unfortunately in this day and age much of the code base is
being written using offshore resources, this means the in-house staff, if
present at all, now must maintain code that they didn’t even write! Worse yet,
your application may be dependent on support once or twice removed (India is now offshoring to China.)
When database specific features, languages and capabilities
are utilized this greatly reduces the maintenance needs for your staff, as the
maintenance, bug fixes and other worries are now on the database manufacturer
and not on your staff.
Retention of Knowledge and Skill Sets
Even if the application is fully developed in-house, can you
guarantee that 5 years down the road you will have the expertise to maintain
it? Think about how much companies where paying Cobol developers to come out of
retirement to fix the Y2K problems. There will always be PL/SQL and Oracle
experts, can you say as much for that wonderful new language dejur you might be developing in right now?
People move on, get promoted, get fired, skill sets atrophy
or are considered obsolete. That great deal on developing your application
offshore in Y+ might seem a bargain today, but what about 5 years down the road
when Z# is the skill set offered and there are no Y+ developers to be found.
Take the long look ahead instead of to the next quarter.
Summary
Database independence is a myth, it doesn’t exist. Database
independence will always mean poorer scalability and poorer performance than
with an application that takes advantage of the strengths of its underlying
database. Independent code increases maintenance, decreases reliability and
increases dependence on your development teams (in house or off shore.) Just
say no to database independence!