Friday, May 18, 2012

Complete Information

A man wanted to hire a fruit picker. He went down to the local general store and inquired after available locals who would be willing to work hard and help get in his fruit. One old fella at the store told him about a man he used that was 10 times faster than the one he used before. Thinking "Boy! Ten times faster! I'll pay much less and get my fruit in just as fast!" Well, he hired the picker and discovered to his dismay that he wasn't any faster than pickers he had used before. Disgusted he went back to give the old timer a piece of his mind. The old timer explained, "Yep, he was ten times faster than my previous picker, of course my previous picker was 90 years old with arthritis!"

The purpose of the above illustration is just to stress that unless you know what it is that something is being compared to, you have no way to know if what you are being told makes sense or not. A case in point, whenever I do a presentation comparing performance, I always give the configuration used for both the before and after systems. By showing both the before and after systems the attendees or readers or viewers know exactly what I am comparing. Usually if at all possible I try to make the only difference be the thing I am trying to compare, for example, a hard drive based configuration verses and SSD one. In that case I would make sure I used the same or identical servers, memory and CPU as well the same interface and make sure that both systems where matched as far as storage capacity and bandwidth. It is the only fair way to compare two systems.

If I wanted to I could show statistics proving the TMS SSDs ran hundreds of times faster. Of course faster than what I wouldn't say. If I compared to a 2-CPU single core machine with 64 MB of memory running against a 5 disk RAID5 array of 7500 RPM drives capable of 1 Gb/s bandwidth and 1000 IOPS and 5 ms latency and then ran the comparison against a 8 CPU, 8 core per CPU machine with 2 Terabytes of memory and a RamSan630 with 10-4 Gb/s interfaces and 1,000,000 IOPS with 0.110 ms latency as the storage how much improved do you think the performance would be. I wouldn't be lying to you, but I would be omitting some critical details!

Now to the meat of it, I have attended many Exadata presentations and read many of the Exadata white papers. In all of the presentations and all of the whitepapers where they give their 5X, 10X, 26X times improvement comparisons they never tell you what the prior server setup contained. They aren't giving us the tools to be able to draw a fair conclusion from the information given. Come on Oracle, play fair! Show the before and after configurations so we can be the judge if it was a fair comparison or not. If a client is going from a resource constrained environment to one that is over provisioned for every resource (memory, CPU and IO) then naturally you will get significant performance improvements, if you don't then something is really wrong.

Tuesday, May 15, 2012

If Not Exadata?

Many companies are asking the question: If not an Exadata, then what should we buy? Let’s examine one of the alternatives to the Exadata, in this case the Exadata X2-8 since it is the top of the line.

In my previous blog I showed using numbers from Oracle’s own price sheets, that an Exadata X2-8 will cost around $12M dollars over a three year period considering initial cost, hardware support and software licensing. I didn’t include the required installation and consulting fees that go with that because they can vary depending on how many databases you move to the Exadata and the complexities of installation.

I didn’t talk about the specifications of the X2-8 from a performance point of view. Let’s examine the raw performance numbers. Figure 1 is taken from a presentation given by Greg Walters, Senior Technology Sales Consultant, Oracle, Inc. to the Indiana Oracle Users Group on April 11, 2011 shows the capabilities of the various Exadata configurations.

Figure 1: Exadata Performance Numbers

So for this blog I am primarily concerned with the numbers in the first column for the Exadata X2-8 Full Rack. Also, I assume that most will be buying the high performance disks so if we look at those specifications and meet or beat them, then we will also beat the low performance values as well. So the values we are concerned with are:  

Raw Disk Data Bandwidth: 25 GB/s 
Raw Flash Data Bandwidth: 75 GB/s 
Disk IOPS: 50,000 
Flash IOPS: 1,500,000 
Data Load Rates: 12 TB/hr 

  Pay attention to notes 2 and 3:

Note 2 says:

IOPS- based on peak IO requests of size 8K running SQL. Note that other products quote IOPS based on 2K, 4K or smaller IO sizes that are not relevant for databases.

So the actual value for IOPS is based on peak not steady state values. This is an important distinction since the system cannot sustain the peak value except for very short periods of time. The second part of the note is rather disingenuous as when the IO is passed to the OS the request is broken down into either 512 byte or 4K byte IO requests since most OS can only handle 512 byte or 4K IOs. A majority of modern disks (like those in the storage cells in Exadata) will only support 4K IO size so arguing that testing at 8K is more realistic is rather simplistic. In addition flash IO is usually done at 4K

Note 3 says:  

Actual performance will vary by application.

This is similar to mileage may vary and simply means that the numbers are based on ideal situations and the actual performance will probably be much less.

So now we have the performance characteristics of the Exadata. The question is: Are these based on measurement or on what the interface will provide? At 50K IOPS with an 8K block size You only get 0.38 GB/s, do the math: 50,000*8192/1024^3=0.3814. On the 1,500,000 IOPS from the flash: 1,500,000*8192/1024^3=11.44 GB/s so the highest bandwidth that can actually be attained at peak IOPS for both disk and Flash would be 11.82 GB/s.

Note 1 says that are not including any credit for either advanced or HCC compression.

Also notice they don’t tell you if the IOPS are based on 100% read, 80/20 read/write or 50/50 read/write, a key parameter is the mix of reads and writes if it is not specified the number given is useless.

One additional little thing they don’t tell you, is that the Flash cache is at the Cell level and is actually used as an Oracle optimized SAN cache. This is read-only. What does this mean? It means unless the data is relatively stable (non-changing) the actual IOPS from the cache could be quite a bit lower than advertised. Now in a data warehouse with data that doesn’t change I have no doubt they can get read numbers from the combined caches that reach that high at peak.

Ok, so now we have some performance numbers to compare to:

Disk IO bandwidth: 0.38 GB/s 
Flash IO Bandwidth: 11.44 GB/s 
Disk IOPS: 50,000 (read/write ratio unknown) 
Flash IOPS: 1,500,000 (since this is cache, read-only) 
Total IOPS: 1,550,000 (high estimate, it is unlikely you will get this level of IOPS)

So the total IOPS for the system is 1,550,000 IOPS and the total bandwidth is 11.82 GB/s. They quote a loading bandwidth of 12 TB/s but make the claim it is based on the CPUs more than the IO capabilities. So, if we provide adequate bandwidth and CPUs we should be able to match that easily.

I don’t care how advanced the disk is, a high-performance disk will be lucky to achieve 250 random IOPS. So, 14 Cells X 12 Disk/cell X 250= 42,000, now if you take the listed value of 300 for non-random IO then you get 50,400. In a test to achieve 100,000 IOPS from disks, EMC needed 496 disks yielding a value of 202 IOPS/disk, at that level their disk farm can only achieve close to 34,000 IOPS so again their numbers are rather optimistic.

Other specifications we need to pay attention to are the number of cores: 64/server with 2 servers so 128 cores and 1 TB of memory per server for a total of 2 TB. Also, the Exadata X2-8 system uses Infiniband so we will also use it in our configuration to provide similar bandwidth. So, to first deal with the servers, we will use the same ones that Oracle used in the X2-8, the Sunfire X4800 with 8-8 2.26 MHz core CPUs each and 1 TB of memory. This will run about $268,392.00. Now, is this the fastest or best server for this configuration? Probably not, but for purposes of comparison we will use it.

The needed Infiniband switches and cables and the associated cabinet we will need will probably be another $40K or so.

Now to the heart of the system, let’s examine the storage. Let’s get really radical and use pure SSD storage. This means we can do away with the flash cache altogether since putting a flash cache in front of flash storage would be redundant and would actually decrease performance. So we will need (from my previous blog) 30 TB of storage using the numbers provided by Oracle. That could be accomplished with 3 RamSan820 each with 10TB of HA configured eMLC flash. Each RamSan820 can provide 450,000 sustained read/ 400,000 sustained write IOPS with 2-2 Port QDR Infiniband interfaces; these RamSans would cost about $450K.

What would the specifications for this configuration look like?

Total servers: 2 
Total cores: 128 
Total memory: 2 TB 
Interface for IO: Infiniband 
Bandwidth: 12 GB/s from the interfaces, 5 GB/s sustained (by IOPS) 
Total Storage: 30 TB Total IOPS: 1,350,000 IOPS 80/20 read/write ratio doing 4K IOs (which by the way, map nicely to the standard IOs on the system). Peak rates would be much higher. 
Total cost with Oracle licenses and support for three years: Base: $7,062,392.00* + Support and licenses 2 additional years: $2,230,560.00=$9,292,952.00 for a savings of $2,618,808.00 over the three years.  

* Close to $6m of this cost is for Oracle core based licenses due to the 128 cores

You would also get a savings in support and license costs of $523,600.00 for each year after the first three in addition to the savings in power and AC costs.

Now, unless you are really consolidating a load of databases you will not need the full 128 CPUs, so you could save a bunch in license fees by reducing the number of cores (approximately $49K/core), while you can do that with the configuration I have discussed, you unfortunately cannot with the X2-8. You can do similar comparisons to the various X2-2 quarter, half and full racks and get similar savings.

Friday, May 11, 2012

Exadata - The Gift the Keeps on Taking

If your work is in the Oracle environment you have no doubt heard about the Oracle Exadata and the whole EXA-proliferation. It seems there is a Exa for everything, Exalogic for the cloud, Exalytics for the BI folks, Exadata for every database ever conceived. Oracle has certainly put those Sun engineers they bought to use over the last few years. Since I don’t usually deal with the cloud (yet) and BI isn’t really my game let’s discuss the Exadata database machines.

Essentially they come in two flavors, the Exadata X2-2 series in quarter, half and full rack configurations and the Exadata X2-8 which comes as a full rack only.Let’s take a look at the published prices for these various configurations. Now, bear in mind, Oracle will no doubt heavily discount license costs to get you on the hook I can’t say what any actual prices will end up, after all, one can no more think like an Oracle salesman than one can a used car salesman. Chart 1 shows the various configurations and costs.

Chart 1: Oracle Exadata Configurations

Of course the important numbers are towards the right where you see total hardware cost, total software cost and total support cost. The total support cost is for each year after the first year.

Chart 2: Oracle Hardware and Software Costs

Now the costs shown in Chart 2 come from the most current Oracle price lists available on the Web, just do a google search for “Oracle Engineered Systems Price List” and “Oracle Software Price List” and you can do the math like I did. Don’t forget to apply the 0.5 multicore discount!

Several years ago when I was not as wise as I am now, I purchased a timeshare. It seemed a good idea at the time, the salesman showed how the cost was eliminated by all the savings I would have over the years going on vacations to their wonderful resorts. Of course he didn’t say how often those wonderful resorts would not be available and brushed little things like maintenance fees under the rug, so to speak. It ended up for each month I would pay over $400 in maintenance fees and these could go up several percent a year, forever. Needless to say I am divesting myself of that timeshare.

Why am I bringing this up? This is identical to the maintenance and upgrade fees you will be paying with the various Exadata setups.

Chart 3: Exadata Support Costs

Unless you really need those 128 CPUs in the X2-8 or the total number of Exadata Cells you need to purchase with the various configurations why pay a yearly fee forever for capacity you aren’t using? With the Oracle suggested configurations you will end up paying for 100% of the disks when you can only utilize about 33% or less of their capacity. Talk about unavailability! So we have seen the short term costs but what about down the road? One ROI is the three year cost, Chart 4 shows the projected 3 year costs for the various Exadata platforms. Again, these are with published, full price numbers, if your are smart you will never pay these amounts.
Chart 4: 3 Year Costs Now, if you are consolidating dozens or hundreds of databases, have a huge data warehouse with a simple design and lots of duplicate row entries, or just have a huge pot of money to spend, the Exadata platform may be the best fit. But for most of us the Exadata is too expensive now, and too expensive in future costs. Exadata, the gift that keeps on taking.

Thursday, May 3, 2012

Big Data

A couple of years ago I kidded that soon someone would come up with a Chaos Database system where you dumped everything into a big pile and let the CPU sort it out. It appears that the day has come. I am attending the Enterprise Data World where data architects, data managers and various folks interested in DWH, DSS, BI meet and greet. Here the big buzz is about big data. The best definition I have seen so far is it is essentially columnar designated data. For example, a document would be stored with a unique id tag, a column group, a column group description and a timestamp (actually at linear sequence of some kind) and the data package which would be the document. Now inside the document it is hoped there are XML tags which parse out the information contained, if not, some kind of index is created on the document. So, instead of a table you have a single column with multiple data pieces and a package, where the package contains the information and the rest tells the system how it fits into a predefined data taxonomy or oncology. You also have a bunch of idividual index structures for the packages which aren't self intelligent. A taxonomy is essentially a definition of a specific data set. For example, animal breaks down into species, which break into subspecies which breaks into sexes. The tag portion of the columnar data puts the data into the particular spot in the data taxonomy where it fits. Once all your individual big data pieces have been encoded with their tags and stored, a language such as Hadoop is used to access it using MapReduce statements which is essentially a map of taxonomy objects and what to do with them if you find them. This is hand coded. Of course all of this, Big Data, Hadoop, NoSQL (not only SQL) and all the new hoopla is in its beginning, it is around Oracle2.0 level. It seems to be the revenge of the Data Architect and Programmer against the relative ease of database programming as it exists today. It would seem that defining a table and using a blob or XML object for the package with the same type of column definitions as in the Big Data paradigm would give the same benefits but allow use of existing data query tools. I pose the question, how much of all this "New Paradigm" is re-labeled old technology? Do we really need completely new databases and structures to handle this? Of course with each column becoming a tagged data object the size of our database will also become Big. What required a table with 5 columns will now require an additional 3-4 , for lack of a better term, columns to describe each main column. This seems to indicate that data volumes will increase by a factor of 4 or more. As an employee in a company that makes storage technology, I applaud this, however, part of me asks is it really needed.