It’s Obvious. It’s in the Data.
January 8, 2014 Leave a comment
Bill Cole, Competitive Sales Specialist,Information Management, IBM
You’ve had that experience, right? Somebody says that the answer is in the data so you look harder and all you see is stuff. There’s not a pattern within a grenade blast of this data. Maybe if you had a bit more time you’d find it. Or maybe having the data in the right format would make a difference.
We all know the traditional relational database isn’t a great platform for analyzing mass quantities of data. Your OLTP relational database is built for processing small-ish transactions, maintaining data integrity in the face of an onslaught of concurrent users all without regard to disk space or processor utilization. Abuse the resources to get the performance you need! To paraphrase John Paul Jones: Ignore the checkbook, full speed ahead!
So we learned to build special-purpose structures for our non-transactional needs, and then manage the fallout as we tried to find anything that even smelled like (consistent) performance. Each step forward in the data warehouse arena was a struggle. We demanded resources or explained away failures with a wave of a disk drive or processor.
This situation was clearly not good for our mission of analyzing great chunks of data in a reasonable time. Subsets of data – data marts – were used to work around our limitations. But this meant we were either replicating data or losing some data that might be useful in other queries. Clearly not the best of situations.
Our friends out in Almaden studied the problem and found that column-oriented tables were the best basis for a solution. After all, we were gathering up large quantities of raw data and analyzing it, not processing OLTP transactions. There would be little need for those annoying special-purpose structures. Nor would we need any indexes. All this would save lots of space and reduce processing time, too, so we could achieve not only predictable performance but VERY good performance. The kind of performance our friends in the business needed to build better relationships with suppliers and customers.
The implementation of the new analytics platform is in DB2 10.5 with BLU Acceleration (The answer to why “BLU” is in an earlier blog entry.). The very cool thing is that BLU is an option you can choose for either the entire database or just the analytics tables. So you can have your traditional row-oriented tables and the column-oriented tables in a single database if that suits your design. No need to learn and maintain a whole new technology just for your analytics.
And we can’t forget the synergy with Cognos. After all, the two products are developed just a few miles from each other. Turns out the Cognos folks help the DB2 team by sharing typical analytics queries and the DB2 team uses those examples to tune the query engine. Nice! Of course, this helps out with the queries we build ourselves or through – gasp! – other products. Oh well, DB2 is there to make us all look good.
A quick refresher on column-oriented data. The easiest way for me to think about it is that we’ve stood the database on its side so that instead of seeing everything in rows we’re seeing the data in columns grouped together. A typical description of a table has the column names running across the top of the page which is analogous to the way data is stored in a most relational databases. However, the column-oriented table has the data for a column grouped together and the rows are built by assembling the data from the columns. Not ideal for OLTP but excellent for processing gobs of data that’s particular to a group of columns. (There’s a fuller discussion of this in a previous blog post.) No need for indexes since we’re not looking for individual rows.
The sort of performance users have reported with DB2 and BLU Acceleration, is nothing short of amazing. Double-digit improvements in throughput. And it’s this reliably predictable performance that allows us to build those applications that require sub-second kind of analysis. You know the ones I’m talking about. While you are on the phone or a web site, the agent or the site offers you options based on YOUR previous interactions, not just options for any random caller or user. The options are specific because we can analyze data in the time you’re on the phone or a web site.
Finally, I’m told the mark of genius is being able to connect seemingly random dots into a pattern. You know those folks who are at the conclusion while the rest of us are still just looking at the dots. You don’t need a genius if you’ve got BLU! You’ll find that pattern/information gem in record time, too. And you’ll show the business that you’re delivering the data they need when they need it.
Know more about the innovative technology in BLU Acceleration through this video series on YouTube!