DB2 Advantages Over Oracle in Compression
September 25, 2012 Leave a comment
Chris Eaton, Technical Specialist, IBM
I have previously posted about compression and the new adaptive compression capabilities in DB2 10. In this posting I want to clearly articulate why DB2 is the leader in compression capabilities and how this translates into big savings for clients along with big performance gains for database workloads.
What is compression and what’s available in DB2?
No matter what form of compression we look at, they all work on the same basic premise which is to find repeating patterns in larger amounts of data and replace those repeating patterns with smaller symbols. That’s compression in a nutshell. The algorithm that delivers the best compression is the one that can find the most repeating patterns. However, for databases you have to weigh the cost of finding and replacing patterns against the benefits of smaller objects. With DB2 we leverage a number of different techniques to find the most patterns to compress out while at the same time actually improving performance.
How do we use more CPU to compress/decompress yet get better performance?
DB2 has the most compression algorithms to offer of any DBMS vendor with 2 algorithms for table compression, 3 for index compression, 2 for temporary table compression and 1 for log archive and backup compression. The beauty of the DB2 10 solution is that you as a user don’t pick the algorithms; DB2 looks at the data, decides on the best way to compress the data and presto, smaller tables, indexes, temporary objects and log files. With compression on tables in the range of 7x the objects are significantly smaller than they are uncompressed so every I/O brings in seven times more rows into memory and every GB of memory has seven times more rows packed in there (because DB2 keeps rows compressed in memory and on disk). By doing up to 7 times fewer I/Os we actually are seeing large performance boosts at most clients and yes we use more CPU to do that but most clients don’t run their Linux, UNIX or Windows servers at 100% CPU utilization all the time. Quite on the contrary, most are sitting at around 40% utilization and so there is lots of CPU to accommodate the 5-7% higher compression/decompression requirements where as I/O bandwidth is at a premium. So compression is a win/win.
What sets DB2 apart?
As I mentioned we have a lot of algorithms to leverage to find the most patterns. In fact, compared to Oracle 11gR2 we have a huge advantage. The reason is that Oracle (and Microsoft as well as Sybase) all use what is called a page level dictionary looking for repeating patterns only on a 4k, 8k, 16k or 32k data page. Where as with DB2 we use a page level approach but also a global table level scan for repeating patterns. This 2 level compression approach in DB2 means we find more patterns and therefore get better compression. Let me end off with an analogy. How many people in the department you work in have a birth date of June 15? I would bet that the vast majority of readers of this blog do not share my birth date (of June 15). But how many people on the planet share the same birthday as me? Millions I’m sure. Why? Because if I only look at a small amount of data (1 page worth) I’m not likely to find the same repeating patterns that I would find if I looked at a much larger set of data (a table worth or a planet worth). Since DB2 does both we deliver superior compression and improved performance.
Tune in to this webcast this week for more on compression.
Look for more of my blogs at Toolbox.
Chris Eaton is a worldwide technical specialist for IBM’s Information Management products focused on Database Technology, Information Governance and Workload Optimization.Chris has been working with DB2 on the Linux, UNIX, and Windows platform for over 19 years. He is the author of several books in the data management space and is an international award winning speaker.