Stay Off the Front Page, with Backups

Bill Cole

Bill Cole – Competitive Sales Specialist,Information Management, IBM

My first company made the front page of ComputerWorld a long while back.  The front page!  Why? you ask.  Our System/370 was hanging from a crane above the building after the computer room was demolished by a tornado that ravaged part of Omaha.  It was an ignominious end for my baby.  After all, I’d helped the CEs and SEs get it into the building and then put it together late one Saturday night.  We won’t mention the incident that required us to break into the hardware spares at the local office.  I’m sure all of those guys are retired now anyway.

Fortunately, our CEO (yes, the CEO!) thought it a good idea to have backups and store them in a safe in the basement.  It was a simpler time.  So how long did it take to get back in business?  Less than a week.  CICS was back in business running the Chicago plant and we were doing our typical back-office processing (AR, AP, Payroll, etc).  That was my introduction to the value of backups and disaster recovery planning.  Syd (the CEO) was right to plan for it at a time when few of us really thought much about such things.

Let’s define our terms before we get too far into this.  Backups are typically to recover something at a point in time.  Disaster recovery comes in two pieces:  Little disasters like a failed machine or disk drive or network card; and big disasters such as your data center doesn’t seem to be there any more.  There are the ones in between, too.  The production SAN lost a disk or the production system lost an LPAR.  We’ll deal with these and high availability (which is different from highly available) in another installment.

Having a backup is not the same as being ready for disaster recovery.  Self evident?  You might be surprised at how many folks don’t see the difference.  Or didn’t.

Backups seem to be pretty simple when you first start to think about them.  In a batch world, that’s pretty much true.  In the worlds most of us occupy today, backups aren’t simple at all.  A single strategy for every database leads you down the wrong road.  Or an unnecessarily expensive one.

 Backups have to be designed with recovery in mind.  What’s your window for getting back to production status?  A few minutes or a few hours or days?  Can you run in a degraded mode for a period of time?  How long before the wolves come nipping at your heels?  Without the answer to these questions, you can wander off into the backup weeds and get it exactly wrong.  So the first step is to understand the time frame for return to production operations you’ve got.  And factor in just how much data you can afford to lose.  One transaction?  A day?  Probably somewhere in the middle.

A second question is about the level of automation you want to deal with.  I mean real automation, not six hundred PERL scripts that need to be run by seven different people in just the right order (during a full moon, etc).  You’ll probably shoot for something in between completely automatic and completely manual.  Both are scary but for very different reasons.

And we’ll confine ourselves to the database for today.  Backing up the complete environment is a topic for another forum or paper.  I know a company with a department that churns out white papers and certifies methodologies for this sort of thing – because I contributed to it.  LOL.

Choice one: Cold backups of everything.  OMG, that’s so twentieth century!  This might be acceptable for some of the second and third tier environments where a short outage would be acceptable.  Disk-to-disk-to-tape will shorten the time.

 Choice two: Warm backups of everything with hot incrementals for the changes.  This one really has to be done on a very firm schedule since restoring the base image plus a week of logs might be a longer process than you have time for.

  •  Choice two-A: Monthly or weekly cold backups and daily incrementals.  Or incrementals more often depending on your recovery options.  I’ve accomplished this one by having a process that watches for log file completions and then FTPs the completed log file to another system for backing up.  Or you can write a script that does a remote copy of the log files directly (less chance of random corruption in my experience) to the backup system.
  • Choice two-B: Same as above but shipping the logs to a database that is ingesting the logs in recovery mode.

Choice three: Write every transaction to a log so you can replay it for recovery.  Hmmm.  How do you sell that one to the application developers?  And won’t there be a performance hit?

There’s a fourth choice.  Read on!

We have software to do all of these, of course, so there’s little need for you to do some of the heroic measures in building your own versions.  I know what you’re thinking: there’s a license cost.  Okay, I see that.  However, we know that our software works because we have tested over and over with all sorts of databases across every version and release.  Can you say the same thing about your scripts?

Here’s a short synopsis of IBM’s products:

QREP(Q Replication) – replicate transactions to a remote database using MQ to transmit messages reliably.

CDC (Change Data Capture) – replicates transactions to a remote database using a proprietary TCP/IP messaging.  More on CDC in another installment.  Another useful option within CDC is to build files for DataStage to use for reloading the database.  That seems a pretty interesting option.

HADR (High Availability Disaster Recovery)- replicate transactions to a remote database.  HADR can be tuned to prevent loss of no transactions (which imposes an overhead on performance, of course).  You can choose to lose more by configuring for async replication, but you can tune how long that window is.  One of the more esoteric configurations can be cascaded backup databases.  One of the things that you can do with your backup/standby databases is reporting.  I’m a big fan of this option since I hate the thought of servers simply waiting for something to fail without providing any real value.

Note that all of them are about transactions, not simply copying raw logs.  After all, it’s the transactions we’re after.  And none of them require any changes to application code so any and all applications should work without worrying you.  So you stay focused on adding value to the business rather than simply playing with the technology (no matter how much it is, eh?).

 Full-out paranoia: Combine several of the above into your strategy.  I had three different methods of backup for a very large online auction house (no, not the one you’re thinking about).  A full warm backup.  Remote copies of the logs. Log shipping with recovery.  One of the backups had to be right!

Finally, Syd, the CEO I mentioned above, loved IBM technology which meant we were on the bleeding edge of a lot of things.  IBM loved us.  We were the local poster children for the way to run a modest IBM shop.  So you can see that I was spoiled from the beginning.  Syd gave me several very challenging assignments that affected the future of the company; he liked the outcomes and eventually moved me to another of his companies in Atlanta to start their IT shop.  It’s a move I have never regretted and lessons I have never forgotten.

Learn more about the new version of  DB2 for LUW

Follow Bill Cole on Twitter : @billcole_ibm

Download DB2 10.5 to experience all these capabilities

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: