Achieving High Availability with PureData System for Transactions

KellySchlamb

Kelly Schlamb , DB2 pureScale and PureData Systems Specialist, IBM

A short time ago, I wrote about improving IT productivity with IBM PureData System for Transactions and I mentioned a couple of new white papers and solution briefs on that topic.  Today, I’d like to highlight another one of these new papers: Achieving high availability with PureData System for Transactions.

I’ve recently been meeting with a lot of different companies and organizations to talk about DB2 pureScale and PureData System for Transactions, and while there’s a lot of interest and discussion around performance and scalability, the primary reason that I’m usually there is to talk about high availability and how they can achieve higher levels than what they’re seeing today. One thing I’m finding is that there are a lot of different interpretations of what high availability means (and I’m not going to argue here over what the correct definition is). To some, it’s simply a matter of what happens when some sort of localized unplanned outage occurs, like a failure of their production server or a component of that server. How can downtime be minimized in that case?  Others extend this discussion out to include planned outages, such as maintenance operations or adding more capacity into the system. And others will include disaster recovery under the high availability umbrella as well (while many keep them as distinctly separate topics — but that’s just semantics). It’s not enough that they’re protected in the event of some sort of hardware component failure for their production system, but what would happen if the entire data center was to experience an outage? Finally (and I don’t mean to imply that this is an exhaustive list — when it comes to keeping the business available and running, there may be other things that come into the equation as well), availability could also include a discussion on performance. There is typically an expectation of performance and response time associated with transactions, especially those that are being executed on behalf of customers, users, and business processes. If a customer clicks on button on a website and it doesn’t come back quickly, it may not be distinguishable from an outage and the customer may leave that site, choosing to go to a competitor instead.

It should be pointed out that not every database requires the highest levels of availability. It might not be a big deal to an organization if a particular departmental database is offline for 20 minutes, or an hour, or even the entire day. But there are certainly some business-critical databases that are considered “tier 1” that do require the highest availability possible. Therefore, it is important to understand the availability requirements that your organization has.  But I’m likely already preaching to the choir here and you’re reading this because you do have a need and you understand the ramifications to your business if these needs aren’t met. With respect to the companies I’ve been meeting with, just hearing about what kinds of systems they depend on from both an internal and external perspective- and what it means to them if there’s an interruption in service- has been fascinating.  Of course, I’m sympathetic to their plight, but as a consumer and a user I still have very high expectations around service. I get pretty mad when I can’t make an online trade, check the status of my travel reward accounts, or even order a pizza online ; especially when I know what those companies could be doing to provide better availability to their users.  🙂

Those things I mentioned above — high availability, disaster recovery, and performance (through autonomics) — are all discussed as part of the paper in the context of PureData System for Transactions. PureData System for Transactions is a reliable and resilient expert integrated system designed for high availability, high throughput online transaction processing (OLTP). It has built-in redundancies to continue operating in the event of a component failure, disaster recovery capabilities to handle complete system unavailability, and autonomic features to dynamically manage utilization and performance of the system. Redundancies include power, compute nodes, storage, and networking (including the switches and adapters). In the case of a component failure, a redundant component keeps the system available. And if there is some sort of data center outage (planned or unplanned), a standby system at another site can take over for the downed system. This can be accomplished via DB2’s HADR feature (remember that DB2 pureScale is the database environment within the system) or through replication technology such as Q Replication or Change Data Capture (CDC), part of IBM InfoSphere Data Replication (IIDR).

Just a reminder that the IDUG North America 2014 conference will be taking place in Phoenix next month from May 12-16. Being in a city that just got snowed on this morning, I’m very much looking forward to some hot weather for a change. Various DB2, pureScale, and PureData topics are on the agenda. And since I’m not above giving myself a shameless plug, come by and see me at my session: A DB2 DBA’s Guide to pureScale (session G05). Click here for more details on the conference. Also, check out Melanie Stopfer’s article on IDUG.  Hope to see you there!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: