Migrating a DB2 database from a Big Endian environment to a Little Endian environment

roger

By Roger Sanders, DB2 for LUW Offering Manager, IBM

What Is Big-Endian and Little-Endian?

Big-endian and little-endian are terms that are used to describe the order in which a sequence of bytes are stored in computer memory, and if desired, are written to disk. (Interestingly, the terms come from Jonathan Swift’s Gulliver’s Travels where the Big Endians were a political faction who broke their boiled eggs on the larger end, defying the Emperor’s edict that all eggs be broken on the smaller end; the Little Endians were the Lilliputians who complied with the Emperor’s law.)

Specifically, big-endian refers to the order where the most significant byte (MSB) in a sequence (i.e., the “big end”) is stored at the lowest memory address and the remaining bytes follow in decreasing order of significance. Figure 1 illustrates how a 32-bit integer would be stored if the big-endian byte order is used.

endian image1Figure 1. Big-endian byte order

For people who are accustomed to reading from left-to-right, big-endian seems like a natural way to store a string of characters or numbers; since data is stored in the order in which it would normally be presented, programmers can easily read and translate octal or hexadecimal data dumps. Another advantage of using big-endian storage is that the size of a number can be more easily estimated because the most significant digit comes first. It is also easy to tell whether a number is positive or negative—this information can be obtained by examining the bit at offset 0 in the lowest order byte.

Little-endian, on the other hand, refers to the order where the least significant byte (LSB) in a sequence (i.e., the “little end”) is stored at the lowest memory address and the remaining bytes follow in increasing order of significance. Figure 2 illustrates how the same 32-bit integer presented earlier would be stored if the little-endian byte order were used.

endian image 2

 Figure 2. Little-endian byte order

One argument for using the little-endian byte order is that the same value can be read from memory, at different lengths, without having to change addresses—in other words, the address of a value in memory remains the same, regardless of whether a 32-bit, 16-bit, or 8-bit value is read. For instance, the number 12 could be read as a 32-bit integer or an 8-bit character, simply by changing the fetch instruction used. Consequently, mathematical functions involving multiple precisions are much easier to write.

Little-endian byte ordering also aids in the addition and subtraction of multi-byte numbers. When performing such operations, the computer must start with the least significant byte to see if there is a carry to a more significant byte—much like an individual will start with the rightmost digit when doing longhand addition to allow for any carryovers that may take place. By fetching bytes sequentially from memory, starting with the least significant byte, the computer can start doing the necessary arithmetic while the remaining bytes are read. This parallelism results in better performance; if the system had to wait until all bytes were fetched from memory, or fetch them in reverse order (which would be the case with big-endian), the operation would take longer.

IBM mainframes and most RISC-based computers (such as IBM Power Systems, Hewlett-Packard ProLiant servers, and Oracle SPARC servers) utilize big-endian byte ordering. Computers with Intel and AMD processors (CPUs) use little-endian byte ordering instead.

It is important to note that regardless of whether big-endian or little-endian byte ordering is used, the bits within each byte are usually stored as big-endian. That is, there is no attempt to reverse the order of the bit stream that is represented by a single byte. So, whether the hexadecimal value ‘CD’ for example, is stored at the lowest memory address or the highest memory address, the bit order for the byte will always be: 1100 1101

Moving a DB2 Database To a System With a Different Endian Format

One of the easiest ways to move a DB2 database from one platform to another is by creating a full, offline backup image of the database to be moved and restoring that image onto the new platform. However, this process can only be used if the endianness of the source and target platform is the same. A change in endian format requires a complete unload and reload of the database, which can be done using the DB2 data movement utilities. Replication-based technologies like SQL Replication, Q Replication, and Change Data Capture (CDC), which transform log records into SQL statements that can be applied to a target database, can be used for these types of migrations as well. On the other hand, DB2 High Availability Disaster Recovery (HADR) cannot be used because HADR replicates the internal format of the data thereby maintaining the underlying endian format.

The DB2 Data Movement Utilities (and the File Formats They Support)

DB2 comes equipped with several utilities that that can be used to transfer data between databases and external files. This set of utilities consists of:

  • The Export utility: Extracts data from a database using an SQL query or an XQuery statement, and copies that information to an external file.
  • The Import utility: Copies data from an external file to a table, hierarchy, view, or nickname using INSERT SQL statements. If the object receiving the data is already populated, the input data can either replace or be appended to the existing data.
  • The Load utility: Efficiently moves large quantities of data from an external file, named pipe, device, or cursor into a target table. The load utility is faster than the Import utility because it writes formatted pages directly into the database, instead of performing multiple INSERT
  • The Ingest utility: A high-speed, client-side utility that streams data from files and named pipes into target tables.

Along with these built-in utilities, IBM InfoSphere Optim High Performance Unload for DB2 for Linux, UNIX and Windows, an add-on tool that must be purchased separately, can be used to rapidly unload, extract, and repartition data in a DB2 database. Designed to improve data availability, mitigate risk, and accelerate database migrations, this tool helps DBAs work with very large quantities of data with less effort and faster results.

Regardless of which utility is used, data can only be written to or read from files that utilize one of the following formats:

  • Delimited ASCII
  • Non-delimited or fixed-length ASCII
  • PC Integrated Exchange Format
  • Extensible Markup Language (IBM InfoSphere Optim High Performance Unload for DB2 for Linux, UNIX and Windows only.)

Delimited ASCII (DEL)

The delimited ASCII file format is used by a wide variety of software applications to exchange data. With this format, data values typically vary in length, and a delimiter, which is a unique character not found in the data values themselves, is used to separate individual values and rows. Actually, delimited ASCII format files typically use three distinct delimiters:

  • Column delimiters. Characters that are used to mark the beginning or end of a data value. Commas (,) are typically used as column delimiter characters.
  • Row delimiters. Characters that are used to mark the end of a single record or row. On UNIX systems, the new line character (0x0A) is typically used as the row delimiter; on Windows systems, the carriage return/linefeed characters (0x0D–0x0A) are normally used instead.
  • Character delimiters. Character that are used to mark the beginning and end of character data values. Single quotes (‘) and double quotes (“) are typically used as character delimiter characters.

Typically, when data is written to a delimited ASCII file, rows are streamed into the file, one after another. The appropriate column delimiter is used to separate each column’s data values, the appropriate row delimiter is used to separate each individual record (row), and all character and character string values are enclosed with the appropriate character delimiters. Numeric values are represented by their ASCII equivalent—the period character (.) is used to denote the decimal point (if appropriate); real values are represented with scientific notation (E); negative values are preceded by the minus character (-); and positive values may or may not be preceded by the plus character (+).

For instance, if the comma character is used as the column delimiter, the carriage return/line feed character is used as the row delimiter, and the double quote character is used as the character delimiter, the contents of a delimited ASCII file might look something like this:

10,”Headquarters”,860,”Corporate”,”New York”

15,”Research”,150,”Eastern”,”Boston”

20,”Legal”,40,”Eastern”,”Washington”

38,”Support Center 1″,80,”Eastern”,”Atlanta”

42,”Manufacturing”,100,”Midwest”,”Chicago”

51,”Training Center”,34,”Midwest”,”Dallas”

66,”Support Center 2″,112,”Western”,”San Francisco”

84,”Distribution”,290,”Western”,”Denver”

Non-Delimited ASCII (ASC)

With the non-delimited ASCII file format, data values have a fixed length, and the position of each value in the file determines which column and row a particular value belongs to.

When data is written to a non-delimited ASCII file, rows are streamed into the file, one after another and each column’s data value is written using a fixed number of bytes. (If a data value is smaller that the fixed length allotted for a particular column, it is padded with blanks.) As with delimited ASCII files, a row delimiter is used to separate each individual record (row) — on UNIX systems the new line character (0x0A) is typically used; on Windows systems, the carriage return/linefeed characters (0x0D–0x0A) are used instead. Numeric values are treated the same as when they are stored in delimited ASCII format files.

Thus, a simple non-delimited ASCII file might look something like this:

10Headquarters       860Corporate   New York

15Research                150Eastern          Boston

20Legal                        40 Eastern         Washington

38Support Center   180Eastern        Atlanta

42Manufacturing    100Midwest       Chicago

51Training Center   34 Midwest       Dallas

66Support Center   211Western        San Francisco

84Distribution         290Western        Denver

 

PC Integrated Exchange Format (IXF)

The PC Integrated Exchange Format file format is a special file format that is used almost exclusively to move data between different DB2 databases. Typically, when data is written to a PC Integrated Exchange Format file, rows are streamed into the file, one after another, as an unbroken sequence of variable-length records. Character data values are stored in their original ASCII representation (without additional padding), and numeric values are stored as either packed decimal values or as binary values, depending upon the data type used to store them in the database. Along with data, table definitions and associated index definitions are also stored in PC Integrated Exchange Format files. Thus, tables and any corresponding indexes can be both defined and populated when this file format is used

Extensible Markup Language (XML)

Extensible Markup Language (XML) is a simple, yet flexible text format that provides a neutral way to exchange data between different devices, systems, and applications. Originally designed to meet the challenges of large-scale electronic publishing, XML is playing an increasingly important role in the exchange of data on the web and throughout companies. XML data is maintained in a self-describing format that is hierarchical in nature. Thus, a very simple XML file might look something like this:

<?xml version=”1.0″ encoding=”UTF-8″ ?>

<customerinfo>

<name>John Doe</name>

<addr country=”United States”>

<street>25 East Creek Drive</street>

<city>Raleigh</city>

<state-prov>North Carolina</state-prov>

<zip-pcode>27603</zip-pcode>

</addr>

<phone type=”work”>919-555-1212</phone>

<email>john.doe@xyz.com</email>

</customerinfo>

As noted earlier, only IBM InfoSphere Optim High Performance Unload for DB2 for Linux, UNIX and Windows can work with XML files.

db2move and db2look

As you might imagine, the Export utility, together with the Import utility or the Load utility, can be used to copy a table from one database to another. These same tools can also be used to move an entire database from one platform to another, one table at a time. But a more efficient way to move an entire DB2 database is by using the db2move utility. This utility queries the system catalog of a specified database and compiles a list of all user tables found. Then it exports the contents and definition of each table found to individual PC Integrated Exchange Format (IXF) formatted files. The set of files produced can then be imported or loaded into another DB2 database on the same system, or they can be transferred to another server and be imported or loaded to a DB2 database residing there.

The db2move utility can be run in one of four different modes: EXPORT, IMPORT, LOAD, or COPY. When run in EXPORT mode, db2move utilizes the Export utility to extract data from a database’s tables and externalize it to a set of files. It also generates a file named db2move.lst that contains the names of all of the tables that were processed, along with the names of the files that each table’s data was written to. The db2move utility may also produce one or more message files containing warning or error messages that were generated as a result of the Export operation.

When run in IMPORT mode, db2move uses the file db2move.lst to establish a link between the PC Integrated Exchange Format (IXF) formatted files needed and the tables into which data is to be populated. It then invokes the Import utility to recreate each table and their associated indexes using information stored in the external files.

And, when run in LOAD mode, db2move invokes the Load utility to populate tables that already exist with data stored in PC Integrated Exchange Format (IXF) formatted files. (LOAD mode should never be used to populate a database that does not already contain table definitions.) Again, the file db2move.lst is used to establish a link between the external files used and the tables into which their data is to be loaded.

Unfortunately, the db2move utility can only be used to move table and index objects. And if the database to be migrated contains other objects such as aliases, views, triggers, user-defined data types (UDTs), user-defined functions (UDFs), and stored procedures, you must duplicate those objects in the target database as well. That’s where the db2look utility comes in handy. When invoked, db2look can reverse-engineer an existing database and produce a set of Data Definition Language (DDL) SQL statements that can then be used to recreate all of the data objects found in the database that was analyzed. The db2look utility can also collect environment registry variable settings, configuration parameter settings, and statistical (RUNSTATS) information, which can be used to duplicate a DB2 environment on another system.

 

Mastering the DB2 10.1 Certification Exam – Part 2: Security

It’s hard to argue against the benefits of becoming a DB2 Certified Professional. Aside from gaining a better understanding of DB2, it helps keep you up to date with the latest versions of the product.  It also gives you professional credentials that you can put on your resume to show that you know what you say you know.

But many people are reluctant to put in the time and effort it takes to prepare for the exams. Some just don’t like taking tests, others don’t feel they have the time or money to prepare. That’s where we come in – the DB2 team has put together a great list of resources to help you conquer the certification exams.

We caught up with Anas Mosaad and Mohamed El-Bishbeashy who are part of the DB2 team to that developed the DB2 10.1 Fundamentals Certification Exam 610 Prep – a 6 part tutorial series aimed at helping DBA’s prepare for the certification exam.

What products are focused on in this tutorial?

In this tutorial we’ve focused completely on DB2 10.1 LUW

Tell us a little about what students can hope to learn about in this tutorial?

It is the second in a series of six tutorials designed to help you prepare for the DB2 Fundamentals Exam (610). It puts in your hands all the details needed to successfully pass security related question in the exam. It introduces the concepts of authentication, authorization, privileges, and roles as they relate to DB2 10.1. It also introduces granular access control and trusted contexts.

Why should a DBA be interested in this certification?

IBM professional certifications are recognized world wide, so you will get recognized! In addition, this one is the first milestone in the advanced DB2 certification paths (development, DBA and advanced DBA). It acknowledges that you are knowledgeable about the fundamental concepts of DB2 10.1. It shows that you have an in-depth knowledge of the basic to intermediate tasks required in day-to-day administration, basic SQL (Structured Query Language), understand which additional products are available with DB2 10.1, understand how to create databases and database objects, and have a basic knowledge of database security and transaction isolation.

Do have any special tips?

Absolutely, here are a few of our favorite tips for preparing for the certification exam:

  • Practice with DB2
  • If you don’t have access to DB2, download the fully functional DB2 Express-C for free
  • Read the whole tutorial before taking the exam
  • Be a friend of DB2 Knowledge Center (formerly infocenter)
  • When in doubt, don’t hesitate, post and collaborate in the forums.

For more information:

DB2 10.1 fundamentals certification exam 610 prep, Part 2: DB2 security

For the entire series of tutorials for Exam 610 DB2 Fundamentals, include the following:
Part 1: Planning
Part 2: DB2 security
Part 3: Working with databases and database objects
Part 4: Working with DB2 Data using SQL
Part 5: Working with tables, views, and indexes
Part 6: Data concurrency

About the authors:
Anas MosaadAnas Mosaad, a DB2 solutions migration consultant with IBM Egypt, has more than eight years of experience in the software development industry. He is a member of IBM’s Information Management Technology Ecosystem Team focusing on enabling and porting customer, business partner, and ISV solutions to the IBM Information Management portfolio, which includes DB2, Netezza, and BigInsights. Anas’ expertise includes portal and J2EE, database design, tuning, and database application development.

Mohamed El-BishbeashyMohamed El-Bishbeashy is an IM specialist for IBM Cairo Technology Development Center (C-TDC), Software Group. He has 12+ years of experience in the software development industry (8 of those are with IBM). His technical experience includes application and product development, DB2 administration, and persistence layer design and development. Mohamed is an IBM Certified Advanced DBA and IBM Certified Application Developer. He also has experience in other IM areas including PureData Systems for Analytics (Netezza), BigInsights, and InfoSphere Information server.

Exclusive Opportunity to Influence IBM Product Usability: Looking for Participants for Usability Test Sessions – Data Warehousing and Analytics

Arno thumbnail 2By Arno C. Huang, CPE
Designer, IBM Information Management Design
IBM Design making the user the center of our productsThe IBM Design Team is seeking people with a variety of database, data warehousing and analytics backgrounds to participate in usability test sessions. We are currently looking for people who work in one of the following roles: DBA, Architect, Data Scientist, Business Analyst or Developer. As a test participant, you will provide your feedback about current or future designs we are considering, thus making an impact on the design of an IBM product and letting us know what is important to you.

Participating in a study typically consists of a web conference or on-site meeting scheduled around your availability. IBM will provide you with an honorarium for your participation. There are several upcoming sessions, so if you’re interested, we’ll help you find a session that best suits your schedule. If you are interested, please contact Arno C. Huang at achuang@us.ibm.com

Improve IT Productivity with IBM PureData System for Transactions

KellySchlamb

Kelly Schlamb , DB2 pureScale and PureData Systems Specialist, IBM
I’m a command line kind of guy, always have been. When I’m loading a presentation or a spreadsheet on my laptop, I don’t open the application or the file explorer and work my way through it to find the file in question and double click the icon to open it. Instead, I open a command line window (one of the few icons on my desktop), navigate to the directory I know where the file is (or will do a command line file search to find it) and I’ll execute/open the file directly from there. When up in front of a crowd, I can see the occasional look of wonder at that, and while I’d like to think it’s them thinking “wow, he’s really going deep there… very impressive skills”, in reality it’s probably more like “what is this caveman thinking… doesn’t he know there are easier, more intuitive ways of accomplishing that?!?”

The same goes for managing and monitoring the systems I’ve been responsible for in the past. Where possible, I’ve used command line interfaces, I’ve written scripts, and I’ve visually pored through raw data to investigate problems. But inevitably I’d end up doing something wrong, like miss a step, do something out of order, or miss some important output – leaving things not working or not performing as expected. Over the years, I’ve considered that part of the fun and challenge of the job. How do I fix this problem? But nowadays, I don’t find it so fun. In fact, I find it extremely frustrating.Things have gotten more complex and there are more demands on my time. I have much more important things to do than figure out why the latest piece of software isn’t interacting with the hardware or other software on my system in a way it is supposed to. When I try to do things on my own now, any problem is immediately met with an “argh!” followed by a google search hoping to find others who are trying to do what I’m doing and have a solution for it.

When I look at enterprise-class systems today, there’s just no way that some of the old techniques of implementation, configuration, tuning, and maintenance are going to be effective. Systems are getting larger and more complex. Can anybody tell me that they enjoy installing fix packs from a command line or ensuring that all of the software levels are at exactly the right level before proceeding with an installation of some modern piece of software (or multiple pieces that all need to work together, which is fairly typical today)? Or feel extremely confident in getting it all right? And you’ve all heard about the demands placed on IT today by “Big Data”. Most DBAs, system administrators, and other IT staff are just struggling to keep the current systems functioning, not able to give much thought to implementing new projects to handle the onslaught of all this new information. The thought of bringing a new application and database up, especially one that requires high availability and/or scalability, is pretty daunting. As is the work to grow out such a system when more demands are placed on it.

It’s for these reasons and others that IBM introduced PureSystems. Specifically, I’d like to talk here about IBM PureData System for Transactions. It’s an Expert Integrated System that is designed to ensure that the database environment is highly available, scalable, and flexible to meet today’s and tomorrow’s online transaction processing demands. These systems are a complete package and they include the hardware, storage, networking, operating system, database management software, cluster management software, and the tools. It is all pre-integrated, pre-configured, and pre-tested. If you’ve ever tried to manually stand up a new system, including all of the networking stuff that goes into a clustered database environment, you’ll greatly appreciate the simplicity that this brings.

The system is also optimized for transaction processing workloads, having been built to capture and automate what experts do when deploying, managing, monitoring, and maintaining these types of systems. System administration and maintenance is all done through an integrated systems console, which simplifies a lot of the operational work that system administrators and database administrators need to do on a day-to-day basis. What? Didn’t I just say above that I don’t like GUIs? No, I didn’t quite say that. Yeah, I still like those opportunities for hands-on, low-level interactions with a system, but it’s hard not to appreciate something that is going to streamline everything I need to do to manage a system and at the same time keep my “argh” moments down to a minimum. The fact that I can deploy a DB2 pureScale cluster within the system in about an hour and deploy a database in minutes (which, by the way, also automatically sets it up for performance monitoring) with just a few clicks is enough to make me love my mouse.

IBM has recently released some white papers and solution briefs around this system and a couple of them talk to these same points that I mentioned above. To see how the system can improve your productivity and efficiency, allowing your organization to focus on the more important matters at hand, I suggest you give them a read:

Improve IT productivity with IBM PureData System for Transactions solution brief
Four strategies to improve IT staff productivity white paper

The four strategies as described in these papers, that talk to the capabilities of PureData System for Transactions, are:

  • Simplify and accelerate deployment of high availability clusters and databases
  • Streamline systems management
  • Reduce maintenance time and risk
  • Scale capacity without incurring downtime

I suspect that I won’t be changing my command line and management/maintenance habits on my laptop and PCs any time soon, but when it comes to this system, I’m very happy to come out of my cave.