The value of common database tools and linked processes for Db2, DevOps, and Cloud

Michael

by Michael Connor, Analytics Offering Management

Today we released DB2 V11 for Linux, UNIX and Windows. The release includes updates to Data Server Manager (DSM) V2.1 and Data Server Driver connectivity V11 and Advanced Recovery Feature (ARF) V11.    As many of you may be aware of – 2 years ago we embarked on a strategy to completely rethink our tooling strategy.  The market was telling us we needed to focus more on a simplified user experience, a web console addressing both the power and casual user role, and deliver deep database support in support of production applications.  In  March 2015, we delivered our first iteration of Data Server Manager as part of 10.5.  This year we have yet again extended capability to this valuable platform and in addition extended support across a number of IBM Data stores including DB2, dashDB, DB2 on Cloud, and BigInsights.

First let’s talk about some of the drivers we hear related to Database Delivery.

  1. The LOB and LOB developer communities want access to mission critical data and extend that data through new customer facing OLTP applications.
  2. Business analysts are using more data than ever – in generating and enhancing customer value through Analytic applications.
  3. These new roles need on demand access to data across all aspects of the delivery lifecycle from idea inception to production delivery and support.
  4. While the timelines are lessened, the data expanded and the lifecycle speeded up, quality cannot suffer.

Therefore, the DBA, Development, Testing, and Production support roles are now participating in activities known as Continuous Delivery, Continuous Testing, and DevOps.  With the goal of improving customer service, decreasing cycle and delivery times, without decreasing quality.

DSM pic1Some areas that are addressed by our broader solutions for Continues Delivery, Testing, and DevOps include:

  • High Performance Unload of production data and selective data environment, including test data environment restore with DB2 Recovery Expert
  • Simplified test data management addressing discovery, subsetting, masking, and refresh with Test Data Management.
  • Automated driving of application test and performance based workloads with Rational Functional and Performance Tester.
  • Release Management and Deployment automation with Rational Urbancode.

And finally, areas improved with our latest DB2 releases

  • SQL Development and execution with Data Server Manager
  • Test and Deployment Data Server Monitoring with Data Server Manager
  • SQL capture and analysis with Data Server Manager
  • Client and application Data Access, Workload and Failover management with Data Server Drivers

DSM Pic 2The Benefits of considering a Continuous — Solution include reduced cycle times, lower risk of failure, improved application performance and reduced risk of downtime.

With the V11 Releases we have delivered enhancements including:

  • DSM: DB2 LUW V11 support  and monitoring improvements for PureScale applications, Extended Query history analysis
  • ARF: DB2 LUW V11 support and improvements for Analytics usage with BLU Acceleration
  • DS Driver (Also DB2 Connect): Manageability improvements, Performance enhancements, and extended driver support now for iMAC applications.

DSM Pic 3Many of the improvements noted above are also available for our private Cloud offering in preview DashDB Local – which leverages DSM as an integral component of their dashboard, and our public Cloud offering DB2 on Cloud.

Read more details about the announcement for further information:   http://www-01.ibm.com/common/ssi/ShowDoc.wss?docURL=/common/ssi/rep_ca/9/872/ENUSAP16-0139/index.html&lang=en&request_locale=en

Also check out the DB2 LUW Landing Page:  http://www.ibm.com/analytics/us/en/technology/db2/db2-linux-unix-windows.html

 

Blogger:    Michael Connor, with Analytics offering management, joined IBM in 2001 and has focused early in his IBM career on launching the z/OS Development Tooling business centered on Rational Developer for z.  Since moving to Analytics in 2013, Michael leads the team responsible for Core Database Tooling

Migrating a DB2 database from a Big Endian environment to a Little Endian environment

roger

By Roger Sanders, DB2 for LUW Offering Manager, IBM

What Is Big-Endian and Little-Endian?

Big-endian and little-endian are terms that are used to describe the order in which a sequence of bytes are stored in computer memory, and if desired, are written to disk. (Interestingly, the terms come from Jonathan Swift’s Gulliver’s Travels where the Big Endians were a political faction who broke their boiled eggs on the larger end, defying the Emperor’s edict that all eggs be broken on the smaller end; the Little Endians were the Lilliputians who complied with the Emperor’s law.)

Specifically, big-endian refers to the order where the most significant byte (MSB) in a sequence (i.e., the “big end”) is stored at the lowest memory address and the remaining bytes follow in decreasing order of significance. Figure 1 illustrates how a 32-bit integer would be stored if the big-endian byte order is used.

endian image1Figure 1. Big-endian byte order

For people who are accustomed to reading from left-to-right, big-endian seems like a natural way to store a string of characters or numbers; since data is stored in the order in which it would normally be presented, programmers can easily read and translate octal or hexadecimal data dumps. Another advantage of using big-endian storage is that the size of a number can be more easily estimated because the most significant digit comes first. It is also easy to tell whether a number is positive or negative—this information can be obtained by examining the bit at offset 0 in the lowest order byte.

Little-endian, on the other hand, refers to the order where the least significant byte (LSB) in a sequence (i.e., the “little end”) is stored at the lowest memory address and the remaining bytes follow in increasing order of significance. Figure 2 illustrates how the same 32-bit integer presented earlier would be stored if the little-endian byte order were used.

endian image 2

 Figure 2. Little-endian byte order

One argument for using the little-endian byte order is that the same value can be read from memory, at different lengths, without having to change addresses—in other words, the address of a value in memory remains the same, regardless of whether a 32-bit, 16-bit, or 8-bit value is read. For instance, the number 12 could be read as a 32-bit integer or an 8-bit character, simply by changing the fetch instruction used. Consequently, mathematical functions involving multiple precisions are much easier to write.

Little-endian byte ordering also aids in the addition and subtraction of multi-byte numbers. When performing such operations, the computer must start with the least significant byte to see if there is a carry to a more significant byte—much like an individual will start with the rightmost digit when doing longhand addition to allow for any carryovers that may take place. By fetching bytes sequentially from memory, starting with the least significant byte, the computer can start doing the necessary arithmetic while the remaining bytes are read. This parallelism results in better performance; if the system had to wait until all bytes were fetched from memory, or fetch them in reverse order (which would be the case with big-endian), the operation would take longer.

IBM mainframes and most RISC-based computers (such as IBM Power Systems, Hewlett-Packard ProLiant servers, and Oracle SPARC servers) utilize big-endian byte ordering. Computers with Intel and AMD processors (CPUs) use little-endian byte ordering instead.

It is important to note that regardless of whether big-endian or little-endian byte ordering is used, the bits within each byte are usually stored as big-endian. That is, there is no attempt to reverse the order of the bit stream that is represented by a single byte. So, whether the hexadecimal value ‘CD’ for example, is stored at the lowest memory address or the highest memory address, the bit order for the byte will always be: 1100 1101

Moving a DB2 Database To a System With a Different Endian Format

One of the easiest ways to move a DB2 database from one platform to another is by creating a full, offline backup image of the database to be moved and restoring that image onto the new platform. However, this process can only be used if the endianness of the source and target platform is the same. A change in endian format requires a complete unload and reload of the database, which can be done using the DB2 data movement utilities. Replication-based technologies like SQL Replication, Q Replication, and Change Data Capture (CDC), which transform log records into SQL statements that can be applied to a target database, can be used for these types of migrations as well. On the other hand, DB2 High Availability Disaster Recovery (HADR) cannot be used because HADR replicates the internal format of the data thereby maintaining the underlying endian format.

The DB2 Data Movement Utilities (and the File Formats They Support)

DB2 comes equipped with several utilities that that can be used to transfer data between databases and external files. This set of utilities consists of:

  • The Export utility: Extracts data from a database using an SQL query or an XQuery statement, and copies that information to an external file.
  • The Import utility: Copies data from an external file to a table, hierarchy, view, or nickname using INSERT SQL statements. If the object receiving the data is already populated, the input data can either replace or be appended to the existing data.
  • The Load utility: Efficiently moves large quantities of data from an external file, named pipe, device, or cursor into a target table. The load utility is faster than the Import utility because it writes formatted pages directly into the database, instead of performing multiple INSERT
  • The Ingest utility: A high-speed, client-side utility that streams data from files and named pipes into target tables.

Along with these built-in utilities, IBM InfoSphere Optim High Performance Unload for DB2 for Linux, UNIX and Windows, an add-on tool that must be purchased separately, can be used to rapidly unload, extract, and repartition data in a DB2 database. Designed to improve data availability, mitigate risk, and accelerate database migrations, this tool helps DBAs work with very large quantities of data with less effort and faster results.

Regardless of which utility is used, data can only be written to or read from files that utilize one of the following formats:

  • Delimited ASCII
  • Non-delimited or fixed-length ASCII
  • PC Integrated Exchange Format
  • Extensible Markup Language (IBM InfoSphere Optim High Performance Unload for DB2 for Linux, UNIX and Windows only.)

Delimited ASCII (DEL)

The delimited ASCII file format is used by a wide variety of software applications to exchange data. With this format, data values typically vary in length, and a delimiter, which is a unique character not found in the data values themselves, is used to separate individual values and rows. Actually, delimited ASCII format files typically use three distinct delimiters:

  • Column delimiters. Characters that are used to mark the beginning or end of a data value. Commas (,) are typically used as column delimiter characters.
  • Row delimiters. Characters that are used to mark the end of a single record or row. On UNIX systems, the new line character (0x0A) is typically used as the row delimiter; on Windows systems, the carriage return/linefeed characters (0x0D–0x0A) are normally used instead.
  • Character delimiters. Character that are used to mark the beginning and end of character data values. Single quotes (‘) and double quotes (“) are typically used as character delimiter characters.

Typically, when data is written to a delimited ASCII file, rows are streamed into the file, one after another. The appropriate column delimiter is used to separate each column’s data values, the appropriate row delimiter is used to separate each individual record (row), and all character and character string values are enclosed with the appropriate character delimiters. Numeric values are represented by their ASCII equivalent—the period character (.) is used to denote the decimal point (if appropriate); real values are represented with scientific notation (E); negative values are preceded by the minus character (-); and positive values may or may not be preceded by the plus character (+).

For instance, if the comma character is used as the column delimiter, the carriage return/line feed character is used as the row delimiter, and the double quote character is used as the character delimiter, the contents of a delimited ASCII file might look something like this:

10,”Headquarters”,860,”Corporate”,”New York”

15,”Research”,150,”Eastern”,”Boston”

20,”Legal”,40,”Eastern”,”Washington”

38,”Support Center 1″,80,”Eastern”,”Atlanta”

42,”Manufacturing”,100,”Midwest”,”Chicago”

51,”Training Center”,34,”Midwest”,”Dallas”

66,”Support Center 2″,112,”Western”,”San Francisco”

84,”Distribution”,290,”Western”,”Denver”

Non-Delimited ASCII (ASC)

With the non-delimited ASCII file format, data values have a fixed length, and the position of each value in the file determines which column and row a particular value belongs to.

When data is written to a non-delimited ASCII file, rows are streamed into the file, one after another and each column’s data value is written using a fixed number of bytes. (If a data value is smaller that the fixed length allotted for a particular column, it is padded with blanks.) As with delimited ASCII files, a row delimiter is used to separate each individual record (row) — on UNIX systems the new line character (0x0A) is typically used; on Windows systems, the carriage return/linefeed characters (0x0D–0x0A) are used instead. Numeric values are treated the same as when they are stored in delimited ASCII format files.

Thus, a simple non-delimited ASCII file might look something like this:

10Headquarters       860Corporate   New York

15Research                150Eastern          Boston

20Legal                        40 Eastern         Washington

38Support Center   180Eastern        Atlanta

42Manufacturing    100Midwest       Chicago

51Training Center   34 Midwest       Dallas

66Support Center   211Western        San Francisco

84Distribution         290Western        Denver

 

PC Integrated Exchange Format (IXF)

The PC Integrated Exchange Format file format is a special file format that is used almost exclusively to move data between different DB2 databases. Typically, when data is written to a PC Integrated Exchange Format file, rows are streamed into the file, one after another, as an unbroken sequence of variable-length records. Character data values are stored in their original ASCII representation (without additional padding), and numeric values are stored as either packed decimal values or as binary values, depending upon the data type used to store them in the database. Along with data, table definitions and associated index definitions are also stored in PC Integrated Exchange Format files. Thus, tables and any corresponding indexes can be both defined and populated when this file format is used

Extensible Markup Language (XML)

Extensible Markup Language (XML) is a simple, yet flexible text format that provides a neutral way to exchange data between different devices, systems, and applications. Originally designed to meet the challenges of large-scale electronic publishing, XML is playing an increasingly important role in the exchange of data on the web and throughout companies. XML data is maintained in a self-describing format that is hierarchical in nature. Thus, a very simple XML file might look something like this:

<?xml version=”1.0″ encoding=”UTF-8″ ?>

<customerinfo>

<name>John Doe</name>

<addr country=”United States”>

<street>25 East Creek Drive</street>

<city>Raleigh</city>

<state-prov>North Carolina</state-prov>

<zip-pcode>27603</zip-pcode>

</addr>

<phone type=”work”>919-555-1212</phone>

<email>john.doe@xyz.com</email>

</customerinfo>

As noted earlier, only IBM InfoSphere Optim High Performance Unload for DB2 for Linux, UNIX and Windows can work with XML files.

db2move and db2look

As you might imagine, the Export utility, together with the Import utility or the Load utility, can be used to copy a table from one database to another. These same tools can also be used to move an entire database from one platform to another, one table at a time. But a more efficient way to move an entire DB2 database is by using the db2move utility. This utility queries the system catalog of a specified database and compiles a list of all user tables found. Then it exports the contents and definition of each table found to individual PC Integrated Exchange Format (IXF) formatted files. The set of files produced can then be imported or loaded into another DB2 database on the same system, or they can be transferred to another server and be imported or loaded to a DB2 database residing there.

The db2move utility can be run in one of four different modes: EXPORT, IMPORT, LOAD, or COPY. When run in EXPORT mode, db2move utilizes the Export utility to extract data from a database’s tables and externalize it to a set of files. It also generates a file named db2move.lst that contains the names of all of the tables that were processed, along with the names of the files that each table’s data was written to. The db2move utility may also produce one or more message files containing warning or error messages that were generated as a result of the Export operation.

When run in IMPORT mode, db2move uses the file db2move.lst to establish a link between the PC Integrated Exchange Format (IXF) formatted files needed and the tables into which data is to be populated. It then invokes the Import utility to recreate each table and their associated indexes using information stored in the external files.

And, when run in LOAD mode, db2move invokes the Load utility to populate tables that already exist with data stored in PC Integrated Exchange Format (IXF) formatted files. (LOAD mode should never be used to populate a database that does not already contain table definitions.) Again, the file db2move.lst is used to establish a link between the external files used and the tables into which their data is to be loaded.

Unfortunately, the db2move utility can only be used to move table and index objects. And if the database to be migrated contains other objects such as aliases, views, triggers, user-defined data types (UDTs), user-defined functions (UDFs), and stored procedures, you must duplicate those objects in the target database as well. That’s where the db2look utility comes in handy. When invoked, db2look can reverse-engineer an existing database and produce a set of Data Definition Language (DDL) SQL statements that can then be used to recreate all of the data objects found in the database that was analyzed. The db2look utility can also collect environment registry variable settings, configuration parameter settings, and statistical (RUNSTATS) information, which can be used to duplicate a DB2 environment on another system.