Fraud detection? Not so elementary, my dear! (Part 2)
April 24, 2014 Leave a comment
Radha Gowda, Technical Marketing, IBM Analytics
The first part of this blog gave an overview of IBM Watson Foundation portfolio and DB2 solutions for financial fraud detection. In this part, we’ll go over DB2 Warehouse features that help detect fraud in near-real-time.
Figure 1: DB2 warehouse for operational analytics
Data warehouses integrate data from one or more disparate sources to provide a single view of the business and have that single repository available to all levels of the business for analysis. To support today’s workloads, the data warehouse architecture must optimize both traditional deep analytic queries and shorter transactional type queries. It must be able to scale out under data explosion without compromising on either performance or storage. And, it must have the capacity to load and update data in real-time. DB2 for Linux, UNIX and Windows offers you all these capabilities and more to help you build a scalable and high performing warehouse for near real-time fraud detection.
DB2 warehouse components are organized into six major categories as shown in Figure 2. We shall discuss only the highlighted ones that help make near-real-time fraud detection a reality.
Figure 2: Warehouse components available in DB2 advanced editions
As we discussed before, fraud detection is knowledge intensive. It involves sifting through vast amount of data to identify and verify patterns, and construct fraud models to help with real-time detection of fraudulent activity.
DB2 offers embedded analytics, in the form of OLAP and data mining.
Data Mining enables you to analyze patterns and make predictions. Unlike solutions that require end users to extract data from the warehouse, independently analyze it and then send the results back to the warehouse, DB2 provides embedded data mining, modeling, and scoring capabilities.
Modeling– the process starts with historical data being gathered and put through a series of mathematical functions to classify, cluster and segment the data. It automatically finds associations and business rules in the data, which may signify interesting patterns (imagine customers’ credit card purchasing patterns). The business rules are then collected together into a model, which can have a few or tens of thousands of rules.
Visualization helps analysts evaluate the business rules to make sure that they are accurate.
Scoring involves applying the verified business rules to current data to help predict transactions that are likely to be fraudulent in real time.
For example, consider credit card spending patterns outside the norm. While outlier rules (detecting deviations in large data sets) can be applied to a banking transaction when it enters the system to help predict whether it is fraudulent, outlier handling is not usually automatic. An expert needs to take a closer look to decide whether to take action or not. This is where Cognos comes to help – to generate reports to visualize the outliers so a human expert can understand the nature of an outlier.
DB2 supports standard data mining model algorithms such as clustering, associations, classification and prediction; additional algorithms may be imported in industry-standard Predictive Model Markup Language (PMML) format from other PMML-compliant data mining applications including SAS and SPSS. This capability enables high-volume, high-speed, parallelized scoring of data in DB2 using third-party models.
Cubing Services provide decision makers a multidimensional view of data stored in a relational database. It supports OLAP capabilities within the data warehouse and simplifies queries that run against large and complex data stores. The multidimensional view of data leads to easier discovery and understanding of the relationships in your data for better business decisions. In addition, Cubing Services cubes are first-class data providers to the CognosBusiness Intelligence platform for incorporating predictive and analytic insights into Cognos reports.
Unstructured Data – up to 80 percent of the data within an organization is unstructured. DB2 can extract information from your unstructured business text and correlate it with your structured data to increase business insight into customer issues. DB2 also allows you to process unstructured data and create multidimensional reports using OLAP capabilities. In addition, unstructured data can be integrated into data mining models to broaden predictive capabilities.
DB2 Spatial Extender allows you to store, manage, and analyze spatial data in DB2, which along with business data in a data warehouse helps with fraud analysis.
Temporal Data helps you implement time-based queries quickly and easily. Historical trend analysis and point-in-time queries can be constructed by using the history tables and SQL period specifications that are part of the database engine.
Database Partitioning Feature (DPF) for row based data store– As data volume increases over time, the data might become skewed and fragmented, resulting in decreased performance. DPFdistributes table data across multiple database partitions in a shared-nothing manner in which each database partition “owns” a subset of the data. It enables massive parallel processing by transparently splitting the database across multiple partitions and using the power of multiple servers to satisfy requests for large amounts of information. This architecture allows databases to grow very large to support true enterprise data warehouses.
Data Movement and Transformation
Continuous Data Ingest (CDI) allows business-critical data to be continually loaded into the warehouse without the latency associated with periodic batch loading. It allows the warehouse to reflect the most up-to-date information and can help you make timely and accurate decisions. Consider for example, receiving a lost credit card log, a potential credit card fraud alert, from the call center. Such an event is ingested into the warehouse immediately rather than wait until a batch load occurs on pre-defined intervals. Using such contextual information along with account transaction data can help in real-time fraud detection.
In fact, after experiencing just how beneficial CDI feature is, some of our clients have renamed their Extract, Transform, and Load (ETL) processes to Extract, Transform, and Ingest (ETI).
All these features are available in DB2 advanced editions and IBM PureData System for Operational Analytics to help you deliver near-real-time insights.
Now, are you meeting the service level agreements for performance while trying to prevent fraud in real time? Not sure? Why don’t you give DB2 with BLU Acceleration or other IBM Data Management solutions a try? Perhaps IBM Data Management Solutions can help you achieve your business objectives.
Yes, fraud detection is not so elementary. But with the right clues, I mean with the right software and tools, it could be made elementary.
Follow Radha on Twitter @radgo1
Read the IBM Data Management for Banking whitepaper for more information on how IBM can help banks gain a competitive edge!