Technology Data Management
Business Issues Channels Enterprise Services SME Technology
Module Header
Craig WentworthMWD Advisors
Craig Wentworth
16th April - Egnyte the blue touchpaper...
Louella FernandesLouella Fernandes
Louella Fernandes
11th April - Managed Print Services: Are SMBs Ready?
Louella FernandesLouella Fernandes
Louella Fernandes
11th April - The Managed Print Services (MPS) Opportunity for SMBs
Simon HollowayThe Holloway Angle
Simon Holloway
11th April - Intellinote - capture anything!
David NorfolkThe Norfolk Punt
David Norfolk
11th April - On the road to Morocco


Turbo charging data quality
Andy Hayler By: Andy Hayler, CEO, The Information Difference
Published: 14th April 2009
Copyright The Information Difference © 2009
Logo for The Information Difference

Master data management initiatives are now being deployed with sizeable data volumes. A few years ago 10 million master data records was quite chunky, but we now see some examples of 100 million master data record applications. Simply processing such volumes has issues, and then you have to consider how you are going to keep your new shiny data in mint condition. You can put in a data quality "firewall" which, for example, will check for potential duplicate records about to be entered in, say, an order processing system. However applying clever matching algorithms to large volumes of data and still expecting a sensible response time is problematic.

This background makes the general availability of the DataRush engine by Pervasive Software potentially interesting, a company long established in the field of embeddable databases and data integration. The DataRush technology uses highly parallel techniques to enable processing of large amounts of data very quickly. For example common data quality algorithms such as "edit distance" are delivered with the engine, meaning that such common tasks as name and address checking can be done quickly. Beta applications at companies such as TC3, who process large volumes of health care claims, have seen some dramatic performance improvements over previous approaches. Another documented example is at PIERS, a company that collect bills of loading in the shipping world and analyses these to help companies understand trends in international trade. Extensive processing is needed to eliminate duplicate data before the data can be turned into meaningful information.

There is no shortage of business use cases in the world where data quality processing has to be applied to large volumes of data (another example is mortgage claims), so there should be a substantial market for something that can make this go much faster. This technology has the potential to be picked up by data quality software providers, and perhaps MDM vendors (MDM applications have a significant data quality component) to turbo-charge their own products. Given Pervasive's track record of producing reliable embeddable software, they will be taken seriously. The engine could in principle be used in other use cases, such as analytics, but data quality is the obvious focus at present. In addition to software vendors, there are plenty of systems integrators that custom-build applications in specialist areas with data quality elements, and in some of these cases volume and processing time will be a major issue.

It is early days, but with growing interest in data quality (a market that grew 17% in 2008 according to our latest research) and increasing need to deal with high data volumes, DataRush could be in the right place at the right time.


Published by: IT Analysis Communications Ltd.
T: +44 (0)190 888 0760 | F: +44 (0)190 888 0761