Technology Data Management
Business Issues Channels Enterprise Services SME Technology
Module Header
Louella FernandesLouella Fernandes
Louella Fernandes
22nd April - Internet of Things: A New Era for Smart Printing?
Simon HollowayThe Holloway Angle
Simon Holloway
18th April - Virgin Media expose private email addresses
Craig WentworthMWD Advisors
Craig Wentworth
17th April - Box's enterprise customers step forward to be counted
Craig WentworthMWD Advisors
Craig Wentworth
16th April - Egnyte the blue touchpaper...


An Important Day for Product Data Quality
Andy Hayler By: Andy Hayler, CEO, The Information Difference
Published: 20th April 2009
Copyright The Information Difference © 2009
Logo for The Information Difference

Oracle's announcement of an OEM of Silver Creek Systems' DataLens product (31st March) is interesting in a number of ways. Firstly it helps to legitimize the product data quality problem. Most data quality vendors have developed from the route of tackling customer name and address, a very common problem that almost every company recognises. Fortunately addresses are well structured and a number of algorithms are available to help detect partly misspelt or incomplete addresses, and then fix data records using these (and perhaps enrich them, by adding additional information such as which voting constituency the address is within, using one of numerous information providers).

However, product data is a different beastie altogether from address data. It is unstructured (despite some attempts at industry standards like UNSPSC, these are far from universal) , and can be extremely complex, often with hundreds of attributes (price, dimensions, components,...). In some industries the problem is immense, with for example electronic components catalogs typically being vast. Product data for, say, a catalog is often drawn from multiple systems which may have partial or overlapping product data (a 2008 Information Difference survey found that the average large company has nine separate systems generating product data). Despite this, there has been little recognition of the problem in the market, with just a few vendors such as Silver Creek (Inquera and Datactics are others) who have specialised in this product data quality niche. Silver Creek has been a pioneer in raising awareness of the issue.

Silver Creek approaches the tackling the quality of product data records by building up semantic rules that allow the software to recognise the elements of product information even from a partially constructed record. By working with a domain expert the software learns the semantic rules which apply and gets steadily better at recognising input records. The key is that the rules which it builds up are not hard-coded patterns, and so many less rules are needed to be developed than in the case of pattern-based technology approaches.

In addition to being a useful endorsement that the problem itself is real, Oracle's announcement further demonstrates that traditional approaches do not work well with product data, since Oracle already OEMs data quality products which tackle the customer name and address problem.

The acknowledgment by Oracle that product data is a thorny problem is welcome in itself, and will be helpful to its customers who experience pain in this area. It is a big feather in Silver Creeks' cap, since Oracle is such a powerful player, and should lead to plenty of new opportunities for its software if it can exploit the vast and well-oiled Oracle sales distribution channel.


Published by: IT Analysis Communications Ltd.
T: +44 (0)190 888 0760 | F: +44 (0)190 888 0761