Oracle's announcement of an OEM of Silver Creek Systems' DataLens product (31st March) is interesting in a number of ways. Firstly it helps to legitimize the product data quality problem. Most data quality vendors have developed from the route of tackling customer name and address, a very common problem that almost every company recognises. Fortunately addresses are well structured and a number of algorithms are available to help detect partly misspelt or incomplete addresses, and then fix data records using these (and perhaps enrich them, by adding additional information such as which voting constituency the address is within, using one of numerous information providers).
However, product data is a different beastie altogether from address data. It is unstructured (despite some attempts at industry standards like UNSPSC, these are far from universal) , and can be extremely complex, often with hundreds of attributes (price, dimensions, components,...). In some industries the problem is immense, with for example electronic components catalogs typically being vast. Product data for, say, a catalog is often drawn from multiple systems which may have partial or overlapping product data (a 2008 Information Difference survey found that the average large company has nine separate systems generating product data). Despite this, there has been little recognition of the problem in the market, with just a few vendors such as Silver Creek (Inquera and Datactics are others) who have specialised in this product data quality niche. Silver Creek has been a pioneer in raising awareness of the issue.
Silver Creek approaches the tackling the quality of product data records by building up semantic rules that allow the software to recognise the elements of product information even from a partially constructed record. By working with a domain expert the software learns the semantic rules which apply and gets steadily better at recognising input records. The key is that the rules which it builds up are not hard-coded patterns, and so many less rules are needed to be developed than in the case of pattern-based technology approaches.
In addition to being a useful endorsement that the problem itself is real, Oracle's announcement further demonstrates that traditional approaches do not work well with product data, since Oracle already OEMs data quality products which tackle the customer name and address problem.
The acknowledgment by Oracle that product data is a thorny problem is welcome in itself, and will be helpful to its customers who experience pain in this area. It is a big feather in Silver Creeks' cap, since Oracle is such a powerful player, and should lead to plenty of new opportunities for its software if it can exploit the vast and well-oiled Oracle sales distribution channel.