In a crowded software market it is difficult to make your marketing message stand out from the crowd, so I was intrigued to see a new data quality start-up with a genuinely eye-catching press release: "DataQualityFirst Makes Sense of Madoff Account Data". It turns out that during the various legal proceedings underway regarding the Madoff Ponzi scheme, a US judge placed a file of Madoff's unfortunate investors in the public domain, apparently in order to help claimants realise that they could register for a potential claim against the now bankrupt Madoff organisation.
The newspapers naturally had their prurient fun identifying famous names of betrayed investors, but DataQualityFirst, a Massachusetts-based start-up, uses the file as a case study for its new software. The Madoff file is quite instructive as it is just the typical kind of poorly structured file which we have all encountered in the world of real-life data projects. The file has no metadata, just six columns, with over 13,000 records. The PartyQualityInsight (PQI) product was used to detect "parties" (such as investors, custodians, intermediaries, estates) from the raw data and, after just four simple rules were defined (e.g. "first name found is the likely account holder") produced a neat analysis of the data. Most of the records had more than one "party" so first PQI figured out there were a total of more than 23,000 "party instances" to work with. Then PQI automatically reduced that down to 13,593 unique people and businesses and showed where there are relationships between them. A few other points surfaced, such as the clustering of Madoff customers at certain addresses, as presumably these were clients recommending the Madoff scheme to their neighbours.
The point underlying the demonstration is not the specifics of this case but how quickly the software could profile the data and make sense of the structure inherent in it. While many profiling tools could do something similar, I liked the very intuitive "data health" type scorecard display that the software has, which makes it is easy to see at a glance the general state of the data. The company makes the important but often-missed distinction about understanding that a person or company can have many roles. I can be a customer of a company and also a supplier to it—i.e. be different "parties".
The company founders have a solid background with Vality, one of the pioneers of data quality, and have taken the unusual approach of basing their application on top of the existing and well proven Vality tool set (now owned by IBM and sold as Quality Stage). This technology has considerable history so is well proven and has quite powerful functionally, but typically requires a fair bit of set up and configuration to apply in practice. By taking advantage of this existing technology PartyQualityInsight, while limiting its market somewhat to the IBM world, at least does not have to worry about justifying whether its underlying technology works.
It is early days for the company but the idea of using the Madoff dataset as a case study was a very inventive piece of PR. If their software turns out to be as innovative as their marketing then they should do well.