IBM has just announced the general availability of DB2 10 for Linux, UNIX and Windows. As might be expected from a major release, there is a lot in it, though the focus is more on providing improvements than in introducing new capabilities, though there are still some of those. However, the first thing to note is that this release re-unifies the code lines for pureScale and InfoSphere Warehouse. Previously the code line was unified up to version 9.7 but a special version 9.8 was introduced specifically with the introduction of pureScale: but this is now corrected. Specifically with regard to pureScale it means that this now offers more flexible hardware support as well as geographically dispersed clusters.
The release has three focus areas: lower operational costs, ease of development and reliability. In the first of these categories are improved performance, improved compression, a new fast loader and improved migration automation from Oracle Database. This is the sort of thing that you expect from releases: more and better of what you did before. Compression (now known as "adaptive compression") is now implemented both at the table level and at the page level, which will be more efficient than was previously the case. As far as Oracle migrations are concerned, IBM is now claiming 98% + compatibility as opposed to 95-98% with the previous release.
In the case of performance, the workload management has been improved and there is a new feature called a jump scan. This latter is interesting because it represents the reuse of technology acquired from Netezza. Those familiar with Netezza will know that it uses a technique known as a Zonemap that tells the database optimiser where data is not, so that relevant queries do not need to look at those blocks on disk, thereby increasing performance. Jump Scan uses a similar technique. While not employing Zonemaps per se, Jump Scans re-deploy some of the algorithmic work that Netezza has done in this area.
Perhaps the two most interesting new features of the whole release are the ability to store object histories and support for triples (along with SPARQL, which is the query language for triples).
In terms of object histories you can designate selected objects and can then store history for these objects. In combination with this you can then perform point-in-time queries. In other words, a query based on the information known at some designated point in the past. This can be important for a variety of reasons, including compliance. Interestingly, you can also project to a point in the future and therefore use this facility as a sort of what-if capability.
However, it is the triple store support that to me is most interesting. I am not sure that many users are thinking about triple stores yet but they will become increasingly significant. A triple is essentially something stored in subject-verb-object format with the exception that each element may have qualifiers, for example the verb might be "ownscar" and the object "2012BMW". Triples are important because this is the way that the semantic web (web 3.0) works and the details therein can be captured and manipulated using triples in what is known variously as a triple store, graph store or graph database. Graphs are also especially useful for capturing and exploring relationship information such as "Fred" "is the cousin of" "Frank" or "Martha" "lives in the same house as" "Sven".
DB2 does not implement a triple store per se but you can, in effect, tag tables so that relevant data appears to form a triple. Interestingly, IBM has benchmarked DB2 for triples against Jena TDB, which is the Apache open source project in this space, and it reckons it can show about three times better performance for some workloads. This sounds good but I would be more impressed if IBM had conducted its benchmark against one of the leading commercial graph databases such as Neo4j. Still, I am pleased to see IBM being innovative in this area and, as I have said, it is early days for triple stores, so we will have to await developments.