I'm an old database hand—and I think that using such
technologies effectively is still an aspect of "good IT
governance". I can remember times before RDBMSs (Relational
Database Management Systems) and deciding that the speed of IBM's
IMS Fast Path database outweighed its lack of flexibility, and
that the new Oracle thing had problems with its optimiser, some
20 or 30 years ago. How things have changed and RDBMSs are
now—almost—ubiquitous (in the public consciousness,
anyway; IMS is still a current product). But you can still make
me lose my rag by saying things like "RDBMS is all very well,
and very flexible but it is a bit slow". Speed is simply
nothing to do with the RDBMS specification (which is a logical
implementation of relational set theory)—if an RDBMS is
slow, it is simply because it has been implemented badly
(usually, because someone thinks that tables have to exist in
physical storage as well as in the logical view—you can
make such implementations fast, but it's a struggle).
There are a few counterexamples of fast RDBMS implementations
with complete abstraction between the logical view and the
physical storage.
Intersystems Caché is a fine example—on disk, the data
is in a sparse array of bits; logically it can be a fast RDBMS.
Or OODBMS. Or Mumps etc. Similarly, Sybase IQ looks like an RDBMS
accessed with SQL—but underneath it's entirely different
(and column, not row oriented).
Now, I've just been talking to Emma McGrattan (SVP Engineering at
Ingres) about VectorWise, which brings abstraction to Ingres
RDBMS.
Put simply, VectorWise provides a SQL (logical RDBMS) front end
to a physically column-oriented data store, with code that takes
full advantage of modern 8086 chip architectures. But they key
issue it addresses is that access to main memory, not disk, is
the performance bottleneck today and it is essential to use
on-chip cache effectively. What VectorWise means is that BI-type
queries run many times faster than they would in a traditional
RDBMS implementation, optimised for transaction processing. What
this means in turn is that the cheap commodity multicore hardware
can run your BI applications as fast as you hoped when you boot
the stuff, without applications having to be rewritten for the
new architectures. This will also go some way towards addressing
multicore issues.
That's good, obviously, especially as Ingres also comes with an
enterprise culture (probably from its time in CA) and attitude to
support; as well as with an Open Source culture. There are lots
of good things about Open Source software but if the support
isn't similar to what you'd get from IBM/DB2, few enterprises
will be interested.
What is really interesting to me, however, is the possibilities
going forward (and here I must stress that I'm not revealing an
Ingres roadmap; these are just my speculations).
For a start, Ingres currently addresses two kinds of processing
with a row-oriented and column-oriented store. Both look like an
RDBMS and use SQL but, underneath, you replicate row-oriented
transactional data into a column-oriented store for BI queries.
That's sensible (and pretty common). But, there's no fundamental
reason why you shouldn't have one store, with the optimiser
choosing to store data in a row-oriented or column-oriented form,
as best suits the current workload and chip architecture. The
advantage of this would be that near-real-time decision support
applications could access transactional data directly without the
latency associated with moving data into a different kind of
store.
Then again, this technology could revolutionise programming
tools. At the moment, most tools produce horrid SQL queries, from
programmers who don't understand query design anyway, and the
Ingres optimiser refactors these queries into sensible ones that
run efficiently on modern chips. That's good—but suppose
(we are in the Open Source world) someone takes the Ingres
optimiser and embeds it in a programmers IDE so that queries are
optimised at the start—and feedback from the optimiser can
educate programmers in query design? Might that be more efficient
overall?
And, of course, VectorWise isn't limited to Intel x86
architecture (Ingres is already looking at AMD and reports
similar application). If Ingres ever gets around to implementing
VectorWise for different platforms (mainframes etc.) you could
simple take an application optimised for x86 and simply recompile
it with VectorWise for any other platform. And then we get back
to the ideal of database portability SQL promised before Oracle
got its hands on it in the interest of "efficiency". The
difference with the way VectorWise optimises for particular
hardware platforms is that, unlike SQL extensions, the hardware
optimisation is entirely transparent to programmers. Abstraction
is a wonderful thing—if it is done completely and
absolutely, because it isolates business automation choices from
technology changes and vendor marketing decisions.