IT-Analysis.com
IT-Analysis.com Logo
Enterprise SME Business Issues Technology Services Channels
Module Header
Peter AbrahamsAbrahams Accessibility
Peter Abrahams
7th February - Android: Ice Cream Sandwich Accessibliity
David NorfolkThe Norfolk Punt
David Norfolk
7th February - BCS CMSG Conference 2012
Fern HalperFern Halper
Dr Fern Halper
31st January - Four Vendor Views on Big Data and Big Data Analytics: IBM
Fran HowarthBloor Security Blog
Fran Howarth
30th January - Getting ahead in the cloud
Philip HowardBloor IM Blog
Philip Howard
25th January - Cassandra and Hadoop
Blogs > The Norfolk Punt
OSS Innovation - with Ingres VectorWise
David Norfolk By: David Norfolk, Practice Leader - Development, Bloor Research
Published: 7th October 2009
Copyright Bloor Research © 2009
Logo for Bloor Research

It is now widely accepted that OSS (Open Source Software) can be industrial strength. Even the UK government has a policy which says that software should be acquired on the basis of its fitness for purpose and that OSS shouldn't be excluded (see here) - in fact, that if two products are equally fit for purpose, the OSS solution should be preferred, I think. An awful lot of websites already run on Apache, and JBoss is widely accepted as a valid choice of industrial-strength application server.

But can OSS really innovate at the fundamental level? Often, OSS projects seem to be playing catchup. I'm writing this in Open Office, which I like and which does some things better than Microsoft Office but which does seem to be following Microsoft's model. In particular, is fundamental, industrial-strength OSS innovation (not just neat research projects) possible with something as well established as relational database management system (RDBMS) technology? Relational databases have been around for years and SQL hasn't changed much - but you can still find popular OSS databases which don't really support referential integrity and transaction processing across their product ranges, which hardly seems terribly innovative (except in the sense that DB2 et al have supported this sort of thing for years, so that not doing so is rather a new thing).

Well innovation is driven by need and one which is currently appearing is the need to take full advantage of modern chip developments. Most RDBMSs were developed in the days of Intel's 286 architecture and really don't take full advantage of the latest developments such as pipelining, on-chip fine-grained parallel processing and multiple levels of on-chip cache yet - a level 1 cache miss can be very expensive (when in the worst case. simply updating a pointer in main memory can take 300 chip cycles). This is an aspect of an issue I've highlighted before: increasing data volumes and an emphasis on near-real-time decision support, to say nothing of the increasing need for "green computing" which can use hardware technology more efficiently in terms of electrical power consumption, means that applications must exploit the latest chip designs effectively. Many current applications can't do this well (programming for parallel processing is hard).

Intel, for example, has a new compiler feature that exploits its latest chip architecture and which markedly speeds up low-level operations that can exploit it. This is called "vectorization" [http://software.intel.com/en-us/articles/vectorization-with-the-intel-compilers-part-i/] but it is not widely exploited outside of the specialised games, graphics and "high performance computing" fields. In fact, you probably don't want general business application programmers playing with this stuff - they should be concentrating on what the business needs. One place to optimise chip utilisation is in the translation of high level SQL and XML code into low level C data access instructions, by, for example, the RDBMS - the exploitation of the physical chip architecture could then be transparent to business programmers. This is not the place to go into why rigorous hardware abstraction is a good thing for business programming - the continuing loyalty of IBM's iSeries customers and the rise of Virtual Machine architectures is a clue - but, trust me, it is.

And, which database, according to Roger Burkhardt (CEO and President of Ingres), is first to effectively exploit this new Intel capability, in a business processing context, according to Intel itself? Open-source Ingres, naturally, which seems to be on a bit of a roll at present (although I'm sure that Oracle and DB2 must be working on similar capabilities). The Ingres VectorWise project is a collaboration between Ingres and the VectorWise spin out from the database research team at CWI of Amsterdam, which has the support and involvement of Intel.

Talking to Roger, this innovation isn't the result of his access to a huge community of OSS programmers working for nothing, but because the OSS model gives him access to particular academic brains that he simply couldn't afford to hire or who wouldn't want to join a commercial company.

Ingres now incorporates "hardware conscious design" for its low level physical data access code and optimiser. The neat (absolutely essential, in fact) trick is that this allows programmers to write bog-standard SQL just as they've always done (with the caveat that accepted "good practice" database programming might be a good idea) while the Ingres DBMS generates "vectorwise" (vectorization-aware) code that can access data tens or hundreds of times faster than a conventional DBMS could manage. VectorWise simply takes effective advantage of the capabilities already built into the chips most new machines come with. You can find technical details of how this works on Ingres' site: http://www.ingres.com/vectorwise/. This works for more than just Intel chip technology; VectorWise exploits the individual chip compilers' knowledge of the internals of a whole range of different chips in order "to provide extraordinary results... we've seen this in Intel and AMD, and on both CISC and RISC architectures", Roger tells me.

What interests me, apart from the fact that this innovation comes from an OSS database project (and I do realise that OSS does innovate anyway, it's just that "vectorization support" strikes me as a particularly fundamental and exciting incremental step forward in innovation) is that it implements the absolute abstraction between the logical RDBMS (SQL) view and the underlying hardware that I think is a vital part of implementing an RDBMS. Intersystems Caché is another example of this, using different technology - but Caché isn't OSS. Business programmers using Ingres write open standards SQL for Java application servers just as they do currently; the latest Ingres RDBMS writes the code needed to exploit the low-level design of modern Intel chips and make data access fast. The increased fine-grained parallelism and exploitation of on-chip cache is entirely transparent to the business programmers, as it should be.

So, what's not to like? Well, I'll have to investigate further but I've thought of nothing much that's specific to Ingres so far. There are possible generic issues. I don't know yet how robust both vectorisation and Ingres' optimiser are to poor quality code. Apart from the obvious issue of coding high level bottlenecks into a program (queuing up around a unique ID generator seed that makes a remote call to check something before returning wouldn't be a great idea) it is possible that "good practice" programming style is required for vectorisation and the optimiser to work at their best. You can train programmers properly (now, there's an interesting idea - how many companies recruit self-certified programming experts without first checking their claimed expertise) but you can't mandate good practice with the aid of a coding standards manual. You need tools that can alert programmers to coding or design antipatterns before they commit to them in hard code - perhaps tools such as eoSense and others (see my blog here) could be modified to encourage "vectorisation-friendly" programming patterns - I must ask). Perhaps such tools could be built into the programming environment but I don't think they widely are, as yet.

And, of course, there's the general issue that the performance of SQL calls isn't really important in itself. What really matters is the end-to-end user experience (of which SQL performance is but a part). The new breed of developer focuses first and foremost on the business service delivered (I hope) - is it fast enough for what the business needs (which may be very fast indeed in the case of; a financial services trading application in which algorithms interact with algorithms); is it "fit for purpose". Only after deciding that should you worry about database access as a possible bottleneck.

Although, that said, using the electricity passing through the chip efficiently for calculations rather than for warming the room is increasingly important - perhaps if you can use the full capabilities of your chips, you'll need fewer of them and generate less heat from less (expensive) electricity. That might fit a business need many companies are now finding that they have.

Nevertheless, there is one specific question to ask of this technology - how does this vectorised Ingres code cope when the data has to move onto a platform which doesn't use the latest chips (remembering that VectorWise isn't Intel-specific)? Presumably, the high level business SQL doesn't change at all and the Ingres RDBMS on the older legacy platform simply generates code appropriate to it. Perhaps code that is optimised to work on the latest chops turns out to be as optimal as you can get for legacy chips too - which would be nice. However, since most enterprises can't afford to move their processing holus-bolus to the latest architectures, it's a question worth exploring and, of course, perhaps some applications now possible with VectorWise will have performance constraints which mean they can only run on the latest chips, which is something organisations will have to manage.

Overall, however, I think that this Ingres development is an exciting innovation for business. It's necessary because so many automated processes generate data these days so there are lots of data available and lots of analytics programs being invented to exploit them. Whether it is a coup for OSS is really a side issue - we already know what the next generation of processors will look like, more or less, constrained by the real laws of physics. Using a database that can exploit their capabilities effectively is a no-brainer - whether it is OSS or proprietary software - as long as people remember that the effectiveness of the overall business process, not the efficient database processing it may exploit, is what really pays their wages....

Reader Comments

We automatically stop accepting comments 180 days after a post is published. If you would like to know more about this subject, please contact us and we'll try to help.

8th October 2009: 'Dale B. Ritter, B.A.' said:

The exciting analysis of chip application optimization is a refreshing, modern, highly informative note. Progress on chip loading and utilization is exponential for data density, and the processes of chip advancement and software sophistication are synergistic. That data density factor depends on the software design, which has a new mode in the picoyoctometric, 3D, interactive video atomic model imaging function named the GT integral. It has software research relevance by it's data flow model, with algebraic command of manifolds and scales, and innovates computer science.
Images of the h-bar magnetic energy particle of ~175 picoyoctometers are online at http://www.symmecon.com with the complete atomic modeling manual titled The Crystalon Door.

Reply to Dale B. Ritter, B.A.?

9th October 2009: 'David Norfolk' said:

Yes, sounds good, if somewhat advertorial.

The trouble is, of course, that the limits to automated business are more in the business area - a lack of really innovative business ideas - than in the technology. But at least modern chip technology is helping technology keep out of the way.

Reply to David Norfolk?

Advertisement



Published by: IT Analysis Communications Ltd.
T: +44 (0)190 888 0760 | F: +44 (0)190 888 0761
Email: