Business Issues Channels Enterprise Services SME Technology
Module Header
Craig WentworthMWD Advisors
Craig Wentworth
16th April - Egnyte the blue touchpaper...
Louella FernandesLouella Fernandes
Louella Fernandes
11th April - Managed Print Services: Are SMBs Ready?
Louella FernandesLouella Fernandes
Louella Fernandes
11th April - The Managed Print Services (MPS) Opportunity for SMBs
Simon HollowayThe Holloway Angle
Simon Holloway
11th April - Intellinote - capture anything!
David NorfolkThe Norfolk Punt
David Norfolk
11th April - On the road to Morocco


SQL on Hadoop
Philip Howard By: Philip Howard, Research Director - Data Management, Bloor Research
Published: 17th February 2014
Copyright Bloor Research © 2014
Logo for Bloor Research

The good thing about running SQL on Hadoop is that SQL is a declarative language, which means that you don’t need to know where the data is, you just have to ask for it and then the database works out how to get the information you need. However, unless you have a database optimiser the performance will suck.

Now there are various SQL initiatives around but probably the most advanced is Impala. And in version 1.2, which was introduced at the end of December, Cloudera introduced facilities to optimise join order but, while this is a step in the right direction, it hardly constitutes a full-blown optimiser.

However, a couple of related announcements have caught my eye this week. The first was that Calpont has changed its name to the name of its product InfiniDB, it has raised another round of funding and it has announced version 4.5 of its database with an Enterprise Management dashboard. None of which has much to do with Hadoop except that it reminded me that Calpont (as it then was) announced the availability of InfiniDB running on Hadoop back last year, along with an open source license. And, of course, InfiniDB has a grown-up optimiser.

Another product that has an adult optimiser is HP Vertica. And MapR has just announced an early access program (prior to general availability in March) for the HP Vertica Analytics Platform running on the MapR Hadoop distribution.

The truth is that you will get much better performance—orders of magnitude better—from either InfiniDB or Vertica than you will from Impala. So this poses three questions: firstly, will we see more vendors porting their warehouse products onto Hadoop (or HDFS); secondly, how quickly will Cloudera or HortonWorks (with its SQL implementation) be able to produce an optimiser than can compete reasonably well with these intruders into their market; and, thirdly, how much does this matter?

The answer to the first question is yes. I don’t who or when but this is the general trend, not just in data warehousing but across a variety of markets. The answer to the second question is not soon: it takes years to develop a good optimiser—probably not as many years as it used to, because there is plenty of experience out there, which was not the case historically—but still a significant period.

Thirdly, yes it matters. You may have to pay a license fee for HP Vertica (or not, in the case of InfiniDB) but the performance advantages you get from having a decent optimiser will mean that you need significantly less hardware in order to get comparable performance, and that should more than offset any such license fees. And that also explains why I expect more vendors to do the same thing as InfiniDB and Vertica, because there is a window of opportunity while Cloudera gets its optimiser up to speed.


Published by: IT Analysis Communications Ltd.
T: +44 (0)190 888 0760 | F: +44 (0)190 888 0761