IT-Analysis.com
IT-Analysis.com Logo
Technology Big Data
Business Issues Channels Enterprise Services SME Technology
Module Header
Craig WentworthMWD Advisors
Craig Wentworth
16th April - Egnyte the blue touchpaper...
Louella FernandesLouella Fernandes
Louella Fernandes
11th April - Managed Print Services: Are SMBs Ready?
Louella FernandesLouella Fernandes
Louella Fernandes
11th April - The Managed Print Services (MPS) Opportunity for SMBs
Simon HollowayThe Holloway Angle
Simon Holloway
11th April - Intellinote - capture anything!
David NorfolkThe Norfolk Punt
David Norfolk
11th April - On the road to Morocco

Analysis

SQL and NoSQL
Philip Howard By: Philip Howard, Research Director - Data Management, Bloor Research
Published: 21st March 2013
Copyright Bloor Research © 2013
Logo for Bloor Research

It is apparent that there is significant demand for SQL in the NoSQL world. Whether it is HQL (the Hive version of SQL) or CQL (the Cassandra version) there are clearly people who like a declarative as opposed to a procedural approach to accessing big data.

However, there is a big problem here: the reason why SQL works as well as it does in traditional, usually relational, environments is that the databases it is accessing have optimisers. Those optimisers fundamentally do two things: first, they rewrite crappy SQL code into something that is as efficient as it can be. That's a big advantage: anybody can write poor code, whether you're using MapReduce or SQL, but with an optimiser that gets handled for you. Of course, there's a downside: it encourages sloppy programming but that's another story.

The second thing that a database optimiser does for you is to determine how best to perform joins and other SQL functions. Of course, you may not have the possibility of a join within a NoSQL database but that in itself is a restriction. Anyway, the optimiser is also optimising the performance of all the other SQL capabilities that are available and some of these, at least, will be available in NoSQL databases.

So the bottom line is that you aren't going to get much in the way of performance out of simply bolting SQL onto a NoSQL source unless you also design in an optimiser.

Interestingly, this is essentially what EMC has done with Hawq SQL in its Pivotal HD product, which is essentially a re-working of the EMC Greenplum parallel engine to support Hadoop. By coming from a conventional database background that already has an optimiser, extending or re-working it to support Hadoop environments is a great deal easier than simply creating a Hadoop optimiser from scratch, and this is precisely what Hawq offers.

Going a step further, it is worth remembering that Teradata's polymorphic storage for Aster Data provides you with the ability to use either a native Aster storage engine or an HDFS storage engine (or both) under Aster Data. And, of course, to do this effectively you need to extend the optimiser so that it supports the whole environment. See also my article "DB2: a relational epithet is no longer enough" in which I described the fact that DB2 already has three storage engines, two of which are NoSQL engines (XML storage and a graph store - though you would typically use XQuery and SPARQL to access these), and that it is likely to add more such stores in the future. Again, one of the reasons why adding new storage engines is no trivial matter is precisely because of the need to extend the optimiser to understand the different storage. However, once you do so you are likely to get much better performance than you would do from a NoSQL database with SQL but no optimiser.

Advertisement



Published by: IT Analysis Communications Ltd.
T: +44 (0)190 888 0760 | F: +44 (0)190 888 0761
Email: