IT-Analysis.com
IT-Analysis.com Logo
Services Support & Maintenance
Business Issues Channels Enterprise Services SME Technology
Module Header
Craig WentworthMWD Advisors
Craig Wentworth
16th April - Egnyte the blue touchpaper...
Louella FernandesLouella Fernandes
Louella Fernandes
11th April - Managed Print Services: Are SMBs Ready?
Louella FernandesLouella Fernandes
Louella Fernandes
11th April - The Managed Print Services (MPS) Opportunity for SMBs
Simon HollowayThe Holloway Angle
Simon Holloway
11th April - Intellinote - capture anything!
David NorfolkThe Norfolk Punt
David Norfolk
11th April - On the road to Morocco

Analysis

So, what is data virtualisation?
Philip Howard By: Philip Howard, Research Director - Data Management, Bloor Research
Published: 27th January 2011
Copyright Bloor Research © 2011
Logo for Bloor Research

Data virtualisation is the latest technology to enjoy its moment in the hypelight and there has been some considerable debate within the blogosphere about what it actually is and what its relationship is to data federation, data integration and EII (enterprise information integration).

Rather than start from scratch I thought I would go back through my files and see what I had written about this in the past (if anything). I found the following definition of an EII platform (that is, what you need to support EII, which is, after all, about information rather than mere data). What I wrote, some three years ago, was that an EII platform needs to do four things:

  1. “It virtualises your data – it makes all relevant data sources, including databases, application environments and other places where data may be sourced, appear as if they were in one place so that you can access that data as such.
  2. “It abstracts your data – that is to say, it conforms your data so that it is in a consistent format regardless of any native structure and syntax that may be in use in the underlying data sources.
  3. “It federates the data – it provides the connectivity that allows you to pull data together, from diverse, heterogeneous sources (which may contain either operational or historical data or both) so that it can be virtualised. It should also enable things like push-down optimisation so that query joins can be mastered in the optimal place.
  4. “It presents the data in a consistent format to the front-end application (typically, but not always, a BI tool) either through relational views (via SQL) or by means of web/data services, or both.”

Actually, I didn’t quite write that: I have updated it somewhat but the gist is the same.

Clearly, data federation is not the same as data virtualisation. Moreover, federation is not necessary for virtualisation, depending on why you are doing the virtualisation. If you want to link a number of data marts together so that you can query across them then clearly the query optimisation capabilities of a federation engine will be necessary. On the other hand, if you want to create Mashups or other applications that have relatively lightweight access requirements, or you want to use virtualisation to support MDM-like capabilities, then such functions may not be necessary. Instead you can use data services. Data services may also be more appropriate in environments where less of the data is relational and more of it comes from a variety of unstructured sources or from the web. Indeed, there is a whole new discussion to be had about the distinctions between data virtualisation for unstructured data and structured data (or a combination of the two) but that’s a subject for another day.

The other question that arises is whether parts 1, 2 and 4 are all actually parts of the same thing. I think 2 and 4 probably are or, at least, the differences are so slight that there is no point in making a distinction.

Parts 1 and 2 are another issue. If data virtualisation is about having a virtual data source that does not necessarily mean that it is easy to work with. It is certainly easy to imagine a huge hybrid database that contains relational and non-relational data, pdf documents and a whole bunch of other things, but that would not necessarily mean that the data was all in a common format and, therefore, easy to work with. So, I think both 1 and 2 are required and are different. It is certainly true that it does not make much sense to implement data virtualisation without an abstraction layer but that doesn’t mean they are the same thing.

Finally, I haven’t talked about data integration at all. Well, the fact is that leading data integration products support data services so you should certainly be able to virtualise data sources even if you can’t federate them (they won’t typically have the sort of distributed query optimiser you would want from a data federation product). The question will be how easy it is to build the abstraction layer with a data integration tool. Of course, you can create all the transformations and mappings necessary for this purpose but what you would really like is something that automates a lot of this abstraction rather than requiring you to build it for yourself. It is in these two areas—federation and automated abstraction—that the pure players in the market, especially Composite Software and Denodo, have a significant advantage over the data integration vendors.

Advertisement



Published by: IT Analysis Communications Ltd.
T: +44 (0)190 888 0760 | F: +44 (0)190 888 0761
Email: