Business Issues Channels Enterprise Services SME Technology
Module Header
Louella FernandesLouella Fernandes
Louella Fernandes
11th April - Managed Print Services: Are SMBs Ready?
Louella FernandesLouella Fernandes
Louella Fernandes
11th April - The Managed Print Services (MPS) Opportunity for SMBs
Simon HollowayThe Holloway Angle
Simon Holloway
11th April - Intellinote - capture anything!
David NorfolkThe Norfolk Punt
David Norfolk
11th April - On the road to Morocco
Philip HowardBloor IM Blog
Philip Howard
10th April - Attunity Gold Client


Big data integration
Philip Howard By: Philip Howard, Research Director - Data Management, Bloor Research
Published: 5th September 2013
Copyright Bloor Research © 2013
Logo for Bloor Research

In the previous articles in this series I have considered the need for trust in, context around and the security of big data. In each of these cases, governance capabilities are required that are parallel to those of conventional data even though, taken individually, these requirements are typically simpler than for those for transactional data. Conversely, governing the variety of different types of data that may be being analysed will typically require a more agile approach. Nevertheless, it is not really more complex or complicated. Unfortunately, this is not the case when it comes to integration.

Take smart meters as an example. You collect the data to feed your sales invoicing and your CRM system as well as to support capacity planning in your power stations (if you are an electricity generator). In addition, you will want reconciliations between the smart data and billing systems to prevent leakage, you will want integration with fraud systems and you will need smart metering (error) data to feed into your service management applications. That's half a dozen different applications that you will want to feed from your smart meters; and there are probably some others that I haven't thought of (like loading the data into a data warehouse for analysis and subsequently archiving it).

That's an awful lot of integration to do and that's not even mentioning that you may have a streaming platform added into the mix if you want to do real-time analysis, or Cassandra if you want real-time trending as well, and/or Hadoop if you don't want either of these, plus your data warehouse and archiving platforms - but that's an aside. Some of this integration is going to be hard-wired but you're not going to hard-wire all of this, at least not to start with, so you're going to need ETL (extract, transform and load) or ELT, data federation and, quite possibly, data replication as well. And, of course, you need to manage all this: which means having the metadata to understand where and how these different integration techniques are used, and you are going to need lineage capabilities across this environment, which feeds into (if it is not actually a part of) your big data governance requirement.

To be fair, this issue will not be quite as severe in other big data environments where you are ingesting data purely for analytic purposes as opposed to environments where there are also transactional implications. For example, if you are analysing social media data then your integration requirements will be more limited (subject to the fact that there are many social media sites and you may be adding or changing the sites you derive data from on an ongoing basis) but, nevertheless, they will still be more complex than previously.

So, bearing in mind that I started off by saying that your governance environment needs to be more flexible for big data the same applies, at least sometimes in spades, to integrating it with your conventional environment.


Published by: IT Analysis Communications Ltd.
T: +44 (0)190 888 0760 | F: +44 (0)190 888 0761