At a recent analyst day in Hursley Labs, IBM outlined its current and future plans for Big Data. The company clearly sees a huge business opportunity in this market by helping customers extract value from new, complex and voluminous data sources to uncover deeper and richer insights that have previously not been possible before. However, embarking on a Big Data journey is a far from straightforward proposition not least because it requires organisations to consider new technology and architectural approaches to information management and analytics. While IBM’s extensive research expertise is being brought to bear successfully on the Big Data challenge through Infosphere BigInsights and InfoSphere Streams, the early stage of market development means the move towards an integrated technology stack and common developer tooling environment is still a work in progress. One that IBM clearly sees as its next Big Data challenge.
Leveraging Big Data requires a different information management approach
Big Data is one of the hottest trends in IT industry circles. Although overused as a buzzword it is generally characterised by the large, complex and rapidly growing volume of information that often remains untapped by existing analytical applications and data warehousing systems. Examples of this data include web traffic, high volume sensor data and social media information from web sites such as Twitter and Facebook. Managing and exploiting this data however provides significant added value to organisations by enabling them to discover and act on new and deeper insights.
IBM is no stranger to managing data having fleshed out an information management portfolio—mainly through acquisition—for data warehousing, ETL, data quality, data federation and master data management. However, Big Data is posing a different set of information management challenges, as its volume, variety and complexity continue to push the performance and scalability boundaries of current data warehousing and BI architectures.
As a result, IBM has tailored two product offerings to address this Big Data challenge: Infosphere BigInsights and Infosphere Streams. The former is used to ingest, process and analyse vast volumes of data for solving large scale business problems such as anti money laundering, customer sentiment analysis and weather impact analysis. The platform is based on open source Apache Hadoop, a distributed computing framework that supports parallel processing of large-scale unstructured data. In addition it is augmented with advanced text analytic capabilities and tools for provisioning, security, workflow and fault tolerance—designed to make BigInsights more of an enterprise-ready consideration. Infosphere Streams, on the other hand, provides a development platform for building and deploying applications using the Stream Processing Language (SPL). These applications enable the continual analysis of massive volumes of streaming data at ultra low latency levels such as those used for traffic management, fraud prevention or financial trading systems.
No one technology or platform can solve the Big Data challenge
Organisations leveraging next-generation technology and languages such as Hadoop and SPL are effectively managing and exploiting Big Data today. However their application also demonstrates that—at least in IBM’s view—Big Data challenges cannot be solved by a single platform or engine but instead need to employ a variety of technologies, components and architectures. While IBM’s vast Services organisation can help customers navigate this maze, the longer term aim for the company is to bring more commonality and integration across its existing Big Data portfolio. BigInsights and Streams have grown out of separate development projects and use different engines, UIs and tooling for Big Data applications. Bringing these together on a common platform will help reduce the cost of ownership and streamline the development process, but can simultaneously broaden the opportunities for mining all forms of Big Data.
Ease of use is also another factor governing uptake and so support for a visual development paradigm is one area IBM is working on to make its Big Data platforms more attractive to a broader base of developers.
At the same time Big Data needs to be framed in the context of an enterprise information architecture—one that not only provides a blueprint for bringing all this “stuff” together, but facilitates integration with other enterprise data stores and analytic tools to stop Big Data from becoming just another silo. While the company is building hooks to help bridge gaps across Big Data and existing information management environments—for example around Netezza and DB2—it still has a way to go. What’s needed in the longer term is a deeper and more consistent approach that enables organisations to integrate their Big Data and traditional data warehouse environments in a non-disruptive way and also allows them to analyse it (whether Big or otherwise) using standard BI and analytic tools such as Cognos and SPSS.
These characteristics and some early use cases point to an early phase of adoption in the market. Of those looking at Big Data most are in experimentation mode, testing out the concepts, design and technology. For the majority many are prepared to adopt a wait-and-see approach.