Of all the conversations and discussions held at Salesforce’s customer day in Egham this week, one in particular struck a chord with me. Cloudapps, an ISV and partner of Salesforce, spoke briefly about the challenges of capturing, storing and analysing Big Data, in particular telecom mast sensor data, on the Force.com platform. Although there wasn’t time to drill into detail about Force.com’s Big Data support – it got me thinking generally about the wider options for exploiting vast amounts of digital data within the cloud.
As a lot of the Big Data that people are interested in analysing – such as sensor data, social media data or weblog file data – already lives in the cloud, it makes sense to also use it as a platform for hosting and analysing this data too. Rather than the often more inefficient method of pushing it to an on-premise enterprise data warehouse for example. At the same time one set of technologies in particular – Hadoop – is also becoming synonymous as a lower cost approach for storing and processing these large-scale datasets in the cloud.
Given the proliferation of digital data and the desire to harness it for better business effect, it’s not surprising that Big Data and the Cloud are on a natural collision course. With its elastic processing capacity, lower cost and lower risk approach, the cloud provides a powerful platform for storing, processing and crunching this data – whether using Hadoop or not. Similarly the emergence of cloud based offerings that mix analytics and Big Data in the public cloud are also helping to circumnavigate some of the skills shortage issues relating to advanced analytics techniques and Big Data technologies.
However that’s not to say that the cloud should be seen as the answer to all your Big Data needs. On-premise data warehouses that employ technologies such as MPP analytic databases, in-memory computing, columnar databases or packaged appliances provide equally valid alternatives. In fact in many cases these Big Data approaches should be seen as complementary to each other, as each brings different strengths to the table. The challenge however for organisations that haven’t put their eggs in any one particular Big Data basket, is how to mesh these approaches together and equally how to do this across the on-premise and cloud divide. The benefits of integrating data to support a more consolidated, complete and accurate view of your business are well known after all.
As we outlined in our recent report on Analytics in the Cloud, a hybrid cloud is one plausible approach to using both public clouds and private or on-premise IT to deliver a more integrated Big Data analytic system. This can provide a more pragmatic and blended approach for balancing the strengths and pitfalls of both cloud and on-premise implementations but it also comes with its own set of challenges. Apart from managing the environment there are also factors relating to the immaturity of certain Big Data technologies, lack of best practice and interoperability across platforms. For example, of those looking at Big Data Hadoop projects a significant proportion are still in experimentation (rather than production) mode, testing out the concepts, design and technology -although we do expect this to change over the next 12-18 months as the market evolves.
Given the great many opportunities for leveraging Big Data in the cloud, it’s surprising to see that aside from its social media monitoring platform Radian6, Salesforce doesn’t have a stronger message or story about its Big Data hosting capabilities. It appears from the conversations at the customer event the company still has something to prove when it comes to supporting and helping ISVs and developers work with Big Data especially on the Force.com platform. But as a company used to pioneering cloud based offerings we don’t expect this to be the situation for very long.