We would have been processing Big Data long ago had it been possible. There have been many factors that contributed to making it possible. I'll list the technical factors in the order in which they come to mind:
- Scale-out database technology: Scale out architectures have been around for quite a while, so there's more to this than just the appearance of column store databases, but that's part of it. The point is that the old relational database architecture that depended on a row-oriented storage of data didn't scale out well. When the column store databases (Netezza, Vertica, etc.) appeared, it was possible to scale out to large data volumes and still deliver performance. Larger volumes of data could be accessed quickly.
- Hadoop: First Hadoop was Open Souce and free, second it had scale-out parallelism built in. Naturally companies began to experiment with it. With the advent of UNIX we lost the availability of key-value stores which were always useful. With Hadoop, a key-value store, the capability returned. Hadoop also became an ecosystem. Not particularly fast for most of what it does, but very versatile for anything that smacked of Big Data.
- The Cloud: It took a while for cloud providers to think of Big Data as an opportunity, but they cottoned on. The point was that you could assemble a grid of servers much more quickly than if you bought physical servers, and it might be cheaper too. There was still the problem of getting large volumes of data into the cloud, but it wasn't an insuperable barrier. The cloud enables Big Data applications, but not the biggest.
- Hardware: The fall in hardware costs (per unit of power) continues apace. This means smaller grids of servers for Big Data of any given volume. It's not just multicore processors, it's also the fall in memory costs and more configurable memory, plus flash storage and also hybrid flash (see here for an example) and also, software that enables the use of grids. See also Big Data and In-Memory: Are They Related? for a discussion on the relevance of in-memory to data.
- Data Analytics evolution: Most Big Data applications are analytics applications. There has been an explosion in software in this area, including the advent of Mahout and KNIME as Open Source capabilities and the explosion in the use of the R language. All of these work with Hadoop.
Those are probably the major factors but they have conspired to create a vibrant technology ecosystem, so many products have piled in on top of these to amplify the noise around Big Data and its capability.
Oh, and there's usually business advantage to be gained in mining those terabytes and petabytes, but there was always business advantage in data mining.
We automatically stop accepting comments 180 days after a post is published. If you would like to know more about this subject, please contact us and we'll try to help.
29th March 2013: 'Joe Clabby' said:
Nicely done Robin. I've been stating for a few years now that the rise of Big Data was largely due to declining hardware costs as well as application simplification. You take this even further with your analysis of Hadoop, the move to columnar stores, and to the cloud.
Well reasoned -- nice job.
Reply to Joe Clabby?