We all know that Big Data – in all its forms – presents a potential treasure trove of information that organisations can use to extract business value and insight. At the same time it’s become fully apparent that this also poses several technical challenges too, not least the ability to crunch and analyse all of this data in a timelier and faster manner.
One such vendor espousing the benefits of a performant Big Data infrastructure is SAS, through its High-Performance Analytics (HPA) initiative. As described on its website HPA is ‘an optimal way to gain insights from big data in shorter reporting windows’ or in other words ‘getting to the relevant data quicker’. This obviously begs the question: “well what exactly is behind HPA?” Drilling a little deeper we can see it combines a number of different technology pieces each designed to target particular Big Data pain points, whether this concerns speeding up and improving the analytic modelling process, scaling to larger data volumes or analysing a full set of data as opposed to working with just a sample.
In essence there are three core components to HPA comprising:
- In-memory analytics. Like many of its rivals, such as SAP with HANA and Oracle with Exalytics, SAS now has an in-memory component for its analytics portfolio. Having said that the company is also keen to differentiate itself from other more SQL-centric in-memory approaches as it applies its technology across the analytic lifecycle; in particular, model development and deployment but also in areas such as end-user visual exploration. It works by providing a distributed in-memory computing environment that divides analytic intensive tasks such as text and data mining operations into manageable pieces and distributes these computations in parallel across a dedicated set of blades. The technology also works as performance accelerator across some of its industry specific analytic applications in areas such as risk management and retail markdown optimisation.
- Next up there’s Grid Computing. Introduced as part of the SAS9 BI platform around six years ago using technology from partner Platform Computing, SAS’s gird computing offering provides workload balancing, high availability, and parallel job execution across multiple servers, with shared physical storage to process large volumes of data and analytics programs. It is primarily targeted at helping IT organisations build and manage a more flexible SAS infrastructure that can scale effectively.
- And finally there’s In-database processing. This initiative, started around four years ago, targets IT departments with big database footprints, and works by pushing analytic intensive SAS tasks such as data preparation, model development or scoring, inside the database. This not only enables the SAS environment to leverage the processing power of the database platform but also helps minimise data movement. To date SAS has worked with Teradata, IBM, Greenplum and Oracle to translate its SAS functions and procedures into SQL for execution in the database.
The real challenge
In isolation these three technology components do help boost the performance of a SAS environment. However having the technology is one thing, but understanding what scenarios they are applicable to and equally what parts of the HPA portfolio you should use for each scenario is of course the challenge for most organisations. It appears that there is no clear cut answer especially as performance requirements are often unique to each organisation. That said we believe the sweet spot for both in-memory and in-database at least, centres on speeding up and improving the iterative process of data acquisition, data analysis, variable selection, modeling and model assessment.
And there are good reasons why you might want to do this. Speeding up the model deployment process for example can allow a bank to determine the level of credit exposure in its consumer-lending portfolio on a more timelier basis, similarly they can calculate the probability of a loan default in a much quicker time frame or evaluate alternate scenarios and build recommendations when market conditions suddenly change. In a business environment where time always equals money, all these improvements can significantly impact the bottom or top line. Similarly by accelerating the model development process business analysts or statisticians can handle more variables, employ more complex modeling techniques, perform more model iterations and build more accurate and finely tuned models.
The bottom line for enterprises
While this all sounds well and good, organisations also need to mindful of the fact that applying HPA to analytic processes doesn’t necessarily make them any easier, just a hell-of-a-lot quicker. They do, for example, still need to apply the same best practice and rigour to the modeling process to ensure models are fit-for-purpose and there’s an on-going process of model improvement and management. If you don’t get the basics right then HPA is only going to speed up sub-optimal processes or at worst, make the wrong things happen faster.
By the same token we also believe SAS needs to make it a lot clearer about how organisations plan for HPA. It needs to explain the scenarios in which you would use one HPA technology piece over another, or equally when you might use them in combination. At the moment it feels like HPA purely serves as an umbrella term for bringing together some disparate technologies that are only unified by their purpose of speeding up performance. Having a more modular and flexible approach that allows organisation to pick and choose the level of performance they require and move and and down gear (to use an analogy) as needs change should, we believe, be a longer term aim for SAS.
But tell me what you think?