A few weeks ago I was fortunate enough to attend a Data Scientist series event in London sponsored by EMC/Greenplum. Its aim was to bring like-minded people together to explore the topic of creating competitive advantage with Big Data analytics. As a very well-attended event it also provided a perfect opportunity to discuss the role of data scientists and similarly how – given the current advanced analytics skills shortage – companies should go about sourcing this next generation of analyst-cum-scientist.
Part of my discussions at the event focused on pinning down a description for a data scientist, a task that proved to be harder than you would think. Overall consensus however seemed to be that they broadly fall into two camps. Firstly, those that represent the purist view, where a data ‘scientist’ is responsible for experimenting, discovering and publishing research on new algorithms and analytic techniques, activities prevalent at academic institutions and data driven organisations such as Google and Yahoo; and, second, the data scientists who could be best described as more mainstream – those responsible for experimenting, creating new knowledge and developing insights for real world commercial business problems. My first impressions from the event were that many of the attendees naturally fell into this second category.
So what makes this type of data scientist? To begin with, it’s a role that needs to mix both old and new disciplines, tools and techniques and combine this with a healthy dose of analytical scepticism and curiosity! Similarly, it’s a role that needs to blend advanced analytic skills, such as those evident in projects for churn modelling, fraud detection, customer segmentation, risk mitigation or sentiment analysis – a set of disciplines that requires a deep knowledge and proficiency in areas such as predictive modelling, text analytics, machine learning and data mining.
In my experience any project involving these more specialised skills requires a person who is comfortable with preparing and exploring data, has an aptitude for applying analytic techniques and algorithms to data and is highly competent at discovering and identifying meaningful patterns in that data. In many cases this person is a statistician or mathematician, someone who is numerically sophisticated and data savvy; but someone who is also (affectionately) known, due to their clinical data analysis approach, as ‘the guy or gal in a white coat’ – a fleeting reference to their laboratory style existence perhaps.
While these traits provide a solid grounding for the data scientist there are other qualities, skills, experience and abilities that need to be brought to bear if a data scientist is to become more than just a re-badged statistician. Above all the data scientist’s role needs to become more critical and central to the end-to-end analytics process, something that can be enabled by;
- Building expertise in handling large volumes and different styles of digital data, such as unstructured and real time data
- Becoming conversant in new analytic technologies, architectures and languages – where necessary – for storing, processing and manipulating this type of data.
- Developing a flair for the exploratory and experimental side of the role; required to tease out interesting and previously unknown insights in vast pools of data. The best data scientists tend to be both intensely curious and compentent communicators.
- Enhancing their ability to participate in the analytic process. Rather than work in isolation, the success of a data scientist’s analytical prowess also hinges on working in unison with other members of the analytics team, whether this is the data engineer, DBAs, programmers or business analysts.
- Bringing a high degree of data domain experience or business know-how to the table so that analytical insights can be effectively applied to real world business problems or opportunities.
It’s perhaps the latter three areas of skill in this list that more clearly differentiates the work of a data scientist over previous incarnations, but in many ways, it also illustrates a more subtle and generational difference between an old and new style of data analysis; where different and modern technologies and languages such as Hadoop, R, Perl and Python are increasingly being put to work on Big Data Analytic problems.
Of course the big question surrounding data scientists is how do you source, train and retain these highly valuable individuals? Given that data science is an evolving discipline it’s no surprise to find that many practitioners often move from adjacent areas such as mathematics, software engineering or computer science for instance. But it’s a problem that is also exacerbated because of the limited amount of graduate and post graduate courses dedicated to advanced analytics and/or data science – although this is changing. Many of those looking for alternative strategies are starting by targeting their efforts internally and investing in home-grown talent. While there’s no silver bullet to the skills shortage problem casting your net further could yield valuable results, whether this through recruiting different types of graduate, internal training or sourcing talent through a third party. And for those of you lucky enough to have these talented individuals already on board then my words of advice would be – hold onto and nurture them and above all don’t let them go!