What’s in a name: Statistics vs Data Science

Whilst the Select team are very much missing working together in the office, there have been positives in the move to working more remotely. One of these is that many meetings and gatherings have moved online meaning that they are open to a wider pool of participants who many not normally have been able to attend in person. Last week one of our senior consultants, Jo, virtually joined a Royal Statistical Society (RSS) meeting hosted by the Glasgow local group discussing the differences between statistics and data science. Surprisingly, Jo wasn’t the furthest afield as one participant hailed from Perth, Australia!

The discipline of data science has boomed in the past decade along with increasing volumes of data that are being generated and stored. Those who have the skills to manipulate, analyse and visualise big data to gain insights are in demand. But a question that is asked repeatedly among statisticians, data scientists and others is whether statistics and data science truly are different? or whether, underneath the buzzwords of “machine learning” and “AI”, data science is a rebranding of statistics? This is not a new question; it has been the subject of many discussions, for example in one of our previous blogs, the media and other RSS events.

This meeting explored the differences and the commonalities of statistics and data science.

There are certainly a lot of techniques in common (for example classification and regression trees or clustering) and the two fields often have different names or terminology for the same techniques (for example Gaussian processing and kriging, hypothesis testing and A-B testing). Both roles also require good communication skills, programming skills and the ability to extract usable insights from data.

Some subtle potential differences, or differences in emphasis were also discussed. While statisticians have programming skills, data scientists may use a wider range of programming languages. Statisticians are possibly seen as being more focussed on assumptions than data scientists. Many of the traditional statistical techniques are interpretable whereas the more modern machine learning techniques such as random forests or neural network are ‘black boxes’, producing predictions rather than interpretations. But all that said, the two disciplines have more in common than there are differences. Both focus on applying analytical techniques to solve Real World problems with data. And while data science is seen as new and fashionable, a lot of the machine learning techniques applied by data scientists are statistical and commonly applied by statisticians in their work.