Data acquisition and analysis
The rise and rise of data science
What is this thing called data science, where has it come from, and why is it so popular?
Institute of Physics Fellow, Dr Tamara Clelford, explores the journey of data science from its roots in physics and astronomy to its place today in business and finance.
Back in 2001, the term ‘data science’ was first used in a publication by William Cleveland. Fast forward to today and every business wants to employ data scientists. What is this thing called data science, where has it come from, and why is it so popular? Read on and I will reveal all.
Scope of data science
A one-line definition of what a data scientist does is a difficult thing to write. The job has grown to encompass:
- analysis of trends
- machine learning
- artificial intelligence
- numerical analysis
- words analysis
- business analytics
- clicks on a web site analysis
Extracting knowledge and insights
Data science can mean different things in different roles. Instead of a one-line definition, think more in terms of a short discussion.
Data Science extracts knowledge and insights from data, often information that would not be available from just one set of data.
The roots of data science are in scientific methods and algorithms and these are rigorously used. The output has to be verified as plausible.
Most data scientists spend the majority of their time cleaning data before it can be used. This is to remove the incorrect and partial data.
It is multi-disciplinary and often collaborative, being part:
- computer science
- big data
To be a good data scientist you have to understand all these aspects.
Roots of data science
The roots of data science date back to the areas of physics and astronomy where large data sets were first collected and scientifically analysed. There are many good examples of data science in the world of physics.
One is the work of Tycho Brahe and Johannas Kepler. Tycho took superbly accurate astronomical observations by eye. He lived in the 16th century, before telescopes. His measurements were five times more precise than his contemporaries and there were a lot of them. He had created an accurate clean data set.
Tycho collaborated with Kepler and, using numerical rigour, the observations were turned into scientific discoveries. Kepler used Tycho’s observations, scientific methods and algorithms to develop his laws of planetary motion.
Measuring the distance to the Sun
Another good example, is measuring the distance from the earth to the sun. Today, this is a relatively simple task using radar or lasers. This was not the case when measurements were first starting to be made by Archimedes in the 3rd century BC.
We continued to try to measure this distance in earnest, gaining a wide range of different results, until Simon Newcome came along in 1895. You guessed it. He applied data science techniques to get an incredibly accurate answer with quite limited resources.
The main difference in Newcome's approach was to use multiple data sets. He did not only use the really popular transit of Venus to get an answer. He also used aberration, the speed of light and the Gaussian Gravitational constant.
Newcome made sure the data sets he was working with were clean and he collaborated with other scientists to ensure proper scientific rigour in his work. This all paid off with a highly accurate calculated value of 0.9994AU. It would take another 50 years for us to get closer to the correct answer.
More recent examples include the way large particle physics projects analyse their data, and places like the Conseil Européen pour la Recherche Nucléaire (CERN), which generates huge amounts of data, and how they handle work in a consortium.
These analytics skills, often honed in physics or astrophysics degrees, are becoming applicable to the world outside academic physics. Not only applicable, but highly desirable and sought after.
If you come along to one of my talks, you will experience the thrill of a live code, where I write a simple machine learning algorithm to predict if a person would have survived the voyage of the Titanic. You will see first-hand the kind of work that data scientists do. Written in Python and using open source data from Kaggle, I will take you through:
- the initial look at the data
- how to clean it
- how to build a machine learning algorithm
- how to interpret the results
Business and data science
Because data science identifies and predicts trends, it has become a staple in the financial world. It is quickly spreading across the business world, building tools such as:
- recommender engines
- business analytics
- online advertising
People in different types of analytics roles are being re-branded as data scientists to keep up with the demand. Exactly where this sudden demand and response will take the world of data science will be interesting to see and will unfold over the next few years.
Dr Tamara Clelford is the founder of Swamphen Enterprises and works as a consultant in data science, antennas and radio frequency engineering. Alongside technical consulting Tamara is a regular speaker and teaches adults about coding and data science.
Read blogs by Dr Clelford.
The author would like to thank the Institute for collaborating with Swamphen Enterprises on this piece.