Touted as “the sexiest job of the 21st century,”[1] the definition of data science has been the subject of many articles. At Parrot Analytics, we prefer to define what data scientists do.
Generally, data scientists are responsible for extracting knowledge from data, which for us and many others usually means Big Data. To succeed, data scientists possess three general skills, famously defined by Drew Conway[2]:
Having these skills is necessary but not sufficient to doing data science. These skills must be utilized in a certain way; to see how, we consider the second and most important word[3] in the phrase: science.
Science is, above all else, a process by which we learn more about the world around us. A physicist may use data about atoms and quarks while a sociologist may look at data on certain groups of people, but both seek to understand more about their respective areas of research through what is commonly called the scientific method. Similarly, data scientists use whatever data is relevant to their field or company to gain new and valuable information about their area. This area can be an established research field, like medicine or politics, or it can be a new, industry-specific domain, like Parrot Analytics’ work in the realm of television. Regardless of the field for a data scientist, the scientific method generally looks like this:
- Define the question
As Drew Conway states in his original diagram: “Questions first, then data.” To be valuable, this research question must be driven by the data scientist’s specific domain rather than by the tools, data or algorithms available. - Define, then acquire data set
Even if all the data is already available, it focuses and streamlines the analysis if the data scientist identifies exactly what is needed and extracts only that. Getting the data often requires computer science knowledge around APIs, big databases, etc. - Process the data
Up to 80% of a data scientist’s time[4] can be spend wrangling the data into a specific form in order for it to be useful in the analysis. “Hacking” skills (knowing enough computer science or programming to get the job done) can help at this stage. - Exploratory data analysis
After the data is in the proper format, it is important for the data scientist to spend some time understanding what the data contains: variables, patterns, outliers, basic statistics, etc. This step can also lead to refinements in the research question. - Statistical predicting/modelling
Armed with a specific research question and a good understanding of the data, the data scientist uses their mathematical and statistical knowledge to apply the appropriate algorithms to the data in order to get the results they need to answer the question. - Interpret results
This step is the crux of the data science process: based on these results, what can be learned about this particular domain? Such an analysis is impossible without a deep knowledge of the data acquired, the mathematics used, and the field the data scientist works in. - Challenge results
Reviewing one’s work is important in any creative process, and data science is no exception. What other conclusions could be drawn from this statistical test? Do these results and their interpretation make sense in a wider context? - Communicate results
Booz Allen Hamilton defines data science as “the art of turning data into actions.”[5] In order to turn the results of the scientific process to business actions, data scientists must effectively communicate these results to many other people, both technical and non-technical, in the company.
This scientific process is what data scientists do; who they are is harder to quantify. At Parrot Analytics, we find that data scientists are curious, creative and tenacious people. They are the type who will not only find an innovative solution to the problem, but will fight to make sure it has a meaningful impact on their organization and its domain area, in our case measuring global demand for TV content. In our opinion, these qualities and skills make data science the sexiest job around.
– Kayla Hegedus, Data Scientist
[1] https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/
[2] https://s3.amazonaws.com/aws.drewconway.com/viz/venn_diagram/data_science.html
[3] http://simplystatistics.org/2013/12/12/the-key-word-in-data-science-is-not-data-it-is-science/
[4] http://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html
[5] http://www.boozallen.com/insights/2013/11/data-science-field-guide