Insights

What Is Data Science, Anyway?

10 September, 2015

Touted as “the sexiest job of the 21st century,”[1] the definition of data science has been the subject of many articles. At Parrot Analytics, we prefer to define what data scientists do.

Generally, data scientists are responsible for extracting knowledge from data, which for us and many others usually means Big Data. To succeed, data scientists possess three general skills, famously defined by Drew Conway[2]:

Having these skills is necessary but not sufficient to doing data science. These skills must be utilized in a certain way; to see how, we consider the second and most important word[3] in the phrase: science.

Science is, above all else, a process by which we learn more about the world around us. A physicist may use data about atoms and quarks while a sociologist may look at data on certain groups of people, but both seek to understand more about their respective areas of research through what is commonly called the scientific method. Similarly, data scientists use whatever data is relevant to their field or company to gain new and valuable information about their area. This area can be an established research field, like medicine or politics, or it can be a new, industry-specific domain, like Parrot Analytics’ work in the realm of television. Regardless of the field for a data scientist, the scientific method generally looks like this:

  1. Define the question
    As Drew Conway states in his original diagram: “Questions first, then data.” To be valuable, this research question must be driven by the data scientist’s specific domain rather than by the tools, data or algorithms available.
  2. Define, then acquire data set
    Even if all the data is already available, it focuses and streamlines the analysis if the data scientist identifies exactly what is needed and extracts only that. Getting the data often requires computer science knowledge around APIs, big databases, etc.
  3. Process the data
    Up to 80% of a data scientist’s time[4] can be spend wrangling the data into a specific form in order for it to be useful in the analysis. “Hacking” skills (knowing enough computer science or programming to get the job done) can help at this stage.
  4. Exploratory data analysis
    After the data is in the proper format, it is important for the data scientist to spend some time understanding what the data contains: variables, patterns, outliers, basic statistics, etc. This step can also lead to refinements in the research question.
  5. Statistical predicting/modelling
    Armed with a specific research question and a good understanding of the data, the data scientist uses their mathematical and statistical knowledge to apply the appropriate algorithms to the data in order to get the results they need to answer the question.
  6. Interpret results
    This step is the crux of the data science process: based on these results, what can be learned about this particular domain? Such an analysis is impossible without a deep knowledge of the data acquired, the mathematics used, and the field the data scientist works in.
  7. Challenge results
    Reviewing one’s work is important in any creative process, and data science is no exception. What other conclusions could be drawn from this statistical test? Do these results and their interpretation make sense in a wider context?
  8. Communicate results
    Booz Allen Hamilton defines data science as “the art of turning data into actions.”[5] In order to turn the results of the scientific process to business actions, data scientists must effectively communicate these results to many other people, both technical and non-technical, in the company.

This scientific process is what data scientists do; who they are is harder to quantify. At Parrot Analytics, we find that data scientists are curious, creative and tenacious people. They are the type who will not only find an innovative solution to the problem, but will fight to make sure it has a meaningful impact on their organization and its domain area, in our case measuring global demand for TV content. In our opinion, these qualities and skills make data science the sexiest job around.

– Kayla Hegedus, Data Scientist

[1] https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/
[2] https://s3.amazonaws.com/aws.drewconway.com/viz/venn_diagram/data_science.html
[3] http://simplystatistics.org/2013/12/12/the-key-word-in-data-science-is-not-data-it-is-science/
[4] http://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html
[5] http://www.boozallen.com/insights/2013/11/data-science-field-guide

 

TV360 MONITOR

Monitor global content trends across 100+ platforms for 10,000+ TV shows in 46 markets and gain access to exclusive reports and analysis.

Why get TV360 MONITOR?

  • Rank 10,000+ TV shows in 46 markets and worldwide.
  • See the top 20 shows for each platform and its genres.
  • Get access to comprehensive TV and SVOD analytics.
  • Discover consolidated insights on global television trends.
  • Learn from exclusive reports and whitepapers.
  • Save time with one curated feed of TV industry news.