Data Science Roles

In this blogpost we talk about the different roles and their skills we find in the Data Science world. There is overlap between them and I am sure that you could come up with even more roles, but I guess the ones below give a pretty good overview.

How matter you look at it, “Data Science is a team sport” as DJ Patil stated about a year ago in this article.


The Data Scientist

Roles

  • Design experiments: designs and executes the different experiments.
  • Get and clean data: retrieving the data from the data sources. Clean up the data sets and make sure they are useful for further processing.
  • Analyze data: This is also called exploring the data. This part is important to see early patterns or trends.
  • Communicate the results: after the different experiments, the data scientist has to communicate in a simple and clear way the results using different techniques and tools.

Skills

  • Statistics (inference)
  • Machine learning (supervised and unsupervised algorithms)
  • Data analysis
  • Data communication
  • Software engineering and programming (mostly R or Python)

Spirit

  • Willing to find answers
  • Not intimidated by new data
  • Willing to say I don’t know

The Data Engineer

Roles

  • Build data infrastructure: create, setup and configure the different server environments. This can be on-premise or in the cloud (for example Azure).
  • Manage data storage and use: keeps an eye on the data sources and makes sure they are kept in good condition and accessible in a performing way.
  • Implement production tools: installs and configures the different tools that are used for data science.

Skills

  • Hardware knowledge
  • Databases
  • Data processing at scale (streaming)
  • Software engineering

Spirit

  • Willing to find answers
  • Knows a bit of data science
  • Works well under pressure

The Data Manager

Roles

  • Builds a data team: The data manager is responsible for the team, he manages the data scientists and data engineers.
  • He sets goals and priorities for the team.
  • Manages data science process: creating the way the data science process is carried out.
  • Interacts with other groups. It is important to communicate with the stakeholders and business people to make sure the questions and answers are clear and unambiguous.

Skills

  • Knowledge of software/hardware
  • Knowledge of roles
  • Knows what can and can’t be achieved
  • Strong communication
  • Data science
  • People Management

Spirit

  • Knowledgeable
  • Supportive
  • Thick skinned

TJA