Should you hire a data engineer instead of a data scientist?

Published on 01 November 2017, in #data-engineering, #collaboration

In this post, we'll look at

which aspects are required to develop a data-driven software product,
how data scientists and data engineers fit these aspects,
how to detect if your team needs a data engineer, and
how to find a data-engineer for your team, if you need one.

Three aspects for any data-driven product #

Developing a data-driven software product is not only about analytics. In fact, there are three aspects required for a product to succeed: Consulting, Analytics, and Automation.

Let’s dive into each of them individually.

Aspect #1: Consulting #

Consulting is about

translating the business problem into analytical terms,
collaborating on a business-case, and
validating outcomes with (internal) business clients.

It requires business-acumen and desire to understand the domain.

Aspect #2: Analytics #

Analytics is about extracting insights from data to solve a given business problem. It requires in-depth knowledge in:

preparing and cleaning the data/features,
simple models like excel and data exploration/vizualization,
complex models to reach out for if needed, like machine-learning, statistics and simulations.

The focus here is to demonstrate via a proof of concept that the analytical solution adds value to the business.

Aspect #3: Automation #

Automation is about optimising & preparing the product for long-term use. The proof-of-concept is improved to production-level code, using software-engineering best-practices.

Underestimating this step might result in:

low confidence in the product from end-users, due to many unexpected bugs,
an unexpected loss of production data,
unexpected long-term service unavailability,
inability to promptly add a feature/fix an issue.

Automation requires software-engineering skill set tailored for data-driven products.

Data Scientists vs. Data Engineers #

Data Scientists usually come from mathematical/statistics/operation-research/machine-learning background. Having business acumen, they are strong in both analytics and consulting. They live and breath for data insights and applying models to solve a business problem.

Data Engineers, on the other hand, come from software-engineering background. They live and breath for automation. They understand how to ship high-quality production-level code, including code readability, testability, architecture, DevOps, automated-deployment, robust ETL, etc.

Data engineers can also speed-up delivery of analytical parts by providing technical support for data scientists. For example:

create a discovery platform for data scientists (e.g. IPython notebooks automatically hooked to a cleaned data warehouse representing a single source of truth),
provide infrastructure improvements for data scientists (e.g. a common Docker image with all data-science tooling installed),
create shareable libraries, boilerplates, etc.

Skills mapping: Summary #

Here's a summary of expected expertize in the three discussed aspects by role.

Expected expertize	Data Scientist	Data Engineer
Consulting	:medal: Strong	Basic
Analytics	:medal: Strong	Medium
Automation/Engineering	Basic	:medal: Strong

NB: As applicable to any role, the more a data-engineer knows about analytics and consulting, the better. And vice versea, the more a data scientist knows about automation and engineering, the better.

Do you need a Data Engineer? #

This question should be simple simple to answer: Ask your team which activities they spend most of their time on.

If the answer includes mostly manual deployments, getting access to data, re-cleaning the data, code refactoring, application monitoring, dev-ops, fighting Spark/Hadoop/Kafka/Yarn issues, then you probably need an additional Data Engineer.

If the answer is modeling, feature-engineering, vizualisations, communicating with an internal customer, you are probably not in need of additional Data Engineer.

How to find Data Engineers #

In the current job market, the demand for data engineers exceeds supply. In this context, there seems to be two viable options on how to get an additional data engineer to the team:

Option #1: Become an attractive workplace #

Become an attractive workplace so that data-engineers come to you: start open-source initiatives & analytical blogs, strengthen conference presence, start organizing local meetups.

This option is the harder one but it pays off in long-term.

Option #2: Turn generalists into specialists #

Alternatively, hire software-engineers who are generalists. A strong generalist (e.g. a Python/Scala developer) would grasp the required stack fast and would be a great engineering complement to your existing team of data scientists.

More resources #

← Previous post: Supercharge your team
→ Next post: From a model to production: Software-engineering best-practices

This blog is written by Marcel Krcah, an independent consultant for product-oriented software engineering. If you like what you read, sign up for my newsletter