Skip to main content
Marcel Krčah

Should you hire a data engineer instead of a data scientist?

Published on , in ,

In this post, we'll look at

Three aspects for any data-driven product #

Developing a data-driven software product is not only about analytics. In fact, there are three aspects required for a product to succeed: Consulting, Analytics, and Automation.

Let’s dive into each of them individually.

Aspect #1: Consulting #

Consulting is about

It requires business-acumen and desire to understand the domain.

Aspect #2: Analytics #

Analytics is about extracting insights from data to solve a given business problem. It requires in-depth knowledge in:

The focus here is to demonstrate via a proof of concept that the analytical solution adds value to the business.

Aspect #3: Automation #

Automation is about optimising & preparing the product for long-term use. The proof-of-concept is improved to production-level code, using software-engineering best-practices.

Underestimating this step might result in:

Automation requires software-engineering skill set tailored for data-driven products.

Data Scientists vs. Data Engineers #

Data Scientists usually come from mathematical/statistics/operation-research/machine-learning background. Having business acumen, they are strong in both analytics and consulting. They live and breath for data insights and applying models to solve a business problem.

Data Engineers, on the other hand, come from software-engineering background. They live and breath for automation. They understand how to ship high-quality production-level code, including code readability, testability, architecture, DevOps, automated-deployment, robust ETL, etc.

Data engineers can also speed-up delivery of analytical parts by providing technical support for data scientists. For example:

Skills mapping: Summary #

Here's a summary of expected expertize in the three discussed aspects by role.

Expected expertize Data Scientist Data Engineer
Consulting :medal: Strong Basic
Analytics :medal: Strong Medium
Automation/Engineering Basic :medal: Strong

NB: As applicable to any role, the more a data-engineer knows about analytics and consulting, the better. And vice versea, the more a data scientist knows about automation and engineering, the better.

Do you need a Data Engineer? #

This question should be simple simple to answer: Ask your team which activities they spend most of their time on.

If the answer includes mostly manual deployments, getting access to data, re-cleaning the data, code refactoring, application monitoring, dev-ops, fighting Spark/Hadoop/Kafka/Yarn issues, then you probably need an additional Data Engineer.

If the answer is modeling, feature-engineering, vizualisations, communicating with an internal customer, you are probably not in need of additional Data Engineer.

How to find Data Engineers #

In the current job market, the demand for data engineers exceeds supply. In this context, there seems to be two viable options on how to get an additional data engineer to the team:

Option #1: Become an attractive workplace #

Become an attractive workplace so that data-engineers come to you: start open-source initiatives & analytical blogs, strengthen conference presence, start organizing local meetups.

This option is the harder one but it pays off in long-term.

Option #2: Turn generalists into specialists #

Alternatively, hire software-engineers who are generalists. A strong generalist (e.g. a Python/Scala developer) would grasp the required stack fast and would be a great engineering complement to your existing team of data scientists.

More resources #

This blog is written by Marcel Krcah, an independent consultant for product-oriented software engineering. If you like what you read, sign up for my newsletter