In this post, we'll look at
Developing a data-driven software product is not only about analytics. In fact, there are three aspects required for a product to succeed: Consulting, Analytics, and Automation.
Let’s dive into each of them individually.
Consulting is about
It requires business-acumen and desire to understand the domain.
Analytics is about extracting insights from data to solve a given business problem. It requires in-depth knowledge in:
The focus here is to demonstrate via a proof of concept that the analytical solution adds value to the business.
Automation is about optimising & preparing the product for long-term use. The proof-of-concept is improved to production-level code, using software-engineering best-practices.
Underestimating this step might result in:
Automation requires software-engineering skill set tailored for data-driven products.
Data Scientists usually come from mathematical/statistics/operation-research/machine-learning background. Having business acumen, they are strong in both analytics and consulting. They live and breath for data insights and applying models to solve a business problem.
Data Engineers, on the other hand, come from software-engineering background. They live and breath for automation. They understand how to ship high-quality production-level code, including code readability, testability, architecture, DevOps, automated-deployment, robust ETL, etc.
Data engineers can also speed-up delivery of analytical parts by providing technical support for data scientists. For example:
Here's a summary of expected expertize in the three discussed aspects by role.
|Expected expertize||Data Scientist||Data Engineer|
NB: As applicable to any role, the more a data-engineer knows about analytics and consulting, the better. And vice versea, the more a data scientist knows about automation and engineering, the better.
This question should be simple simple to answer: Ask your team which activities they spend most of their time on.
If the answer includes mostly manual deployments, getting access to data, re-cleaning the data, code refactoring, application monitoring, dev-ops, fighting Spark/Hadoop/Kafka/Yarn issues, then you probably need an additional Data Engineer.
If the answer is modeling, feature-engineering, vizualisations, communicating with an internal customer, you are probably not in need of additional Data Engineer.
In the current job market, the demand for data engineers exceeds supply. In this context, there seems to be two viable options on how to get an additional data engineer to the team:
Become an attractive workplace so that data-engineers come to you: start open-source initiatives & analytical blogs, strengthen conference presence, start organizing local meetups.
This option is the harder one but it pays off in long-term.
Alternatively, hire software-engineers who are generalists. A strong generalist (e.g. a Python/Scala developer) would grasp the required stack fast and would be a great engineering complement to your existing team of data scientists. If needed, there are trainings available to help with transition.