Visit ankitrathi.com now to:
— to read my blog posts on various topics of AI/ML
— to keep a tab on latest & relevant news/articles daily from AI/ML world
— to refer free & useful AI/ML resources
— to buy my books on discounted price
— to know more about me and what I am up to these days
Today, who needs a Data Engineer when everyone else wants to hire a Data Scientist?
Let me start with a real-time situation; a new enthusiastic data scientist joins a firm. He knows how to analyse data, how to build models around it, how to create data stories. Now, the business wants him to work on a use-case, data scientist understand the use-case and start looking around for data to work on. And he keeps on waiting because there is no ready-made data available, data is hidden across various data stores. Now, the data scientist needs help and here comes data engineer to his rescue.
“A Data Engineer is responsible for the creation, processing and maintenance of data pipelines which gives processed data that enables data scientists to work on their use-cases.”
So I would like to call ‘data science’ as ‘data science & engineering’ which gives a better idea of the engineering skills required in this field.
But not all organizations realizes that they require both roles and most of the time data scientists end up doing data engineering tasks most of their time.
Skills of a Data Engineer
An article from DataQuest mentions the following skills what a data engineer should have:
- Architecting distributed systems
- Creating reliable pipelines
- Combining data sources
- Architecting data stores
- Collaborating with data science teams and building the right solutions for them
Panoply has published a decent article on ‘How to Become A Data Engineer’ which also highlights the skills required for the role:
Data Scientists Vs Data Engineers:
In general, data scientists are great at advanced analytics and data engineers are good at programming front in general.
The differences between data engineers and data scientists are explained in the following article by DataCamp from the following aspects: responsibilities, tools, languages, job outlook, salary, etc.
Following the article on O’Really coins a term ‘Machine Learning Engineer’ for a role who fills the gap between a Data Scientist & Data Engineer.
Ratios of data engineers to data scientists
Even if an organization/department realizes that they need both roles, a common issue is to figure out the ratio of data engineers to data scientists. Considering that building data pipelines require more efforts, a common starting point is 2–3 data engineers for every data scientist.
Ankit Rathi is an AI architect, published author & well-known speaker. His interest lies primarily in building end-to-end AI applications/products following best practices of Data Engineering and Architecture.