Evolution of Data Engineering Field

How data engineering field has evolved over the years?

Initially, there were databases (DBs) to support day-to-day business operations.

For analytical needs of how the business is performing,

leaders used to get reports on their business’s performance (BI) via ad-hoc queries.

But this was difficult to maintain and manage,

data warehouse (DWH) emerged to streamline the above problem.

Changing ER schema to Star/Snowflake schema required transformation called extract, transform and load (ETL).

That’s how DB developers, BI engineers, ETL developers, DWH engineers emerged in the data ecosystem.

As the internet went mainstream, vendors emerged to tackle web applications and backend systems,

but their infrastructure remained expensive, monolithic, and heavily licensed.

As data grew, it pushed the limits of traditional DBs and DWHs,

it became evident that data systems need to be cost-effective, scalable, available, and reliable.

At the same time, commodity hardware also became cheap,

Google introduced GFS, Yahoo developed Hadoop, Amazon launched the public cloud,

data started being called big data, various tools emerged with Hadoop ecosystem to satiate specific data needs.

Big data gained immense hype, every business started looking for big data engineers,

but data ecosystem became complicated with high administrative overhead.

Recently, open-source and cloud vendors (AWS/Azure/GCP) started looking for ways to simplify all of this,

which made big data processing so accessible that big started mean nothing,

and big data engineers are being called simply data engineers these days.

Data engineering is still evolving rapidly, the trend is moving from a monolithic framework to a decentralized one.

If you like this post, follow me and stay tuned for interesting stuff on cloud data technology.

#dataengineering #datascience #bigdata #dataanlytics #etl #dwh #bi #cloud #aws #gcp #azure #database


Ankit Rathi is a Cloud Data Technologist, published author & well-known speaker. His interest lies primarily in building end-to-end data/AI applications/products following best practices of Data Engineering and Architecture.

If you have any questions or comments, click the "Go To Discussion" button below!