Data Lakes in Modern Data Architecture

Data Lake (DL): a storage repository that holds a vast amount of raw data in its native format until it is needed

DWH Vs DL

Data Lakes will be central to the modern data architecture because of these features:

  • Agility: ability to convert data >> information >> action
  • Insight: ability to give business insights
  • Scalability: ability to accommodate data growth

All data is welcome:

  • Stores all type of data: structured, semi-structured & unstructured
  • Stores raw data in its original form for extended period of time
  • Uses various tools to correlate, enrich & query for the insights on the data
  • Provides democratized access via single unified view across the Enterprise

Traditional Data Architecture

Sources >> ETL >> EDW >> Data Discovery/Analytics/BI

Modern Data Architecture

Streaming/Unstructured/Various Sources >> Data Lake (Derived/Discovery Sandbox) >> EDW >> Data Science/Data Discovery/Analytics/BI

Data Lake Challenges & Complications

In Building:

  • Rate of change in data sources
  • Skill gap in the industry
  • Complexity involved in accommodating different data sources

In Managing:

  • Ingestion of different data sources
  • Lack of visibility for future requirements
  • Privacy & Compliance related

In Delivering:

  • Quality Issues with data
  • Reliance on IT
  • Reusability of data

Approach for Data lakes

Enable the Data Lake

  • Ingest the data
  • Organize the data
  • Register in Catalog

Govern the data in the lake

  • Cleanse the data
  • Secure the data
  • Operationalize the data

Engage with business

  • Discover the data
  • Enrich the platform
  • Provision the data sources

Data Lake Reference Architecture

Data Lake Management Platform

  • Unified Data Management
  • Managed Ingestion
  • Data Reliability
  • Data Visibility
  • Data Privacy & Security

Getting Started

References

Building a Modern Data Architecture

DWH Vs DL


Ankit Rathi is an AI architect, published author & well-known speaker. His interest lies primarily in building end-to-end AI applications/products following best practices of Data Engineering and Architecture.

Why don’t you connect with Ankit on YouTube, Twitter, LinkedIn or Instagram?

If you have any questions or comments, click the "Go To Discussion" button below!