Data Lake (DL): a storage repository that holds a vast amount of raw data in its native format until it is needed

DWH Vs DL

Data Lakes will be central to the modern data architecture because of these features:

  • Agility: ability to convert data >> information >> action
  • Insight: ability to give business insights
  • Scalability: ability to accommodate data growth

All data is welcome:

  • Stores all type of data: structured, semi-structured & unstructured
  • Stores raw data in its original form for extended period of time
  • Uses various tools to correlate, enrich & query for the insights on the data
  • Provides democratized access via single unified view across the Enterprise

Traditional Data Architecture

Sources >> ETL >> EDW >> Data Discovery/Analytics/BI

Modern Data Architecture

Streaming/Unstructured/Various Sources >> Data Lake (Derived/Discovery Sandbox) >> EDW >> Data Science/Data Discovery/Analytics/BI

Data Lake Challenges & Complications

In Building:

  • Rate of change in data sources
  • Skill gap in the industry
  • Complexity involved in accommodating different data sources

In Managing:

  • Ingestion of different data sources
  • Lack of visibility for future requirements
  • Privacy & Compliance related

In Delivering:

  • Quality Issues with data
  • Reliance on IT
  • Reusability of data

Approach for Data lakes

Enable the Data Lake

  • Ingest the data
  • Organize the data
  • Register in Catalog

Govern the data in the lake

  • Cleanse the data
  • Secure the data
  • Operationalize the data

Engage with business

  • Discover the data
  • Enrich the platform
  • Provision the data sources

Data Lake Reference Architecture

Data Lake Management Platform

  • Unified Data Management
  • Managed Ingestion
  • Data Reliability
  • Data Visibility
  • Data Privacy & Security

Getting Started

References

Building a Modern Data Architecture

DWH Vs DL


Ankit Rathi is an AI architect, published author & well-known speaker. His interest lies primarily in building end-to-end AI applications/products following best practices of Data Engineering and Architecture.

Why don’t you connect with Ankit on YouTube, Twitter, LinkedIn or Instagram?