Cloud

Multi-hop architecture Azure

Multi-hop architecture is a design approach for organizing data in the Delta warehouse. Multi-hop architectures belong to the medallion architecture data design pattern. The main goal of this architecture is to incrementally and progressively improve the structure and quality of data.

The architecture consists of three main layers: Bronze, Silver, and Gold. The pre-landing layer can also be included if the data needs to be copied from the client system into the delta platform.

MultiHop Architecture
MultiHop Architecture

This approach is quite similar to the traditional ETL approach wherein the data travels through different layers like pre-landing, staging, and landing.

Pre-Landing Layer:

The purpose of this layer is to fetch the data in raw form. Data is collected from different sources like CSV, database, etc, and inserted into Azure Data Lake Storage Gen2 using ADF/Databricks workflows.

Data from the source is extracted in parquet format. The folder structure can be designed based on the source name. Parquet format offers several advantages like data compression and decompression, increased data throughput, and performance.

This layer is totally optional as the data from the source can be directly inserted into the bronze layer.

Bronze Layer(raw data):

The bronze layer is where we land all the data from external source systems. The data in this layer is truncated and loaded. This layer contains the latest data from the source depending on how we are fetching data from the source be it incremental or full load.

The transformations on the data like Julian check, Null check, and deduplication are performed in this layer.

Silver Layer(cleaned and conformed data):

Silver layer does the data combination of the received data. Whether the load type is a full load or incremental, the data from the bronze layer is directly merged with the silver layer using Merge queries.

Gold Layer(curated business level tables):

Gold layer acts as the data fabrication layer. This layer provides all aggregations and extra calculations based on specific source requirements.

This layer is for reporting and uses more de-normalized and read-optimized data models with fewer joins. The final layer of data transformations and data quality rules are applied here.

Conclusion:

Adding an extra layer like prelanding at the start gives more flexibility to implement business logic. However, there are several scenarios that do not require this layer to implement. It totally depends on the requirement.

Loading

Translate ยป