Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

1 Dimensions

Building the Master Dimensions – SCD Type 1

2 Slowly Changing Dimensions


Building the Product Dimension – SCD Type 2

Building the Data


Warehouse – Part 2 3 Facts
Building the Fact Table

4 Review
Review of the Data in the Data Warehouse

MODULE 11
SECTION 1
Dimensions
Dimensions
Overview of Dimensions
Dimensions provide descriptive context for the quantitative data in the
fact table in a data warehouse.
Denormalized
• Dimensions are denormalized tables

• Hierarchical attributes in a dimension are flattened

02 03 • Prevents snow-flaking and reducing expensive joins


01

Descriptive Hierarchical
• Provide descriptive information about the business • Attributes in a dimension organized hierarchically for
analytics
• Use surrogate keys to join and describe data in the fact table
• Use hierarchies to aggregate or drill-down quantitative data
in the fact table
• Allows for slicing the quantitative data over different dimensions

• Time dimension provides historical context for the data in the fact
table
Dimensions
Overview of Slowly Changing Dimensions
Dimensions connected to a fact table are also affected by the passage of
time

Type 0 Type 1 Type 2


Changes Ignored Changes Updated (No History) Changes Inserted (History Preserved)

• Data cannot be changed • Data can be changed • Create new record by tuple versioning

• No history preservation • Overwrite old data with new data • Historical record is made inactive

• No history preservation • New record is made active

Key ID Name City Key ID Name City Key ID Name City From To Active
OLD

S123 123 Mike Rome S123 123 Mike Rome S123 123 Mike Rome Jan 1 - Yes
2022

Key ID Name City Key ID Name City Key ID Name City From To Active
NEW

S123 123 Mike Rome S123 123 Mike Milan S123 123 Mike Rome Jan 1 Dec 31 No
2022 2022

S124 123 Mike Milan Jan 1 - Yes


2023
Dimensions
Master Dimensions and their type
Type 0
• None
Dim Date Dim Store

Type 1
FACT • Dim Date • Changes to these
Sales • Dim Currency
dimensions are rare but
updates are possible

• Dim Store • History preservation is not


Dim Territory Dim Currency required
• Dim Territory

Type 2
• Dim Product • Changes and history are
Dim Product
required

Important to define the type of dimension as part of the data warehouse design
Dimensions
Building the SCD Type 1 Dimensions

Main steps in building the SCD Type 1 Dimensions

Read Source Compare Target Load Target


• Read Stage Table • Use a conditional split to • Update existing record in Target
compare Source and Target if it already exists

• Compare on natural key of the • Insert new record in Target if it


source and target doesn’t exist
DIMENSIONS
Building the Master Dimensions – Type 1

Building a Type 1 dimension


Using Mapping Data Flows

Using Stored Procedures


SECTION 2
Slowly Changing Dimensions
Slowly Changing Dimensions
Product Dimension – SCD Type 2
Product dimension has attributes that can change and where history
preservation is necessary
Title

• The title of the wine, typically


Type 2 doesn’t change

Changes Inserted (History Preserved)


Vintage
ID No Title Vintage Score From To Active
• Vintage of the wine doesn’t
OLD

1 123 Nebbiolo 2015 95 1/1/2022 - Yes change, since it is the year of the
wine

ID No Title Vintage Score From To Active Score


NEW

1 123 Nebbiolo 2015 95 1/1/2022 31/12/2022 No


• The score of the wine can change
2 123 Nebbiolo 2015 93 1/1/2023 - Yes
since it depends on reviewers

• Essential to preserve history


Dimensions
Building the SCD Type 2 Dimension

Main steps in building SCD Type 2 Dimensions

Read Source Compare Target Load Target


• Read Stage Table • Use a Lookup to compare Source • Add new record with new surrogate
and Target key if records differ on Type 2
attribute
• Compare source with matched
• Add new record if record doesn’t
records from lookup
exist in target with new surrogate
key

• Use derived transformation to add


effective start and end dates and
active flag
Slowly Changing Dimensions
Building the Product Dimension – Type 2 dimension

Building a Type 2 dimension


Using Mapping Data Flows

Using Stored Procedures


Dimensions
Assignment – Build remaining Type 1 Dimensions

Assignment
Build all Type 1 dimensions using Stored Procedures and invoke
them from ADF
Build one of the Type 1 dimensions using Mapping Data Flows

Test the implementations and review the dimension data


Slowly Changing Dimensions
Building the remaining dimensions

Building remaining dimensions


Invoke Stored Procedures
SECTION 3
Facts
Facts
Building the Fact Table
Fact tables are generally built after dimensions. This enables the
assignment of the dimension surrogate keys

Fact Table
Dim Column Name
Dim Date
Store
key StoreId

FACT key TerritoryId Derive dimension surrogate keys


Sales key DateId by joining the stage sales data with
the dimensions
key CurrencyId
Dim Dim
Territory Currency key ProductId

SalesQty
Derive fact measures by applying
SalesAmount
Dim the appropriate calculations
Product CostAmount

MarginAmount
Facts
Building the Fact Table

Main steps in building Fact Tables

Read Source Compare Target Load Target


• Read Stage Sales Table • Lookup relevant dimension tables • Use merge transformation to merge
fact table with dimension table using
• Retrieve the dimension surrogate keys the surrogate keys

• Calculate or derive required measures • Load merged data into the fact table

• Use derived transformation to add


dimension surrogate keys
Facts
Building the Fact Table

Building the Fact Table


Loading Fact data from Stage Sales Transactions

Deriving Dimension Keys


SECTION 5
Review
Review
Review of the Data Warehouse

What did we build?


Dimensions – Type 1 and Type 2

Facts – Dimension Keys and Measures


Review
Review of the Data Warehouse

Review the Data Warehouse


Review the Data Warehouse with canned queries
Module Summary
In this module we learnt

Overview Integration Hands-On


We got an overview of dimensions and their We learnt the concepts of loading Type 1 and We learnt how to build the Type 1 and Type 2-
benefits. Type 2 dimensions dimension patterns

We learnt about different types of slowly We learnt about the concept of loading a Fact We then built the Fact table
table and the different ways to handle delta loads
changing dimensions We then reviewed our data warehouse with
various queries to analyze the data
References
Surrogate Keys
Surrogate Keys | James Serra's Blog

Populating a Data Warehouse


Methods for populating a data warehouse | James Serra's Blog

Alter Row Transform


https://learn.microsoft.com/en-us/azure/data-factory/data-flow-alter-row

You might also like