Rapidly changing dimension are
dimension where the attribute values of the dimension changes frequently
causing the dimension grow rapidly if we have designed the dimension to capture
the changes as Type 2 dimension. The rapid growth of dimension will impact
maintenance and preformation of this dimension.
Due to this reason we cannot use the SCD type. To overcome with this issue we are using
rapidly changing Dimension approach.
In this approach we will split a
dimension into 3 different tables. One table is having the non-changed
attributes and other dimension table having the changed attribute with SCD type
2 implementation and 3rd table is the bridge of table one and two.
For example
Suppose we have a Patient
dimension where we have 1k rows in it. This dimension has below columns.
Ø
Patient_id
Ø
Name
Ø
Gender
Ø
Marital_status
Ø
Weight
Ø
BMI
Patient name gender and marital
status are not changing very frequently. It is changing very rarely. But weight
and BMI both are frequently change. Every visit it will update. Think if a
patient visits 20 times in a year for that we need to create 20 records for the
single patient (19 inactive and 1 active). If we take average 15 then every
year we are creating 1000*15 =15000 records.
It is impacting our performance.
To overcome with this issue we
will split this dimension into 3 tables as we discuss above.
Ø DimPatient
(Updating the records using SCD type -1 approach)
Ø Patient_mini_dim(
Updating/inserting using SCD type -2 approach)
Ø Patient_Junk(Incremental
load)
Let’s see how to implement it in
SQL Server
Below are the tables.
CREATE TABLE dimpatient_stg
(
dimpatient_stg_id INT IDENTITY(1, 1) PRIMARY KEY,
patient_id INT,
p_nm VARCHAR(50),
gender VARCHAR(10),
marital_status VARCHAR(20),
p_weight FLOAT,
p_bmi FLOAT
)
CREATE TABLE dimpatient
(
dimpatient_id INT IDENTITY(1, 1) PRIMARY KEY,
patient_id INT,
p_nm VARCHAR(50),
gender VARCHAR(10),
marital_status VARCHAR(20)
)
CREATE TABLE patient_mini_dim
(
patient_mini_dim INT IDENTITY(1, 1) PRIMARY KEY,
patient_id INT,
start_dt DATE,
end_date DATE
)
CREATE TABLE patient_junk
(
patient_mini_dim INT,
p_weight FLOAT,
p_bmi FLOAT
)
|
Inserting some records in these tables
INSERT INTO dimpatient_stg
(patient_id,
p_nm,
gender,
marital_status,
p_weight,
p_bmi)
VALUES (1,
'Bagesh Kumar Singh',
'Male',
'Marirred',
85,
33);
INSERT INTO dimpatient_stg
(patient_id,
p_nm,
gender,
marital_status,
p_weight,
p_bmi)
VALUES (2,
'Rajesh',
'Male',
'Marirred',
75,
31);
INSERT INTO dimpatient_stg
(patient_id,
p_nm,
gender,
marital_status,
p_weight,
p_bmi)
VALUES (3,
'Mohan',
'Male',
'Unmarirred',
55,
25);
INSERT INTO dimpatient
(patient_id,
p_nm,
gender,
marital_status)
VALUES (1,
'Bagesh Kumar Singh',
'Male',
'Marirred');
INSERT INTO dimpatient
(patient_id,
p_nm,
gender,
marital_status)
VALUES (2,
'Rajesh',
'Male',
'Marirred');
INSERT INTO patient_mini_dim
(patient_id,
start_dt)
VALUES (1,
'01-12-2022');
INSERT INTO patient_mini_dim
(patient_id,
start_dt)
VALUES (2,
'01-12-2022');
INSERT INTO patient_junk
(patient_mini_dim,
p_weight,
p_bmi)
VALUES (1,
80,
32);
INSERT INTO patient_junk
(patient_mini_dim,
p_weight,
p_bmi)
VALUES (2,
70,
30);
|
See the records in the tables.
There some records in the stage
table.
Using below script we will the
Stage table into the dimension table.
--Inserting new records in dimpatient and updating existing record (SCD type -1)
MERGE dimpatient T
using dimpatient_stg S
ON T.patient_id = s.patient_id
WHEN matched AND T.p_nm <> S.p_nm OR T.gender <> S.gender OR T.marital_status<>
s.gender THEN
UPDATE SET T.p_nm = S.p_nm,
T.gender = S.gender,
T.marital_status = s.marital_status
WHEN NOT matched BY target THEN
INSERT
VALUES ( S.patient_id,
S.p_nm,
S.gender,
s.marital_status);
-- updating the end dat of the patient( in patient_mini_dim) which are in stage comes for the load
UPDATE m
SET m.end_date = Getdate()
FROM patient_mini_dim m
JOIN dimpatient_stg stg
ON stg.patient_id = m.patient_id;
--Inserting the new records of the patient( in patient_mini_dim) which are in stage comes for the load with start date and end date as null
INSERT INTO patient_mini_dim
(patient_id,
start_dt)
SELECT DISTINCT stg.patient_id,
Getdate() AS start_dt
FROM dimpatient_stg stg
LEFT JOIN patient_mini_dim mdim
ON stg.patient_id = mdim.patient_id
WHERE end_date IS NULL;
-----------Populating the Patient_Junk table
INSERT INTO patient_junk
(patient_mini_dim,
p_weight,
p_bmi)
SELECT mini.patient_mini_dim,
stg.p_weight,
stg.p_bmi
FROM dimpatient_stg stg
JOIN patient_mini_dim mini
ON mini.patient_id = stg.patient_id
WHERE mini.end_date IS NULL;
|
Running this script.
Now see the records in the
tables.
Now if we are getting below
records in the stage to load.
INSERT INTO dimpatient_stg
(patient_id,
p_nm,
gender,
marital_status,
p_weight,
p_bmi)
VALUES (1,
'Bagesh Kumar Bagi',
'Male',
'Marirred',
90,
35);
INSERT INTO dimpatient_stg
(patient_id,
p_nm,
gender,
marital_status,
p_weight,
p_bmi)
VALUES (3,
'Mohan',
'Male',
'Marirred',
55,
25);
INSERT INTO dimpatient_stg
(patient_id,
p_nm,
gender,
marital_status,
p_weight,
p_bmi)
VALUES (4,
'Rohan',
'Male',
'Marirred',
75,
31);
|
Now running the script.
Data is loaded successfully.