Sunday, 22 May 2022

Rapidly changing Dimension in DWH

Rapidly changing dimension are dimension where the attribute values of the dimension changes frequently causing the dimension grow rapidly if we have designed the dimension to capture the changes as Type 2 dimension. The rapid growth of dimension will impact maintenance and preformation of this dimension.  Due to this reason we cannot use the SCD type.  To overcome with this issue we are using rapidly changing Dimension approach.

In this approach we will split a dimension into 3 different tables. One table is having the non-changed attributes and other dimension table having the changed attribute with SCD type 2 implementation and 3rd table is the bridge of table one and two.

 For example

Suppose we have a Patient dimension where we have 1k rows in it. This dimension has below columns.

Ø  Patient_id

Ø  Name

Ø  Gender

Ø  Marital_status

Ø  Weight

Ø  BMI

Patient name gender and marital status are not changing very frequently. It is changing very rarely. But weight and BMI both are frequently change. Every visit it will update. Think if a patient visits 20 times in a year for that we need to create 20 records for the single patient (19 inactive and 1 active). If we take average 15 then every year we are creating 1000*15 =15000 records.

It is impacting our performance.

To overcome with this issue we will split this dimension into 3 tables as we discuss above.

Ø  DimPatient (Updating the records using SCD type -1 approach)

Ø  Patient_mini_dim( Updating/inserting using SCD type -2 approach)

Ø  Patient_Junk(Incremental load)

Let’s see how to implement it in SQL Server

Below are the tables.

CREATE TABLE dimpatient_stg
  
(
     
dimpatient_stg_id INT IDENTITY(1, 1) PRIMARY KEY,
     
patient_id        INT,
     
p_nm              VARCHAR(50),
     
gender            VARCHAR(10),
     
marital_status    VARCHAR(20),
     
p_weight          FLOAT,
     
p_bmi             FLOAT
  
)

CREATE TABLE dimpatient
  
(
     
dimpatient_id  INT IDENTITY(1, 1) PRIMARY KEY,
     
patient_id     INT,
     
p_nm           VARCHAR(50),
     
gender         VARCHAR(10),
     
marital_status VARCHAR(20)
  
)

CREATE TABLE patient_mini_dim
  
(
     
patient_mini_dim INT IDENTITY(1, 1) PRIMARY KEY,
     
patient_id       INT,
     
start_dt         DATE,
     
end_date         DATE
  
)

CREATE TABLE patient_junk
  
(
     
patient_mini_dim INT,
     
p_weight         FLOAT,
     
p_bmi            FLOAT
  
) 

 Inserting some records in these tables

INSERT INTO dimpatient_stg
            
(patient_id,
             
p_nm,
             
gender,
             
marital_status,
             
p_weight,
             
p_bmi)
VALUES      (1,
             
'Bagesh Kumar Singh',
             
'Male',
             
'Marirred',
             85
,
             33
);

INSERT INTO dimpatient_stg
            
(patient_id,
             
p_nm,
             
gender,
             
marital_status,
             
p_weight,
             
p_bmi)
VALUES      (2,
             
'Rajesh',
             
'Male',
             
'Marirred',
             75
,
             31
);

INSERT INTO dimpatient_stg
            
(patient_id,
             
p_nm,
             
gender,
             
marital_status,
             
p_weight,
             
p_bmi)
VALUES      (3,
             
'Mohan',
             
'Male',
             
'Unmarirred',
             55
,
             25
);

INSERT INTO dimpatient
            
(patient_id,
             
p_nm,
             
gender,
             
marital_status)
VALUES      (1,
             
'Bagesh Kumar Singh',
             
'Male',
             
'Marirred');

INSERT INTO dimpatient
            
(patient_id,
             
p_nm,
             
gender,
             
marital_status)
VALUES      (2,
             
'Rajesh',
             
'Male',
             
'Marirred');

INSERT INTO patient_mini_dim
            
(patient_id,
             
start_dt)
VALUES      (1,
             
'01-12-2022');

INSERT INTO patient_mini_dim
            
(patient_id,
             
start_dt)
VALUES      (2,
             
'01-12-2022');

INSERT INTO patient_junk
            
(patient_mini_dim,
             
p_weight,
             
p_bmi)
VALUES      (1,
             80
,
             32
);

INSERT INTO patient_junk
            
(patient_mini_dim,
             
p_weight,
             
p_bmi)
VALUES      (2,
             70
,
             30
); 

See the records in the tables. 

There some records in the stage table.

   

Using below script we will the Stage table into the dimension table.

--Inserting new records in dimpatient and updating existing record (SCD type -1)
MERGE dimpatient T
using dimpatient_stg S
ON T.patient_id = s.patient_id
WHEN matched AND T.p_nm <> S.p_nm OR T.gender <> S.gender OR T.marital_status<>
s.gender THEN
  
UPDATE SET T.p_nm = S.p_nm,
             
T.gender = S.gender,
             
T.marital_status = s.marital_status
WHEN NOT matched BY target THEN
  
INSERT
  
VALUES ( S.patient_id,
           
S.p_nm,
           
S.gender,
           
s.marital_status);

-- updating the end dat of the patient( in patient_mini_dim) which are in stage comes for the load 
UPDATE m
SET    m.end_date = Getdate()
FROM   patient_mini_dim m
       
JOIN dimpatient_stg stg
         
ON stg.patient_id = m.patient_id;

--Inserting the new records of the patient( in patient_mini_dim) which are in stage comes for the load with start date and end date as null
INSERT INTO patient_mini_dim
            
(patient_id,
             
start_dt)

SELECT DISTINCT stg.patient_id,
                
Getdate() AS start_dt
FROM   dimpatient_stg stg
       
LEFT JOIN patient_mini_dim mdim
              
ON stg.patient_id = mdim.patient_id
WHERE  end_date IS NULL; 

-----------Populating the Patient_Junk table
INSERT INTO patient_junk
            
(patient_mini_dim,
             
p_weight,
             
p_bmi)
SELECT mini.patient_mini_dim,
       
stg.p_weight,
       
stg.p_bmi
FROM   dimpatient_stg stg
       
JOIN patient_mini_dim mini
         
ON mini.patient_id = stg.patient_id
WHERE  mini.end_date IS NULL; 

Running this script.

Now see the records in the tables.  

Now if we are getting below records in the stage to load.

INSERT INTO dimpatient_stg
            
(patient_id,
             
p_nm,
             
gender,
             
marital_status,
             
p_weight,
             
p_bmi)
VALUES      (1,
             
'Bagesh Kumar Bagi',
             
'Male',
             
'Marirred',
             90
,
             35
);

INSERT INTO dimpatient_stg
            
(patient_id,
             
p_nm,
             
gender,
             
marital_status,
             
p_weight,
             
p_bmi)
VALUES      (3,
             
'Mohan',
             
'Male',
             
'Marirred',
             55
,
             25
);

INSERT INTO dimpatient_stg
            
(patient_id,
             
p_nm,
             
gender,
             
marital_status,
             
p_weight,
             
p_bmi)
VALUES      (4,
             
'Rohan',
             
'Male',
             
'Marirred',
             75
,
             31
); 

Now running the script.

 


Data is loaded successfully.

3 comments:

If you have any doubt, please let me know.

Popular Posts