Sunday, 22 May 2022

Implementation of slowly changing Dimension Type-2 Creating a new additional record to maintain the history (Using Start Date and End Date) using SQL Merge statement

Using this approach we are maintaining the history of the changing attributes. We are maintaining the history of the records by different ways.

Ø  Using start date and end date

Ø  Using IsActive flag (True & False)

Ø  Using Yes or No values

Let’s see the example

In this demo we will see SCD type using Start date and End date

Maintaining the history using start date and end date

In the customer dimension below are the records

ID

CUS_ID

Name

Year

Address

Start_Date

End_date

1

1

Bagesh Kumar Singh

2018

Pune

4/20/2018

Null

2

2

Mahesh

2018

Chennai

4/20/2018

Null

Now are getting below records in the sating table and we are updating the customer dimension

CUS_ID

Name

Year

Address

1

Bagesh Kumar Singh

2019

Greater Noida

3

Suresh

2020

Mumbai

Now our Customer dimension will be

ID

CUS_ID

Name

Year

Address

Start_date

End_date

1

1

Bagesh Kumar Singh

2018

Pune

4/20/2018

06/27/2019

2

2

Mahesh

2018

Chennai

4/20/2018

Null

3

3

Suresh

2020

Mumbai

06/27/2019

Null

4

1

Bagesh Kumar Singh

2019

Greater Noida

06/27/2019

Null

When a new record is created in the dimension table in that case we are storing the Start_date as Today date and Keeping End_date as Null or any highest default date (12/31/9999). Here we are using the End date as null.

If the existing records comes with the updated value in that case first we are updated the Today date into the end_date column and then inserting the new record in the dimension table with start_date as Today date and End_date as null.

Those records are having End_date as null it means that records are the Active records.

Below are the tables

CREATE TABLE dimcustomer_stg_scd_2

  (

     dimcustomer_stg_scd_2_id INT IDENTITY(1, 1) PRIMARY KEY,

     cus_id                   INT,

     cus_nm                   VARCHAR(50),

     cus_year                 VARCHAR(10),

     cus_add                  VARCHAR(500)

  )

 

CREATE TABLE dimcustomer_scd_2

  (

     dimcustomer_scd_2_id INT IDENTITY(1, 1) PRIMARY KEY,

     cus_id               INT,

     cus_nm               VARCHAR(50),

     cus_year             VARCHAR(10),

     cus_add              VARCHAR(500),

     start_date           DATE,

     end_date             DATE

  )

 

Inserting some records in to the stage and DimCustomer table.

INSERT INTO dimcustomer_scd_2
            
(cus_id,
             
cus_nm,
             
cus_year,
             
cus_add,
             
start_date)
VALUES      (1,
             
'Bagesh Kumar Singh',
             
'2018',
             
'Pune',
             
'4/20/2018');

INSERT INTO dimcustomer_scd_2
            
(cus_id,
             
cus_nm,
             
cus_year,
             
cus_add,
             
start_date)
VALUES      (2,
             
'Mahesh',
             
'2018',
             
'Chennai',
             
'4/20/2018');

INSERT INTO dimcustomer_stg_scd_2
            
(cus_id,
             
cus_nm,
             
cus_year,
             
cus_add)
VALUES      (1,
             
'Bagesh Kumar Singh',
             
'2022',
             
'Greater Noida');

INSERT INTO dimcustomer_stg_scd_2
            
(cus_id,
             
cus_nm,
             
cus_year,
             
cus_add)
VALUES      (3,
             
'Suresh',
             
'2022',
             
'Mumbai'); 

Now see the records in both tables.  

Using the below script we are loading the data from stage to main table.

-- This inserts another record to the dimension for SCD Type changes

INSERT INTO dimcustomer_scd_2

            (

                        cus_id,

                        cus_nm,

                        cus_year,

                        cus_add,

                        start_date,

                        end_date

            )

SELECT cus_id,

       cus_nm,

       cus_year,

       cus_add,

       start_date,

       end_date

FROM   (MERGE dimcustomer_scd_2 TT

                        using            dimcustomer_stg_scd_2 ST

                        ON (

                                                          tt.cus_id= st.cus_id)

                                         -- This inserts new records in the dimension table

                        WHEN NOT matched THEN

                        INSERT

                                     (

                                                  cus_id,

                                                  cus_nm,

                                                  cus_year,

                                                  cus_add,

                                                  start_date,

                                                  end_date

                                     )

                                     VALUES

                                     (

                                                  cus_id,

                                                  cus_nm,

                                                  cus_year,

                                                  cus_add,

                                                  Getdate(),

                                                  NULL

                                     )

                                     -- This marks the older record to be outdated for SCD Type 2

                        WHEN matched

                        AND          tt.cus_add <> st.cus_add

                        AND          tt.end_date IS NULL THEN

                        UPDATE

                        SET    tt.end_date = Getdate() output $action Action_Taken,

                               st.cus_id,

                               st.cus_nm,

                               st.cus_year,

                               st.cus_add,

                               Getdate() AS start_date,

                               NULL      AS end_date ) AS merge_out

WHERE  merge_out.action_taken = 'UPDATE';

Now running this script.  

Cus_id 1 is updated with new value and old record end date with updated with current date.

Let’s see again cus_id= 1 is coming with updated address then current records will become the historical records and stage record will be the current record.  

Running the above script. 

Get the expected result.

No comments:

Post a Comment

If you have any doubt, please let me know.

Popular Posts