Sunday 22 May 2022

Implementation of slowly changing Dimension Type-4 Using Historical or archival table using SQL Script

In this demo we are using the Archival table.

In this approach we are having two table main tables and the historical or Archive table to maintain the history. Basically we are using the logic of SCD type 2 (Maintaining the history using start date and end date) only the difference between them in type two we are storing the historical record in to the same dimension table but in the SCD Type 4 Approach we are storing the current active record into the main table and the historical record into the another Historical table or Archive table.

 See the example.

We have DimCustomer_scd_4 (main table), DimCustomer_scd_4_Archive (Archive table) and the DimCustomer_STG_scd_4 (Stage table).

 In the customer dimension below are the records

Table : DimCustomer_scd_4

ID

CUS_ID

Name

Year

Address

Start_Date

End_date

1

1

Bagesh Kumar Singh

2018

Pune

4/20/2018

Null

2

2

Mahesh

2018

Chennai

4/20/2018

Null

Table : DimCustomer_scd_4_Archive

ID

CUS_ID

Name

Year

Address

Start_Date

End_date

 

 

 

 

 

 

 

Now are getting below records in the sating table and we are updating the customer dimension

Table : DimCustomer_STG_scd_4

CUS_ID

Name

Year

Address

1

Bagesh Kumar Singh

2019

Greater Noida

3

Suresh

2020

Mumbai

Now our Customer dimension will be

Table : DimCustomer_scd_4

ID

CUS_ID

Name

Year

Address

Start_date

End_date

2

2

Mahesh

2018

Chennai

4/20/2018

Null

3

3

Suresh

2020

Mumbai

06/27/2019

Null

4

1

Bagesh Kumar Singh

2019

Greater Noida

06/27/2019

Null

Table : DimCustomer_scd_4_Archive

ID

CUS_ID

Name

Year

Address

Start_date

End_date

1

1

Bagesh Kumar Singh

2018

Greater Noida

4/20/2018

06/27/2019

When a new record is created in the dimension table in that case we are storing the Start_date as Today date and Keeping End_date as Null or any highest default date (12/31/9999). Here we are using the End date as null.

If the existing records comes with the updated value in that case first we are we are inserting the record into the archive table with filling End_date as today date and then deleting the record from the Main table. In the main table (DimCustomer_scd_4) we always get latest records.

Let’s see the demo.

Here we are creating 3 tables.

CREATE TABLE dimcustomer_stg_scd_4

  (

     dimcustomer_stg_scd_2_id INT IDENTITY(1, 1) PRIMARY KEY,

     cus_id                   INT,

     cus_nm                   VARCHAR(50),

     cus_year                 VARCHAR(10),

     cus_add                  VARCHAR(500)

  )

 

CREATE TABLE dimcustomer_scd_4

  (

     dimcustomer_scd_4_id INT IDENTITY(1, 1) PRIMARY KEY,

     cus_id               INT,

     cus_nm               VARCHAR(50),

     cus_year             VARCHAR(10),

     cus_add              VARCHAR(500),

     start_date           DATE,

     end_date             DATE

  )

 

CREATE TABLE dimcustomer_scd_4_archive

  (

     dimcustomer_scd_4_archive_id INT IDENTITY(1, 1) PRIMARY KEY,

     cus_id                       INT,

     cus_nm                       VARCHAR(50),

     cus_year                     VARCHAR(10),

     cus_add                      VARCHAR(500),

     start_date                   DATE,

     end_date                     DATE

  )

 

Inserting few records into the Stgaing and main table.

INSERT INTO dimcustomer_scd_4

            (cus_id,

             cus_nm,

             cus_year,

             cus_add,

             start_date)

VALUES      (1,

             'Bagesh Kumar Singh',

             '2018',

             'Pune',

             '4/20/2018');

 

INSERT INTO dimcustomer_scd_4

            (cus_id,

             cus_nm,

             cus_year,

             cus_add,

             start_date)

VALUES      (2,

             'Mahesh',

             '2018',

             'Chennai',

             '4/20/2018');

 

INSERT INTO dimcustomer_stg_scd_4

            (cus_id,

             cus_nm,

             cus_year,

             cus_add)

VALUES      (1,

             'Bagesh Kumar Singh',

             '2022',

             'Greater Noida');

 

INSERT INTO dimcustomer_stg_scd_4

            (cus_id,

             cus_nm,

             cus_year,

             cus_add)

VALUES      (3,

             'Suresh',

             '2022',

             'Mumbai');

See the records into the table. 

Below script is used to implement the SCD type : 4

 

--insert the updated record into archive and delete it from Main table

DELETE main

output deleted.cus_id,

       deleted.cus_nm,

       deleted.cus_year,

       deleted.cus_add,

       deleted.Start_date,

       Getdate()

INTO dimcustomer_scd_4_archive

FROM   dimcustomer_scd_4 main

       JOIN dimcustomer_stg_scd_4 stg

         ON stg.cus_id = main.cus_id

WHERE  stg.cus_add <> main.cus_add

       AND main.end_date IS NULL;

 

-- Insert the records into the main table from stage table.

INSERT INTO dimcustomer_scd_4

            (cus_id,

             cus_nm,

             cus_year,

             cus_add,

             start_date)

SELECT stg.cus_id,

       stg.cus_nm,

       stg.cus_year,

       stg.cus_add,

       Getdate()

FROM   dimcustomer_stg_scd_4 stg

       LEFT JOIN dimcustomer_scd_4 main

              ON stg.cus_id = main.cus_id

WHERE  main.dimcustomer_scd_4_id IS NULL;

Now running this query.After run this query sees the records into the table.

  

For the cus_id =1 we are getting the updated value in this case previous record we are inserting this record into the archive table and deleting that record into the main table and inserting the latest record into the main table.

No comments:

Post a Comment

If you have any doubt, please let me know.

Popular Posts