Using this approach we are updating the existing records with updated value without maintaining any history. Insert a new record into the dimension table if that record does not exist in the dimension table. It is very easy and saving huge amount of space because we are not maintaining any history.
For example
In the customer dimension below are the records
ID |
Name |
PAN_Number |
Aadhar_Number |
1 |
Bagesh Kumar Singh |
12345 |
44444 |
2 |
Mahesh |
22222 |
66666 |
Now are
getting below records in the sating table to update the PAN and Aadhar number in the customer dimension
ID |
Name |
PAN_Number |
Aadhar_Number |
1 |
Bagesh Kumar Singh |
55555 |
33333 |
3 |
Suresh |
77777 |
88888 |
Now our
Customer dimension will be like below. PAN and Aadhar number are not updated
for the customer ID 1.
ID |
Name |
PAN_Number |
Aadhar_Number |
1 |
Bagesh Kumar Singh |
55555 |
33333 |
2 |
Mahesh |
22222 |
66666 |
3 |
Suresh |
77777 |
88888 |
In the above
load we get one existing ID with updated value and one new record. Here is not
mainlining any history. If anyone want to know, what was the Pan or Aadhar
number of Bagesh (ID: 1) previously. We can’t get it. In this approach only we
are getting latest value in the dimension table.
We are using
such approach where we know that no historical information is required in the
Data ware house and reports.
Advantage of Type 1
·
This is the easiest way to handle the SCD
problem.
·
It is saving Hug amount of spaces in the
database because we are not saving the Historical information of the dimension.
Disadvantage of Type 1
·
We are not having the historical information. In
case someone updating the dimension records we may not able to recover the
updated information.
Ø
Using Merge statement
Ø
Using Inner and Left join
o
Inner join to update the existing records
o
Left join to identify the new records and
inserting these records in the dimension table.
Using Merge statement
Update the
record if cus_id is exist in destination table and any changes in PAN or Aadhar
Number and insert the destination if no record is exits.
MERGE dimcustomer T using dimcustomer_stg S ON T.cus_id = s.cus_id WHEN matched AND ( T.pan_num <> S.pan_num OR T.aadhar_num<>s.aadhar_num) THEN UPDATE SET T.pan_num = S.pan_num, T.aadhar_num = s.aadhar_num WHEN NOT matched BY target THEN INSERT VALUES (S.cus_id, S.cus_nm, S.pan_num, S.aadhar_num); |
Records in
the in Dimcustomer and staging table.
In stage we are getting one record to update and one new record. Now we are running this script load the data into the designation table.
Two rows are affected. See the record of DimCustomer table.
Pan and Aadhar both are updated successfully.
Using Inner and Left join
Ø
Using Inner and Left join
o
Inner join to update the existing records
o
Left join to identify the new records and
inserting these records in the dimension table.
Query to
update the record
UPDATE dim SET dim.pan_num = stg.pan_num, dim.aadhar_num = stg.aadhar_num FROM dimcustomer
dim JOIN dimcustomer_stg stg ON dim.cus_id = stg.cus_id WHERE dim.pan_num <> stg.pan_num OR dim.aadhar_num <> stg.aadhar_num; |
Using Left
join to identify the new record to insert the new record into the destination
table.
INSERT INTO dimcustomer (cus_id, cus_nm, pan_num, aadhar_num) SELECT stg.cus_id, stg.cus_nm, stg.pan_num, stg.aadhar_num FROM
dimcustomer_stg stg LEFT JOIN dimcustomer dim ON dim.cus_id = stg.cus_id WHERE dim.dimcustomer_id IS NULL; |
Before
running the script records in the table.
Now we are
running the script
See the
record in the destination table.
Get the
expected result.
No comments:
Post a Comment
If you have any doubt, please let me know.