Showing posts with label DWH. Show all posts
Showing posts with label DWH. Show all posts

Sunday 22 May 2022

Early Arriving Facts or Late Arriving Dimensions

Late arriving dimensions also called as Early Arriving Facts, which occurs when dimension data arriving in the data warehouse later than the fact data that references that dimension record.

Early Arriving Fact or Late Arriving Dimension Scenario

In the common scenario for a data warehouse, ETL (extract, transform, and load) processes are designed in a manner that dimensions are fully processed first. Once the dimension processing has been completed, the facts are loaded, with the assumption that all required dimension data is already in place.

Late arriving dimensions break this pattern, because the fact data is processed first, before the associated dimension information. The ETL processes can encounter fact records that are related to new dimension members i.e. members that have yet to be processed.

To Handling such type of scenario we are using below approach.

Ø  Never process the fact table

Ø  Park and retry the records

Ø  Insert a dummy value

Ø  Insert a row in the dimension and update it letter

Never process the fact table

If the fact record to be loaded is of no value without a reference to the late arriving dimension, and there is no expectation that the late arriving dimension will ever show up, then there is no reason to load the fact. Simply omit the record from the fact load. This is rarely the case.

Park and retry the records

If the fact records to be loaded are of no value without a reference to the late arriving dimension, and the late arriving dimension is expected to show up eventually, then we can queue the record until such time the dimension record is loaded. The queued record must be flagged such that it is retried at a future time and loaded once a related dimension record is loaded. It is typically a good idea to create a queued record report to monitor this process.

Insert a dummy value

If the fact record does have value without a reference to a related late arriving dimension, and we either do not expect or do not care if a dimension record shows up at a later time, then we can simply load the fact record with a foreign key reference to the unknown record in the related dimension; typically -1 surrogate key value. Note that we must pay particular attention to the fact grain if we are going to take the unknown member approach to failed foreign key lookups. Most fact table primary keys are set as a composite across foreign keys and degenerate dimensions. If allowing a foreign key to be set to -1 cause a possible primary key violation, then we need to look for other options such as adding a degenerate dimension.

Insert a row in the dimension and update it letter

If the fact record has value without a reference to a related late arriving dimension, and we expect the late arriving dimension to show up at a later time, then we can use the natural key in the record bound for the fact table to seed a record (inferred member) in the related dimensions. Once the dimension record shows up, it will have the same natural key and the dimension record’s attributes can then be updated. Particular attention should be paid to the inferred member’s record time in cases where dimension history is tracked such as SCD 2 situations. 

Inferred Dimension in DWH

While loading fact records, a dimension record may not yet be ready. One solution is to generate a surrogate key with null for all the other attributes. This should technically be called an inferred member, but is often called an inferred dimension. It is also known Early coming Fact or Late coming dimension. These all are different names of same scenario.

These are the dimension table type to handle late data, in a scenario where a fact table is loaded before the references are available in the dimension table. In such a case, to respect referential integrity, a new record having NULL for all dimension attributes can be created in the dimension table which the fact table can infer. This new row can be updated when dimension data is available.

Junk Dimension in DWH

A junk dimension is a single table with a combination of different and unrelated attributes to avoid having a large number of foreign key in the fact table. In other word we can say that junk dimension is created to manage the foreign key.

Let’s see this Example.

We have a product fact table. This table has the high and low cardinality. This table is attached the n numbers of Small – small Dimension.  Below are the dimension tables.

Dim_Product_Type

Dim_Product_Type_ID

Product_Type

1

Sales

2

Purchase

 

Dim_Product_Quality

Dim_Product_Quality _ID

Product_Quality

1

High

2

Medium

3

Low

Dim_Expiry

Dim_Expiry_ID

IsExpired

1

Yes

2

No

3

NA

Dim_Tax_Information

Dim_Tax_Information_ID

Tax_Info

1

Tax Free

2

28 %

3

18%

4

12%

5

5%

Dim_Delivery_Mode

Dim_Delivery_Mode_ID

Delivery_mode

1

Dealer

2

Agent

3

Direct

These are some dimensions. If we don’t have the Junk dimension in that case we will store this information in the fact table as below

Fact_product

Product_ID

Dim_Product_Type_ID

Dim_Product_Quality _ID

Dim_Expiry_ID

Dim_Tax_Information_ID

Dim_Delivery_Mode_ID

1

1

1

1

1

1

2

1

1

1

1

2

3

1

1

1

2

2

4

1

1

2

2

2

5

1

2

2

2

2

To over coming with this we will create the Junk Dimension and here we will keep all primary of the small dimensions.

Our Junk Dimension looks like below.

Dim_Product_Junk

Dim_Product_Junk_ID

Product_Type

Product_Quality

IsExpiried

Tax_Information

Delivery_Mode

1

Sales

High

Yes

Tax free

Dealer

2

Sales

High

Yes

Tax free

Agent

3

Sales

High

Yes

28 %

Agent

4

Sales

High

No

28%

Agent

5

Sales

Medium

No

28%

Agent

Now Our Fact table look like below

Fact_product

Product_ID

Dim_Product_Junk_ID

1

1

2

2

3

3

4

4

5

5

In the fact Dim_Product_Junk_ID is show the Information product dimension for example Dim_Product_Junk_ID = 1 means

Ø  Product_type : sales

Ø  Product_Quality : High

Ø  IsExpired : Yes

Ø  Tax_Information : Tax Free

Ø  Delivery_mode : Dealer

Junk dimension has (Dim_Product_Type* Dim_Product_Quality * Dim_Expiry* Dim_Tax_Information* Dim_Delivery_Mode) Records in the table.

So total rows in this junk dimension is 2*3*3*5*3 =270 records are in our Junk dimension table.

Popular Posts