Sunday 22 May 2022

Early Arriving Facts or Late Arriving Dimensions

Late arriving dimensions also called as Early Arriving Facts, which occurs when dimension data arriving in the data warehouse later than the fact data that references that dimension record.

Early Arriving Fact or Late Arriving Dimension Scenario

In the common scenario for a data warehouse, ETL (extract, transform, and load) processes are designed in a manner that dimensions are fully processed first. Once the dimension processing has been completed, the facts are loaded, with the assumption that all required dimension data is already in place.

Late arriving dimensions break this pattern, because the fact data is processed first, before the associated dimension information. The ETL processes can encounter fact records that are related to new dimension members i.e. members that have yet to be processed.

To Handling such type of scenario we are using below approach.

Ø  Never process the fact table

Ø  Park and retry the records

Ø  Insert a dummy value

Ø  Insert a row in the dimension and update it letter

Never process the fact table

If the fact record to be loaded is of no value without a reference to the late arriving dimension, and there is no expectation that the late arriving dimension will ever show up, then there is no reason to load the fact. Simply omit the record from the fact load. This is rarely the case.

Park and retry the records

If the fact records to be loaded are of no value without a reference to the late arriving dimension, and the late arriving dimension is expected to show up eventually, then we can queue the record until such time the dimension record is loaded. The queued record must be flagged such that it is retried at a future time and loaded once a related dimension record is loaded. It is typically a good idea to create a queued record report to monitor this process.

Insert a dummy value

If the fact record does have value without a reference to a related late arriving dimension, and we either do not expect or do not care if a dimension record shows up at a later time, then we can simply load the fact record with a foreign key reference to the unknown record in the related dimension; typically -1 surrogate key value. Note that we must pay particular attention to the fact grain if we are going to take the unknown member approach to failed foreign key lookups. Most fact table primary keys are set as a composite across foreign keys and degenerate dimensions. If allowing a foreign key to be set to -1 cause a possible primary key violation, then we need to look for other options such as adding a degenerate dimension.

Insert a row in the dimension and update it letter

If the fact record has value without a reference to a related late arriving dimension, and we expect the late arriving dimension to show up at a later time, then we can use the natural key in the record bound for the fact table to seed a record (inferred member) in the related dimensions. Once the dimension record shows up, it will have the same natural key and the dimension record’s attributes can then be updated. Particular attention should be paid to the inferred member’s record time in cases where dimension history is tracked such as SCD 2 situations. 

Popular Posts