Late arriving dimensions also
called as Early Arriving Facts, which occurs when dimension data arriving in
the data warehouse later than the fact data that references that dimension
record.
Early Arriving Fact or Late Arriving Dimension Scenario
In the common scenario for a data
warehouse, ETL (extract, transform, and load) processes are designed in a
manner that dimensions are fully processed first. Once the dimension processing
has been completed, the facts are loaded, with the assumption that all required
dimension data is already in place.
Late arriving dimensions break
this pattern, because the fact data is processed first, before the associated
dimension information. The ETL processes can encounter fact records that are
related to new dimension members i.e. members that have yet to be processed.
To Handling such type of scenario we are using below approach.
Ø
Never process the fact table
Ø
Park and retry the records
Ø
Insert a dummy value
Ø
Insert a row in the dimension and update it
letter
Never process the fact table
If the fact record to be loaded
is of no value without a reference to the late arriving dimension, and there is
no expectation that the late arriving dimension will ever show up, then there
is no reason to load the fact. Simply omit the record from the fact load. This
is rarely the case.
Park and retry the records
If the fact records to be loaded
are of no value without a reference to the late arriving dimension, and the
late arriving dimension is expected to show up eventually, then we can queue
the record until such time the dimension record is loaded. The queued record
must be flagged such that it is retried at a future time and loaded once a
related dimension record is loaded. It is typically a good idea to create a
queued record report to monitor this process.
Insert a dummy value
If the fact record does have
value without a reference to a related late arriving dimension, and we either
do not expect or do not care if a dimension record shows up at a later time,
then we can simply load the fact record with a foreign key reference to the
unknown record in the related dimension; typically -1 surrogate key value. Note
that we must pay particular attention to the fact grain if we are going to take
the unknown member approach to failed foreign key lookups. Most fact table
primary keys are set as a composite across foreign keys and degenerate
dimensions. If allowing a foreign key to be set to -1 cause a possible primary
key violation, then we need to look for other options such as adding a
degenerate dimension.
Insert a row in the dimension and update it letter
If the fact record has value without a reference to a related late arriving dimension, and we expect the late arriving dimension to show up at a later time, then we can use the natural key in the record bound for the fact table to seed a record (inferred member) in the related dimensions. Once the dimension record shows up, it will have the same natural key and the dimension record’s attributes can then be updated. Particular attention should be paid to the inferred member’s record time in cases where dimension history is tracked such as SCD 2 situations.