SLOWLY CHANGING DIMENSION, is an SSIS transformation which can solve problems for dimensions whose attributes are changing by the time, and needs updates in future.
Normally once the data is loaded into DWH , we need cleaning during the process. For this purpose we can use inbuilt transformation also, i.e. lookup, fuzzy lookup etc. as well as one can make self built logic using SQL queries or may be by using other built in SSIS transformations.
In same way to perform SCD logic three pipelines are needed:
1) Do lookup in order to find the new rows from the source.
2) Make logic using timestamp or hash byte values to recognize the updates in any row, based on the matching record values.
3) And for inferred member rows, need to have new pipeline, checking for inferred rows in dimension table, and according get updates from source.
Moreover, making separate logics and then simply save the package for future references will be a better option as compared to directly using SCD.
SCD, can be configured with the wizard, and after completion of wizard normally two new pipelines are created, in these pipelines we see OLEDB command transformation which works row by row basis, and not good based on performance point of view. And additional modifications if needed in the query then again need to rewrite the whole logic again, so instead of doing it, one can separately make a new logic and use it.
Why SCD logics needed?
mainly in dimension tables , there are very few dimensions which show changing behavior with time, for such dimensions, incremental load logic need to be written with SCD logic also based on the column behavior.
For example: The address of the customer could vary by time, but the row of customer will not be needed to increased. In such cases we need to add multiple address column in the dimension table and add that new address record.
Another solution for the same is, flag column can be used and then the rows for that customer will keep increasing with each changing attribute. Based in business discussion one of the method need to be used, but in both the cases SCD logic must be implemented.
Comments
Post a Comment