Skip to main content

SLOWLY CHANGING DIMENSION SSIS

 

SLOWLY CHANGING DIMENSION, is an SSIS transformation which can solve problems for dimensions whose attributes are changing by the time, and needs updates in future.

Normally once the data is loaded into DWH , we need cleaning during the process. For this purpose we can use inbuilt transformation also, i.e. lookup, fuzzy lookup etc. as well as one can make self built logic using SQL queries or may be by using other built in SSIS transformations.

In same way to perform SCD logic three pipelines are needed:

1) Do lookup in order to find the new rows from the source.

2) Make logic using timestamp or hash byte values to recognize the updates in any row, based on the matching record values.

3) And for inferred member rows, need to have new pipeline, checking for inferred rows in dimension table, and according get updates from source.


Moreover, making separate logics and then simply save the package for future references will be a better option as compared to directly using  SCD. 

SCD, can be configured with the wizard, and after completion of wizard normally two new pipelines are created, in these pipelines we see OLEDB command transformation which works row by row basis, and not good based on performance point of view. And additional modifications if needed in the query then again need to rewrite the whole logic again, so instead of doing it, one can separately make a new logic and use it.

Why SCD logics needed?

mainly in dimension tables , there are very few dimensions which show changing behavior with time, for such dimensions, incremental load logic need to be written with SCD logic also based on the column behavior.

For example: The address of the customer could vary by time, but the row of customer will not be needed to increased. In such cases we need to add multiple address column in the dimension table and add that new address record.

Another solution for the same is, flag column can be used and then the rows for that customer will keep increasing with each changing attribute. Based in business discussion one of the method need to be used, but in both the cases SCD logic must be implemented.







Comments

Popular posts from this blog

How to be a DATA Analyst

To become a data analyst one need to be good at maths, basically numbers and visuals are two things every data analyst must know about. In my experience I believe domain knowledge and understanding the business is one of the key factor one would be needing to sort out pattens or analysis from the business data. Until and unless I don't know what my details is telling to me how I can be sure what to analyse. After all these basically a series of tool is important so that the work of analyst becomes easy, let's say tools like Excel, SQL , visualisation (tableau, powerbi), cloud computing (azure,AWS), modules in python like matplotlib, scikit learn, seaborn, pandas are some of the basic necessities that need to be fullfill. Some guidelines if I have to say is: 1) always try with clean data, then move to dirty data(mostly wrong columns values, mismatch column values, redundant data) 2) making quick visuals are always a better approach to increase confidence and skill in the path 3)...

SQL Interview Question

  SQL Interview Question: You are given two unrelated tables: Product — containing product details ProductSubcategory — containing subcategory details There are no common columns , and no foreign key or primary key relationships between them. Task: Write a SQL query to perform a join between these two tables, despite having no direct relationship. Query And Related Table: The problem with the above script is when you have exact match between the "product description" column and "subname" column then this code will work, but let say you have difference like Gadget and Gadgets then in this case the query will fail.  Feel free to post you comments over this solution, my approach would be: 1) Do the match over the letter by letter then count the total letter matched and then total letters if the percentage for this is above 80% then this record must be in the join condition. Basically perform the lookup with the help of SQL code. 2) Make abridge...

HTTP request to get data import into Database

What is HTTP request? It's a logic that can be used in order to pull data using api's created by individuals both private and public. Some API's are build in house to use it for own use and some are public API's, there are series of methods that can used in order to fetch data using these public API's. Let's see some of them: 1) GET: it is used to request some data from the frontend, usually the website URL, access token is required and in the URL itself the path is specified that someone is looking for. Let's understand with an example:- For a college website xyzuniversity.com, if I need to call the record for a student name "Alex" I will be writing the URL like GET http://university.com/student-record/alex:? And in the response I will be getting the associated record in JSON format. Here we used the GET method to call this data, like this we got multiple methods like PUT, HEAD, DELETE, PATCH,OPTIONS etc that can be used based on the need. Let...