Monday 13 December 2021

Load the unique records from file using SSIS transformation (Union all and Sort Transformation)

 We are getting the data from 4 sources CSV Files. Some time we are getting the same records in more than one file. Our business wants to load the unique records in the main table. Basically we need to remove the duplicate data and load the unique records.

Files in folder.  


Below are the files data. 

Below is the table.

       CREATE TABLE Employees(

                EmployeeID int NOT NULL primary key,

                LastName varchar(50) NOT NULL,

                FirstName varchar(50) NOT NULL,

                Title varchar(50) NULL,

                )

 We can load these data using two ways.

1.       We will read all files and merging the data using union all transformation after that we will use sort transformation and remove the data and load the unique records into the destination table.

2.       Other way like we load the all files in the stage table using the Foreach each loop container  after that we will take the execute SQL Task and write the sql script to filter the duplicate records and loading the unique records in the destination table.

In this demo we will see the first way to load the data.

Let’s see.

We are taking the data flow task.  

In the dataflow task we are taking 4 flat files source and doing the configuration and doing the mapping.  

Doing the mapping. 

Now we are taking the union all transformation.  


This wills marge the data into the single dataset.

Read more about: Union all Transformation

https://bageshkumarbagi-msbi.blogspot.com/2017/09/union-all-transformation-in-ssis.html

 Now we are using the sort transformation to remove the duplicate data.

  

Need to check the key. We need to check mark the Remove rows with duplicate sort values to delete the duplicate records.

Read More about: Sort Transformation

https://bageshkumarbagi-msbi.blogspot.com/2015/02/sort-transformation.html

Now taking OLEDB destination and doing the mapping.  

Below is the complete transformation. 

Now running the package.

Before run data in the table.  

Running package now. 

Package executed successfully.

See the records in the table.  


Unique data loaded successfully.

Drawback of this approach is like suppose we want add any new file then we need to modify the package. Otherwise the new file will not load.

 Adding a new file. 

Now we are running this package. 

It loads only 4 files. For loading the 5th file we need to make the changes in the package then only we can load the data of the 5th file.

Now we are removing one file from source.  

Now running this package.

This package get failed 

With the below error.  

For loading the remaining files we make the changes in the package then only we can load the data in the table.

As per my opinion: 2nd approach is the best to handle such type of scenarios.

No comments:

Post a Comment

If you have any doubt, please let me know.

Popular Posts