Monday, 13 December 2021

Load the unique records from file using SSIS (Using Staging table)

 We are getting the data from 4 sources CSV Files. Some time we are getting the same records in more than one file. Our business wants to load the unique records in the main table. Basically we need to remove the duplicate data and load the unique records.

Files in folder.

Below are the files data. 

Below is the table.

Staging table

CREATE TABLE Employees_Stg(

                EmployeeID int ,

                LastName varchar(50) NOT NULL,

                FirstName varchar(50) NOT NULL,

                Title varchar(50) NULL,

                )

 Main table

       CREATE TABLE Employees(

                EmployeeID int NOT NULL primary key,

                LastName varchar(50) NOT NULL,

                FirstName varchar(50) NOT NULL,

                Title varchar(50) NULL,

                )

 We can load these data using two ways.

1.       We will read all files and merging the data using union all transformation after that we will use sort transformation and remove the data and load the unique records into the destination table.

2.       Other way like we load the all files in the stage table using the Foreach each loop container  after that we will take the execute SQL Task and write the sql script to filter the duplicate records and loading the unique records in the destination table.

In this demo we will see the second way to load the data.

Taking one variable to store the file name.  

Taking the Foreach loop container to iterate the files and loading these files data into the staging table. 

Storing file name in the variable.  

Inside the Foreach loop container we are taking data flow task. Taking Source file as the Flat file and creating the connection string as dynamic. 

Setting the Delay validation property as True.

Read more about Delay validation property in SSIS.

https://bageshkumarbagi-msbi.blogspot.com/2016/09/delay-validation-properties-in-ssis.html

Loading the data into the staging table. 

After loading the data into the stage we are taking the Execute Sql task and wring the below script to load the data into the main table. 

Below is the sql script.

 

insert into Employees

(

EmployeeID,

LastName,

FirstName,

Title

)

select

EmployeeID,

LastName,

FirstName,

Title

from (

select *,ROW_NUMBER() over

(PARTITION by EmployeeID

 order by EmployeeID)

 as RowNumber from Employees_Stg

) x

 where RowNumber=1

Now our package is ready to run.

Read more about : how to delete the duplicate records fromthe table.

https://bageshkumarbagi-msbi.blogspot.com/2015/11/deleting-duplicate-records-from-table.html

Data in the table before run 

Now running the package.  

See the records in the table.  

Records staging and main table. 

Six records are duplicates.

Benefit of this approach  is like suppose we want add any new file or removing  then we not need to modify the package.

 Adding a new file in source. 

Data in file

Now running the package. We are not making any changes in this package.  

Data loaded successfully. 

Now we are remove one file.  

Executing the package. 

Data in the table.  


Data loaded successfully for all three files without any changes in the package.

No comments:

Post a Comment

If you have any doubt, please let me know.

Popular Posts