We are getting the data from 4 sources CSV Files. Some time we are getting the same records in more than one file. Our business wants to load the unique records in the main table. Basically we need to remove the duplicate data and load the unique records.
Files in folder.
Below are the files data.
Below is the table.
Staging table
CREATE TABLE Employees_Stg( EmployeeID int , LastName varchar(50) NOT NULL, FirstName varchar(50) NOT NULL, Title varchar(50) NULL, ) |
CREATE TABLE Employees( EmployeeID int NOT NULL primary key, LastName varchar(50) NOT NULL, FirstName varchar(50) NOT NULL, Title varchar(50) NULL, ) |
1.
We will read all files and merging the data
using union all transformation after that we will use sort transformation and
remove the data and load the unique records into the destination table.
2.
Other way like we load the all files in the
stage table using the Foreach each loop container after that we will take the execute SQL Task
and write the sql script to filter the duplicate records and loading the unique
records in the destination table.
In this demo we will see the second way to load the data.
Taking one variable to store the file name.
Taking the Foreach loop container to iterate the files and loading these files data into the staging table.
Storing file name in the variable.
Inside the Foreach loop container we are taking data flow task. Taking Source file as the Flat file and creating the connection string as dynamic.
Setting the Delay validation property as True.
Read more
about Delay validation property in SSIS.
https://bageshkumarbagi-msbi.blogspot.com/2016/09/delay-validation-properties-in-ssis.html
Loading the data into the staging table.
After loading the data into the stage we are taking the Execute Sql task and wring the below script to load the data into the main table.
Below is the sql script.
insert into Employees ( EmployeeID, LastName, FirstName, Title ) select EmployeeID, LastName, FirstName, Title from ( select *,ROW_NUMBER() over (PARTITION by EmployeeID order by EmployeeID) as RowNumber from Employees_Stg ) x where RowNumber=1 |
Now our package is ready to run.
Read more about : how to delete the duplicate records fromthe table.
https://bageshkumarbagi-msbi.blogspot.com/2015/11/deleting-duplicate-records-from-table.html
Data in the table before run
Now running the package.
See the records in the table.
Records staging and main table.
Six records
are duplicates.
Benefit of this approach is like suppose we want add any new file or
removing then we not need to modify the
package.
Data in file
Now running the package. We are not making any changes in this package.
Data loaded successfully.
Now we are remove one file.
Executing the package.
Data in the table.
Data loaded successfully for all three files without any changes in the package.
No comments:
Post a Comment
If you have any doubt, please let me know.