Friday, 28 December 2018

Fuzzy Grouping Transformation in SSIS

We are getting the data from a different source from the different vendor. We need to load that data into the repo table. Before loading the data we need to clean that data because sometimes we are getting some missing character or missing word. We need to load the correct data into the repo table.
The Fuzzy Grouping Transformation in SSIS is used to replace the wrongly typed words with correct words. This transformation is the same as the Fuzzy Lookup Transformation but Fuzzy Grouping Transformation does not require any reference table to correct the data. It will use the grouping technique to check for the wrongly typed words (type mistakes) and correct them.
Let’s learn how to implement Fuzzy Grouping Transformation in our package.
I have a customer table which is going to use in this demo.
CREATE TABLE CUSTOMER
(C_ID int identity(1,1),
C_Name VArchar(50),
C_Add Varchar(100)
)


Below flat file, we are going to load in this table.
   

Now I am taking the data flow task.
  
 
Taking source as a flat file.

  
And configuring the flat file configuration.

 
Double click on this transformation and configure the oledb connection.
  
Now go to the columns tab
 

In the column table, we will see below the extra column
·         Match type
·         Minimum similarity
·         Similarity output alias
·         Numerals
·         Comparison flags
Go to the Advance table
  

Input key column name: Provide the Unique name for Input Key Column. key_inis the default name.
Output key column name: Provide the Unique name for Output Key Column. key_Outis the default name.
Similarity score column name: Provide the Unique name for the similarity score column. _score is a value between 0 and 1. This will indicate the similarity of the input row to the canonical row. Here I am setting .6 (60%)
Now taking OLEDB destination doing the OLEDB configuration and mapping the destination table column.  
Click ok.
Before running the package I am enabling the data viewer to view the result before loading this value to the table.
  
See all wrong name are corrected and loaded it on the database.

 


At runtime, the Fuzzy Grouping transformation creates temporary objects such as tables and indexes, potentially of significant size, in the SQL Server database that the transformation connects to. The size of the tables and indexes are proportional to the number of rows in the transformation input and the number of tokens created by the Fuzzy Grouping transformation.
The transformation also queries the temporary tables. We should, therefore, consider connecting the Fuzzy Grouping transformation to a non-production instance of SQL Server, especially if the production server has limited disk space available.


No comments:

Post a Comment

If you have any doubt, please let me know.

Popular Posts