Tuesday 14 June 2016

Row Sampling transformation in SSIS

The Row Sampling transformation is used to obtain a randomly selected subset of an input dataSet. We can specify the exact size of the output sample, and specify a seed for the random number generator. For example just I want to select the 100 rows from a table for sampling for analysis then we use Row sampling transformation.

Let’s see the example of Row sampling

Open SSDT and take Data flow task

Double click on the Data flow task and the Data flow pane takes source as OLEDB

Now configure the required configuration of OLEDB Source


Click ok.
Now I am taking the Row sampling transformation

Now double click on the Row sampling transformation. We will get the Row Sampling Transformation Editor.
  
Select the number of rows which we want for sampling.
We can give the Sample output name as well as unselected rows (not sample).
Leave the Random seed check is unchecked. If we checked it then always it will show the same records.
Now click on the Columns for selecting the columns.
  
Click ok.
Row sampling transformation configuration is done.
Now I am taking Multicast transformation to see the Row sampling.

Select the Output as sampling selected Output.
If we want to see the row of unselected rows. we take another multicast transformation to see the result.

Click ok.
Now package is ready to run
  
To seeing the result I am using Data viewer

See the result for selected Row sampling the output will be the 50 rows

Package executed successfully.

Note

Row Sampling is full blocking transformation. If your source is OLEDB then it will be better to use sql and get the random rows from the backend .it will save the time. Performance will be increase.
It will good when you are using source as Flat files, CSV files.


1 comment:

If you have any doubt, please let me know.

Popular Posts