Sunday 10 September 2017

Difference between Merge and Union All Transformation in SSIS

Below are the major difference between Merge and Union All transformation.
Ø  Merge Transformation always accepts sorted input data. For union all no need to sort input.
Ø  Merge Transformation only accepts two inputs.  But Union all accepts multiple inputs.
Ø  Merge transformation will produce sorted output but union all give unsorted data.
Let’s see the example.

Merge Transformation

 Taking Data flow task.

Double click on the data flow task.
In Data flow, I am taking two input source and configuring both sources.

Now I am taking Marge Transformation.
Now I am taking Merge Transformation and merging both sources into merge transformation

It is throwing an error.
It asks for sorting.
Now I am using sorting transformation. Let’s see.


Configure the sort transformation. And also configure the Merge transformation.

I am taking Multicast transformation to see the result.
Now I am executing this package.

See the data. It is in sorted order.
Let’s see if we use more than two inputs in Merge Transformation.
If both sources are the databases it will be better to sort the data in the database itself. It will be faster because all we know that Sort transformation is full blocking transformation. When we short data in the database also we getting the same error to sort data before Merge transformation. To overcome that issue we need to enable the IsSorted properties in the oledb source.



When we adding 3rd input in Merge transformation we are getting below error.
We can use only two inputs for the Merge transformation.
  

Union All Transformation

It will take multiple inputs. It accepts unsorted inputs.
Taking Data flow task

Double click on the data flow task. Configuring the inputs source.

No need to sort input data.
Now I am executing this packing. See the output.


Not sorted output.

Difference between For Loop and For Each Loop

For Loop

A for loop will execute the tasks a specified number of times, in other words, 10 times, or 25 times, and the number of times is specified in the definition of the container. You can use a variable to specify what that count is.
For Loop is used for looping through a number of tasks for a set number of times.
Simply :-> initialize, verify condition, increment\assign

For Each Loop

A for each loop will execute once for each item in the collection of items that it is looking at. A good example would be if users are putting an Excel file into a directory for import into the DB. We cannot tell ahead of time how many will be in the directory because a user might be late, or there might be more than one file from a given user. When we define the Foreach container, we would tell it to execute for each *.xls in the directory and it will then loop through, importing each one individually, regardless of how many files are actually there.
Foreach Loop loops through various things such as files, objects, ADO connections, etc.
Above screen shot show we have task configured for “Foreach file enumerator” which can be used to move\copy\delete file (of given file) in a given folder
Enumerator configuration:
Folder: Specify the folder path from where we need to process the files
Flies: File name (format) \ file extension
Fully qualified: File name along with location is returned.Eg:  C:\Example.txt
Name and extension: The file name with its extension is returned. Eg: Example.txt
Name only: The file name without its extension is returned. Eg: Example
Variable Mapping
The result of the for loop will be assigned to the variable and its value can be used further.
Foreach File Enumerator: Enumerates files in a folder
Foreach Item Enumeration: Enumerates items in a collection, such as the executables specified in an Execute Process task.
Foreach ADO Enumerator: Enumerates rows in a table, such as the rows in an ADO recordset.
Foreach ADO.NET Schema Rowset Enumerator: Enumerates schema information about a data source.
Foreach from Variable Enumerator: Enumerates a list of objects in a variable, such as an array or ADO.NET DataTable.
Foreach NodeList Enumeration: Enumerates the result set of an XML Path Language (XPath) expression.

Foreach SMO Enumerator: Enumerates a list of SQL Server Management Objects (SMO) objects, such as a list of views in a database.

Different types of Control Flow components in SSIS

The different types of Control Flow components are:
Ø  Data Flow Tasks
Ø  SQL Server Tasks
Ø  Data Preparation Tasks
Ø  Work flow Tasks
Ø  Scripting Tasks
Ø  Analysis Services Tasks
Ø  Maintenance Tasks
Ø  Containers

Data Flow Tasks

 The Data Flow task encapsulates the data flow engine that moves data between sources and destinations and lets the user transform, clean, and modify data as it is moved. Addition of a Data Flow task to a package control flow makes it possible for the package to extract, transform, and load data.

SQL Server Tasks

Bulk Insert Task

This task offers an efficient way to copy large volumes of data.


   

Execute SQL Task

This task allows the execution of a SQL statement. If the statement returns results, they can be stored in a variable.

 

Transfer Database Task

 This task will copy or move a SQL Server database between two instances of SQL Server. It can even be used to make a copy of a database on the same server. Databases can be copied either online or offline.


Transfer Error Messages Task

 This task will transfer a single or multiple SQL Server user defined error messages between SQL Server instances. It can be setup to transfer specific user messages or all error messages.


Transfer Jobs Task

This task will transfer a single or multiple SQL Server Agent jobs between SQL Server instances.


Transfer Logins Task

 This task will transfer a single or multiple SQL Server logins between SQL Server instances.


Transfer Master Stored Procedures Task

 This task will transfer a single or multiple SQL Server Master Database stored procedures between SQL Server instances.


Transfer SQL Server Objects Task

This task will transfer a single or multiple SQL Server database objects between SQL Server instances. Most of SQL Servers DDL objects can be copied with this task.


Data Preparation Tasks

File System Task

This task allows the user to copy/move/delete files and directories on a file system.


FTP Task

This task allows the user to copy/move/delete files and directories over FTP.


Web Service Task

This task allows the user to execute a Web service method and store the results


XML Task

 This task is used to work with XML data. XSLT can be used along with XPath to validate, compare and merge documents. The results of this can then be stored.

Data Profiling Task

This task can be used for checking and validating data quality. Profiles can be set up and checked for various data quality issues such as column length issues, column patterns, column statics, etc.
     

Work flow Tasks

  • Execute Package Task: This task will run other SQL Server Integration Services packages.
  • Execute Process Task: This task will execute an application or batch file.
  • Message Queue Task: This task allows you to send and receive messages between SSIS packages, or to send messages to an application via an application queue. This task uses Message Queuing (MSMQ)
  • Send Mail Task: This task allows for email messages to be created and sent using an SMTP server.
  • WMI Data Reader Task: This task allows a Package to query, using WQL, computer systems (local and remote) for information regarding that computer.
  • WMI Event Watcher Task: This task watches for WMI events that have occurred on a computer system, and allows the package to take an action if certain criteria are met.

Scripting Tasks

  • Script Task: This task can be used to program functions that are not available in the standard SSIS tasks or transformations. In SSIS 2005 this task can be programmed in VB.NET. In SSIS 2008 VB .NET and C# can be used to program a Script Task.

SQL Server Maintenance Tasks

  • Back Up Database Task: This task will allow you to backup a one or many SQL Server databases.
  • Check Database Integrity Task: This task will allow you to check the integrity of all the objects in one or many SQL Server databases.
  • Execute SQL Server Agent Job Task: This task allows for the execution of a SQL Server Agent job.
  • Execute T-SQL Statement Task: This task is similar to the Execute SQL Task, however, it only supports Transact SQL Statements. It should be used for SQL Server specific SQL statements.
  • History Cleanup Task: This task allows for the cleanup of historical activity data. It will clean up the history for database maintenance plans, backup activities, restore activities and SQL Server agent jobs.
  • Maintenance Cleanup Task: This task allows for the cleanup of backup files, and the reports of maintenance plans.
  • Notify Operator Task: This task allows SSIS to notify SQL Server Agent operators. They can be notified by email, pager, or net send.
  • Rebuild Index Task: This task will rebuild an index or indexes on one or many databases.
  • Reorganize Index Task: This task will reorganize an index or indexes on one or many databases.
  • Shrink Database Task: This task will shrink the size of the SQL Server database data and database log files.
  • Update Statistics Task: This task will update the statistics for one of many tables in one or many databases.

Analysis Services Tasks

  • Analysis Services Execute DDL Task: This task will run data definition language statements on Analysis Services. This allows for the creation, drop, alter of cubes, dimensions and mining models.
  • Analysis Services Processing Task: This task will process Analysis Services Cubes, Dimensions, and Mining Models.
  • Data Mining Query Task: This task will run a DMX (Data Mining Extensions) query that creates a prediction based on new data that is run against an Analysis Services data mining model.
  • Containers
  • Others


Load XML file in SSIS

Today we will learn how to load the XML data in SQL Server using SSIS.
Open the SSDT.
Take data flow task

Double click on data flow task.
In data flow, I am taking the source as XML Source

Double click on the XML Source

XML Source Editor will be open
In Connection Manager, we need to select the Data Access Mode.

Here I am select the XML file from the location.
This is my XML File

Now browse the file

If we don’t have the XSD file we need to generate the file.
Click on the Generate XSD button.

It will generate the XSD file. We need to save this file.

Save it.
Below XSD file generated

Now we need to browse this file

Now go to the Column table

When we select Column table we will get the below warning

Click ok.
The above warning stems from having the XSD file define string based columns which do not have a minimum and maximum length noted for that particular element within the XSD. This issue can be alleviated by adding text similar to the below notation to your string columns. To avoid this warning we need to add the min and max length in XSD file.

All column is mapped

Click ok.
Now we need to configure the Error output option.

Click on Ok.
XML source is configured

Now I am taking destination as OLEDB destination.


Configuring the OLEDB destination.

Now mapping the columns.

Now the package is ready to run.
No data in table

Now I am running this package.
Package executed successfully.

Now see the result in the database.


Merge transformation in SSIS

SSIS merge is one of the components of SSIS, available in the toolbox. SSIS merge works similar way to a SQL join it merges the two or more different sources (Sources can be of same type or different type / heterogeneous) into a single output. These two sorted data-sets can be anything like Flat file/Excel workbook/Relational table etc. SSIS is all about collecting data from different sources, if we want to merge the collected data from different sources then we can use merge component. Merge component accepts only 2 sorted (compulsory) inputs. If there are more than 2 inputs then it best to use Union All transformation component. Also, Merge transformation has only 1 output and does not have any Error output.

Taking Data flow task.

Double click on the data flow task.
In Data flow, I am taking two input source and configuring both sources.

Now I am taking Marge Transformation.
Now I am taking Merge Transformation and merging both sources into merge transformation

It is throwing an error.
It asks for sorting.
Now I am using sorting transformation. Let’s see.


Configure the sort transformation. And also configure the Merge transformation.
  
I am taking Multicast transformation to see the result.
Now I am executing this package.

See the data. It is in sorted order.
Let’s see if we use more than two inputs in Merge Transformation.
If both sources are the databases it will be better to sort the data in the database itself. It will be faster because all we know that Sort transformation is full blocking transformation. When we short data in the database also we getting the same error to sort data before Merge transformation. To overcome that issue we need to enable the IsSorted properties in the oledb source.

  
When we adding 3rd input in Merge transformation we are getting below error.

We can use only two inputs for the Merge transformation.


Popular Posts