A buffer is a delimited memory zone used by SSIS to hold the data that it is dealing with during the execution of a package. Its metadata is invariable once the design is complete. In other words, its metadata (fields, data types, etc) cannot be changed at execution time. However, its content can be. The process of manipulating the data inside the buffers is the job of SSIS transformations.
Since SSIS buffer addresses are fixed in memory, using them is extremely fast. There is no need to either copy information between buffers or to re-allocate the buffers themselves.
Since SSIS buffer addresses are fixed in memory, using them is extremely fast. There is no need to either copy information between buffers or to re-allocate the buffers themselves.
Use of buffers by SSIS transformation type
Row-by-row transformations
Rows are processed as they enter the component, thus, there is no need to accumulate data. Because it is able to use buffers previously created (by preceding components/precedents), it’s not necessary to create new ones and copy data into them. Examples: Data Conversion, Lookup, Derived Column, etc.
Partially blocking transformations
Partially blocking transformations
These are usually used to combine data sets. Since there is more than one data entry, it is possible to have huge amounts of rows waiting, stored in memory, for the other data set to reach the component. In these cases, the component’s data output is copied to new buffers and new execution threads may be created. Examples: Union All, Merge Join, etc.
Fully blocking transformations
Some transformations need the complete data set before they start running. Therefore, these are the ones that impact on performance the most. In these cases, as well, new buffers and new
execution threads are created. Examples: Aggregate, Sort. SSIS reuses previously used buffers as much as possible, in order to increase performance. Row-by-row transformations are known as synchronous. Each input row produces one output row. On the other hand, in partially-blocking and fully-blocking transformations, known as asynchronous, there is no need to have the same number of input rows as output rows (they need no output rows at all).
execution threads are created. Examples: Aggregate, Sort. SSIS reuses previously used buffers as much as possible, in order to increase performance. Row-by-row transformations are known as synchronous. Each input row produces one output row. On the other hand, in partially-blocking and fully-blocking transformations, known as asynchronous, there is no need to have the same number of input rows as output rows (they need no output rows at all).
Buffer sizes
Buffer size is a property, and it is defined by the SSIS engine, which in all cases will try to increase performance, though this will depend on the values of certain configurable properties and on its internal limits:
DefaultMaxBufferRows:
This is a dataflow property. It is configurable, although SSIS defines it by default at 10,000. SSIS multiplies Estimated Row Size times DefaultMaxBufferRows, to estimate how big a buffer would be.
DefaultMaxBufferSize:
Again, a dataflow property. It is fixed to 10MB by default. It is possible to increase it trying to improve performance, but there’s an upper threshold of 100MB, known as MaxBufferSize, which cannot be changed.