Professional Documents
Culture Documents
SSIS Transformations
SSIS Transformations
Most transformations have one input, one regular output, and one error output. Some transformations
have multiple inputs to consolidate data from separate pipelines, and some transformations have
multiple outputs to separate data into separate pipelines. Not all transformations have error output.
The types of transformations in the pipeline affect the data flow engine’s management of memory
buffers. Row transformations, such as Data Conversion or Derived Column, are very efficient at
reusing existing buffers. Multiple rows are received into the buffer from an upstream component in
batches, data in the buffer is manipulated on a row-by-row basis, and then these rows in the buffer
are operated on by the next component. This process continues until all rows from the source have
been transformed.
Other transformation types must create new buffers to store output before allowing the next
component access to the data and, consequently, place the greatest demands on system resources.
Transformations operating on rowsets, such as Aggregate or Sort, must read all input before creating
any output. Transformations that combine input, such as Merge or Union All, are somewhere in
between the other transformation types, reading some input before creating output, but they do copy
that output to a different buffer and thus have greater resource requirements. The following section
describes the transformations that SSIS includes.
Row Transformations
The following transformations update column values and create new columns. The transformation is
applied to each row in the transformation input.
Character Map This transformation maps characters in a string into different characters—for
example, a lowercase-character mapping changes ASCII 64 (A) into ASCII 97 (a). It
overwrites the column with the new string or adds a column to the output row.
o Lowercase
o Uppercase
o Apply linguistic casing (simple case mapping for Turkic and other locales)
Copy Column This transformation creates new columns in the output row from existing
columns in the input row for subsequent transformation.
Data Conversion This transformation applies data conversion functions to change data type,
column length, or precision and scale to an input column and adds a column with the new data
to the output row.
Derived Column This transformation applies expressions to an input column to create a new
value for a new column in the output row or as a replacement value for the existing column.
Script Component This transformation runs custom code against data in the pipeline.
OLE DB Command This transformation runs a SQL statement (optionally parameterized) for
each row in the pipeline.
Rowset Transformations
The following transformations create new rowsets. The rowset can include aggregate and sorted
values, sample rowsets, or pivoted and unpivoted rowsets.
Aggregate This transformation applies an aggregate function (Sum, Minimum, Maximum, and
so on) to grouped records and produces new output records from aggregated results. Output
records contain only the columns used for grouping and the aggregated values.
Sort This transformation applies an ascending or descending sort to one or more columns and
optionally removes rows with duplicate values in the sort columns.
Percentage Sampling This transformation creates a random sample set of output rows by
selecting a specified percentage of input rows, commonly used for data mining.
Row Sampling This transformation creates a random sample set of output rows by selecting
a specified number of input rows, primarily used for testing packages with a subset of
representative data.
Pivot This transformation converts rows into columns. For example, suppose your input rows
contain CustomerName, Product, and Quantity. If you pivot on Product, the resulting output
will have CustomerName and one column for each product. The output product columns will
contain the values that were in the input Quantity column.
Unpivot This transformation converts columns into rows. For example, suppose your input
has a CustomerName column and a column for each product your company sells. The product
columns contain the quantity that the customer purchased. If you unpivot on the product
columns, your output row will have CustomerName, Product, and Quantity rows. The Product
column will contain the names of the product columns in the input rows.
The following transformations distribute rows to different outputs, create copies of the transformation
inputs, join multiple inputs into one output, and perform lookup operations.
Conditional Split This transformation separates input rows into separate output pipelines
based on a Boolean expression configured for each output. A single input row is passed to the
first output row for which the condition is true or to a default output defined for rows that
meet no conditions.
Multicast This transformation copies all input rows to two or more outputs.
Merge In this transformation, input rows are interleaved. Merges combine two sorted datasets
with the same column structure into a single output. The output will be sorted in the same
order as the two inputs.
Merge Join In this transformation, input rows are combined side by side. Merge Join
combines two sorted datasets, using a FULL, LEFT, or INNER join.
Union All In this transformation, input rows are stacked. Union All combines two or more
datasets with the same column structure into a single output.
Lookup This transformation joins input rows with columns in a reference dataset (from a
table, view, or SQL statement) to add one or more columns to the output row.
The following transformations perform data quality operations such as standardizing values and
looking up values in a reference table.
Fuzzy Lookup This transformation finds close or exact matches between two or more
columns (DT_WSTR and DT_STR data types only) in the input row and a row in a reference
table and adds selected columns from the matched row and columns for fuzzy matching
metrics.
Fuzzy Grouping This transformation finds close or exact matches between input rows based
on one or more columns and adds columns to the output, identifying matches and similarity
scores.
Data-Mining Transformations
Data-Mining Query Uses data from the input rows to execute prediction queries on a SQL
Server Analysis Services data-mining model and creates output rows from the query result
set.
Term Extraction Extracts nouns or noun phrases (or both) from an input column containing
English text, places the extracted terms in a column in the output row, and adds another
column for the score. Multiple output rows per input row are created when multiple terms are
extracted from an input column.
Term Lookup Matches text in an input column with a reference table, counts the number of
occurrences of the term in the dataset, and adds a term and frequency column to the output
row. As with Term Extraction, multiple output rows per input row can be created.
Other Transformations
SSIS includes the following transformations to export and import data, add audit information, count
rows, and work slowly with changing dimensions.
Export Column This transformation reads a file name from an input column and inserts data
from another column in the same row into the specified file.
Import Column This transformation reads a file name from an input column and inserts data
from the specified file into a new output column for the same row.
Audit This transformation adds the value of a system variable, such as MachineName or
ExecutionInstanceGUID, to a new output column.
Row Count This transformation counts the number of rows currently in the data flow and
stores the value in a variable.
Slowly Changing Dimension This transformation manages inserts and updates in a data
warehouse for Type 1 and Type 2 changes.
In SSIS, expressions can be used to implement a decision-based condition, create and update values
in data columns, assign values to variables, define constraints in precedence constraints, and provide
the expressions used by the For Loop container. This section describes the elements in SSIS that can
use expressions.
Conditional Split Transformation
The Conditional Split transformation implements a decision structure based on expressions to direct
data rows to different destinations. Expressions used in a Conditional Split transformation must
evaluate to true or false. For example, rows that meet the condition in the expression “Column1 >
Column2” can be routed to a separate output.
Variables
Variables can be referenced in expressions. A SSIS package supports the following two types of
variables:
System variables These variables contain package, task, and transformation metadata such
as name, ID, start time, version, creation date, machine name, user name, and so on.
User variables These variables are created by the package designer and can contain any
desired data stored in a variety of data types.
Precedence Constraints
Precedence constraints can use expressions to specify the conditions that determine whether the
constrained task or container in a package runs. Expressions used in a precedence constraint must
evaluate to true or false. For example, the expression @[User::A] > @[User::B] compares two user-
defined variables to determine whether the constrained task runs.
The second expression determines whether the set of tasks should be repeated.
The third expression determines how the variable should be changed each time the set of
tasks is repeated.
Expressions Elements
Expressions are based on an expression language, which is similar to C or C#. This language includes
the following three elements:
The expression grammar Defines expression syntax and the operators, functions, and data
types available for use in expressions
The expression evaluator Parses expressions and determines whether expressions adhere
to the rules of the expression grammar
The expression builder Provides a list of available data columns and system and user
variables and a list of functions and operators that is used to build expressions
More Info
You can learn more about the language on which expressions are based in the SQL Server Books
Online article, “Integration Services Expression Concepts” at the following URL:
http://msdn2.microsoft.com/en-us/library/ms141827.aspx.
Building Expressions
The expression builder is a graphical tool for building expressions and is available in the dialog boxes
for the Conditional Split and Derived Column transformations. To build expressions, you can drag and
then drop items from either list onto the expression column. The expression builder automatically adds
needed syntax elements, such as the @ prefix, on variable names.