Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

SSIS Transformations

Most transformations have one input, one regular output, and one error output. Some transformations
have multiple inputs to consolidate data from separate pipelines, and some transformations have
multiple outputs to separate data into separate pipelines. Not all transformations have error output.

The types of transformations in the pipeline affect the data flow engine’s management of memory
buffers. Row transformations, such as Data Conversion or Derived Column, are very efficient at
reusing existing buffers. Multiple rows are received into the buffer from an upstream component in
batches, data in the buffer is manipulated on a row-by-row basis, and then these rows in the buffer
are operated on by the next component. This process continues until all rows from the source have
been transformed.

Other transformation types must create new buffers to store output before allowing the next
component access to the data and, consequently, place the greatest demands on system resources.
Transformations operating on rowsets, such as Aggregate or Sort, must read all input before creating
any output. Transformations that combine input, such as Merge or Union All, are somewhere in
between the other transformation types, reading some input before creating output, but they do copy
that output to a different buffer and thus have greater resource requirements. The following section
describes the transformations that SSIS includes.

Row Transformations

The following transformations update column values and create new columns. The transformation is
applied to each row in the transformation input.

 Character Map This transformation maps characters in a string into different characters—for
example, a lowercase-character mapping changes ASCII 64 (A) into ASCII 97 (a). It
overwrites the column with the new string or adds a column to the output row.

o Full-width <-> half-width characters

o Hiragana <-> Katakana characters

o Simplified Chinese <-> Traditional Chinese

o Lowercase

o Uppercase

o Reverse byte order (order of the bytes in each character reversed)

o Apply linguistic casing (simple case mapping for Turkic and other locales)

 Copy Column This transformation creates new columns in the output row from existing
columns in the input row for subsequent transformation.

 Data Conversion This transformation applies data conversion functions to change data type,
column length, or precision and scale to an input column and adds a column with the new data
to the output row.
 Derived Column This transformation applies expressions to an input column to create a new
value for a new column in the output row or as a replacement value for the existing column.

 Script Component This transformation runs custom code against data in the pipeline.

 OLE DB Command This transformation runs a SQL statement (optionally parameterized) for
each row in the pipeline.

Rowset Transformations

The following transformations create new rowsets. The rowset can include aggregate and sorted
values, sample rowsets, or pivoted and unpivoted rowsets.

 Aggregate This transformation applies an aggregate function (Sum, Minimum, Maximum, and
so on) to grouped records and produces new output records from aggregated results. Output
records contain only the columns used for grouping and the aggregated values.

 Sort This transformation applies an ascending or descending sort to one or more columns and
optionally removes rows with duplicate values in the sort columns.

 Percentage Sampling This transformation creates a random sample set of output rows by
selecting a specified percentage of input rows, commonly used for data mining.

 Row Sampling This transformation creates a random sample set of output rows by selecting
a specified number of input rows, primarily used for testing packages with a subset of
representative data.

 Pivot This transformation converts rows into columns. For example, suppose your input rows
contain CustomerName, Product, and Quantity. If you pivot on Product, the resulting output
will have CustomerName and one column for each product. The output product columns will
contain the values that were in the input Quantity column.

 Unpivot This transformation converts columns into rows. For example, suppose your input
has a CustomerName column and a column for each product your company sells. The product
columns contain the quantity that the customer purchased. If you unpivot on the product
columns, your output row will have CustomerName, Product, and Quantity rows. The Product
column will contain the names of the product columns in the input rows.

Split and Join Transformations

The following transformations distribute rows to different outputs, create copies of the transformation
inputs, join multiple inputs into one output, and perform lookup operations.

 Conditional Split This transformation separates input rows into separate output pipelines
based on a Boolean expression configured for each output. A single input row is passed to the
first output row for which the condition is true or to a default output defined for rows that
meet no conditions.

 Multicast This transformation copies all input rows to two or more outputs.
 Merge In this transformation, input rows are interleaved. Merges combine two sorted datasets
with the same column structure into a single output. The output will be sorted in the same
order as the two inputs.

 Merge Join In this transformation, input rows are combined side by side. Merge Join
combines two sorted datasets, using a FULL, LEFT, or INNER join.

 Union All In this transformation, input rows are stacked. Union All combines two or more
datasets with the same column structure into a single output.

 Lookup This transformation joins input rows with columns in a reference dataset (from a
table, view, or SQL statement) to add one or more columns to the output row.

Data Quality Transformations

The following transformations perform data quality operations such as standardizing values and
looking up values in a reference table.

 Fuzzy Lookup This transformation finds close or exact matches between two or more
columns (DT_WSTR and DT_STR data types only) in the input row and a row in a reference
table and adds selected columns from the matched row and columns for fuzzy matching
metrics.

 Fuzzy Grouping This transformation finds close or exact matches between input rows based
on one or more columns and adds columns to the output, identifying matches and similarity
scores.

Data-Mining Transformations

The following transformations perform data-mining operations:

 Data-Mining Query Uses data from the input rows to execute prediction queries on a SQL
Server Analysis Services data-mining model and creates output rows from the query result
set.

 Term Extraction Extracts nouns or noun phrases (or both) from an input column containing
English text, places the extracted terms in a column in the output row, and adds another
column for the score. Multiple output rows per input row are created when multiple terms are
extracted from an input column.

 Term Lookup Matches text in an input column with a reference table, counts the number of
occurrences of the term in the dataset, and adds a term and frequency column to the output
row. As with Term Extraction, multiple output rows per input row can be created.

Other Transformations

SSIS includes the following transformations to export and import data, add audit information, count
rows, and work slowly with changing dimensions.
 Export Column This transformation reads a file name from an input column and inserts data
from another column in the same row into the specified file.

 Import Column This transformation reads a file name from an input column and inserts data
from the specified file into a new output column for the same row.

 Audit This transformation adds the value of a system variable, such as MachineName or
ExecutionInstanceGUID, to a new output column.

 Row Count This transformation counts the number of rows currently in the data flow and
stores the value in a variable.

 Slowly Changing Dimension This transformation manages inserts and updates in a data
warehouse for Type 1 and Type 2 changes.

Synchronous and Asynchronous Transformations

The following transformations determine how rows are processed.

 Synchronous transformations These transformations process data row by row. They


manipulate data that already exists in a memory buffer. Transformations that use data already
contained in the row are the fastest (for instance, Derived Column). Transformations that
process data row by row but must interact with an external data source will be slower
(Lookup, Object Linking and Embedding (OLE) DB Command).

 Asynchronous transformations These transformations cause SSIS to copy and manipulate


data from one memory buffer to another. Asynchronous transformations potentially have more
or fewer output rows than input rows.

 Partially blocking transformations These transformations join datasets (Merge) or create a


dataset that is dissimilar to the source dataset. (Pivot output has different columns from its
input.) Data flows through these transformations more slowly than it flows through non-
blocking transformations.

 Blocking transformations These transformations cannot send a row on to the next


transformation until all of the input rows have been read.

Using Expressions in Packages


An expression is a combination of symbols—identifiers, literals, functions, and operators—that yields a
single data value. Simple expressions can be a single constant, variable, or function. More frequently,
expressions are complex, using multiple operators and functions and referencing multiple columns and
variables. The key information to remember is that expressions always return a single value.

Expression Usage in SSIS

In SSIS, expressions can be used to implement a decision-based condition, create and update values
in data columns, assign values to variables, define constraints in precedence constraints, and provide
the expressions used by the For Loop container. This section describes the elements in SSIS that can
use expressions.
Conditional Split Transformation
The Conditional Split transformation implements a decision structure based on expressions to direct
data rows to different destinations. Expressions used in a Conditional Split transformation must
evaluate to true or false. For example, rows that meet the condition in the expression “Column1 >
Column2” can be routed to a separate output.

Derived Column Transformation


The Derived Column transformation uses values created by using expressions either to populate new
columns in a data flow or to update existing columns. For example, the expression Column1 + “ABC”
can be used to update a value or to create a new value with the concatenated string.

Variables
Variables can be referenced in expressions. A SSIS package supports the following two types of
variables:

 System variables These variables contain package, task, and transformation metadata such
as name, ID, start time, version, creation date, machine name, user name, and so on.

 User variables These variables are created by the package designer and can contain any
desired data stored in a variety of data types.

Expressions reference variables using @[System::VariableName] or @[User::VariableName] syntax.

Precedence Constraints
Precedence constraints can use expressions to specify the conditions that determine whether the
constrained task or container in a package runs. Expressions used in a precedence constraint must
evaluate to true or false. For example, the expression @[User::A] > @[User::B] compares two user-
defined variables to determine whether the constrained task runs.

The For Loop Container


The For Loop container repeats a set of tasks until an expression evaluates to false. The For Loop
container uses the following three expressions:

 The first expression initializes the variable.

 The second expression determines whether the set of tasks should be repeated.

 The third expression determines how the variable should be changed each time the set of
tasks is repeated.

Expressions Elements

Expressions are based on an expression language, which is similar to C or C#. This language includes
the following three elements:

 The expression grammar Defines expression syntax and the operators, functions, and data
types available for use in expressions
 The expression evaluator Parses expressions and determines whether expressions adhere
to the rules of the expression grammar

 The expression builder Provides a list of available data columns and system and user
variables and a list of functions and operators that is used to build expressions

More Info

You can learn more about the language on which expressions are based in the SQL Server Books
Online article, “Integration Services Expression Concepts” at the following URL:
http://msdn2.microsoft.com/en-us/library/ms141827.aspx.

Building Expressions

The expression builder is a graphical tool for building expressions and is available in the dialog boxes
for the Conditional Split and Derived Column transformations. To build expressions, you can drag and
then drop items from either list onto the expression column. The expression builder automatically adds
needed syntax elements, such as the @ prefix, on variable names.

You might also like