One of the most valuable control flow task is Data Flow task. It has 3 types of objects.
1.)Data Flow Source Adapters: ado.net source(Provides connections to tables or queries through ADO.NET Provider),excel source,flat file source, ole db source(connects to installed OLE DB providers, such as SQL Server,Access,SSAS and Oracle), raw file source,xml source.
You can also define more than one data source.
2.)Data Flow Transformations:Transformations are used to modify the data so that it can be standardized.
SQL Server Integration Services provides a wide variety of transformations to meet an organization’s requirements.
Each transformation contains different properties to control how the data is changed
3.)Data Flow Destination Adapters: You then define data destinations in which the transformed data is loaded into.
Similar to Data Flow sources and have matching destination adapters.Apart from Source adapters other destination adapters are Data Mining Model Training,DataReader Destination, Dimension Processing,Partition Processing, RecordSet DestinationSQL Server Compact Destination, SQL Server Destination.
To use the destination optimization technique to have fast load and not just one row at a time to be inserted into the destination table set data access mode to Table or View- Fast Load.
DATA FLOW PATHS:
Data Flow paths can be simply used to connect a data source directly to a data destination. Typically, you use a Data Flow path to determine the order in which a transformation takes place; specifying the path that is taken should a transformation succeed or fail.
2 Paths:
1.)Success DF Path:green arrow
2.)Failure DF Path:red arrow..You are not required to configure error output Data Flow paths for every Data Flow component.
DATA VIEWERS:
A data viewer is a useful debugging tool that enables you to view the data as it passes through the data flow between two data flow components.
The different types of viewers are Grid, Histogram, Scatter Plots and column Charts.
Data viewers can be added by right clicking the data flow path.
You can use a data viewer only when running a package in the debug environment. If you have an error path configured in the package, it will be ignored when you run the package progrmmatically or from the command line.
WORKING WITH DATA TRANSFORMATIONS:
Transformations give the ability to manipulate the data in the data flow. It allows you to change the data as the data is being moved from a source connection to a destination connection such as a text file to a table within a database.
1.) ROW LEVEL TRANSFORMATIONS:
Transformations provided operations on rows without the need of other rows from sources.
Uses of this transformation includes..Performing mathematical calculations, converting text value to numerical etc or replacing null values.
The transformations in this category are:
Character Map, copy column, data conversions, derived column, row count, sort, aggregate(sum, group by) etc.
Note:The Cache transformation can improve the performance of the Lookup transformation.
2.)MULTI INPUT OR MULTI OUTPUT TRANSFORMATIONS:
It can work with more than 1 data input or generate more than 1 data output.
They include:
Conditional Split:
The Conditional Split transformation takes a single data flow input and creates multiple data flow outputs based on multiple conditional expressions defined within the transformation. The order of the conditional expression is important. If a record satisfies the first condition, the data is moved based on that condition even if it meets the condition of the second expression. There, the record will no longer be available to be evaluated against the second condition. Expression can be a combination of functions and operators to define a condition.
Lookup:
Allows matching between pipeline column values to external database tables.
Merge Join:
The Merge Join transformation is similar to the Merge transformation. However, you can make use of the following Transact-SQL clauses to determine how the data is merged. The Transact-SQL clauses include FULL, LEFT or INNER join. Like the Merge transformation, the input columns must be sorted and the columns that are joined must have compatible data types.
Merge:
The Merge transformation takes multiple inputs into the transformation and merges the data together from the separate inputs. A prerequisite to the merge input working successfully is that the input columns are sorted. Furthermore, the columns that are sorted must also be of compatible data types. For example, you cannot merge the input that has a character data type with a second input that has a numeric data type.
Multicast:
The Multicast transformation allows you to output multiple copies of the same data flow input to different data flow outputs.
Union All:
The Union All transformation is very similar to the Merge transformation. The key difference is that the Union All transformation does not require the input columns to be sorted. However, the columns that are mapped must still have compatible data types.
Note: Merge and Merge Join require data to be sorted before.
The Merge transformation is similar to the Union All transformations. Use the Union All transformation instead of the Merge transformation in the following situations:
The transformation inputs are not sorted.
The combined output does not need to be sorted.
The transformation has more than two inputs.
3.)ADVANCED DATA PREPARATION TRANSFORMATION:These let you perform advanced operations on the rows in the data flow pipeline.
OLE DB Command transformation:
Performs database operations such as updates, deletes, one row at a time based on mapped parameters from input rows.
Slowly Changing Dimensions:
Processes Dimension changes including tracking dimension history and updating dimension values. It handles common dimension changes like Historical Attributes(2 rows are created for 1 record), Fixed Attributes and Changing attributes(replaces the row data).
Data Mining Query
Applies input rows against a data mining model for prediction.
Fuzzy Grouping
The Fuzzy Group transformation allows you to standardise and cleanse data by selecting likely duplicate data and comparing it to an alias row of data that is used to standardise the input data. As a result, a connection is required to SQL Server, as the Fuzzy Group transformation requires a temporary table to perform its work.
Fuzzy Lookup
The Fuzzy Lookup transformation performs data cleansing tasks such as standardizing data, correcting data and providing missing values.
Using the fuzziness capability that is available to the Fuzzy Grouping transformation, this logic can be applied to Lookup operations so that it can return data from a dataset that may closely match the Lookup value required. This is what separates the Fuzzy Lookup transformation from the Lookup transformation, which requires an exact match. Note that the connection to SQL Server must resolve to a user who has permission to create tables in the database.
Script Component
Provides VB.NET scripting capabilities against Rows ,Coulmns, Inputs and Outputs.
Term Extraction
Analysis text input for eng nouns and noun phrases
Term Lookup
Analysis the text input column for user defined set of words for association.
No comments:
Post a Comment