Microsoft BI Certification: 2011

Sunday, August 28, 2011

EXTENDING SSAS CUBES

1. )Define attribute hierarchies and attribute relationship to optimize cube design and facilitate data analytics.
2.) Enrich the reporting experience by building a end user layer consisting of KPI’s, action, translations and perspectives.
3.) Use MDX language expressions to define important business metrics not available in fact tables and require custom expressions.

A.) Defining user hierarchies and dimension relationships:
Attribute Relationship:
Notes:
1.)If a dimension is based on Star Schema its source is a single dimension table. Snowflake schemas have multiple dimension tables.
2.)Unlike relational database which is a 2 dimensional structure of rows and columns, a UDM is a multi-dimensional which supports data hierarchies eg Year, Quarter and month levels and the relationships that exists between these attributes.
All dimension attributes are directly or indirectly related to the dimension key attribute with 1:1 or M:1 logical relationships.
By default the UDM is not aware of the logical relationships among attributes. It defies M:1 relationship with all attributes and the dimension key, except with snowflake dimension where it automatically creates separate M:1 relationships among attributes from different tables in the attribute relationship tab in the dimensions.
One must spend time to understand and appropriately define attribute relationships.
3.)Setting up attribute relationships provide the following benefits:
-- Server can optimize the storage of the attribute hierarchies
--SSAS can effectively retrieve and aggregate data.
If there is no attribute relationship the query that groups data by product category will scan the fact data on the fly to summaries the individual products but if the attributes are related the server could optimize the query and use the subcategory subtotals.

Attribute Relationship Properties:
1.) Attribute- Specifies the related attribute name
2.) Cardinality- Cardinality of relationship definition
3.) Name- Relationship name specification
4.) RelationshipType –Flexible (slowly changing dimension) or rigid (does not change with time)
5.) Visible- If False the corresponding member property will not show to the end user.

Creating a New attribute Relationship or creating a new Dimension hierarchy:
1.) Attribute Relationship: Right click the source attribute in the attributes pane (in the Attribute relationship tab for a particular dimension) that is on the one side of the relationship. Make sure the source attribute shows the attribute on the many side and the Related Attribute is the attribute on the one side of the relationship.(Territory CountryTerritory Group). Else drag and drop the many side attribute to the one side attribute in the Attribute relationship pane.
2.) Dimension Hierarchy: Right click the attribute you want to be at the top level and say ‘Start New Hierarchy. Else drag and drop the top level attribute to the Hierarchy tab in the dimension structure.
Important
Note:
1.)Although you can set up the user defined hierarchies without defining the attribute relationships, this should be avoided because it reflect poor design.
2.)Consider hiding attributes that are used in user hierarchies by setting their AttributeHierarchyVisible property to False to prevent redundancy in user exp.

Associating Dimension to User Group:
When a dimension is added the cube designer determines which measure group the dimension is related to by using the table relationship metadata within the DSV on which the cube is based.
The dimension usage tab within the cube designer defines the various relationships that should exists between each measure and dimension within the cube.

Relationship Types:
1.) Regular Dimension- is joined directly to the fact table through a granularity attribute
2.) Fact- When the dimension is based on the fact table used to define the measure group
Eg To see all sales order line items for a given Sales Order, you need to build dimensions directly from the fact table and then join them to the related measure group through a fact relationship.
3.) Referenced- Given dimension is related to fact table through an intermediate dimension.
4.) Many-to-Many- Dimension is related to a given measure group through a intermediate measure group
5.) Data Mining- Defines the relationship between a dimension based on a data mining model and a given measure group.

B.) Creating KPIS's, Actions, Translations and Perspectives:
B1.) KPI:
Organisations use KPI's to gauge their business performance. Eg Sales Profit, Revenue Growth, and growth in Customer base. It shows the health of a company.
SSAS is suitable for KPIs because
1.) Automatic calculation of KPIs- The server automatically calculates the KPI properties as the user slices the cube data. eg a user can drill down to time dimension and cube could calculate the KPI across the hierarchy of year, quarters and month levels.
2.) Time effective-Since the dashboards include different chart and report views that show historical trends, the server is pinged with many queries including large data sets and the page is rendered within seconds.

Expression Based Properties of KPI:
1.)Value
Current Value of the KPI
2.)Goal
What the KPI value should be in a perfect world
Following exp sets it to 40% more than Reseller sales Amount from the previous date period.
1.40*([Date].[Calendar].PrevMember ,[Measures].[Reseller Sales Amount])
The PrevMember function returns the previous member in the hierarchy eg if the level is yr it returns the previous year.
3.)Status-
Indicates how the KPI value is as compared to Goal.

1-Good performance-->
When KPIValue(Reseller Sales")/KPIGoal("Reseller Sales)>=1

0-Acceptable Performance-->
When KPIValue(Reseller Sales")/KPIGoal("Reseller Sales)<1 and When KPIValue(Reseller Sales")/KPIGoal("Reseller Sales)>=.85

-1-Under performance-->
Else the above statements

SSAS provides several standard KPI related functions:
--KPIValue(to retrieve KPI values)
--KPIGoal(to retrieve KPI Goal)
--KPIStatus(to retrieve KPI Status)
--KPITrend(to retrieve KPI Trend)

4.)Trend-
How the KPI is doing over time. Returns -1 to 1. It is equal to the following expression.
(
[Measures].[Reseller Sales Amount]-([Date].[Calender].PrevMember,[Measures].[Reseller Sales Amount])
) /
[Measures].[Reseller Sales Amount]

Additional KPI Properties:
Associated Measure Group
Display folder
Parent KPI- Used to indicate the parent KPI of the current KPI. Useful if the set of KPIs are shown together as they might be in a scorecard.
Current Time Member
Weight
Status Indicator and Trend Indicator- used to indicate the images the client should use to display the values graphically.

Note:
You can use KPI aware client applications such as Excel and SSRS to display KPIs on the report.You can also write MDX statements to directly query the KPI properties by using KPIValue(), KPIGoal(), KPIStatus() and KPITrend()
Eg below statement retries the properties of the Reseller Sales KPI that ave been implemented in the cube sliced by the Product Category by the year 2003....

SELECT { KPIValue('Reseller Sales"),
KPIGoal('Reseller Sales"),
KPIStatus('Reseller Sales"),
KPITrend('Reseller Sales") } ON COLUMNS,
[Dim Product].[Product Category].Members ON ROW
FROM [Adventure Works]
WHERE [Date].[Calender].[Calendar Year].&[2003]

B2.) Implementing Actions:
Actions provide another mechanism by which the users can gather information and take next steps based on the information they find in cubes
Actions can extend the cube in a versatile way eg if a user has drilled down to the lowest level of a dimension eg product dimension and wants to see the individual sales and orders placed for that product. If this info is not stored in a cube one can implement a reporting action that lets the user use a reporting action that displays a report with the required data.

Note:
Actions are defined in a cube but interpreted and initiated in a client app. Not all clients support actions. eg excel 2007 supports actions but SSRS does not. An action relates to an object(eg dimension table particular cell) in the cube.

Action Types:
Define action types to integrate the cube with client applications. They inform the client applications how to interpret the action.
1.)Regular Action Type
--Dataset-action content is a MDX statement
--Proprietary- action is client specific. Client is responsible for interpreting the semantic meaning of action.
--Rowset-Action content is a command statement. Unlike dataset, a rowset action targets any OLE-DB compliant data source, including relational db
--Statement-Content type is an OLE-DB command. The statement should not yeild any result except success or failure
--URL(default)- action content is a URL
a.)Used to construct a URL that displays a reporting server report
b.)Used to build a URL that references a specific page on the corporate website related to the product.
2.)Drillthrough Actions:
Lets the client request details behind aggregated cell values
3.)Reporting Actions- Can be used to request SSRS reports.

Properties:
1.)Name
2.)Target-
1.) target type(attribute member)- If you set cube the action will relate to the entire cube and do not need to provide more details about the cube. In the client the action is available for all cube objects every member , every cell, every dimension and every level.
Limit the action by selecting Dimension members. The action in this case is available for all members of the dimension specified in the target object.
Select Hierarchy or level to limit the availability of the action to the members of the specific hierarchy.
Another option is to select cell as the target type. when a user clicks a cell in a client application the action passes the value of teh cell as the parameter.
2.)target object(attribute hierarchy) if want to target a attribute hierarchy
3.)Condition

Note:
For configuring a drilldown action, one needs a list of drillthrough columns that the end user will see when the action is executed.

B3.) Localisation of Cube through Translations:
Translations support International Users.
To translate the cube metadata including captions of cubes, cube dimensions, measure group, measures, KPI, actions, named sets browse to the translation tab in the cube designer.

B4.) Implementing Cube Perspective:
Represents a subset of the cube.
Main purpose is to reduce the perceived complexity of the large cube by exposing only a subset of the cube objects.
eg if the members of a project are interested in browsing just the reseller sales, a perspective can be created that includes only the reseller sales measure group and the associated dimensions.
By default the cube has a single perspective that exposes the entire cube content.
Perspectives cannot be used to enforce restricted access to portions of the cube(security mechanism).

C.)Creating Calculations and Queries by using MDX
MDX gives the programming power to implement custom solutions.

MDX Syntax:
MDX used in 2 main ways:
1.)To query
2.)To extend multidimensional cubes- eg implement business metrics eg KPI's

Tuples:
Multidimensional coordinate that identifies a single cell in the cube space.
eg [Product].[Product].[Road Bike],[Date].[Year].2005,[Measures].[Internet Sales] references a cell found at the intersection of product Road Bike, year 2004 and measures Internet Sales.
Sets:
Collection of tuples with same dimensionality or attribute hierarchies. represented by {}
eg
1.){[Sales Territories].[Countries].[All Countries],[Date].[Year].[2004],[Product].[Product].[Mountain-100]}
2.){[Sales Territories].[Countries].[All Countries],[Date].[Year].[2004],[Product].[Product].[Road-200]}

Notes:
1.) The cube measures are treated as members of a special Measures dimension.
2.) When a dimension member in a tuple is referenced, you can use a member name eg [Product].[Product].[Mountain-100]. This syntax works only if there is a single product with that name.
You can resolve the member uniquely by using its key and prefix the key by & eg [product].[Product].&[10] will resolve the member uniquely even if there are more than 1 same name members.
3.) Cube space consists of many hierarchies, enumerating them is tedious so if the attribute hierarchy is missed it references to the default hierarchy...Usually the
"All Member" hierarchy.
4.) Can use the CurrentMember for most dimensions, but you do not need to explicitly reference in the tuple By default the Current Memeber of each attribute hierarchy is implicitly referenced in a tuple.
5.) Difference between 'Members' and 'All'-
Eg [Product].[Category].[All] Dragging the All Member of any hierarchy gives all the members present in that hierarchy. eg it returns just 1 row with All data.
Eg Product.Category.Members Members lists the the members from the category attribute hierarchy of the Product Dimension

Basic MDX query:
SELECT [Measures].[Internet Sales Amount] on COLUMNS
[Product].[Category]. ON ROWS
FROM [Adventure Works]
WHERE [Date].[Calendar Year].&[2003]

Notes:
1.) A query can have upto 128 axes(numbered 0 to 127)
First 5 are named COLUMNS, ROWS, PAGES, SECTIONS and CHAPTERS.
2.) Cannot skip axes

*Members-returns all members of the hierarchy

MDX Functions for Navigating Hierarchies:
1.)CurrentMember-eg [Order Date].[Calander].CurrentMember, [Measures].[Reseller Sales Amount] tuple returns the reseller Sales Amount for the member of the calendar hierarchy that references the current cell.
2.)PrevMember-eg [Order Date].[Calander].PrevMember returns the previous member with respect to the current member in the calendar hierarchy
Imp: Used to define KPI Trend and KPI Goals.
eg KPI Goal= 1.40([Date].[Calendar].PrevMember,[Measures].[Reseller Sales Amount])
3.)Children-eg [Order Date].[Calander].[2003].Children returns all quarters of the year 2003 assuming that the level below the calender year in the calender hierarchy is the quarters.
4.)Parent- Returns the member's parent in a given hierarchy
eg [Product].[Category].[Product].[Road Bike].Parents returns th Bike members of the Product Category Hierarchy.
5.)Member-Returns all the members of the hierarchy eg [Product].[Product].[Product]. Members return all the products of the product dimension and product Hierarchy excluding the All member.

Functions for Navigating Time: These functions help navigate time dimension.
1.) Lag
eg [Order Date].[Calendar].Lag(1) returns the previous sibling of the current year.
So if the current member is quarter level the Lag(1) will give the previous quarter
2.) Parallel Period- Returns the member from the prior period in the same relative position as the specified member with an optional offset.
If no member is specified then 0 is assumed.
eg 1 with 0 offset
Parallel Period([Date].[Calendar].[Calendar Quarter], [Date].[Calendar].[Month Name].[October 2003]) returns October 2002.

Syntax:
ParallelPeriod( [ Level_Expression [ ,Index [ , Member_Expression ] ] ] )

eg 2 with offset:
SELECT ParallelPeriod ([Date].[Calendar].[Calendar Quarter]
, 3
, [Date].[Calendar].[Month].[October 2003])
ON 0
FROM [Adventure Works]
returns the parallel period for the month of October 2003 with a lag of three periods, based on the quarter level, which returns the month of January, 2003
3.) PeriodsToDate- returns a set of sibling members from the same level as the given member starting with the first sibling ending with the given member.

Syntax:PeriodsToDate( [ Level_Expression [ ,Member_Expression ] ] )
Arguments:
Level_Expression
A valid Multidimensional Expressions (MDX) expression that returns a level.
Member_Expression
A valid Multidimensional Expressions (MDX) expression that returns a member.

eg following query returns internet sales for the 1st 10 months of 2003
Select [Measures].[Internet Sales Amount] on 0,
PeriodsToDate([Date].[Calendar].[Calendar Year],
[Date].[Calendar].[Month Name].&[10].&[2003]) on 1
from [Adventure Works]

Creating a Calculated Member:
It is a dimension member whose value is calculated at the run time. Imp:The cube members are considered as members of the special Measures dimension.
The definitions of the Cube Calculated members and the named sets become part of the cube MDX Script which can be accessed from the calculation tab.

Calculated Member Syntax:
CREATE MEMBER CURRENTCUBE.[measures].[Sales Amount]
AS [measures].[Internet Sales Amount]+[measures].[Reseller Sales Amount],
FORMAT_String="Currency",
VISIBLE=1;

Calculated Members:
A query can request calculated columns using WITH clause before SELECT
eg

With Member [Gross Profit] AS
'[Measures].[Internet Sales Amount]+[Measures].[Reseller Sales Amount]'
SELECT {[Measures].[Internet Sales Amount],[Gross Profit]} ON COLUMNS,
[Product].[Category].members on Rows
FROM [Adventure Works]
WHERE [Date].[Calender year].&[2003]

When to use Calculated Members:
1.)If a Measure is needed that does not belong to a particular measure group.
2.)when you need a non additive measure that operates uniformly across dimension levels.
eg a calculated member that calculates avg reseller profit amount (Reseller Sales Amount/Reseller Sales Count) returns the expected results no matter how the user slices the data.

Note:
Calculated Member does not increase the cube size since the server stores only its definition, not the data.

Defining Named Sets:
The following MDX construct returns a set of dimension members. eg the below MDX returns top 50 customers:
CREATE SET CURRENTCUBE.[Top 50 Most Profitable Customers]
AS
TopCount(
(Existing [Customer].[Customer].[Customer].Members), 50, [Measures].[Internet Sales Amount]);

Note:
The member function returns all members of the customer attribute hierarchy in the dimension customer table. Use the triple notion syntax eg Dimension.Attribute.Attribute to exclude the All member of the hierarchy.
2.)Provides 2 types of named sets
Static: Evaluated once by the server
Dynamic:Evaluated for each query
By default BIDs creates Dynamic sets.

DEVELOPING SSAS CUBES

Dimensional Modeling is the methodology to architect schemas required for reporting by reducing the number of tables and deformalizing the data. There are facts(containing the measure) and dimension(subject areas on which the data is sliced) tables in the schema.
For a BI solution the OLAP methodology is used which extracts the data in the underlying transactional database and loaded into a data mart. Data Mart is designed using the dimensional modeling. On top of data mart a UDM layer is built which has advantages of exceptional performance for large data volumes.
SSAS understands semantics of data so helps extend the cube with important business calculations in form of MDX (multi dimensional expressions). Eg KPIs that measure company performance. Lot of 3rd part tools support SSAS and MDX query language and dashboards can be developed using SharePoint or other tools.

UDM consists of the below components:
1.) Data Source
2.) Data Source View- helps us visualize the database schema and augment it using calculated columns and calculated queries
3.) Dimensional Model
4.) Calculations (optional)- augment the cube using custom logic of MDX such as QTD, YTD etc
5.) End User Model (optional)- Additional features that can be built to give richer scemantics including KPI’s, actions, perspectives and translations.
6.) Management Settings-Admin configures the cube to meet various operational requirements including availability, latency and security.

Defining a Data Source:
Note:
1.)Choose native OLE DB provider because SSAS is implemented in native code.
2.)Authentication option:
Standard security –less secure than windows authentication since need to manage the passwords
Windows Authentication-If the user is a system admin then all the operations will succeed.
3.)Impersonation information:
-Use a specific username n pwd
-Use service account- this is appropriate if SSAS service acctn is the domain or local account and has read access to the underlying data source.
-Use credentials of the current user- this is seldom used since it applies to certain data retrieval scenarios involving querying data mining models.
-Inherit- lets the server decide the mode. The server uses service acctn for cube processing and current user for querying data mining models.

Creating DSV:
IF the primary key and table relationship are defined in the underlying database the wizard automatically detects it . If not you need to add manually by going to the DSV designer and right click and set Logical Primary Key. For a foreign key drag the foreign key of a table to its related Primary key in another table and the relationship automatically creates.
Note: The DSV designer does not validate any logical Pk and table relationships that are manually created so if the underlying tables break any relationship one would not know until the cube is fully completed.
Defining DSV Named Calculations and Named Queries:
Named Calculations: A specific table might have columns that are derived from other columns from the same table or columns of another table.
Named Queries: Helps creating a new table
Note: As the requirements evolve it might be required to replace a table using a new table so in that case named Query can be used to write a query to replace the old table.

Creating and Modifying Cubes:
The next task is to create the dimension layer of the UDM. This includes design dimensions, cubes, measure groups and measures. The cube wizard reads metadata from the DSV and the underlying relational database to deduce the structure of the cube and its related dimensions.
Notes:
Use existing tables is the most common choice for cube creation but the option generate tables in the data source can be used to design the UDM from top to bottom in absence of data source.
A measure group in the UDM is equivalent to the fact table in dimensional modeling. It contains numeric measures that change over time.
Measures and dimensions can be renamed and removed by clearing the check box.

Cube designer tabs:
1.) Cube Structure- used to edit the cube structure
2.) Dimension usage- Defines the relationship between dimension and measures
3.) Calculations- Contains MDX formulas
4.) KPIs- Defines KPIs
5.) Actions- used to create actions eg reporting or launching URLs available to the end user while browsing the cube
6.) Partitions-storage structure is defined
7.) Aggregations- Creates custom aggregation designs
8.) Perspectives-used to create subsets of cube content for easier end user browsing
9.) Translations-defines language translations for cube metadata
10.) Browser- used to browse the cube

Measure/ Dimension Properties:
Measure pane lists the measure groups and measure defined in the cube. The property window describes the properties related to the selected object.
Name: How the measure id called in the end user tool.
FormatString: How the measure should be formatted eg currency or percentage.
Note: It is important to note that any properties you change in the cube do not affect the design of the dimension modeling outside the cube. A dimension included in the cube is an instance of dimension (called the cube dimension) that is defined within the dimension folder of the project.

Creating and Modifying Dimensions:
A dimension gives the user the abilities to isolate, drill down, roll up, categories, filter, summaries and perform other actions on data. It gives the user the ability to analyze data by subject areas.
Creating a dimension:
Notes:
1.)While creating a dimension in the dimension wizard the option Generate a time table in the source option lets you create a time dimension in the data source and then lets you populate the table with range of dates. The option Generate a time table in the server is useful when the fact table contains columns of DateTime date type and resolving these columns to a designated time table is not an option.
2.)Best practice is to use a designated time dimension table in the data source. This helps to enhance the time table with additional attributes like holiday flag etc.
3.)In the dimension wizard, in the select dimension attribute page by default each attribute forms a hierarchy called the Attribute Hierarchy and lets the user analyze data by the members of this hierarchy. Eg Color attribute contains the distinct values of the color column. This lets the user see the aggregated measure values of each member of the attribute hierarchy. Eg user can browse sales amount measure by a product color.
Furthermore you can improve the performance of browsing dimension data through user defined hierarchies by defining attribute relationships between attributes within the hierarchies.
You can clear the enable browsing checkbox to hide the attribute when a user browses a cube.
4.)SSAS supports assigning a pre defined attribute type to an attribute. Imp in time dimension because it tells the server that the attribute represents time periods (year, quarter, Month etc) when you write related calculating eg YTD, MTD , QTD.
5.)The properties are set both on the dimension object and the attributes of the dimension.
6.)Configuring Column Bindings:
Three properties used for column bindings: KeyColumn, NameColumn and ValueColumn.
By default the Key column property of English Month Name is configured to EngMonthName column and if you process the dimension the output can be viewed in the browser and it shows the English Month Name attribute hierarchy ill show the month in alpha order.
Can manipulate this property by using namecolumn property so first change the keyColumn property to point to MonthNumberOfYear and when the dimension is processed the result is 1 to 12 ascending order. This is where NameColumn property becomes useful so here we provide a mapping to the EnglishName. Processing the dimension gives Jan till Dec.

Modifying a Dimension:
You can add/delete an attribute by removing and adding in the attribute pane.
Note:
1.)For the dimension the following 2 properties define the storage mode:
--Storage Mode-defines if the attribute is stored using MOLAP or ROLAP. Most Common MOLAP deployment.
--Proactive Caching- determines whether SSAS processes the updates automatically .Most solutions do not deploy Proactive Caching

Attribute Properties:
1.) Name
2.) Usage- setting of key means that it’s a key column which other attributes relate to. A setting of parent means that it is a part of parent child relationship.
3.) KeyColumns - columns used for Pk
4.) NameColums - optional specifies the columns that provide name values
5.) AttributeHierarchy -Enabled- True if the attribute is used in a multi level hierarcy.
6.) AttributeHierarchy -Visible – if the attribute can be seen and selected by the users
7.) IsAggregatable – Specifies whether the values of the attribute members can be aggregated.
8.) OrderBy - specifies if the members of the attribute are ordered by its own key or name columns.
9.) OrderByAttribute - provides the name of the attribute that provides ordering
10.) Type- specifies the type of info contained in the attribute
Note:
It is important to add the dimension that is created after the cube id processed. They can be added by clicking add cube dimension button at the toolbar.
1 dimension can be added more than once eg time dimension might have to be added more than once to support analysis by different dates eg due date, order date, ship date etc. A dimension that is added multiple times to the same cube is called the Role-Playing dimension.
A role-playing dimension is a dimension that is used in a cube more than one time, each time for a different purpose. For example, you might add a Time dimension to a cube three times to track the times that products are ordered, shipped, and received. Each role-playing dimension is joined to a fact table on a different foreign key.
Dimension usage tab can be used to assign meaningful names to the multiple used dimensions with the appropriate measure groups.

Creating Measure Groups and Measures:
2 scenarios when new measure groups are required:
1.)When new fact table has been added to the database:
As relational table is a container of columns, a cube measure group contains related measures. As new column is added to a fact table, there is a need to add corresponding measures to the measure group.
Use theusage drop down list to select its aggregate behavior . Based on how they aggregate they can be classified as additive (can be summed across all dimensions eg sales amount), semi additive (can be summed across all dimensions except time account balance) or non additive (cannot be summed eg interest rates).
2.) When you need a distinct count measure that calculates the number of fact rows with unique values eg number of unique customers that have places orders. Because of the required server overhead to store distinct count measures a measure group can have only 1 measure with the distinct count aggregation function.
The new measure Group Dialouge box does not let you select a fact table that is already being used by another measure group. The distinct count measure must be placed in a separate measure group. As a workaround the cube will automatically create a new measure group when you try to add a measure that contains a DistinctCount aggregate function to the existing measure group.
Make sure the underlying fact table is included.
Best Practices:
The end user can browse data in the measure group only by dimensions that are joined to the measure group. If the dimension is not joined to a measure group the IgnoreUnrelatedDimension measure group property controls the result when a user browses the measure group by that dimension. When a new measure group is added, the cube designer examines the relationship between fact table and the cube dimensions and sets the dimension usage accordingly. Go to Dimension usage tab to review the settings.
Notes:
A measure group must contain at least 1 measure.
The cube designer adds a system generated measure that contains a count aggregate function. This is useful if you need the server to automatically calculate the number of fact table rows.
Eg if the Internet Sales Measures group represents a customer order and the user slices the cube by year the Internet sales count measure will return the number of orders places in the year. You can remove the count measure after the cube designer has generated the measure group.

Measure Group Properties:
ErrorConfiguration-Specifies default or custom error config
EstimatedRows-number of rows of fact table already know in this case help the server choose optimum data structure.
IgnoreUnrelatedDimensions- Defines the server behavior when the user browses the measure by unrelated dimensions to the measure group.
ProactiveCaching-none
ProcessingMode-Regular (users cannot access the measure group until processing is complete), LazyAggregations (data is accessible as soon as it is processed but processing takes longer)
StorageMode- MOLAP, ROLAP, HOLAP
Tyoe-information about the contents of the measure group

Measure Properties:
AggreagateFunction:
1.)Additive (additive across all dimensions) eg sales amt, volume of participant’s etc, Semiadditive (additive across dimensions except time dimension) eg account balance and nonAdditive (cannot be added across dimensions) eg rates etc
2.)Display folders: Groups measures into logical folders
3.)DataType: Needs to be explicitly set for count and distinct count measures. Inherit gets the data type from the underlying DSV.
4.)FormatString : Defines the measure display format
5.)MeasureExpression: Defines the measure expressions.
6.)Source: Defines the measure bindings. A count measure can be bound to a row if it needs to count rows.
7.)Visible: When set to false the measure is hidden from the end users though can be used in MDX expressions.

Thursday, April 7, 2011

Developing SSAS Cubes: Creating Data Sources and data Source Views

CREATING DATA SOURCES AND DATA SOURCE VIEWS:
SSAS works best with database schema's that are designed to support data analytic s and reporting. The methodology you use to architect such schemes is called dimensional Modeling.
The fact table and its dimensions make a star pattern typical of OLAP. With star schema the dimension table is contained in a single table.
After the dimension table is in place you can build the UDM (Unified Dimensional model) or cube on top of it.

Options for developing Analysis Services:
1.)Project mode:
You can either select the Analysis Services Project template or the Import Analysis Services 9.0 Database Wizard in the Business Intelligence template of the Add New Project dialog box.
Working in the project mode helps you to develop Analysis Services solutions without the need for an active connection to the database. Furthermore, projects created in Business Intelligence Development Studio can work with source control technologies such as Microsoft Visual SourceSafe, allowing multiple developers to work with the same Analysis Services objects without overwriting each other’s work.
The project mode also provides the flexibility to deploy the solution to a test environment before deploying the solution to a production environment. It is best practice to develop Analysis Services solutions in the project mode.
2.)Online mode:
In Business Intelligence Development Studio, you can connect directly to cubes in an Analysis Services database and make modifications to a live cube. The changes that are made occur immediately, and no Analysis Services project is created.

To connect to Analysis Services 2008 in online mode, perform the following steps:
Open Business Intelligence Development Studio.
On the File menu, point to Open, and then click Analysis Services Database.
Select Connect to existing database.
Specify the server name and the database name.
Click OK.

UDP Consists of several components:
1.)Data Source: represents a connection to the database where the data is stored.
2.)Data Source Views: Abstracts the underlying db schema. Can add calculated columns to a DSV when the security prevents you from changing the schema.
3.)Dimension Model: After the DSV is created the next step is to create a cube dimensional model with dimension and fact tables.
4.)Calculations: Only in rare cases will the dimensional model alone meet the needs. Custom logic like MDX expressions such as Quater to date(QTD) and year to date(YTD) can be built from such calculations.
5.)End User Model: It defines the additional features you built on top of dimensional layer to provide richer data semantics. They include KPI's, actions, perspectives and translations.
6.)Management Settings: At the last the savvy admin would configure the cube to meet operational requirements including availability, latency and security.

1.)Defining a new data source:
Setting the Connection Information:
When setting the connection information you need to decide the authentication option the server will use to connect to the data source to read data.
1.)Windows integrated Security- This option connects to the data source under the identity of windows account. Data source operations initiated in BIDS are performed under the identity of the interactive user ie you.
2.)Standard Authentication- User name and Password
By selecting the SQL Server Auth option it is selected. Same credentials will be used both by BIDS and SSAS connections. This authentication is less secure then Windows Authentication.

Impersonation Information:
1.)Specify Windows Username and Password-
Lets you connect to the data source through a predefined windows account
2.)The service Account-
It performs the data operations under the identity of the SSAS windows service. This option is appropriate if the SSAS service account is domain or local account and has read access to the underlying data source data.
3.)Use the credentials of the current user-
It impersonates the identity of the interactive user. This option is rarely used since certain data retrieval scenarios involving data mining models.
4.)Inherit-
Lets the sever decide what impersonation mode is appropriate. For cube processing the server uses service account and for querying data mining models it uses credentials of current user.
Note:
Use Service Account option generally

2.)DATA SOURCE VIEWS(DSV):
Data Source Views provide a logical data model of related tables, views and queries from one or more data sources. As a result, you can define a subset of data in the data source view. This is typically done when you want to expose a data mart from a data warehouse that will be used as a basis for creating a cube. The data source view will hold metadata about the underlying tables and relationships from the source data.

As the data source view is a logical model, changes can be made to the data source view without changing the underlying schema in the data source.

DSV Keys and Table Relationships:
One purpose of DSV is to define the Primary key for the dimension table and the table relationships between dimension and fact tables. If Primary keys and relationships are defined within the underlying tables then the DSV will detect it automatically and add to DSV. If they are not defined eg when using views instead of tables then one must manually add them logically in the DSV schema.
This can be done by:
1.)Within the DSV designer select one or more columns that make up the key for the table by holding down the control key and then clicking the relevant columns.
2.)Right click the selected columns and select "Set logical Primary Key".
To define a logical relationship drag the foreign key from a fact table to its related primary key within a dimension table.The resulting relationship will appear as a arrow pointing from foreign key to the primary key table.

Note: Make sure you understand the underlying relational structure and the data before you create a logical Primary key and table relationships. The DSV designer does not validate any logical primary keys or table relationships that are created manually. If data issues in the underlying table break any logical definitions, one will only come to know once the cubes are designed and processed using the DSV.

Named calculations and Named Queries:
Named Calculation: It is a column based on an expression. It is a computed column which is a part of a DSV.
eg A named calculation can reference another table. Our eg has 2 tables Product and ProductCategory table and both tables have a ProductCategoryID column. The productcategory table has a CategoryName column. You need a named calculation in the product table that shows the Product Category by referencing the CategoryName column from Product Category table. Following SQL statement is used.
(Select CategoryName from ProductCategory
where ProductCategoryID=products.ProductCategoryID)
Named Queries:
It is created with a full select statement with one or more columns.

3.)CREATING AND MODIFYING DATA CUBES:
After creating a data source and DSV that model the underlying data structures, the next task is to create the multidimensional layer of UDP.
There are 3 ways by which wizard for generating the cube definition:
1.)Use Existing Tables:
Helps to generate the cube on the bottoms up approach based on the DSV created.
2.)Create empty Cube:
Need to design the dimensional model from scratch.
3.)Generate tables in Data Source:
This is a top down approach. Select this option when you prefer to design the UDM from top to bottom during the absence of data. After the UDM is in place you can auto generate the supporting schema.
Note: Common to use the first option.

Modifying the Cube and the Cube Designer:
BIDS opens the resulting cube in a cube designer. The cube designer interface has 10 design tabs across the top. You can use these tabs to modify he various aspects of the cube.
Cube Designer Tabs:
The tab names and their functions are as below:
1.)Cube Structure:
Used to edit the basic structure of a cube.
2.)Dimension Usage:
Defines relationships between dimension and measure groups in a cube.
3.)Calculations:
Contains MDX formulas and scripts to define calculated members.
4.)KPI's:
Defines KPI's within a cube including formulas for KPI status and trends.
5.)Actions:
Used to create actions such as running reports or launching urls that will be avaliable to the end users as they browse the cube.
6.)Partitions:
Defines the storage structures used for each measure group within a cube.
7.)Aggregations:
Used to create custom Aggregation designs.
8.)Perspectives:
Used to create subsets of the cube content for easier end user browsing.
9.) Translations:
Defines the lang for cube metadata eg measures and dimension names.
10.)Browser:
Used to browse the cube data

Note:When you modify certain properties for a dimension to change its behavior in a cube, the physical design of this dimension is inherited from the dimension at the database level.

CREATING AND MODIFYING DIMENSIONS:
Note: dimension first is more preferable. In this way, you now control the attributes that exist in the dimension, and define any hierarchies that you deem appropriate for your business users. By taking this approach, you can run the Cube Wizard to create the cube, and then add existing dimensions within the Cube Wizard so that after the wizard completes, the dimensions have already been defined without the need to further edit the dimensions.
When the cube wizard generates a set of dimensions it only adds primary key columns as dimension attributes. As the requirements evolve, you might need to add new dimensions or modify existing dimensions to the UDM dimensional model.

Creating a new Dimension:
Attributes can be organized into user-defined hierarchies that provide navigational paths to assist users when browsing the data in a cube. Instead of browsing a single flat hierarchy of attributes, you can improve the user's experience of reading the dimension data by creating user-defined hierarchies. For example, a Time dimension consists of a user-defined hierarchy named Calendar Year that contains the Year, Quarter and Month levels within the hierarchy for users to navigate through the time dimension data.
Furthermore, you can improve the performance of browsing dimension data through user-defined hierarchies by defining attribute relationships between attributes within the hierarchy. Although attribute relationships are not visible to the users, they can have an impact on performance by creating more effective aggregates within the cube that users may query.
Dimensions, attributes, user-defined hierarchies and attribute relationships are created using the Dimension Wizard.

Monday, April 4, 2011

Executing And Scheduling packages

Using DTExecUI to configure Package Execution:
The command line executable file DTExec is a utility with broad parametrization capabilities. You can generate the command line manually by applying the various parameters but to save time and avoid errors you can use the command line builder utility named DTExecUI.
DTExecUI is a visual tool that naturally groups the command parameter options.

Using DTExec for Package Execution:
To exec SSIS Package saved to a SQL Server with windows Authentication use the following: dtexec.exe /SQL MyPackage /Server SQLProd
To exec SSIS Package saved to a File System: dtexec.exe /f "c:\MyPackage.dtsx"
To exec SSIS Package saved to a File System passing a config file: dtexec.exe /f "c:\MyPackage.dtsx" /conf "c:\pkgOneConfig.cfg"

Executing Packages in SSMS with SSIS Service:
When you right click and run a package in SSMS, SSMS opens the DTExecUI utility. It will be preconfigured with Package location info and you can configure the other property pages.

Creating SQL Server Agent Jobs to Execute SSIS Package:
SSMS-->SQL Server database-->SQL Server Agent-->Job
Following list defines the scheduling options for SQL Server Agent job:
1.)Recurring
2.)One time
3.)Start Automatically when SQL Server Agent Starts
4.)Start whenever the CPU becomes Idle
You can set up notifications eg sending email whenever a package fails,succeeds or completes.

Sunday, April 3, 2011

Managing SSIS Service and Configuring Package Security

MANAGING SSIS SERVICE:
The SSIS Service is the Windows Service named 'SQL Server Integration Services'. You can configure the startup requirements and the service account by using SQL Server Config Manager.
After you connect to the SSIS Service,the object explorer window in SSMS lets you manage,secure and execute packages deployed to the server.
The Running Packages Folder:
You can connect to the SSIS services in SSMS and refresh the running package folder to see what packages are currently running. You can stop or terminate package execution by right clicking the package and it terminates after the next task completes.
The Stored Packages Folder:
It has 2 subfolders: File system and MSDB. The package store is located by default in %Program Files%\Microsoft SQL Server\100\DTS\Packages\Folder.
The services uses a config file at the location:
C:\Program Files\Microsoft SQL Server\100\DTS\Bin\MsDtsSrvr.ini.xml
You can edit this xml file and change the element value to a full address.

MSDB folders displays packages that have been deployed to SQL Server. These packages are stored in MSDB database in a table named [dbo].[sysssispackages]. You can create virtual folders in MSDB folder and import packages directly to those subfolders.

Adding SSIS Package Security:
Securing SSIS is important for 2 main reasons:
1.)Connection Information:
Although packages do not contain any data from sources or destinations there is connection information in those packages.
2.)Schema information:
It is not possible to access the sources or destination directly, but with an unencrypted package it is possible to get the look of how the sources and destination looks like.
There are 3 ways to limit the access to packages:
1.)You can apply SQL Server Security roles and can define which SQL logins or windows group or users have read and write access to a package.
2.)You can employ security encryption by using the package protection level in which either a part(sensitive data) or all of the package is encrypted.
3.)You can define file level security in which you can apply access rights to the file or folder.

Assigning Roles and Securing Packages stored in msdb:
Roles are of following types:
1.)Read actions focus on viewing and executing packages.
2.)Write actions apply to moving packages in and out of msdb without executing the packages.
How to use Security Roles:
1.)Expand stored package folder and then the MSDB folder.
2.)Right click the package name for which you need to apply the security roles and then select package roles dialog box.
Types of Roles:
1.)db_ssisadmin- read(view, execute and export)and write access(delete, change and import)
2.)db_ssisltduser-read and write own packages
3.)db_ssisoperator- read only access
4.)windows_Admin- can view exec details of currently running packages and stop currently running packages.
Package Protection Level:
The ProtectionLevel Property Encrypts package definition information that resides in the underlying XML file.
Package password property when protection level requires encrypting metadata in the package with a password.
By default the package is set to use EncryptSenstivityWithUserKey. This means if you create a package with default settings and the package contains sensitive data(such as connection with a password) the connection with be encrypted and viewed only by you, the package author if the package is opened in BIDS.
Options for Package Level Security:
1.)DontSaveSensitive: Sensitive information is not saved and every time a package is opened the information has to be reentered.
This is helpful if only Windows authentication is used for connections exclusively.
2.)EncryptAllWithPassword: The entire package is encrypted using a password used in the PackagePassword property. The package will not be executed without the correct password in BIDS.
3.)EncryptSensitiveWithPassword: The sensitive data is encrypted based on the password entered in the packagePassword property. if a password is not provided the package will be opened but the sensitive data will be replaced with blanks.If a user attemps to execute the package without the password the package fails.
4.)EncryptSensitiveWithUserKey: The sensitive data is encrypted based on the current user and the computer. if the package is opened by a different user it will be opened but the sensitive data will be replaced with blanks.
5.)EncryptAllWithUserKey:The entire package is encrypted based on the current user used in the PackagePassword property. The package will not be executed unless it is the same user who created it in BIDS.
6.)ServerStorage: This setting can be used only if the package is stored in the SQL Server db.Nothing in the package definition is encrypted. Instead the entire package is protected by the SQL Server db roles.
Note:
If you are deploying a package with the EncryptSensitiveWithUserKey ProtectionLevel setting to a new server the connection passwords will not be decrypted and the package execution fails.
Key Points to consider while deciding on the package ProtectionLevel Security:
1.)If all connections are windows authenticated they do not require passwords since users are authenticated with windows, you do not have sensitive info in your package. In this case use DontSaveSensitive.
2.)If there is sensitive info such as connection password you have to decrypt the password using the package pwd or by running the package on the same computer by the same user.

Managing Package Security with DTUtil:
The following eg encrypts MyPackage.dtsx with the password EncPawd. The parameter requires the package location and the encryption level and a package pwd.(0-DontSaveSensitive,1-EncryptSensitiveWithUserKey,2-EncryptSensitiveWithPassword,3-EncryptAllWithPassword,4-EncryptAllWithUserKey,5-SQLServerStorage)
dtutil.exe /file MyPackage.dtsx file;MyPackage.dtsx;3;EncPawd

Monday, March 28, 2011

Configuring and Deploying a SSIS Package

PACKAGE CONFIGURATION:
When you work in BIDS all values that you set up during development are stored as XML in the package's .dtsx file. The package is also stored in this file. Whenever a package is executed the first action a package takes is to look at its configurations and overwrite the packages current setting with the new settings from the configurations.
Package configurations are optional but they provide certain benefits like:
1.)Letting you update package settings without opening the package in BIDS.
2.)They also give a central location for settings that can be shared between package.
The common elements that are configured using the config file are:
Connection properties
Package Properties
Task and Container Properties
Package Variable Properties
Note:Data Flow properties cannot be configured by using package configurations. You can only configure the Data Flow Container itself.

ENABLING SSIS PACKAGE CONFIGURATION:
Open Package-->SSIS menu-->SSIS Configurations
Config Type:
Most commonly used configuration types are XML Configurations file and SQL Server configurations.
Creating an XML File Configurations:
There are 2 ways of creating an XML File
1.)Hardcoding the configuration file by specifying a name in the box provided.
2.)Use environment variable that uses the location in the file by creating a systems environment variable in the computer systems properties.
This is called indirect Configuration and is useful when the XML file location or file name might change.
Control Panel-->Systems and Security-->Advanced System Setting-->Advanced-->Environment Variables.
Creating an SQL Server Configurations:
Packages can share SQL Server configurations even if they are on different servers since the SQL Server configurations do not have to be stored in the local server.
Note: Which Config type to use:
1.)Use SQL Server config if there are multiple packages that need to share a config and the packages are executing on different computers.
2.) If packages are executed together. i.e. one package executes another which executes another one then use the parent package variable config.
3.) If planning to move packages from developmental to test server and then to production and each environment needs to use a different location then use the xml file
4.)If there are more than 1 config entries then that needs to put into a single SSIS configuration and either XML or SQL Server config file can be used.
Adding Properties to Your Configurations:
If entries already exists in the configuration file you selected SSIS prompts to reuse the setting or overwrite(which will clear the existing configurations) it.
If configuration entries do not exists or if overwrite is clicked you see the select properties to export page.
The Select Properties To Export Page lets you select the properties for the SSIS configurations that are selected. Properties are grouped by the following folder:
Variables
Connection Managers
Log Providers
Properties
Executables
Sharing, Ordering Configurations:
If there are several configurations the configurations are called in the order in which they are listed in the package configuration Organizer.

LIST OF PLACES THAT CAN USE EXPRESSIONS IN THE SSIS PACKAGE:
Derived Column Transformation
Conditional Split
For Loop
Variable Expression
Property Expression which include package properties at the control flow level or container and task properties accessed by editors and property window.
PROPERTY EXPRESSIONS:
It lets you update the properties of control flow during execution of package.
This can be done by first setting the property as an expression through the properties window and then set the expression property.

DEPLOYING A SSIS PACKAGE:
After the configurations and expressions are set in a package it is ready to move to a new environment. The deployment process requires 2 steps:
1.)Creating a package installer set by using package deployment utility
2.)Deploying the installer set to a new destination.
PACKAGE DEPLOYMENT:
When packages are moved from one environment to another there are 2 options for package storage.
SSIS Deployment to the File System:
Packages can be placed on file shares or file folders from which they could be executed.

SSIS Deployment to SQL Server:
Packages can be imported into SQL Servers from which they can be executed.
When you deploy a package to SQL Server the package is stored in a table named sysssispackages.
Deployment can be manual or driven through a utility.
1.)Manual package Deployment:
Connect to SSIS service and open the
2.)Deployment Utility:
It is the built in tool of SSIS and helps deploying an entire project at the same time to either a file system or SQL Server. It first generates a set of files that are packaged and ready to be deployed and then you run the package installer to run the deployment.

Creating a Installer Kit by Using a Deployment Utility:
Each SSIS Project has a property for enabling the deployment utility. When this property is enabled each time the project is built or debugged, the following items are copied to a single file location:
Packages
Misc files
Project Deployment Manifest(the installer kit config file)
To turn on the property follow the following steps:
1.)Right click SSIS project and click properties
2.)Change the createDeploymentUtility as True
3.)Set the location in which the deployment files will be stored.
4.)The AllowConfigChanges property is set to trueby default but it can be set to false if you need to limit the config changes while deployment.
5.)click OK
Next process is to run a build process on SSIS project to generate a deployment set. Right click on SSIS Project and create build.
Note: Turn off CreateDeploymentUtility when finished because any time any package in the project is debugge, a new deployment set will be created. This will overwrite the existing deployment sets so set the property is false after creating a deployment set.

Deploying Packages:
After the installer kit is created the next task is to take the set of files and deploy them to a destination by using Package installation Wizard. The following steps show how is it done:
1.)Move the installer kit to a destination computer. If the destination computer is on share the deployment files remain in place.
2.)Pick the file [ProjectName].SSISDeploymentManifest and open it. When you run the wizard you have the option to deploy the package to either a SQL Server db or a file system.
3.)On the Deploy SSIS package you select the destination as either a file system or SQL Server deployment.
4.)Based on what is selected you will be able to define the specific storage location.
5.) If the package includes config files it will now let you update and change the values of these configuration files.

Note: When deploying the package installation wizard copies the dependent files to the folder you specify.This includes XML Configuration files used by any package in the project(but only where XML config path is hard coded and does not use an environmental variable).Package installation files will update the XML config files in the package and point them to a new location.

Using SSIS DTUtil command line Utility:
SSIS comes with DTUtil which lets you perform package management tasks like deployment through command line or batch file processes.
The Package storage location references use /SQL, /File, /DTS parameters for sources and destinations.
Commands:
/COPY
/MOVE
/DELETE
/EXISTS
/ENCRYPT
/DECRYPT
/SIGN:Digitally signs a package
/FCREATE: Creates and manages SSIS Package store and in SQL Server MSDB db.

eg2: Below example shows copying a file from a local folder to the same location in the local folder with a new file name:
dtutil.exe /FILE c:\MyPackage.dtsx /COPY FILE; c:\MyCopiedPackage.dtsx
eg2: To delete the MyPackage.dtsx
dtutil.exe /SQL c:\MyPackage.dtsx /DELETE
eg3: To check for the existance of the file
dtutil.exe /SQL c:\MyPackage.dtsx /EXISTS
eg4: To deploy a package from file system to sql Server
dtutil.exe /FILE c:\MyCopiedPackage.dtsx /COPY SQL; \MyCopiedPackage.dtsx
When DTUtil is executed through an automated command line, use the follwing to interpret the results:
0 Success
1 Failure
4 Package not found
5 Package cannot load
6 Bad Syntax

Friday, March 25, 2011

Handling data flow errors and Debugging

I.)CONTROL FLOW/CONTROL FLOW ERRORS:
1.)Checkpoints and Transactions:
Within SSIS packages, you can implement checkpoints to reduce the time it would take to rerun failed package from a point of failure.In other words you might not want successful tasks to run again if you restart the package.
Checkpoints can be used to reduce the time it will take to rerun a failed package.
Here is what happened if you run a package that has checkpoints enabled.
1.)Package checks to see if the checkpoint file exists
If the file does not exists(ie it is deleted) the package begins at the first task.
If the file exists the package reads the file and finds out where to start(including updating the values of variables and connections at the time of the last failure).
After you enable checkpoints in package, the final step is to set checkpoints at the various tasks within your package. To do this set the FailPackageOnFailure property at each task or container as True.
If a series of task must be completed as a single unit in which either all the tasks are successful and committed or all error occurs and none of the tasks are committed place the task within a sequence container and then set transaction option property as Required
To ensure transactional consistency of the data that is affected by the SSIS package, you can make use of transaction settings on control flow executable.
This will ensure that if the executable succeeds then the data is committed, if it should fail it will roll back the work performed by the executable to ensure data integrity.
Isolation level defines the level of concurrent access on the task . The least concurrency is chaos and the max is Serializable

2.)Identifying Package Status:
MaximunErrorCount control flow property lets a package continue to run to completion even if there are errors that have occurred. The max number specifies the number of errors that can occur before the package will stop executing and report failure.
Whenever a package is executed outside the debug environment the tab name execution results lists the error and warning description, execution times, final destination row counts and other execution information.

3.)Configuring Execution Logging:

Logging is used to understand what aspect of the package caused the package failure so that you can identify problems quickly.

There are 2 types of logging:

Standard(built in logging feature provided by SSIS logging info about standard events used by event handlers such as OnError, OnQueryCancel and OnPreExecute events.) and custom logging(set up event handler and query system and user defined variables to log specific info to the destination of your choice)

Implementing Standard Logging:

There are 3 steps involved in enabling logging in a package.
1.)Choosing the container to log
2.)Setting the log provider type
Most common files are text files, SQL Server but you can also log to Windows event log, SQL Server profiler or XML file.
3.)Selecting the log events:
The most common are OnError, onPostExecute, OnProcess and OnTaskFailure
If you select sql server as the log provider , the log details are stored in the system table named,dbo.syssislog.
To recover the log file execute:
select *
from dbo.sysssislog

Custom Logging:

Event Handlers For Control Flow :
Note: In data flow the data viewers gives the ability to debug problems while data processing The control flow is different and taking advantage of the visual properties in Visual Studio control flow supports visual debugging and breakpoint features.
You would first look at event handler and you will then explore the debugging capabilities in control flow.

If an error occurs the event handler can send an alert or fix a data problem.You can provide 0 to many event handlers for a package.Event handlers are a component of SQL Server Integration Services that can execute event handler executables only when a defined event is encountered on a given Control Flow container or task within an SSIS package.
You can turn off the event handler for any task or container by setting the propertyDisableEventHandlers to True.

The most common event handler used is an OnError event.
The MaximumErrorCount property on a control flow executable determines the number of allowable errors to occur.
If the threshold of the MaximumErrorCount property is met, the control flow executable throws an error.
If an OnError event handler is configured for the executable that threw the error, the event handler will execute event handler executables that are defined against the control flow executable.
Using the OnError event handler is the equivalent of using an outcome-based precedence constraint set to failure.

The OnQueryCancel event would be raised if you were to stop a package while the package was executing.

The event handler executables that are available for you to use include the same containers and tasks that are available within a Control Flow component of SSIS.

Each package contains set of system variables that are updated for various levels in the package execution. With event handlers you can capture these variables and values which provide info such as ErrorCode, ErrorDescription, SourceName etc.
You can create a template with the desired error handling settings and place it in:
C:\Program Files\Microsoft Visual Studio 9.0\Common7\IDE\PrivateAssemblies\ProjectItems\DataTransformationProject\DataTransformationItems. This can be used as a template and used by opening project menu on the visual studio menu bar and then add new item.

4.)Connecting Control Flow Objects with Precedence:
Precedence Constraints are used only in control flow and not in the data flow. The connectors between objects in control flow are Precedence constraints and connectors in data flow are data paths.
Precedence Constraints:
If the task failed the constraints are blue or red. Even if a task failed it is considered completed and therefore blue constraints (completion) are followed to the next task or container.
These connectors are green(indicating the execution success), red(is tasks failed) or blue(task is completed).
Constraints Features:
1.)Solid Lines represent AND precedence
2.)Dashed lines are logical OR statements
3,)Functional symbols show that an SSIS expression has been embedded in the constraint.
Evaluation Operation Property:
1.)Constraints:It is the execution status of the preceding task(completion, failure or success)
2.)Expressions: They are advanced boolean evaluators(meaning that they must be evaluated to true or false)
Evaluation operation drop down list includes the following options:
1.)Constraints
2.)Expressions
3.)Constraints and Expression
4.)Constraints or Expression

II.)DATA FLOW ERRORS:
1.)Data Paths and Error Paths:
Data Paths are the green connectors that go from one component to another. For these paths the rows have successfully gone through the component are output as are the error rows when the error Output is set to ignore failure.
Error Paths:The red connectors between data flow components. they contain the data rows that failed in a component when the error rows are set to redirected.
Not all components in data flow use error paths. eg multicast component since it does not perform any operations on the data itself.
There are 3 options for configuring error output:
1.)Fail Transformation:
Causes the data flow to fail if an error is encountered
2.)Ignore Failure Option:
Will let the row continue out the normal green data path but the value that resulted in error is changed to NULL.
3.)Redirect Rows
Sends the error row out as the red error path. This is the way to handle errors with separate components.
Uses of Error Path output:
a.) When text file source data does not match the data type specified in the data flow.
b.)When the lookup transformation does not find a match.

2.)Data Viewers:
They help to identify data flow issues.

There are 4 types of Data Viewers- Grid, hitogram, scatter plots and column charts.

Data viewers do not work when package is run from command line.

III.)DEBUGGING
The Progress Results windows will show the steps that the SSIS package is running through and colour codes the task as they each execute to provide a visual indication of which tasks are in progress, which tasks have succeeded and which tasks have failed.

You can also use the built-in row counts within the Data Flow component to indicate the number of data rows passing through the data flow.

You can make use of data viewers to stop the execution of a package and view the state of data as it passes through the data flow.

Furthermore, you can use the Data Profiling task for more in-depth analysis of the data to ensure that the data is being transformed and loaded as expected.

Breakpoints can be applied to both SSIS package components and within the Script task to pause the execution of a package under specific conditions so you can debug the SSIS package at the point the breakpoint causes the package to suspend.
When you debug a package that contains multiple Script tasks, the debugger hits breakpoints in only one Script task and will ignore breakpoints in the other Script tasks. If a Script task is part of a Foreach Loop or For Loop container, the debugger ignores breakpoints in the Script task after the first iteration of the loop.

Debugging the Control Flow with Breakpoints:
Breakpopints function in the control flow and not in the data flow. For Scripting that means you can set up breakpoints only in the control flow Scripting task and not in the data flow scripting Component Transformation.
In the Debug menu open the 'Windows'--> 'locals' this window shows the value of variables when the package is paused with a breakpoint.

Tuesday, March 22, 2011

DATA FLOW TASKS

Data Flow Tasks
One of the most valuable control flow task is Data Flow task. It has 3 types of objects.
1.)Data Flow Source Adapters: ado.net source(Provides connections to tables or queries through ADO.NET Provider),excel source,flat file source, ole db source(connects to installed OLE DB providers, such as SQL Server,Access,SSAS and Oracle), raw file source,xml source.
You can also define more than one data source.
2.)Data Flow Transformations:Transformations are used to modify the data so that it can be standardized.
SQL Server Integration Services provides a wide variety of transformations to meet an organization’s requirements.
Each transformation contains different properties to control how the data is changed
3.)Data Flow Destination Adapters: You then define data destinations in which the transformed data is loaded into.
Similar to Data Flow sources and have matching destination adapters.Apart from Source adapters other destination adapters are Data Mining Model Training,DataReader Destination, Dimension Processing,Partition Processing, RecordSet DestinationSQL Server Compact Destination, SQL Server Destination.
To use the destination optimization technique to have fast load and not just one row at a time to be inserted into the destination table set data access mode to Table or View- Fast Load.

DATA FLOW PATHS:
Data Flow paths can be simply used to connect a data source directly to a data destination. Typically, you use a Data Flow path to determine the order in which a transformation takes place; specifying the path that is taken should a transformation succeed or fail.
2 Paths:
1.)Success DF Path:green arrow
2.)Failure DF Path:red arrow..You are not required to configure error output Data Flow paths for every Data Flow component.

DATA VIEWERS:
A data viewer is a useful debugging tool that enables you to view the data as it passes through the data flow between two data flow components.
The different types of viewers are Grid, Histogram, Scatter Plots and column Charts.
Data viewers can be added by right clicking the data flow path.
You can use a data viewer only when running a package in the debug environment. If you have an error path configured in the package, it will be ignored when you run the package progrmmatically or from the command line.

WORKING WITH DATA TRANSFORMATIONS:
Transformations give the ability to manipulate the data in the data flow. It allows you to change the data as the data is being moved from a source connection to a destination connection such as a text file to a table within a database.
1.) ROW LEVEL TRANSFORMATIONS:
Transformations provided operations on rows without the need of other rows from sources.
Uses of this transformation includes..Performing mathematical calculations, converting text value to numerical etc or replacing null values.
The transformations in this category are:
Character Map, copy column, data conversions, derived column, row count, sort, aggregate(sum, group by) etc.

Note:The Cache transformation can improve the performance of the Lookup transformation.
2.)MULTI INPUT OR MULTI OUTPUT TRANSFORMATIONS:
It can work with more than 1 data input or generate more than 1 data output.
They include:
Conditional Split:
The Conditional Split transformation takes a single data flow input and creates multiple data flow outputs based on multiple conditional expressions defined within the transformation. The order of the conditional expression is important. If a record satisfies the first condition, the data is moved based on that condition even if it meets the condition of the second expression. There, the record will no longer be available to be evaluated against the second condition. Expression can be a combination of functions and operators to define a condition.
Lookup:
Allows matching between pipeline column values to external database tables.
Merge Join:
The Merge Join transformation is similar to the Merge transformation. However, you can make use of the following Transact-SQL clauses to determine how the data is merged. The Transact-SQL clauses include FULL, LEFT or INNER join. Like the Merge transformation, the input columns must be sorted and the columns that are joined must have compatible data types.
Merge:
The Merge transformation takes multiple inputs into the transformation and merges the data together from the separate inputs. A prerequisite to the merge input working successfully is that the input columns are sorted. Furthermore, the columns that are sorted must also be of compatible data types. For example, you cannot merge the input that has a character data type with a second input that has a numeric data type.
Multicast:
The Multicast transformation allows you to output multiple copies of the same data flow input to different data flow outputs.
Union All:
The Union All transformation is very similar to the Merge transformation. The key difference is that the Union All transformation does not require the input columns to be sorted. However, the columns that are mapped must still have compatible data types.
Note: Merge and Merge Join require data to be sorted before.

The Merge transformation is similar to the Union All transformations. Use the Union All transformation instead of the Merge transformation in the following situations:

The transformation inputs are not sorted.

The combined output does not need to be sorted.

The transformation has more than two inputs.

3.)ADVANCED DATA PREPARATION TRANSFORMATION:
These let you perform advanced operations on the rows in the data flow pipeline.
OLE DB Command transformation:
Performs database operations such as updates, deletes, one row at a time based on mapped parameters from input rows.
Slowly Changing Dimensions:
Processes Dimension changes including tracking dimension history and updating dimension values. It handles common dimension changes like Historical Attributes(2 rows are created for 1 record), Fixed Attributes and Changing attributes(replaces the row data).
Data Mining Query
Applies input rows against a data mining model for prediction.
Fuzzy Grouping
The Fuzzy Group transformation allows you to standardise and cleanse data by selecting likely duplicate data and comparing it to an alias row of data that is used to standardise the input data. As a result, a connection is required to SQL Server, as the Fuzzy Group transformation requires a temporary table to perform its work.
Fuzzy Lookup
The Fuzzy Lookup transformation performs data cleansing tasks such as standardizing data, correcting data and providing missing values.
Using the fuzziness capability that is available to the Fuzzy Grouping transformation, this logic can be applied to Lookup operations so that it can return data from a dataset that may closely match the Lookup value required. This is what separates the Fuzzy Lookup transformation from the Lookup transformation, which requires an exact match. Note that the connection to SQL Server must resolve to a user who has permission to create tables in the database.
Script Component
Provides VB.NET scripting capabilities against Rows ,Coulmns, Inputs and Outputs.
Term Extraction
Analysis text input for eng nouns and noun phrases
Term Lookup
Analysis the text input column for user defined set of words for association.

Monday, March 21, 2011

CONTROL FLOW TASK

CONTROL FLOW TASK:

DTS(Data Transformation Services) is a predecessor of SSIS(SQL Server Integration Services)
Control Flow Tasks and Data Flow Tasks:
1.) Control Flow Tasks:
Tasks that include working with data, file system and network locations. These are the workflow engine of the package and define the structure of the control flow.
These tasks also interact with .Net Technologies, Activex components and command line prompts.
Data Flow task:
The Data Flow task is commonly used in SSIS packages that load data warehouses. Separate Data Flow tasks may be created to populate staging tables. For example, a Data Flow task to populate dimension tables with another Data Flow task, and a Data Flow task to populate the fact tables with data.
Bulk Insert task:
The Bulk Insert task is particularly useful in situations where data needs to be retrieved from a system that cannot be connected to SSIS. Text files can act as a great intermediary data source and the Bulk Insert task will minimise the time required to import the data into a SQL Server table.
Execute SQL task:
You can use the Execute SQL task for the following purposes:
Truncating a table or view in preparation for inserting data
Creating, altering and dropping database objects such as tables and views
Recreating fact and dimension tables before loading data into them
Running stored procedures
Saving the rowset returned from a query into a variable
File System task:
A File System task may be used to create a folder on the file system so that it is ready to receive files that are retrieved from elsewhere. For example, an FTP task may use the folder created by the File System task as a destination to move files.
FTP task:
An FTP task is a useful medium to retrieve files from a third-party source on the Internet. Using the FTP task, you can connect to an FTP site, retrieve files and load the files into a local folder that may be used as data within the SSIS package.
XML task:
Note that the XML task is not used to connect to an XML document that will be used as source data to be loaded into a separate destination. The XML task allows you to manipulate the XML data itself. An example could include using the Merge function to merge the contents of many XML documents into a single XML file for consolidation purposes.
Send Mail task:
The Send Mail task is commonly used within SSIS packages to send an e-mail message to SSIS administrators about failures that occur within the package itself. This provides a useful notification mechanism if a package fails and you need to be informed about the failure.
Web Service task:
Suppose you have an SSIS package that is used to populate a data warehouse and process an Analysis Services cube.
A Web Service task could be used at the end of a package to call the ReportServices2005 Web service. You can use the SetExecutionOptions method within the class to create a snapshot of the report that uses the data, which is being populated.
This way, the report snapshots are created after the data warehouse is loaded and the cube processed, rather than relying on the snapshot being created on a schedule.
Message Queue task:
The Message Queue task is useful in scenarios where your package is dependent on receiving data from other sources. For example, you may have a central data warehouse that relies on updates to the fact table from branch offices. There may be a number of Message Queue tasks that wait to receive the data from each branch office. On arrival, the Message Queue task will pass the data into your SSIS package to add the data to the fact table. When all of the branch office data has arrived it will then allow the SSIS package to continue and complete the load of the data warehouse.
Script task:
The Script task is a powerful feature that helps you to overcome a situation where a built-in task or transformation cannot provide the required functionality. Examples can include connecting to custom data sources.
ActiveX Script task:
Prior to SSIS, the ActiveX Script task was used in Data Transformation Services (DTS); the predecessor to SSIS, to perform loops within DTS packages.
Within SSIS, the For Loop Container and Foreach Loop Container can now provide the same functionality. Consider replacing ActiveX Script tasks that perform this logic with the For Loop Containers and Foreach Loop Containers.
Execute Process task:
Consider using the Execute Process task if your SSIS package can only interact with a third-party application through a batch file.
Create the batch file to interact with the third-party application first, and then use the Execute Process task to call the batch file when needed within the package.
Transfer Database task:
The Transfer Database task is a useful task if you want to move a database from one instance of SQL Server that is on old hardware to a new instance of SQL Server that is held on a server with new hardware.
Organizations may also use this task to create a copy of the database on a development SQL Server.
Transfer SQL Server Objects task:
The Transfer SQL Server Objects task allows you to be more selective about the specific objects to move between database in SQL Server, unlike the Transfer Database task.
Use this task when you want to incorporate objects from a SQL Server database into your own database.
Transfer Error Messages task:
Suppose you create user-defined error messages for use within a SQL Server instance. SSIS provides a task exclusively for the purpose of transferring these user-defined messages from one SQL Server instance to another, negating the need to manually recreating the messages.
Transfer Jobs task:
Similar to the Transfer Error Message task, you can use the Transfer Job task to move all or specific jobs from one instance of the SQL Server Agent to another without the need to manually recreate the job.
Transfer Logins task:
You can use the Transfer Logins task to move Logins from one instance of SQL Server to another without requiring to manually recreate them
Transfer Master Stored Procedure task:
The Transfer Master Stored Procedure task can only be used if you save your own user-defined stored procedures within the Master database.
Additional categories of Control Flow tasks are available in SSIS that provide additional functionality, which can be used to interact with components outside of SSIS.
a.)Package Execution Tasks:
Execute Package task:
The Execute Package task is commonly used to bring together multiple SSIS packages and add constraints to control the order in which the packages should run.
This can be useful when loading a data warehouse as well. You can encapsulate separate packages to load the different parts of the data warehouse and then integrate those separate packages within a single package to control the workflow by using the Execute Package task.
Execute DTS 2000 Package task:
Although DTS support is enabled within SSIS, consider upgrading DTS packages to SSIS packages, because this is a deprecated feature available for backward compatibility only.
b.)Analysis Services Task:
The Analysis Services Processing task is commonly used in SSIS packages that are used to load data warehouses.
Typically, this is the last task that is added to SSIS and is used to process objects within Analysis Services after the data has been loaded into the data warehouse.
Data Mining Query task:
The Data Mining Query task helps you to issue DMX prediction queries against a pre-existing data mining structure.
If your data load requirements involve automated prediction on existing data, you can use the Data Mining Query task to fulfill this requirement.
Analysis Services Execute DDL task:
You can use the Analysis Services Execute DDL task if you want to automate the creation of Analysis Services objects such as cubes or dimensions.
c,)Maintainance Tasks:
Backup Database tasks, check db integrity tasks, Execute SQL Server Agent job, Notify Operator Task, Execute TSQL Statement Task, History Cleanup task etc.
d.) Windows Management Instrumentation Tasks:
WMI Event Watcher task:
A WMI Event Watcher task can be used to monitor event in the windows system that triggers a package to execute.
For example, a WMI Event Watcher task could be used to monitor files being added to a folder. On the event occurring, the WMI Event Watcher receives a notification that can then allow the package to run using the files that have been loaded. The WQL to monitor files added to a folder is:
SELECT * FROM __InstanceCreationEvent WITHIN 10 WHERE TargetInstance ISA "CIM_DirectoryContainsFile" and
TargetInstance.GroupComponent= "Win32_Directory.Name=\"c:\\\\WMIFileWatcher\""
WMI Data Reader task:
You can use the results that are returned by the WMI Data Reader task to store the results in a variable that can be passed onto other tasks within the SSIS package.

WORKING WITH PRECEDENCE CONSTRAINTS AND CONTAINERS:
PRECEDENCE CONSTRAINTS:
They help define the work flow by prioritizing different tasks.
This helps you to control what should happen should a task succeed, fail or complete. Expressions can also be used to further refine the behavior of the workflow logic.
A control flow container lets you group different tasks together to know how tasks are run in parallel or in a specific order. You can add tasks to additional containers that are available in the Toolbox in Business Intelligence Development Studio to provide greater flexibility.

PRECEDENCE CONSTRAINTS:
You can use outcome-based precedence constraints that dictate the flow of the control flow based on success, failure or completion of a task.
You can also use expression-based precedence constraints that control the flow of the Control Flow task based on a predefined expression being met.
You can create precedence constraints based on a combination of outcome and expression-based precedence constraints.

When a Control Flow task is added to the Control Flow Designer, a green arrow appears underneath the Control Flow task.
You can click and drag the arrow to connect the precedence executable to another constrained executable.
This will indicate that on successful execution of the precedence executable, the control flow can move on to the constrained executable.
If you click on the task again, a green arrow will appear under the original Control Flow task. You can click and drag this to another constrained executable.
You can also change the control flow so that the arrow indicates on completion.
This means that it will move onto the next Control Flow task regardless of the Control Flow task succeeding or failing and will appear in blue.
You can also change the control flow so that the arrow indicates on failure.
This will indicate that on failed execution of the precedence executable, the control flow can move onto the constrained executable that is connected to it and will appear in red.
If expressions are used to control the tasks in the control flow, a blue line will appear with an fx icon appearing next to the control flow.

Multiple Constraints:
Any Control Flow task can have multiple precedence constraints connecting to it.
When this situation occurs, you need to configure when the constrained executable should execute.
If the constrained executable can only execute if all precedence constraints are completed configure the precedence constraint properties so that the Multiple Constraints property is set to a Logical AND.
If only one of the precedence constraints must complete before the constrained task will execute, configure the precedence constraint properties so that the Multiple Constraints property is set to a Logical OR.
These settings allow you to refine the control for complex SQL Server Integration Services packages.

CONTAINERS:
Control flow containers enable you to group a set of Control Flow tasks or containers within a single container so that they can be organized and managed as a single entity.
Containers can also have precedence constraints connect to it and connect from it.
SQL Server Integration Services provides three types of containers:
Sequence containers
For Loop containers
Foreach Loop containers

A Sequence container is typically used to contain a number of tasks within it, so that the container can act as one logical unit within the Sequence container.
If the container has a precedence constraint, all the tasks within the container must complete first before the control flow path moves onto the next Control Flow task.
Furthermore, properties that are defined at the container level are inherited by the tasks within it, such as the DisableEventHandlers property.
Sequence containers can also act as a scope for variables.

For Loop containers allow you to define a repeating workflow that uses an expression and loops until the expression evaluates to false.
Within the For Loop container, you must define an initialisation value as the basis to start the loop.
To end the loop, an EvalExpression property specifies the value for exiting a loop with an AssignExpression property used to increment the loop.
The For Loop container is useful when you know how many times the container should run through a loop.

When the number of loops required is not known, a Foreach Loop container can loop through a collection of objects until the object no longer exists.
Objects can include files, ADO recordsets or XML nodes.
A task can be performed on the objects as they are enumerated by the Foreach Loop container.

WORKING WITH VARIABLES:
To work with variables in a package, choose variables from SSIS menu.
To add dynamic capabilities Variables can be used to pass information between 2 different components in a package. They add flexibility to the package logic.
1.) System Defined Variables
2.) User Defined Variables
Static values can be assigned to a variable within a variable definition itself, by setting a default value. You can also dynamically assign values to a variable from control flow tasks. For example, an Execute SQL task can be used to perform a count of the rows within a result set. You can then use the result mapping page of the Execute SQL task to pass the result to a variable. The variable can then use this as an input to other control flow tasks within the package.

SYSTEM DEFINED VARIABLES:
1.)SYSTEM VARIABLES in a Packahe:
Package-based variables are system variables that hold information about the package. Some of the common variables include the following:
System::PackageName. The name of the package.
System::PackageID. The unique ID of the package.
System::MachineName. The machine on which the package was executed.
System::UserName. The name of the user that executed the package.
System::StartTime. The time at which the package started to execute.
2.)Task-based variables
Task-based variables can be specific to the task that you add to the SSIS package including the following:
System::TaskName. The name of the task.
System::TaskID. The ID for the task within the package.
System:TaskTransactionOption. This variable holds information about the transaction setting that the task is using.
3.)Container-based variables
he container-based system variables can be used with Sequence, For Loop and Foreach Loop containers. Not all of the system variables are applicable to all container types. Some of the common variables include the following:
System::ParentContainerGUID. Holds the globally unique ID of the parent container that holds the container.
System::ContainerStartTime. Holds the time when the container started to execute.
System::LocaleID. Holds the value of the locale that the container uses.

USER DEFINED VARIABLES:
At the top of the Variable window are the buttons that let you create or delete variables as well as view other variables within a package. The scope defines at what level within a package the variable is created. eg if you select a foreach loop container and then click the add variable button on the toolbar for the variables window, the variable is scoped at that level.If no tasks and containers are selected the variable is scoped at the entire package level.

USING SCRIPT TASK AND DATA PROFILING TASK:
SCRIPT TASK:
It is used within SSIS Package to execute VB.NET or C#.NET code. If variables are to be included you need to include variables in ReadOnlyVariables or ReadWriteVariables list depending on whether you will be just accessing the variable for read or updating purposes in ScriptTask Editor.
DATA PROFILING TASKS:
It is used to review data source entities to check for data cleanliness and completeness and to understand how the data is organized structurally.
It had 2 parts:
1.)Data Profiling Task in the control flow that performs analysis
2.)Data profile viewer that reviews the result.
Data Profiling task features:
Column Null ratio Profile:For returning the percentage of NULL's in the columns relative to total
Column Stats Profile: For mean, median , Avg Values
Column Value Distribution Profile:
Column Length Distribution Profile:
Column Pattern Profile: Displays any patterns found in the column data
Candidate Key Profile:Identifies one or more columns that are unique across all rows
Functional Dependency Profile