Microsoft BI Certification: DEVELOPING SSAS CUBES

Dimensional Modeling is the methodology to architect schemas required for reporting by reducing the number of tables and deformalizing the data. There are facts(containing the measure) and dimension(subject areas on which the data is sliced) tables in the schema.
For a BI solution the OLAP methodology is used which extracts the data in the underlying transactional database and loaded into a data mart. Data Mart is designed using the dimensional modeling. On top of data mart a UDM layer is built which has advantages of exceptional performance for large data volumes.
SSAS understands semantics of data so helps extend the cube with important business calculations in form of MDX (multi dimensional expressions). Eg KPIs that measure company performance. Lot of 3rd part tools support SSAS and MDX query language and dashboards can be developed using SharePoint or other tools.

UDM consists of the below components:
1.) Data Source
2.) Data Source View- helps us visualize the database schema and augment it using calculated columns and calculated queries
3.) Dimensional Model
4.) Calculations (optional)- augment the cube using custom logic of MDX such as QTD, YTD etc
5.) End User Model (optional)- Additional features that can be built to give richer scemantics including KPI’s, actions, perspectives and translations.
6.) Management Settings-Admin configures the cube to meet various operational requirements including availability, latency and security.

Defining a Data Source:
Note:
1.)Choose native OLE DB provider because SSAS is implemented in native code.
2.)Authentication option:
Standard security –less secure than windows authentication since need to manage the passwords
Windows Authentication-If the user is a system admin then all the operations will succeed.
3.)Impersonation information:
-Use a specific username n pwd
-Use service account- this is appropriate if SSAS service acctn is the domain or local account and has read access to the underlying data source.
-Use credentials of the current user- this is seldom used since it applies to certain data retrieval scenarios involving querying data mining models.
-Inherit- lets the server decide the mode. The server uses service acctn for cube processing and current user for querying data mining models.

Creating DSV:
IF the primary key and table relationship are defined in the underlying database the wizard automatically detects it . If not you need to add manually by going to the DSV designer and right click and set Logical Primary Key. For a foreign key drag the foreign key of a table to its related Primary key in another table and the relationship automatically creates.
Note: The DSV designer does not validate any logical Pk and table relationships that are manually created so if the underlying tables break any relationship one would not know until the cube is fully completed.
Defining DSV Named Calculations and Named Queries:
Named Calculations: A specific table might have columns that are derived from other columns from the same table or columns of another table.
Named Queries: Helps creating a new table
Note: As the requirements evolve it might be required to replace a table using a new table so in that case named Query can be used to write a query to replace the old table.

Creating and Modifying Cubes:
The next task is to create the dimension layer of the UDM. This includes design dimensions, cubes, measure groups and measures. The cube wizard reads metadata from the DSV and the underlying relational database to deduce the structure of the cube and its related dimensions.
Notes:
Use existing tables is the most common choice for cube creation but the option generate tables in the data source can be used to design the UDM from top to bottom in absence of data source.
A measure group in the UDM is equivalent to the fact table in dimensional modeling. It contains numeric measures that change over time.
Measures and dimensions can be renamed and removed by clearing the check box.

Cube designer tabs:
1.) Cube Structure- used to edit the cube structure
2.) Dimension usage- Defines the relationship between dimension and measures
3.) Calculations- Contains MDX formulas
4.) KPIs- Defines KPIs
5.) Actions- used to create actions eg reporting or launching URLs available to the end user while browsing the cube
6.) Partitions-storage structure is defined
7.) Aggregations- Creates custom aggregation designs
8.) Perspectives-used to create subsets of cube content for easier end user browsing
9.) Translations-defines language translations for cube metadata
10.) Browser- used to browse the cube

Measure/ Dimension Properties:
Measure pane lists the measure groups and measure defined in the cube. The property window describes the properties related to the selected object.
Name: How the measure id called in the end user tool.
FormatString: How the measure should be formatted eg currency or percentage.
Note: It is important to note that any properties you change in the cube do not affect the design of the dimension modeling outside the cube. A dimension included in the cube is an instance of dimension (called the cube dimension) that is defined within the dimension folder of the project.

Creating and Modifying Dimensions:
A dimension gives the user the abilities to isolate, drill down, roll up, categories, filter, summaries and perform other actions on data. It gives the user the ability to analyze data by subject areas.
Creating a dimension:
Notes:
1.)While creating a dimension in the dimension wizard the option Generate a time table in the source option lets you create a time dimension in the data source and then lets you populate the table with range of dates. The option Generate a time table in the server is useful when the fact table contains columns of DateTime date type and resolving these columns to a designated time table is not an option.
2.)Best practice is to use a designated time dimension table in the data source. This helps to enhance the time table with additional attributes like holiday flag etc.
3.)In the dimension wizard, in the select dimension attribute page by default each attribute forms a hierarchy called the Attribute Hierarchy and lets the user analyze data by the members of this hierarchy. Eg Color attribute contains the distinct values of the color column. This lets the user see the aggregated measure values of each member of the attribute hierarchy. Eg user can browse sales amount measure by a product color.
Furthermore you can improve the performance of browsing dimension data through user defined hierarchies by defining attribute relationships between attributes within the hierarchies.
You can clear the enable browsing checkbox to hide the attribute when a user browses a cube.
4.)SSAS supports assigning a pre defined attribute type to an attribute. Imp in time dimension because it tells the server that the attribute represents time periods (year, quarter, Month etc) when you write related calculating eg YTD, MTD , QTD.
5.)The properties are set both on the dimension object and the attributes of the dimension.
6.)Configuring Column Bindings:
Three properties used for column bindings: KeyColumn, NameColumn and ValueColumn.
By default the Key column property of English Month Name is configured to EngMonthName column and if you process the dimension the output can be viewed in the browser and it shows the English Month Name attribute hierarchy ill show the month in alpha order.
Can manipulate this property by using namecolumn property so first change the keyColumn property to point to MonthNumberOfYear and when the dimension is processed the result is 1 to 12 ascending order. This is where NameColumn property becomes useful so here we provide a mapping to the EnglishName. Processing the dimension gives Jan till Dec.

Modifying a Dimension:
You can add/delete an attribute by removing and adding in the attribute pane.
Note:
1.)For the dimension the following 2 properties define the storage mode:
--Storage Mode-defines if the attribute is stored using MOLAP or ROLAP. Most Common MOLAP deployment.
--Proactive Caching- determines whether SSAS processes the updates automatically .Most solutions do not deploy Proactive Caching

Attribute Properties:
1.) Name
2.) Usage- setting of key means that it’s a key column which other attributes relate to. A setting of parent means that it is a part of parent child relationship.
3.) KeyColumns - columns used for Pk
4.) NameColums - optional specifies the columns that provide name values
5.) AttributeHierarchy -Enabled- True if the attribute is used in a multi level hierarcy.
6.) AttributeHierarchy -Visible – if the attribute can be seen and selected by the users
7.) IsAggregatable – Specifies whether the values of the attribute members can be aggregated.
8.) OrderBy - specifies if the members of the attribute are ordered by its own key or name columns.
9.) OrderByAttribute - provides the name of the attribute that provides ordering
10.) Type- specifies the type of info contained in the attribute
Note:
It is important to add the dimension that is created after the cube id processed. They can be added by clicking add cube dimension button at the toolbar.
1 dimension can be added more than once eg time dimension might have to be added more than once to support analysis by different dates eg due date, order date, ship date etc. A dimension that is added multiple times to the same cube is called the Role-Playing dimension.
A role-playing dimension is a dimension that is used in a cube more than one time, each time for a different purpose. For example, you might add a Time dimension to a cube three times to track the times that products are ordered, shipped, and received. Each role-playing dimension is joined to a fact table on a different foreign key.
Dimension usage tab can be used to assign meaningful names to the multiple used dimensions with the appropriate measure groups.

Creating Measure Groups and Measures:
2 scenarios when new measure groups are required:
1.)When new fact table has been added to the database:
As relational table is a container of columns, a cube measure group contains related measures. As new column is added to a fact table, there is a need to add corresponding measures to the measure group.
Use theusage drop down list to select its aggregate behavior . Based on how they aggregate they can be classified as additive (can be summed across all dimensions eg sales amount), semi additive (can be summed across all dimensions except time account balance) or non additive (cannot be summed eg interest rates).
2.) When you need a distinct count measure that calculates the number of fact rows with unique values eg number of unique customers that have places orders. Because of the required server overhead to store distinct count measures a measure group can have only 1 measure with the distinct count aggregation function.
The new measure Group Dialouge box does not let you select a fact table that is already being used by another measure group. The distinct count measure must be placed in a separate measure group. As a workaround the cube will automatically create a new measure group when you try to add a measure that contains a DistinctCount aggregate function to the existing measure group.
Make sure the underlying fact table is included.
Best Practices:
The end user can browse data in the measure group only by dimensions that are joined to the measure group. If the dimension is not joined to a measure group the IgnoreUnrelatedDimension measure group property controls the result when a user browses the measure group by that dimension. When a new measure group is added, the cube designer examines the relationship between fact table and the cube dimensions and sets the dimension usage accordingly. Go to Dimension usage tab to review the settings.
Notes:
A measure group must contain at least 1 measure.
The cube designer adds a system generated measure that contains a count aggregate function. This is useful if you need the server to automatically calculate the number of fact table rows.
Eg if the Internet Sales Measures group represents a customer order and the user slices the cube by year the Internet sales count measure will return the number of orders places in the year. You can remove the count measure after the cube designer has generated the measure group.

Measure Group Properties:
ErrorConfiguration-Specifies default or custom error config
EstimatedRows-number of rows of fact table already know in this case help the server choose optimum data structure.
IgnoreUnrelatedDimensions- Defines the server behavior when the user browses the measure by unrelated dimensions to the measure group.
ProactiveCaching-none
ProcessingMode-Regular (users cannot access the measure group until processing is complete), LazyAggregations (data is accessible as soon as it is processed but processing takes longer)
StorageMode- MOLAP, ROLAP, HOLAP
Tyoe-information about the contents of the measure group

Measure Properties:
AggreagateFunction:
1.)Additive (additive across all dimensions) eg sales amt, volume of participant’s etc, Semiadditive (additive across dimensions except time dimension) eg account balance and nonAdditive (cannot be added across dimensions) eg rates etc
2.)Display folders: Groups measures into logical folders
3.)DataType: Needs to be explicitly set for count and distinct count measures. Inherit gets the data type from the underlying DSV.
4.)FormatString : Defines the measure display format
5.)MeasureExpression: Defines the measure expressions.
6.)Source: Defines the measure bindings. A count measure can be bound to a row if it needs to count rows.
7.)Visible: When set to false the measure is hidden from the end users though can be used in MDX expressions.

Microsoft BI Certification

Sunday, August 28, 2011

DEVELOPING SSAS CUBES

No comments:

Post a Comment