Microsoft BI Certification: Developing SSAS Cubes: Creating Data Sources and data Source Views

CREATING DATA SOURCES AND DATA SOURCE VIEWS:
SSAS works best with database schema's that are designed to support data analytic s and reporting. The methodology you use to architect such schemes is called dimensional Modeling.
The fact table and its dimensions make a star pattern typical of OLAP. With star schema the dimension table is contained in a single table.
After the dimension table is in place you can build the UDM (Unified Dimensional model) or cube on top of it.

Options for developing Analysis Services:
1.)Project mode:
You can either select the Analysis Services Project template or the Import Analysis Services 9.0 Database Wizard in the Business Intelligence template of the Add New Project dialog box.
Working in the project mode helps you to develop Analysis Services solutions without the need for an active connection to the database. Furthermore, projects created in Business Intelligence Development Studio can work with source control technologies such as Microsoft Visual SourceSafe, allowing multiple developers to work with the same Analysis Services objects without overwriting each other’s work.
The project mode also provides the flexibility to deploy the solution to a test environment before deploying the solution to a production environment. It is best practice to develop Analysis Services solutions in the project mode.
2.)Online mode:
In Business Intelligence Development Studio, you can connect directly to cubes in an Analysis Services database and make modifications to a live cube. The changes that are made occur immediately, and no Analysis Services project is created.

To connect to Analysis Services 2008 in online mode, perform the following steps:
Open Business Intelligence Development Studio.
On the File menu, point to Open, and then click Analysis Services Database.
Select Connect to existing database.
Specify the server name and the database name.
Click OK.

UDP Consists of several components:
1.)Data Source: represents a connection to the database where the data is stored.
2.)Data Source Views: Abstracts the underlying db schema. Can add calculated columns to a DSV when the security prevents you from changing the schema.
3.)Dimension Model: After the DSV is created the next step is to create a cube dimensional model with dimension and fact tables.
4.)Calculations: Only in rare cases will the dimensional model alone meet the needs. Custom logic like MDX expressions such as Quater to date(QTD) and year to date(YTD) can be built from such calculations.
5.)End User Model: It defines the additional features you built on top of dimensional layer to provide richer data semantics. They include KPI's, actions, perspectives and translations.
6.)Management Settings: At the last the savvy admin would configure the cube to meet operational requirements including availability, latency and security.

1.)Defining a new data source:
Setting the Connection Information:
When setting the connection information you need to decide the authentication option the server will use to connect to the data source to read data.
1.)Windows integrated Security- This option connects to the data source under the identity of windows account. Data source operations initiated in BIDS are performed under the identity of the interactive user ie you.
2.)Standard Authentication- User name and Password
By selecting the SQL Server Auth option it is selected. Same credentials will be used both by BIDS and SSAS connections. This authentication is less secure then Windows Authentication.

Impersonation Information:
1.)Specify Windows Username and Password-
Lets you connect to the data source through a predefined windows account
2.)The service Account-
It performs the data operations under the identity of the SSAS windows service. This option is appropriate if the SSAS service account is domain or local account and has read access to the underlying data source data.
3.)Use the credentials of the current user-
It impersonates the identity of the interactive user. This option is rarely used since certain data retrieval scenarios involving data mining models.
4.)Inherit-
Lets the sever decide what impersonation mode is appropriate. For cube processing the server uses service account and for querying data mining models it uses credentials of current user.
Note:
Use Service Account option generally

2.)DATA SOURCE VIEWS(DSV):
Data Source Views provide a logical data model of related tables, views and queries from one or more data sources. As a result, you can define a subset of data in the data source view. This is typically done when you want to expose a data mart from a data warehouse that will be used as a basis for creating a cube. The data source view will hold metadata about the underlying tables and relationships from the source data.

As the data source view is a logical model, changes can be made to the data source view without changing the underlying schema in the data source.

DSV Keys and Table Relationships:
One purpose of DSV is to define the Primary key for the dimension table and the table relationships between dimension and fact tables. If Primary keys and relationships are defined within the underlying tables then the DSV will detect it automatically and add to DSV. If they are not defined eg when using views instead of tables then one must manually add them logically in the DSV schema.
This can be done by:
1.)Within the DSV designer select one or more columns that make up the key for the table by holding down the control key and then clicking the relevant columns.
2.)Right click the selected columns and select "Set logical Primary Key".
To define a logical relationship drag the foreign key from a fact table to its related primary key within a dimension table.The resulting relationship will appear as a arrow pointing from foreign key to the primary key table.

Note: Make sure you understand the underlying relational structure and the data before you create a logical Primary key and table relationships. The DSV designer does not validate any logical primary keys or table relationships that are created manually. If data issues in the underlying table break any logical definitions, one will only come to know once the cubes are designed and processed using the DSV.

Named calculations and Named Queries:
Named Calculation: It is a column based on an expression. It is a computed column which is a part of a DSV.
eg A named calculation can reference another table. Our eg has 2 tables Product and ProductCategory table and both tables have a ProductCategoryID column. The productcategory table has a CategoryName column. You need a named calculation in the product table that shows the Product Category by referencing the CategoryName column from Product Category table. Following SQL statement is used.
(Select CategoryName from ProductCategory
where ProductCategoryID=products.ProductCategoryID)
Named Queries:
It is created with a full select statement with one or more columns.

3.)CREATING AND MODIFYING DATA CUBES:
After creating a data source and DSV that model the underlying data structures, the next task is to create the multidimensional layer of UDP.
There are 3 ways by which wizard for generating the cube definition:
1.)Use Existing Tables:
Helps to generate the cube on the bottoms up approach based on the DSV created.
2.)Create empty Cube:
Need to design the dimensional model from scratch.
3.)Generate tables in Data Source:
This is a top down approach. Select this option when you prefer to design the UDM from top to bottom during the absence of data. After the UDM is in place you can auto generate the supporting schema.
Note: Common to use the first option.

Modifying the Cube and the Cube Designer:
BIDS opens the resulting cube in a cube designer. The cube designer interface has 10 design tabs across the top. You can use these tabs to modify he various aspects of the cube.
Cube Designer Tabs:
The tab names and their functions are as below:
1.)Cube Structure:
Used to edit the basic structure of a cube.
2.)Dimension Usage:
Defines relationships between dimension and measure groups in a cube.
3.)Calculations:
Contains MDX formulas and scripts to define calculated members.
4.)KPI's:
Defines KPI's within a cube including formulas for KPI status and trends.
5.)Actions:
Used to create actions such as running reports or launching urls that will be avaliable to the end users as they browse the cube.
6.)Partitions:
Defines the storage structures used for each measure group within a cube.
7.)Aggregations:
Used to create custom Aggregation designs.
8.)Perspectives:
Used to create subsets of the cube content for easier end user browsing.
9.) Translations:
Defines the lang for cube metadata eg measures and dimension names.
10.)Browser:
Used to browse the cube data

Note:When you modify certain properties for a dimension to change its behavior in a cube, the physical design of this dimension is inherited from the dimension at the database level.

CREATING AND MODIFYING DIMENSIONS:
Note: dimension first is more preferable. In this way, you now control the attributes that exist in the dimension, and define any hierarchies that you deem appropriate for your business users. By taking this approach, you can run the Cube Wizard to create the cube, and then add existing dimensions within the Cube Wizard so that after the wizard completes, the dimensions have already been defined without the need to further edit the dimensions.
When the cube wizard generates a set of dimensions it only adds primary key columns as dimension attributes. As the requirements evolve, you might need to add new dimensions or modify existing dimensions to the UDM dimensional model.

Creating a new Dimension:
Attributes can be organized into user-defined hierarchies that provide navigational paths to assist users when browsing the data in a cube. Instead of browsing a single flat hierarchy of attributes, you can improve the user's experience of reading the dimension data by creating user-defined hierarchies. For example, a Time dimension consists of a user-defined hierarchy named Calendar Year that contains the Year, Quarter and Month levels within the hierarchy for users to navigate through the time dimension data.
Furthermore, you can improve the performance of browsing dimension data through user-defined hierarchies by defining attribute relationships between attributes within the hierarchy. Although attribute relationships are not visible to the users, they can have an impact on performance by creating more effective aggregates within the cube that users may query.
Dimensions, attributes, user-defined hierarchies and attribute relationships are created using the Dimension Wizard.

Microsoft BI Certification

Thursday, April 7, 2011

Developing SSAS Cubes: Creating Data Sources and data Source Views

No comments:

Post a Comment