Monday, February 7, 2011

Overview of DataWarehouse

DataWarehouse:
A data warehouse is a relational database that centrally stores a consistent view of the data from transactional systems in a denormalised format. This makes the retrieval of the data more efficient than it would be from a transactional system.
A data warehouse will consist of one or more data marts. A data mart typically represents an area of the business such as sales or finance departments. Within each data mart, there are two types of tables created—fact tables and dimension tables.

Dimension tables contain columns that hold information that represents a business entity. You can define many dimension tables. An example of a business entity that is represented as a dimension table could be customers. The columns will contain information that is specific to that entity. In the case of the customers dimension table, columns can include Firstname and Lastname. Dimension tables will also contain a primary key column that uniquely identifies the records in the dimension table. The primary key also defines a one-to-many relationship with an associated column or columns within a fact table.

The fact table contains information about historical transactions and typically represents this information as integer data. For example, you can have three dimension tables named Customers, Region and Time. Each has a primary key column that relates to the fact table. In this example, Customer 6231 in the fact table is equal to Francois Ajenstat in the dimension table. Region 4 is equal to the value of West and the Time value of 3423 is equal to the 18th of August, 1975 in the Time dimension table. The additional columns will contain measures about the transaction such as order quantity and sales amount. Using integer data to represent historical transactions allow SQL Server Analysis Services to process the data much more efficiently.
It consists of three groups of columns. The first is a primary key column to maintain the integrity of the fact table itself. The second group of columns are known as foreign keys. Each of these columns relate to a dimension table primary key to provide a relationship between the fact table and the contextual information provided by the dimension table. Imagine that the image represents a Sales data mart, and that the three smaller tables represent dimension tables called CustomerDim, TimeDim and RegionDim. The fact table, which would be the big table in the image, would contain three foreign key columns that would relate directly to each dimension table.
The fact table also contains a third group of columns known as measures. This group of columns hold business metrics such as sales units, sales amounts and cost amounts. As a result, the fact table ideally consists of business information represented by integer values.

Also note that dimension tables can be shared between different data marts. A common example of this is a Time dimension table. This could be used by different data marts to provide consistent information about time across multiple data marts. These types of dimension tables are known as conformed dimensions. In order to manage the movement of data from source systems to the data warehouse, staging tables may also be created in the data warehouse. This allows tables to act as intermediary tables for cleansing data before loading the cleansed data into the respective dimension tables.

DATA-WAREHOUSE STRUCTURES
Star Schema:
A star schema describes the layout of the fact tables and the dimension tables within a data mart. In a star schema, the fact table resides at the center of the data mart. There are dimension tables that have a direct join with the fact table.

SnowFlake Schema:
Within a snowflake schema, the fact table resides at the center of a data mart. Some dimension table have a direct relationship to the fact table. However, there are dimension tables that have a direct relationship with the fact table, but have another relationship to another table. These relationships can also extend to further tables as well.
An example of this can be related to a data mart about a products business entity. In this example, you have a one-dimension table that lists columns about the product including color, size, category and subcategory. If you have 10,000 products spread across 4 categories and 25 subcategories, it is more efficient to store the subcategory information within a separate table with one record for each subcategory and relate this to a separate category table. Rather than repeating the same information many times within a single dimension table to unnecessarily increase the size of the dimension table, it is more efficient to store this information in separate tables.

USING A DATAWAREHOUSE IN A BI SOLUTION:
SQL Server 2008 provides Business Intelligence tools that can be used to populate and utilize a data warehouse. Once the data warehouse has been created in the database engine, SSIS can be used to extract data from source systems.
1.)This data may then be loaded into a staging table where the data is then transformed, and then extracted as cleansed data from the staging table and loaded into the dimension tables within the data warehouse. Once the dimension tables are loaded, SSIS will then populate the associated fact tables with the correct records.
2.)You also have the option to extract data from source systems and load data directly into the data warehouse without the need for staging tables.
When the data warehouse has been fully populated, SSAS will then use the data warehouse as a basis for creating the cube. The dimension tables in the data warehouse acts as the axis of the cube to provide contextual information. These are known as dimensions. The data within the fact table is used as the basis for information that is intersected at each axis within the cube. Furthermore, SQL Server Analysis Services will generate preaggregated information of the data found within the fact table. These are known as measures. You can build cubes without the use of a data warehouse; however, this approach is not common.

SQL Server Reporting Services can then be used to create reports by querying data within the cube, within the data warehouse or from a wide range of data sources.

1 comment:

  1. Jammy Casino - KSH 7
    Jammy Casino 양산 출장안마 has 50000 slots games 전주 출장안마 and is 여주 출장마사지 the 여주 출장샵 only place to 안성 출장안마 play with real money. For a low deposit, we have to give away 5,000 coins a day.

    ReplyDelete