Showing posts with label SSAS. Show all posts

Monday 22 December 2014

Partition Benefits in SSAS

There are following benefits of using partition in SSAS

Ø Minimize downtime: Cube partitioning supports reducing downtime associated with cube processing. In almost all the cases, a portion of data warehouse is volatile and needs to be processed often. However other portions are relatively static. For example, in a sales cube, we need to change the current year's data nightly, but sales from previous years might change only occasionally - in case if account for merchandise returns and exchanges. If your warehouse tracks last 10 years sales then processing only the current partition may be 10 times quicker than processing the entire cube.

Ø Better Query Performance: Cube partition is a powerful mechanism for improving query performance. Queries that summarize data over 10 years could take considerably longer than those that only search through the current year data. If we have proper partitions then SSAS only has to scan a small subset of data to return query results hence dramatic performance improvements compared to queries running against a cube with a single partition.

Ø Aggregations benefits: The partition queried frequently could benefit from additional aggregations, which in turn could improve performance. Partition(s) that are used less can be processed less frequently with considerably fewer aggregations.

Ø Customized storage and processing settings: Frequently accessed partitions might benefit from proactive caching and ROLAP storage. On the other hand, other forms of storage and processing might be better for less frequently queried partitions.

Ø Distributed query and processing load: SSAS allows you to create remote partitions - a remote partition resides on a server different from its parent cube. This way the queries that affect the remote partition are processed on a server separate from its parent cube, allowing you to take advantage of additional processing power.
Ø Parallel Partitions Processing: SSAS allows processing multiple partitions in parallel on a server that has multiple processors. This can further reduce the total cube processing time

Saturday 20 December 2014

Creating Cube in SSAS 2008 Step by Step

Nowadays, analytical solutions are becoming mission critical for many organizations. Microsoft SQL Server 2008 Analysis Services (SSAS) is designed to provide exceptional performance and scalable support with millions of records and thousands of users from different locations.

Why to Build a Cube?
There are many advantages of cube over relational data mart.

While querying a data mart, you can get most of the results but not everything you need for business analysis and decision making. Cube can help you to get answers of all "What-If" scenarios.
Building a cube helps to house your data to centralize the business rules for calculations that you can't easily store in a relational data mart.
The structure of the cube makes it much easier to write queries to compare data year over year (YOY), or to create cumulative values such as year-to-date (YTD) sales.
Scalable Infrastructure - Analysis Services can scale to support databases of many terabytes in size with many thousands of users.
Superior Performance - Analysis Services cubes are multidimensional structures that enable fast access to high volumes of pre-aggregated data, empowering end users to gain insight into relevant business data at the speed of thought.
You gain the ability to manage aggregated data in the cube. To improve query performance in a relational data mart, we often create summary tables to prepare data for queries that don't require transaction-level detail. SSAS creates the logical equivalent of summary tables (called aggregations) and keeps them up-to-date.

Creating a Cube consists of the following steps:

Creating Analysis Services Project
Creating Data Source
Creating Data Source View
Creating Cube and Dimensions
Creating Dimension Hierarchies
Deploying Cube Database from BIDS

1. Creating Analysis Services Project.

First step is to create a project in Business Intelligence Development Studio (BIDS). Launch BIDS from Start --> All Programs --> Microsoft SQL Server 2008 --> SQL Server Business Intelligence Development Studio and then click File --> New --> Project. In the New Project dialog box, select Analysis Services Project. In the Name text box, type MYSSASFirstProject and, if you like, change the location for your project. I'll store this project at location D:\SSASProjects. Click OK to create the project.

2. Creating Data Source

Now add a data source to define the connection string for data mart AdventureWorksDW2008R2. In Solution Explorer, right-click the Data Sources folder and click New Data Source.

In the Data Source Wizard, click Next on the Welcome to the Data Source Wizard page if it hasn't been disabled. On the Select how to define the connection page, click New to set up a new connection. In the Connection Manager, the default provider is the Native OLE DB\SQL Server Native Client 10.0, which is correct for our project.

To define the connection, type the name of your server in the Server Name text box. Alternatively you can select it from the drop-down list, then select AdventureWorksDW2008R2 in the database drop-down list and click Test Connection button to check the connection. Finally click on OK as shown below:

When you're back in the Data Source Wizard, click Next. On the Impersonation Information page, select Use the service account option so that service account will be used to read data from the source when loading data into your SSAS database and service account must have read permissions to do so. Click Next and then Finish to complete the wizard.

3. Creating Data Source View

Now next step is to create a data source view (DSV) from the data source to define dimensions and cubes. You can make changes to the DSV without modifying the actual data source, which is very useful if you have only read permissions to the data mart. In Solution Explorer, right-click on the Data Source Views folder and then click New Data Source Views...You can see Data Source View Wizard. Click Next on the Welcome page. On the Select a Data Source page, select the data source just added to the project (Adventure Works DW2008R2.ds) and click Next. Now add required objects to the DSV by double-clicking each table or view on Select Tables and Views page. I want to add the following tables to the DSV to make it easy to understand for beginners:

DimDate, DimProduct, DimProductCategory, DimProductSubcategory, and FactInternetSales. You can always add more tables later if you want to explore advance BI questions. Now click Next in the Data Source View Wizard once you are finished adding required tables followed by click on Finish. You can give a name to your DSV before clicking Finish button.

I would recommend you to change the name of objects by selecting each one in the DSV designer and remove the Dim and Fact prefixes from the FriendlyName property because when you create dimensions and cubes, only FriendlyName property will be assigned to the objects.

The DSV is shown below:

4. Creating Cube and Dimensions

Next step is to create a Cube and Dimensions from the data source view. In Solution Explorer, right-click on the Cubes folder and then click New Cube...You can see Cube Wizard. Click on Next in Welcome to the Cube Wizard page.On the Select Creation Method page, keep the default option Use existing tables and click Next button. On the Select Measure Group Tables page, choose InternetSales table and click Next.

Now the wizard displays all the measures available in the selected measure group tables. Measures are basically numeric values e.g. OrderQuantity, Unit Price, Sales Amount, Tax Amount etc. Select only the following measures from Internet Sales Group: Order Quantity, Unit Price, Total Product Cost, Sales Amount, and Internet Sales Count

Now click on Next button to open Select New Dimensions page and select Date and Product dimensions. Click Next to proceed.

In the Completing the Wizard page, enter the cube name as AdventureWorksCube and click Finish button to complete the wizard. Cube layout is shown below:

Now click on each dimensions and add required attributes from the Data Source View.

Date Dimension:

Drag and drop FullDateAlternateKey, CalendarYear, CalendarQuarter, EnglishMonthName, and EnglishDayNameOfWeek. Rename FullDateAlternateKey with Full Date, EnglishMonthName with Calendar Month, and EnglishDayNameOfWeek with Calendar Week as shown below:

Product Dimension:

Drag and drop EnglishProductCategoryName from ProductCategory table, EnglishProductSubcategoryName from ProductSubcategory table and Color, ModelName, Size and Weight from Product table.

5. Creating Dimension Hierarchies

Navigate to Date Dimension Structure. Drag and drop Calendar Year attribute into Hierarchies surface area following by Calendar Quarter, Calendar Month, Calendar Week , and Full Date attributes.

Rename hierarchy with Calendar. You'll see a warning mark in the hierarchy because attribute relationship is not set properly.

Set Attribute Relationships

Click on the Attribute Relationships tab in the dimension designer. This tab is available only in Analysis Services 2008. By default, all attributes relate directly to the key attribute, Date Key as shown below:

To optimize the design by reassigning relationships, Right Click on Full Date and select New Attribute Relationship. Select Related Attribute as Calendar Week and Relationship type as Rigid (will not change over time). Repeat same thing for remaining attributes. Finally Attribute Relation will look like below image:

6. Deploying Cube Database from BIDS

Now it’s time to deploy the cube at required server. Right click on the project (MYSSASFirstProject in this example) and click properties to open project properties page. Enter Server Name name Server property and Database name in Database property as shown below:

Click OK to save changes. Now right click on the project and click on Process to Build and Deploy the project. You will a message while deploying the database first time.

Click on Yes to proceed. Now you can see Process Database - PreojectName window. Click on Run to continue.

Once the database is deployed and processed successfully, you can see the data through Browser tab or directly through SQL Server Analysis Services. You can also pull cube data in Excel using Excel OLAP Pivot Tables, which is the preferred option used by business managers.

Structure of Cube

In Cube Designer, you can view and edit various properties of a cube. The designer contains the following tabs, which display different views of the cube.

Cube Structure

Use this tab to modify the architecture of a cube.

Dimension Usage

Use this tab to define the relationships between dimensions and measure groups, and the granularity of each dimension within each measure group. If you use multiple fact tables, you might have to identify whether measures do not apply to one or more dimensions. Each cell represents a potential relationship between the intersecting measure group and dimension.

Calculations

Use this tab to examine calculations that are defined for the cube, to define new calculations for the whole cube or for a subcube, to reorder existing calculations, and to debug calculations step by step by using breakpoints. Calculations let you define new members and measures based on existing values, such as a profit calculation, and to define named sets.

KPIs

Use this tab to create, edit, and modify the Key Performance Indicators (KPIs) in a cube. KPIs enable the designer to quickly determine useful information about a value, such as whether the defined value exceeds a goal or falls short of the goal, or whether the trend for the defined value is getting better or worse.

Actions

Use this tab to create or modify drill through, reporting, and other actions for the selected cube. Actions provide to client applications context-sensitive information, commands, and reports that end users can access.

Partitions

Use this tab to create and manage the partitions for a cube. Partitions let you store sections of a cube in different locations with different properties, such as aggregation definitions.

Perspectives

Use this tab to create and manage the perspectives in a cube. A perspective is a defined subset of a cube, and is used to reduce the perceived complexity of a cube to the business user.

Translations

Use this tab to create and manage translated names for cube objects, such as month or product names.

Browser

Use this tab to view data in the cube.

Review Cube and Dimension properties in Cube Designer

1. Measure Pane

In the Measures pane of the Cube Structure tab in Cube Designer, expand the Internet Sales measure group.
The measures that are defined for the Internet Sales measure group appear. You can change the order of these measures by dragging the measures into the order that you want. The order will affect how certain client applications order these measures. The measure group is named Internet Sales because the underlying fact table had the friendly name of InternetSales in the data source view. Notice that a space was added automatically, based on the capitalized letter "S", to increase the user-friendliness of the name. The measure group and each measure that it contains have properties that you can edit in the Properties window.
The following image shows the measure group and measures in the Measures pane of Cube Designer.

2. Dimension Pane

In the Dimensions pane of the Cube Structure tab in Cube Designer, review the cube dimensions that are in the Analysis Services Tutorial cube.
Notice that while only three dimensions were created at the database level, as displayed in Solution Explorer, there are five cube dimensions in the Analysis Services Tutorial cube. The cube contains more dimensions than the database because the Time database dimension is used as the basis for three separate time-related cube dimensions, based on different time-related facts in the fact table. These time-related dimensions are also called role playing dimensions. The three time-related cube dimensions let users dimension the cube by three separate facts that are related to each product sale:
The product order date, the due date for fulfillment of the order, and the ship date for the order.
By reusing a single database dimension for multiple cube dimensions, Analysis Services simplifies dimension management, uses less disk space, and reduces overall processing time.
3. In the Dimensions pane of the Cube Structure tab, expand Customer, and then click Edit Customer.
The Customer dimension appears in Dimension Designer. (Note that Data Source View Designer and Cube Designer remain open.) Dimension Designer contains three tabs: Dimension Structure, Translations, and Browser. Notice that the Dimension Structure tab includes three panes: Attributes, Hierarchies and Levels, and Data Source View. The attributes that the Cube Wizard designed appear in the Attributes pane and the user hierarchy that the Cube Wizard defined appears in the Hierarchies and Levels pane. The Data Source View pane displays the tables in the data source view from which columns are used as attributes in this dimension.
You add, remove, and edit hierarchies, levels, and attributes on the Dimension Structure tab of Dimension Designer.
The following image shows the Dimension Structure tab of Dimension Designer.

4. Switch to Cube Designer by clicking the tab in the design environment or by right-clicking the Analysis Services Tutorial cube in the Cubes node in Solution Explorer and then clicking View Designer.
5. In Cube Designer, click the Dimension Usage tab.
In this view of the Analysis Services Tutorial cube, you can see the cube dimensions that are used by the Internet Sales measure group. When a cube has multiple measure groups, cube dimensions might be used with some measure groups but not with others. Also, you define the type of relationship between each dimension and each measure group in which it is used.
The following image shows the Dimension Usage tab of Cube Designer.

6. Click the Customer field next to Customer at the intersection of the Internet Sales measure group and the Customer dimension, and then click the ellipsis button (…).
The Define Relationship dialog box appears. In this dialog box, you define the custom dimension properties within a specific measure group. By default, dimensions have the same behavior in each measure group. However, they can have different behavior in different measure groups. Notice that the relationship of the Customer dimension to the Internet Sales measure group is a Regular relationship, which means that the DimCustomer dimension table is directly joined to the FactInternetSales measure group table. Notice also that the granularity of this dimension is at the lowest level, namely the Customer level, but that you can define different levels of granularity. In Lesson 5, you will learn about defining a custom granularity level.
The following image shows the Define Relationships dialog box.

7. Click Advanced.
The Measure Group Bindings dialog box appears, which lets you change the binding of each attribute and define null processing settings. The binding for an attribute specifies the column in the underlying dimension table to which the attribute is bound. By default, this setting is inherited from the dimension; this setting is rarely changed at the measure group level. Null processing settings let you define how Analysis Services treats null values during processing at the measure group level; these settings override any settings at the dimension level.
The following image shows the Measure Group Bindings dialog box.

8. Click Cancel, and then click Cancel again, to return to Cube Designer.

Accessing Cubes from Excel

Introduction

if reporting access to cubes can be provided from Microsoft Excel, then report building can be performed by an end user. Majority of the time, using this method, users can construct reports the way they wish. Improvements in Excel 2007 have provided a number of new fancy features that can be used with cubes.

Connecting to SSAS using Microsoft Excel

Launch Microsoft Excel first. Select the option, select the Data table, then select the From Other Services button in the Data ribbon, and then select From Analysis Services. As shown in Figure 2, a list of available data sources is provided.

In the Analysis Services option is the screen shown below. To connect to the SSAS servers, login credentials need to be provided. However, SSAS does not support SQL Server authentication, hence the Windows Authentication option must be selected.

After providing the login credentials, the SSAS database and the Cube need to be selected as shown below.

All cubes for the selected SSAS database will be listed. Only one cube can be specified at this point. The next step is to set the configurations to the Data connection file. There are options to specify the file name and the path for the Data connection file.

Next select sheet whether a Report or chart is wanted.

Viewing Data
now are the most fascinating steps, viewing the SSAS data from an Excel file. In the right side of the excel sheet, notice a PivotTable Field List. Within this list the KPI, measures and dimension attributes can be seen; the user is able to select the attributes they want.

To see the Sales amount for each product color and class.; simply select Class and Color from the Product dimension and Sales Amount.

A pivot table can be designed with columns and row headers as shown in below.

Monday 15 December 2014

Defining SSAS Data Warehouse

SSAS Cubes rely on relational data structure. Essentially, the SSAS Cubes is a collection of tables and relationships in DBMS. However, nomenclature is different in the OLAP world. We call tables not just entity tables but fact tables and dimension tables.

Fact Table is a relational database table that houses a measure.
Dimensions Table is a relational database table that houses a lookup data.
Combination of Fact and Dimensions can be stored in one table and this table is also called Fact Table.

There are two different data warehouse schemas exist in the OLAP world.

1. Star schema

2. Snowflake schema

Star Schema

SSAS Star Schema is designed with one table in the center or SSAS Fact Table and lookup tables or SSAS Dimension Tables linked to this central table.

SSAS Star schemas are easier to visualize and it is faster to retrieve data from the SSAS start schema. However, there is a need to flatten data or de-normalize and it takes more space on the disk. There are products that are configured to use only start schemas and not the others. So it may be a natural choice to use just this type.

Snowflake schema

SSAS Snowflake Schema is any, other than start, schema design. However, it is usually normalized. Data hierarchies are defined with the help of multiple related tables with the help of primary and foreign key relationships. SSAS Snowflake schemas contain multiple fact and dimensional tables. The SSAS Snowflake schemas have few advantages. They require less storage space due to highly normalized structure of its scheme. However, they are slower for data processing due to multiple joins which take most of the computational power. They are also harder to update.

Understanding SQL Server Analysis Services (SSAS) and Components

SQL Server Analysis Services (SSAS) is the technology from the Microsoft Business Intelligence stack, to develop Online Analytical Processing (OLAP) solutions. In simple terms, you can use SSAS to create cubes using data from data marts / data warehouse for deeper and faster data analysis.

Microsoft SQL Server Analysis Services (SSAS) delivers both online analytical processing (OLAP) and data mining functionality for business intelligence applications.

OLAP

Online analytical processing (OLAP) allows the user to access aggregated and organized data from business data sources, such as data warehouses, in a multidimensional structure called a cube. Microsoft provides tools and features for OLAP that user can use to design, deploy, and maintain cubes and other supporting objects.

Data Mining

Data mining gives the user access to the information that is needed to make intelligent decisions about difficult business problems. Microsoft provides tools for data mining with which user can identify rules and patterns in the data, so that the user can determine why things happen and predict what will happen in the future.

Cubes are multi-dimensional data sources which have dimensions and facts (also known as measures) as its basic constituents. From a relational perspective dimensions can be thought of as master tables and facts can be thought of as measureable details. These details are generally stored in a pre-aggregated proprietary format and users can analyze huge amounts of data and slice this data by dimensions very easily. Multi-dimensional expression (MDX) is the query language used to query a cube, similar to the way T-SQL is used to query a table in SQL Server.

Simple examples of dimensions can be product / geography / time / customer, and similar simple examples of facts can be orders / sales. A typical analysis could be to analyze sales in Asia-pacific geography during the past 5 years. You can think of this data as a pivot table where geography is the column-axis and years is the row axis, and sales can be seen as the values. Geography can also have its own hierarchy like Country->City->State. Time can also have its own hierarchy like Year->Semester->Quarter. Sales could then be analyzed using any of these hierarchies for effective data analysis.

A typical higher level cube development process using SSAS involves the following steps:

1) Reading data from a dimensional model
2) Configuring a schema in BIDS (Business Intelligence Development Studio)
3) Creating dimensions, measures and cubes from this schema
4) Fine tuning the cube as per the requirements
5) Deploying the cube

The basic concepts of OLAP include:

Cube
Dimension table
Dimension
Hierarchy
Level
Fact table
Measure
Schema

Cube

The basic unit of storage and analysis in Analysis Services is the cube. A cube is a collection of data that's been aggregated to allow queries to return data quickly. For example, a cube of order data might be aggregated by time period and by title, making the cube fast when you ask questions concerning orders by week or orders by title.

Cubes are ordered into dimensions and measures.

The data for a cube comes from a set of staging tables, sometimes called a star-schema database. Dimensions in the cube come from dimension tables in the staging database, while measures come from fact tables in the staging database.

Dimension table

A dimension table lives in the staging database and contains data that you'd like to use to group the values you are summarizing. Dimension tables contain a primary key and any other attributes that describe the entities stored in the table. Examples would be a Customers table that contains city, state and postal code information to be able to analyze sales geographically, or a Products table that contains categories and product lines to break down sales figures.

Dimension

Each cube has one or more dimensions, each based on one or more dimension tables. A dimension represents a category for analyzing business data: time or category in the examples above. Typically, a dimension has a natural hierarchy so that lower results can be "rolled up" into higher results. For example, in a geographical level you might have city totals aggregated into state totals, or state totals into country totals.

Hierarchy

A hierarchy can be best visualized as a node tree. A company's organizational chart is an example of a hierarchy. Each dimension can contain multiple hierarchies; some of them are natural hierarchies (the parent-child relationship between attribute values occur naturally in the data), others are navigational hierarchies (the parent-child relationship is established by developers.)

Level

Each layer in a hierarchy is called a level. For example, you can speak of a week level or a month level in a fiscal time hierarchy, and a city level or a country level in a geography hierarchy.

Fact table

A fact table lives in the staging database and contains the basic information that you wish to summarize. This might be order detail information, payroll records, drug effectiveness information, or anything else that's amenable to summing and averaging. Any table that you've used with a Sum or Avg function in a totals query is a good bet to be a fact table. The fact tables contain fields for the individual facts as well as foreign key fields relating the facts to the dimension tables.

Measure

Every cube will contain one or more measures, each based on a column in a fact table that you';d like to analyze. In the cube of book order information, for example, the measures would be things such as unit sales and profit.

Schema

Fact tables and dimension tables are related, which is hardly surprising, given that you use the dimension tables to group information from the fact table. The relations within a cube form a schema. There are two basic OLAP schemas: star and snowflake. In a star schema, every dimension table is related directly to the fact table. In a snowflake schema, some dimension tables are related indirectly to the fact table. For example, if your cube includes OrderDetails as a fact table, with Customers and Orders as dimension tables, and Customers is related to Orders, which in turn is related to OrderDetails, then you're dealing with a snowflake schema.

Microsoft Business Intelligence

Pages