Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 16

Case Study on OLAP

Name: VIJAY S. GACHANDE Class: TEIT Roll No.: 06 Subject: DBT Batch No.: B1

OLAP (On-Line Analytical Processing):

Why OLAP?

Need of OLAP:
1. Multidimensional Data making easy to -Select data -Navigate data -Integrate data -Explore data 2. Analytical Query Language providing -Filter the data relationship -Aggregate the data relationship -merge the data relationship -explore complex data relationship

OLAP CUBE

An OLAP cube as shown in fig(1.1.1) is a data structure that allows fast analysis of data. The arrangement of data into cubes overcomes a limitation of relational databases. Relational databases are not well suited for near instantaneous analysis and display of large amounts of data. Instead, they are better suited for creating records from a Series of transactions known as OLTP or On-Line Transaction Processing. Although many report-writing tools exist for relational databases, these are slow when the whole database must be summarized.

Fig.. Olap Cube

The OLAP cube consists of numeric facts called measures which are categorized by dimensions. The cube metadata is typically created from a star schema or snowflake schema of tables in a relational database. Measures are derived from the records in the fact table and dimensions are derived from the dimension tables.

CLASSIFICATIONS OF OLAP
In the OLAP world, there are mainly two different types: Multidimensional OLAP (MOLAP) ,Relational OLAP (ROLAP) and HOLAP(HYBRID). 1. MOLAP: This is the more traditional way of OLAP analysis. In MOLAP, data is stored in a multidimensional cube. The storage is not in the relational database, but in proprietary formats. Multidimensional OLAP is one of the oldest segments of the OLAP market. The business problem MOLAP addresses is the need to compare, track, analyze and forecast high level budgets based on allocation scenarios derived from actual numbers. The first forays into data warehousing were led by the MOLAP vendors who created special

purpose databases that provided a cube-like structure for performing data analysis.

MOLAP tools restructure the source data so that it can be accessed, summarized, filtered and retrieved almost instantaneously. As a general rule, MOLAP tools provide a robust solution to data warehousing problems. Administration, distribution, meta data Creation and deployment are all controlled from a central point. Deployment and Distribution can be achieved over the Web and with client/server models. Functions of MOLAP -Produces a hypercube -Pre-aggregated and pre-calculated -Rapid response times -Limited in the amount of data that can be managed When MOLAP Tools Bring Value? Need to process information with consistent response time regardless of level of summarization or calculations selected. Need to avoid many of the complexities of creating a relational database to store data for analysis. Need fastest possible performance. Who Should Have Access to MOLAP Tools? Users who are connected to a network and need to analyze larger, less defined data. Users who want to access predefined reports, but need to have the ability to perform additional analysis on information that may not be contained in the report. Advantages:

Excellent performance: MOLAP cubes are built for fast data retrieval, and is optimal for slicing and dicing operations. Can perform complex calculations: All calculations have been pre-generated when the cube is created. Hence, complex calculations are not only doable, but they return quickly. Disadvantages: Limited in the amount of data it can handle: Because all calculations are performed when the cube is built, it is not possible to include a large amount of data in the cube itself. This is not to say that the data in the cube cannot be derived from a large amount of data. Indeed, this is possible. But in this case, only summarylevel information will be included in the cube itself. Requires additional investment: Cube technology are often proprietary and do not already exist in the organization. Therefore, to adopt MOLAP technology, chances are additional investments in human and capital resources are needed. 2. ROLAP This methodology relies on manipulating the data stored in the relational database to give the appearance of traditional OLAP's slicing and dicing functionality. In essence, each action of slicing and dicing is equivalent to adding a "WHERE" clause in the SQL statement.. data warehouse sizes, users have come to realize that they cannot store all of the information that they need in MOLAP databases. The business problem that ROLAP addresses is the need to analyze massive volumes of data without having to be a systems or software expert. Relational OLAP databases seek to resolve this dilemma by providing a multidimensional front end that creates queries to process information in a relational format. These tools provide the ability to transform two-dimensional relational data into multidimensional information.

Due to the complexity and size of ROLAP implementations, the tools provide a robust set of functions for meta data creation, administration and deployment. The focus of these tools is to provide administrators with the ability to optimize system performance and generate maximum analytical throughput and performance for users. All of the ROLAP vendors provide the ability to deploy their solutions via the Web or within a multitier client/server environment. Functions of ROLAP -Data remains in a relational format -Some degree of aggregation -Slower response times -Scales to large amounts of data Advantages: Can handle large amounts of data: The data size limitation of ROLAP technology is the limitation on data size of the underlying relational database. In other words, ROLAP itself places no limitation on data amount. Can leverage functionalities inherent in the relational database: Often, relational database already comes with a host of functionalities. ROLAP technologies, since they siton top of the relational database, can therefore leverage these functionalities. Disadvantages: Performance can be slow: Because each ROLAP report is essentially a SQL query (or multiple SQL queries) in the relational database, the query time can be long if the underlying data size is large. Because ROLAP technology mainly relies on generating SQL statements to Query the relational database, and SQL statements do not fit all needs (for example, it is difficult to perform complex calculations using SQL), ROLAP technologies are therefore traditionally limited by what SQL can do. ROLAP vendors have

mitigated this risk by building into the tool out-of-the-box complex functions as well as the ability to allow users to define their own functions

3. HOLAP:

Functions of HOLAP -Can manage data both as ROLAP and MOLAP -Currently evolving -MOLAP vendors are finding it easier to move into the HOLAP market space

The Basic Structure


Data Staging Area Source Data
Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize conform dimensions store awaiting replication export to data marts No user query services Populate, replicate, recover

Corporate View
Data Mart #1 OLAP (ROLAP, MOLAP,HOLAP) dimensional access subject oriented user group driven refresh frequency conforms to the Bus DW Bus

Extract Extract

Data Mart #2
DW Bus

Data Mart #3

Corporate Staging Area


Data M #1 art OLAP (ROLAP, M OLAP,HOLAP) dimensional access subject oriented user group driven refresh frequency conforms to the Bus DW Bus

The Basic Structure

User Access
Ad Hoc Query Tools Reporting Tools and W riters

Data Feed

Custom ized Applications Data Feed M odels: forecasting; scoring; allocating; data m ining; scenario analysis; etc.

Data M #2 art
DW Bus

Data Feed

Data M #3 art

Operations:
Unfortunately, there is no consensus on the set of multidimensional operations and how to name them. However, in you find a comparison of algebraic proposals in the academic literature, besides a set of operations subsuming all of them. A sequence of these operations is known as an OLAP session. An OLAP session allows transforming a starting query into a new query. Figure 3 draws the transitions generated by each one of these operations (circles and triangles represent different attributes for Fact instances):

Selection

Roll-up/Drill-down

ChangeBase

Drill-across

Projection

Set operations (Union)

Selection or Dice: By means of a logic predicate over the dimension attributes, this operation allows users to choose the subset of points of interest out of the whole n-dimensional space
-Slice and Dice:Look at a specific interest of the business

Roll-up: Also called Drill-up, it groups cells in a Cube based on an aggregation hierarchy. This operation modifies the granularity of data by means of a many-to-one relationship which relates instances of two aggregation levels in the same Dimension, corresponding to a part-whole relationship (figure 3.b from left to right). For example, you could roll-up monthly sales into yearly sales moving from Month to Year aggregation level along the temporal dimension.
-Roll Up:Move from detail to summary

Drill-down: This is the counterpart of Roll-up. Thus, it removes the effect of that operation by going down through an aggregation hierarchy, and showing more detailed data
-Drill Down: Move from summary to detail

ChangeBase: This operation reallocates exactly the same instances of a Cube into a new n-dimensional space with exactly the same number of points (figure 3.c). Actually, it allows two different kinds of changes in the space: you can just rearrange the multidimensional space by reordering

the Dimensions interchanging rows and columns in the Cross-tab (this is also known as Pivoting), or it could add/remove dimensions to/from the space.
-Pivot and Rotate Looking at data from varying perspectives -Drill Through Move to a near transaction level of detail

Drill-across: This operation changes the subject of analysis of the Cube, by showing measures regarding a new Fact. The n-dimensional space remains exactly the same, only the data placed in it change so that new measures can be analyzed (figure). For example, if your Cube contains data about sales, you could use this operation to analyze data regarding production using the same Dimensions. Projection: It selects a subset of measures from those available in the Cube (figure). Set operations: These operations allow users to operate two Cubes defined over the same n-dimensional space. Usually, Union (figure f), Difference and Intersection are considered. This set of algebraic operations is minimal in the sense that none of the operations can be expressed in terms of others, nor can any operation be dropped without affecting its functionality (some tools consider that the set of measures of a Fact conform an artificial analysis dimension, as well; if so, Projection should be removed from the set of operations in order to be considered minimal, since it would be done by Selection over this artificial Dimension). Thus, other operations can be derived by sequences of these. It is the case of Slice (which reduces the dimensionality of the original Cube by fixing a point in a Dimension) by means of Selection and ChangeBase operations. It is also common that OLAP implementations use the term Slice & Dice to refer to the selection of fact instances, and some also introduce Drill-through to refer to directly accessing the data sources in order to Lower the aggregation level below that in the OLAP repository or data mart.

Data integration

1 Data resides in many distributed, heterogeneous OLTP (On-Line Transaction Processing) sources Sales, inventory, customer, . NC branch, NY branch, CA branch 2 Need to support OLAP (On-Line Analytical Processing) over an integrated view of the data 3 Possible approaches to integration I) Eager: integrate in advance and store the integrated data at a central repository called the data warehouse II) Lazy: integrate on demand; process queries over distributed sources

OLTP versus OLAP:


Usage Workload Access Query structure Records per operation Number of users Function Transactions/queries Users types Goal OLTP Application Specific Predefined Read/Write Simple Tens/ Hundreds Thousands/Millions Mostly updates Short, simple transactions Clerical users ACID, transaction throughput OLAP Decision support Unforeseeable Read-only Complex Thousands/Millions Tens/Hundreds Mostly reads Long, complex queries Analysts, decision makers fast queries

Main Differences between OLTP and OLAP are:1. User and System Orientation OLTP: customer-oriented, used for data analysis and querying by clerks, clients and IT professionals. OLAP: market-oriented, used for data analysis by knowledge workers( managers, executives, analysis). 2. Data Contents OLTP: manages current data, very detail-oriented. OLAP: manages large amounts of historical data, provides facilities for summarization and aggregation, stores information at different levels of granularity to support decision making process. 3. Database Design

OLTP: adopts an entity relationship(ER) model and an application-oriented database Design. OLAP: adopts star, snowflake or fact constellation model and a subjectoriented database Design. 4. View OLTP: focuses on the current data within an enterprise or department. OLAP: spans multiple versions of a database schema due to the evolutionary process of An organization; integrates information from many organizational locations and data.

APPLICATIONS OF OLAP

KEY APPLICATIONS Managers are usually not trained to query databases by means of SQL. Moreover, if the query is relatively complex (several joins and subqueries, grouping, and functions) and the database schema is not small (with maybe hundreds of tables), using interactive SQL could be a nightmare even for SQL experts. Thus, OLAP is used to ease the tasks of these managers in extracting knowledge from the data warehouse by means of Drag&Drop, instead of typing SQL queries by hand. OLAP market is estimated around 6 billion US$ in 2006, which is mainly devoted to decision making. However, this paradigm can also be

used in any other field with non-expert users, where schemas and queries are relatively complex. For example, its usage is under investigation in bioinformatics [8], and the semantic web [9].

Declarative languages
There are some research proposals of declarative query languages for OLAP. [1] proposes a graphical query language, while [3] proposes a calculus. From the industry point of view, MDX (standing for Multidimensional Expressions [5]) is the de facto standard. It was introduced in 1997, and in spite of the specification being owned by Microsoft it has been widely adopted. Its syntax resembles that of SQL. [ WITH <MeasureDefinition>+ ] SELECT <DimensionSpecification>+ FROM <CubeName> [WHERE <SlicerClause> ] However, its semantics are completely different. Roughly speaking, an MDX query gets the instances of a given Cube stated in the FROM clause and places them in the space defined by the SELECT clause. Moreover, complex calculations can be defined in the WITH clause, and the dimensions not used in the SELECT clause can be sliced in the WHERE clause (if not explicitly sliced, it is assumed that dimensions that do not appear in the SELECT are sliced at the higher aggregation level: All). WITH MEMBER [Measures].[pending] AS [Measures].[Units Ordered][Measures].[Units Shipped] SELECT {[Time].[2006].children} ON COLUMNS, {[Warehouse].[Warehouse Name].members} ON ROWS FROM Inventory WHERE ([Measures].[pending],[Trademark].[Acme]); In the previous MDX query, an ad-hoc measure pending is firstly defined as the difference between units ordered and shipped. Then, the children of the instance representing year 2006 (i.e. the twelve months of that year) is placed on columns, and the different members of the

aggregation level Warehouse Name on rows. Now, this matrix is filled with the data in Inventory cube, showing the previously defined measure pending and slicing Acme trademark.

FUTURE DIRECTIONS OLAP is used to extract knowledge from the data warehouse. Another kind of tool used with this purpose are data mining tools (see Data Mining definitional entry). Till now, both research communities have been evolving separately. The former must be interactive, while the latter presents computational complexity problems. However, it seems promising to integrate both kinds of tools so that ones can benefit from the others. In fact, it was already suggested in [4], and some tools like Microsoft Analysis Services already integrate them in some way. Nevertheless, there is much work to do in this field, yet. On the other hand, security is usually a flaw in data warehousing projects. [7] contains a survey of OLAP security problems. In the past, OLAP tools used to have just a few users and all of them had high responsibilities in the company, so this was not really a concern in the sense of confidentiality. Nowadays, with the increase in potential users of OLAP systems inside as well as outside the company, security has appeared as a priority in these projects (see Security in DWs definitional entry). Moreover, personal data (like those of customers) are usually analyzed in almost all companies. Thus, inference control mechanisms need to be studied in data mining as well as OLAP tools. Other research directions in OLAP can be the improvement of user interaction and flexibility in the calculation of statistics (see Visual OLAP definitional entry), and the integration of what-if analysis (see What-if Analysis definitional entry). URL TO CODE Some OLAP vendors: Microsoft Analysis Services: http://www.microsoft.com/sql/technologies/analysis/default.mspx Hyperion Solutions: http://www.hyperion.com Cognos PowerPlay: http://www.cognos.com/products/business_intelligence/analysis/index.ht ml Business Objects: http://www.businessobjects.com/products/queryanalysis/olapaccess/busin essobjects.asp MicroStrategy:

http://www.microstrategy.com/Solutions/5Styles/olap_analysis.asp Some open source OLAP tools: Mondrian: http://mondrian.pentaho.org Palo: http://www.palo.net

You might also like