Data Warehouse and OLAP
Data Warehouse
A data warehouse is a historical database and should be considered as the long term memory of an organization.
A Data Warehouse is a centralized system used to store large amounts of data from different sources.
It is designed to support reporting, analysis, and decision-making by providing clean, consistent, and historical data.
Example: University Management System.
For the duration of study by a student at the university, his or her data will be retained in the main data store and all
database operations such as insert, update, delete and retrieve will be performed on it.
Need of Data Warehousing
1. Centralized Data Storage: Combines data from 6. Enhanced Decision-Making: Facilitates
multiple sources into a single, unified repository. data-driven decisions by providing timely and
2. Improved Data Quality and Consistency: relevant information.
Cleanses, transforms, and standardizes data to
ensure accuracy and uniformity. 7. Time-Saving: Reduces the time required to
gather and prepare data for analysis.
3. Faster Query Performance: Optimized for
complex queries and analytical processing, 8. Data Security and Control: Offers better data
enabling faster decision-making. governance, access control, and auditing
capabilities.
4. Historical Data Analysis: Stores large volumes
of historical data for trend analysis and 9. Scalability: Can handle increasing volumes of
forecasting.
data as the organization grows.
5. Supports Business Intelligence (BI): Provides
the foundation for BI tools to generate reports,
dashboards, and insights.
Benefits of data warehousing
Potential high ROI (Return on Investment) High Productivity in corporate decision making
and business intelligence
Investing in Data Warehousing is itself a very big
investment, but past reports suggest ROI growth up to Data Warehousing combines data from multiple
400% with Data Warehousing, making it valuable for sources into meaningful information which could be
business. analyzed and referred by managers to improve their
decisions for organization.
Unbeatable competitive advantage
Cost effective With Data Warehousing, it is possible
Implementation of Data Warehousing could give to streamline the organization thereby reducing
overheads and so reducing product costs.
companies a competitive edge over their rivals. With
Data Warehousing companies could discover Enhanced customer service Data Warehousing
previously unavailable facts and figures, trends and provides essential support when communicating with
untapped information. Such new revelations would customers and so helps improve customer
enhance the quality of decisions. satisfaction and retain them.
Features
Subject-Oriented: Organized around key subjects Optimized for Query and Analysis: Designed for
like customers, sales, products, etc., rather than fast retrieval and complex analytical queries, unlike
applications. transactional databases.
Integrated: Combines data from various sources Supports OLAP (Online Analytical Processing):
into a consistent format with unified naming Allows multidimensional analysis of data for better
conventions, measurements, and encoding insights.
structures.
Data Transformation and Cleansing: Performs
Time-Variant: Stores historical data, allowing for ETL (Extract, Transform, Load) operations to ensure
trend analysis over time. data quality.
Non-Volatile: Once data is entered, it is not Metadata Management: Stores information about
changed or deleted — only read and appended for data origin, definitions, structures, and usage.
analysis.
Data Warehouse Architecture
Every data warehouse has three fundamental components.These are as follows.
1. Load Manager
2. Warehouse Manager
3. Data Access Manager
Load manager
The Load manager is responsible for Data collection from operational systems. It also performs data conversion into
some usable form to be further utilized by the user. It includes all the programs and application interfaces which are
required for extracting data from the operational systems, it’s preparation and finally loading of data into the data
warehouse itself.
It should perform the following tasks:
• Data Identification
• Data Validation for its accuracy
• Data Extraction from the original source
• Data Cleansing
• Data formatting
• Data standardization (i.e. bringing data into conformity with some standard format)
• Consolidates data from multiple sources to one place
• Establishment of Data Integrity using Integrity Constraints
Warehouse manager
The Warehouse manager is the main part of Data Warehousing system as it holds the massive amount of information
from infinite sources. It organizes data in a way so it becomes easy for anyone to analyze or find the required
information. It is the core of the data warehouse itself. It maintains three levels of information, i.e, detailed, lightly
summarized and highly summarized. It also maintains metadata, i.e., data about data.
Query manager
Finally the Query manager is that interface which connects the end users with the information stored in data
warehouse through the usage of specialized end-user tools. These tools are known as Data mining access tools. The
market today is flooded with such tools which have common functionalities and a provision for customizing more
features specific to an enterprise. These have various categories such as query and reporting, statistics, data
discovery, etc
Data Mart
● A department specific data warehouse is termed as ‘data mart’, which is a small localized data
warehouse built for a single purpose.
● It is typically designed to serve the specific needs of a particular group of users or a department within
an organization. For instance, an organization may consist of multiple departments such as Finance, IT,
and others, each requiring tailored data support.
● Each of these departments can have their own data warehouses, which is nothing but the data mart of
that particular department.
● Thus, a data mart can be defined as ‘a specialized, subject-oriented, integrated, time-variant, volatile
data store in support of specific subset of management’s decisions’.
Data Mart
● In a simplified way we can define data marts as ‘A subset of data warehouses that support the
requirements of a particular department or business function’.
● An organization could maintain both a data warehouse (representing data of all departments in a
unified way) and individual departmental data marts.
● Hence, the data mart can be standalone (individual) or linked centrally to the corporate data warehouse
Advantages of Data Mart
● Compared to a data warehouse, implementing a data mart is a lot more economical.
● The potential users of a data mart can be grouped in a better way rather than involving large numbers
of unnecessary members.
● Data marts are designed on the basis that there is no need to serve the entire enterprise. Therefore, the
department can summarize, select and structure their own departments’ data independently.
● Data marts can allow each department to work on a specific piece of historical data rather than the
whole data.
● Departments can customize software for their data mart as per their needs.
● Data marts are cost efficient
Disadvantages of Data Mart
● The limitations of data marts are as follows:
● Once in operation, due to inherent limitations in design it becomes cumbersome to extend their
scope to other departments.
● Data Integration problems are encountered often.
● When the data mart develops multiple dimensions then the scalability problem becomes
common.