Data Warehouse Architecture
Agenda
Business Intelligence Architecture Architectural Components Architectural Approaches Implementing architecture in SAP BI
Data Warehouse Architecture
Data Architecture
Business processes
Infrastructure
Hardware, networking, operating systems, desktops
Technical
Technologies used a various levels of the architecture i.e. query tools, extraction tools
Data Architecture
Determined by business processes
If you are a manufacturer then production orders, orders, shipping and billing are important
Dimensional or E-R
Depends on level of query detail Different models at different levels (query, data staging, etc)
MOLAP, ROLAP
Infrastructure
Network capacity, servers, RAM, storage Where you invest in infrastructure depends somewhat on needs
Nightly updates - adequate staging horsepower Worldwide availability - parallel or distributed servers Customer-level analysis - large servers
Technical Architecture
The technical architecture is driven by the metadata catalog/repository
Metadata refers to the data that describes the data objects, structures, data flows, etc defined in the data warehouse Metadata is stored independently of the business data allowing for more flexibility
For example, because connectivity parameters are not hardcoded in the ETL process data sources can easily be changed
Technical Architecture
Extract Data comes from multiple sources and is of multiple types. Data compression and encryption handling must be considered at this area, if it applies. Transform Data transformation includes surrogate key management, integration, de-normalization, cleansing, conversion, aggregation, and auditing. Load Loading is often done to multiple targets, with load optimization and support for the entire load cycle. Security Administrator access and data encryption policies. Job control This includes job definition, job scheduling (time and event), monitoring, logging, exception handling, error handling, and notification.
Agenda
Business Intelligence Architecture Architectural Components Architectural Approaches Implementing architecture in SAP BI
Generic Architecture
Data Mart Layer Operational Data Store Data Warehouse Layer
Data Staging Layer
Source System
Source System
Source System
Source System
Data Staging Layer Decouples extraction processes from data cleansing processes
Reduces impact of loads on transaction system Snapshot allows repeatable / re-startable ETL processes Area to store data using different load schedules to the DW
Data at this layer is not typically available to end users Area to store 3rd party data
Data Staging Layer Employ data manipulation not appropriate for the DW Data in Data Staging Area is: a copy of operational data deleted after a short period so nonhistorical in flat tables atomic
Do you need a Data Staging Layer
Is real-time reporting required? Does OLTP run 24-7? How complex is the reporting requirement? Does OLTP hardware have enough resources to handle reporting and transaction processing? What are the long term needs of the organization?
Operational Data Store
ODS is a hybrid of data warehouse and operational systems Allows OLTP response times, update capabilities and DSS capabilities Can be used for operational reporting because:
Data from multiple source systems are integrated at this point Operational reporting is typically more structured so dimensional models are not required
Data in the ODS is:
kept for 6 months so non-historical in flat tables or 3NF often a copy of operational data atomic
Operational Data Store
Class I
Real-time
Class II
Two-hour to four-hour
Class III
Daily
Class IV
Aggregated data from the data warehouse
Do you need Data Staging Layer
Is real-time reporting required? Is cross-domain reporting required (multiple source systems)? How complex is the reporting requirement (in terms of aggregation)?
Enterprise Data Warehouse
EDW is the central hub of the BI infrastructure in some architectural approaches Single point of truth for enterprise The question of the appropriate data model for the EDW layer remains controversial Data in the EDW is:
cleansed and harmonized during the ETL process historical distributed to various other systems often 3NF but can be dimensional usually atomic but sometimes aggregate
To Normalize or Not To Normalize
Property Redundancy Data volume Number of records Administration effort Support for realignment Delta handling Transformation Normalized Lower Lower Higher Higher Good Easy Difficult Denormalized Higher Higher Lower Lower Poor Difficult Easy
Choice of whether to normalize or not depends on the complexity and performance of transformations when transferring data between layers and reporting requirements
Data Marts
Data mart is a departmental, or subject oriented, subset of the EDW Data in the data mart comes from the EDW if one exists Relationship between EDW and data marts is often called a hub-and-spoke architecture Data in Data Mart is:
often aggregated but may be atomic dimensional may or may not be historical volatile
Agenda
Business Intelligence Architecture Architectural Components Architectural Approaches Implementing architecture in SAP BI
Two Competing Camps
Corporate Information Factory
Hub-and-spoke architecture Top-Down Inmon
Data Bus Architecture
Bottom-up Kimball
Corporate Information Factory
Data Bus Architecture
Corporate Information Factory
Relies on a single centralized data warehouse
Data is atomic and normalized
Data is distributed to data marts
Exploration warehouses, data mining warehouses
Often 3NF and atomic because its easier to use for analytical models
OLAP data bases
Often star schema and aggregated because its easier to perform complex analyses
Data Bus Architecture
No centralized Data Warehouse
In the data bus architecture the staging area serves a similar purpose as the data warehouse in the CIF
Define conformed dimensions and fact tables
Conformed dimensions form the data bus
Create data marts in conformance with the conformed dimensions and fact tables
A metadata repository forms the basis of the data bus
No reliance on normalized databases
Summarized and atomic data both stored in star schemas
Data Mart Dimensions
Service Call Status Service Call Type
Date Billing Date
End Date
Employee Time
Organization/ Location Fact Groups Customer
GL Account Product
Purchased Product
Pricing Package Vendor Destination Phone # Originating Phone #
Usage Type
Bus Architecture Matrix
Conformed Dimensions and Facts
A conformed dimensions and fact table are semantically the same across the enterprise Architecture defines and reconciles the attributes associated with each conformed dimension or fact table Often as much a political issue to create these as a technical issue Conformed dimension and fact definitions should include:
Organization wide definitions Equations used to derive them (for facts) Units of measure Reporting periods Granularity (for facts)
Dependent Data Marts / Hub & Spoke
Report
Report
Report
Report
Report
An approach also used in the early days, but refined over time: - Originally suggested extensive effort in building the DW - Now recommends building DW incrementally
Data Mart Bus (conformed)
Report
Report
Report
Report
Report
An approach also used in the early days, but refined over time: - Originally suggested building silos - Now recommends enterprise perspective
Independent Data Marts
Report
Report
Report
Report
How Data Warehousing was often performed in the early days - Individual projects developing solutions into functional silos - No program / enterprise perspective - No conformed dimensions
Central Data Warehouse
Report
Report
Report
Report
Report
Seeks to overcome the limitations of previous architectures: - Highly variable with many individual approaches
Federated Data Warehouse
Report
Report
Report
Report
Report
An attempt to consolidate legacy Data Marts
Architecture Alternatives
Hub and Spoke Architecture (Corporate Information Factory) Source System Staging Area
Normalized Relational Warehouse (atomic data)
End user Applications
Dependent Data Marts (summarized/ some atomic data)
Data Mart Bus Architecture with Linked Dimensional Data Marts Source System Staging Area
Dimensionalized Data Marts linked by Conformed Dimensions (atomic/summarized data)
End user Applications
Architecture Alternatives
Independent Data Marts Architecture Source System Staging Area
Independent Data Marts (atomic/summarized data)
End user Applications
Centralized Data Warehouse Architecture Source System Staging Area
Normalized Relational Warehouse (atomic/some summarized data)
End user Applications
Architecture Alternatives
Federated Architecture
Existing Data Warehouses, Data Marts and Legacy Systems Logical/Physical Integration of common data elements
End user Applications
Architecture Hub-and-Spoke Bus Centralized Independent Data Marts Federated
Usage 39% 26% 17% 12% 4%
The Bottom Line
Differences between these in practice are blurry
The argument is mostly a distraction
Best idea is to follow best practices:
Build incrementally according to a business function using portfolio approach Employ an enterprise perspective through governance, standards and a portfolio approach Dimensionally model data at the reporting level and store atomic data at other levels Utilise conformed dimensional models Employ a Staging Area or Data Warehouse
Agenda
Business Intelligence Architecture Architectural Components Architectural Approaches Implementing architecture in SAP BI
InfoObjects
Attributes that are assigned to characteristic InfoObjects are stored independently of any InfoProvider to which the characteristic is attached so characteristics serve as conformed dimensions in the SAP data model
InfoProviders
InfoProviders are different metaobjects in the data basis that can be seen within query definition Physical
DataStore InfoCube InfoObjects
Logical
VirtualProviders InfoSets (Join) MultiCubes (Union) Aggregation Level
Overview of InfoProviders
DataStore Objects
DataStores are used mostly as a staging area for data before it is transferred to one or more InfoProviders
Can be used for reporting, data cleansing, data harmonization and data-quality management Data is atomic Unlike InfoCubes, DataStores allow record updates
DataStores consist of one or three flat tables depending on the type DataStores can be used as a source for queries but are not as efficient as InfoCubes
Required when reporting is at atomic level
DataStore Objects
Standard DataStore Objects
Building blocks for operational data stores Permits delta updates Data must be activated after loading
Direct-update DataStore Objects
Used when access to data is required immediately after it is loaded i.e. as part of an analytical process Accessed via the BI API so that external applications can write to it directly
Write-optimized DataStore Objects
Optimized for uploading large amounts of data Doesnt support delta updates
Modeling Layers of BI Architecture in SAP Data Staging Layer
Persistent Staging Area (PSA)
Used by the SAP provided Business Content
DataStore Object
Allows reporting which is helpful for administrators and developers
Operational Data Store
DataStore Objects
Often in conjunction with SAP PI for near real-time messaging
VirtualProviders
Care must be taken with this approach because performance of source systems can be affected
Modeling Layers of BI Architecture in SAP
EDW
DataStore Objects
Data in a DataStore Object is in a flat table but you can build normalized structures using multiple DataStore Objects
Data Mart Layer
Dimensional objects such as InfoCubes, Virtual InfoCubes, Multiproviders
Metadata in SAP BI
Metadata in SAP BI is stored in the metadata repository Documents all objects and the data flows defined among the various objects Metadata can be exported to HTML and serve as project documentation
Distributing Data in SAP BI
The Data Mart Interface Is used to distribute data from one BI system to another.
The Open Hub Service allows you to distribute data from a BI system to non-SAP data marts
SAP BI Architecture Alternatives
Single Global Data Warehouse Global Data Warehouse Feeding Local Data Warehouses Local Data Warehouses Feeding a Global Data Warehouse Template-Driven Local Data Warehouses Virtual Data Warehouse
Single Global Data Warehouse
Dependent Data Marts
InfoMart
InfoMart
Single Global DWH
Global Data Warehouse Feeding Local Data Warehouses
Dependent Data Marts
InfoMart
InfoMart
InfoMart
InfoMart
Local DWH
Local DWH
InfoMart
Global DWH
InfoMart
Local Data Warehouses Feeding a Global Data Warehouse
Dependent Data Marts
InfoMart
InfoMart
InfoMart
Global DWH
InfoMart
Local DWH
Local DWH
Template-Driven Local Data Warehouses
Data Mart Bus Architecture with Linked Dimensional Data Marts
InfoMart
InfoMart
InfoMart
InfoMart
Local DWH
Template
Local DWH
Template
InfoMart
Global DWH
InfoMart