Distributed Database
ManagementChapter
 Systems
 10
 In this chapter, you will
 learn:
 What a distributed database management
 system (DDBMS) is and what its components
 are
 How database implementation is affected by
 different levels of data and process distribution
 How transactions are managed in a distributed
 database environment
 How database design is affected by the
 distributed database environment
The Evolution of
Distributed Database
Management Systems
 Distributed database management
 system (DDBMS)
  Governs storage and processing of
 logically related data over
 interconnected computer systems
 in which both data and processing
 functions are distributed among
 several sites
The Evolution of Distributed
Database Management Systems
(continued)
 
 Centralized database required that
 corporate data be stored in a single
 central site
 
 Dynamic business environment and
 centralized database’s shortcomings
 spawned a demand for applications
 based on data access from different
 sources at multiple locations
Centralized Database
Management System
 DDBMS Advantages
 Data are located near “greatest demand” site
 Faster data access
 Faster data processing
 Growth facilitation
 Improved communications
 Reduced operating costs
 User-friendly interface
 Less danger of a single-point failure
 Processor independence
 DDBMS Disadvantages
 Complexity of management and control
 Security
 Lack of standards
 Increased storage requirements
 Greater difficulty in managing the data
 environment
 Increased training cost
Distributed Processing
Environment
Distributed Database
Environment
Characteristics of Distributed
Management Systems
 Application interface
 Validation
 Transformation
 Query optimization
 Mapping
 I/O interface
 Formatting
 Security
 Backup and recovery
 DB administration
 Concurrency control
 Transaction management
Characteristics of
Distributed Management
Systems (continued)
 Must perform all the functions of a
 centralized DBMS
 Must handle all necessary functions
 imposed by the distribution of data
 and processing
 Must perform these additional
 functions transparently to the end
 user
A Fully Distributed
Database Management
System
 DDBMS Components
 Must include (at least) the following components:
  Computer workstations
  Network hardware and software
  Communications media
  Transaction processor (or, application processor, or
 transaction manager)
  Software component found in each computer that
 requests data
  Data processor or data manager
  Software component residing on each computer that
 stores and retrieves data located at the site
  May be a centralized DBMS
Distributed Database
System Components
Database Systems: Levels
of Data and Process
Distribution
 Single-Site Processing,
 Single-Site Data (SPSD)
 All processing is done on single CPU or host
 computer (mainframe, midrange, or PC)
 All data are stored on host computer’s local disk
 Processing cannot be done on end user’s side of
 the system
 Typical of most mainframe and midrange computer
 DBMSs
 DBMS is located on the host computer, which is
 accessed by dumb terminals connected to it
 Also typical of the first generation of single-user
 microcomputer databases
Single-Site Processing,
Single-Site Data
(Centralized)
Multiple-Site Processing,
Single-Site Data (MPSD)
 Multiple processes run on different
 computers sharing a single data
 repository
 MPSD scenario requires a network file
 server running conventional applications
 that are accessed through a LAN
 Many multi-user accounting applications,
 running under a personal computer
 network, fit such a description
Multiple-Site Processing,
Single-Site Data
Multiple-Site Processing,
Multiple-Site Data (MPMD)
 Fully distributed database management system
 with support for multiple data processors and
 transaction processors at multiple sites
 Classified as either homogeneous or
 heterogeneous
 Homogeneous DDBMSs
  Integrate only one type of centralized DBMS
 over a network
Multiple-Site Processing,
Multiple-Site Data (MPMD) (continued)
 
 Heterogeneous DDBMSs
  Integrate different types of centralized DBMSs
 over a network
 
 Fully heterogeneous DDBMS
  Support different DBMSs that may even support
 different data models (relational, hierarchical, or
 network) running under different computer
 systems, such as mainframes and
 microcomputers
Heterogeneous
Distributed
Database Scenario
 Distributed Database
 Transparency Features
 Allow end user to feel like database’s only
 user
 Features include:
  Distribution transparency
  Transaction transparency
  Failure transparency
  Performance transparency
  Heterogeneity transparency
 Distribution Transparency
 Allows management of a physically dispersed
 database as though it were a centralized
 database
 Three levels of distribution transparency are
 recognized:
  Fragmentation transparency
  Location transparency
  Local mapping transparency
A Summary of
Transparency Features
Fragment Locations
Transaction Transparency
 Ensures database transactions will
 maintain distributed database’s
 integrity and consistency
 Distributed Requests and
 Distributed Transactions
 Distributed transaction
  Can update or request data from several
 different remote sites on a network
 Remote request
  Lets a single SQL statement access data to be
 processed by a single remote database
 processor
 Remote transaction
  Accesses data at a single remote site
Distributed Requests and
Distributed Transactions
(continued)
 Distributed transaction
  Allows a transaction to reference
 several different (local or remote) DP
 sites
 Distributed request
  Lets a single SQL statement
 reference data located at several
 different local or remote DP sites
A Remote Request
A Remote Transaction
A Distributed Transaction
A Distributed Request
Another Distributed
Request
Distributed Concurrency
Control
 Multisite, multiple-process
 operations are much more likely to
 create data inconsistencies and
 deadlocked transactions than are
 single-site systems
The Effect of a Premature
COMMIT
 Two-Phase Commit
 Protocol
 Distributed databases make it possible for a
 transaction to access data at several sites
 Final COMMIT must not be issued until all
 sites have committed their parts of the
 transaction
 Two-phase commit protocol requires each
 individual DP’s transaction log entry be written
 before the database fragment is actually
 updated
Performance
Transparency
and Query Optimization
 Objective of query optimization routine
 is to minimize total cost associated
 with the execution of a request
 Costs associated with a request are a
 function of the:
  Access time (I/O) cost
  Communication cost
  CPU time cost
Performance Transparency
and Query Optimization (continued)
 
 Must provide distribution transparency as well as
 replica transparency
 
 Replica transparency:
  DDBMS’s ability to hide the existence of multiple
 copies of data from the user
 
 Query optimization techniques:
  Manual or automatic
  Static or dynamic
  Statistically based or rule-based algorithms
 Distributed Database
 Design
 Data fragmentation:
  How to partition the database into fragments
 Data replication:
  Which fragments to replicate
 Data allocation:
  Where to locate those fragments and replicas
 Data Fragmentation
 Breaks single object into two or more
 segments or fragments
 Each fragment can be stored at any site over
 a computer network
 Information about data fragmentation is
 stored in the distributed data catalog (DDC),
 from which it is accessed by the TP to
 process user requests
 Data Fragmentation
 Strategies
 Horizontal fragmentation:
  Division of a relation into subsets (fragments)
 of tuples (rows)
 Vertical fragmentation:
  Division of a relation into attribute (column)
 subsets
 Mixed fragmentation:
  Combination of horizontal and vertical
 strategies
A Sample CUSTOMER
Table
Horizontal Fragmentation
of the CUSTOMER Table
by State
Table Fragments in Three
Locations
Vertically Fragmented
Table Contents
Mixed Fragmentation of
the
CUSTOMER Table
 Data Replication
 Storage of data copies at multiple sites served
 by a computer network
 Fragment copies can be stored at several sites
 to serve specific information requirements
  Can enhance data availability and response time
  Can help to reduce communication and total
 query costs
Table Contents After the
Mixed Fragmentation
Process
Data Replication
Replication Scenarios
 Fully replicated database:
  Stores multiple copies of each database
 fragment at multiple sites
  Can be impractical due to amount of overhead
 Partially replicated database:
  Stores multiple copies of some database
 fragments at multiple sites
  Most DDBMSs are able to handle the partially
 replicated database well
 Unreplicated database:
  Stores each database fragment at a single
 site
  No duplicate database fragments
Data Allocation
 Deciding where to locate data
 Allocation strategies:
  Centralized data allocation
  Entire database is stored at one site
  Partitioned data allocation
  Database is divided into several disjointed parts
 (fragments) and stored at several sites
  Replicated data allocation
  Copies of one or more database fragments are
 stored at several sites
 Data distribution over a computer network is
 achieved through data partition, data
 replication, or a combination of both
 Client/Server vs. DDBMS
 Way in which computers interact to form a
 system
 Features a user of resources, or a client, and
 a provider of resources, or a server
 Can be used to implement a DBMS in which
 the client is the TP and the server is the DP
Client/Server Advantages
 Less expensive than alternate minicomputer or
 mainframe solutions
 Allow end user to use microcomputer’s GUI, thereby
 improving functionality and simplicity
 More people with PC skills than with mainframe
 skills in the job market
 PC is well established in the workplace
 Numerous data analysis and query tools exist to
 facilitate interaction with DBMSs available in the PC
 market
 Considerable cost advantage to offloading
 applications development from the mainframe to
 powerful PCs
Client/Server Disadvantages
 Creates a more complex environment, in which
 different platforms (LANs, operating systems,
 and so on) are often difficult to manage
 An increase in the number of users and
 processing sites often paves the way for security
 problems
 Possible to spread data access to a much wider
 circle of users increases demand for people
 with broad knowledge of computers and
 software increases burden of training and cost
 of maintaining the environment
 C. J. Date’s Twelve
 Commandments for
1.
 Distributed
 Local site independence
 Databases
2. Central site independence
3. Failure independence
4. Location transparency
5. Fragmentation transparency
6. Replication transparency
7. Distributed query processing
8. Distributed transaction processing
9. Hardware independence
10. Operating system independence
11. Network independence
12. Database independence
 Summary
 Distributed database stores logically related
 data in two or more physically independent
 sites connected via a computer network
 Database is divided into fragments
 Distributed databases require distributed
 processing
 Main components of a DDBMS are the
 transaction processor and the data processor
Summary (continued)
 Current database systems can be classified by
 extent to which they support processing and data
 distribution
 DDBMS characteristics are best described as a
 set of transparencies
 A transaction is formed by one or more database
 requests
 A database can be replicated over several
 different sites on a computer network
 Client/server architecture refers to the way in
 which two computers interact over a computer
 network to form a system