Distributed Database
Management Systems
In this chapter, you will learn:
• What a distributed database management system (DDBMS)
 is and what its components are
• How database implementation is affected by different levels
 of data and process distribution
• How transactions are managed in a distributed database
 environment
• How database design is affected by the distributed database
 environment
The Evolution of Distributed
Database Management Systems
• Distributed database management system
 (DDBMS)
 • Governs storage and processing of logically related
 data over interconnected computer systems in
 which both data and processing functions are
 distributed among several sites
The Evolution of Distributed Database
Management Systems (continued)
• Centralized database required that corporate data be
 stored in a single central site
• Dynamic business environment and centralized
 database’s shortcomings spawned a demand for
 applications based on data access from different
 sources at multiple locations
DDBMS Advantages and
Disadvantages
•Advantages include:
 • Data are located near “greatest demand”
 site
 • Faster data access
 • Faster data processing
 • Growth facilitation
 • Improved communications
DDBMS Advantages and
Disadvantages (continued)
•Advantages include (continued):
 •Reduced operating costs
 •User-friendly interface
 •Less danger of a single-point failure
 •Processor independence
DDBMS Advantages and
Disadvantages (continued)
•Disadvantages include:
 —Complexity of management and control
 —Security
 —Lack of standards
 —Increased storage requirements
 —Increased training cost
Characteristics of Distributed
Management Systems
●Application interface
●Validation
●Transformation
●Query optimization
●Mapping
●I/O interface
Characteristics of Distributed
Management Systems (continued)
• Formatting
• Security
• Backup and recovery
• DB administration
• Concurrency control
• Transaction management
Characteristics of Distributed
Management Systems (continued)
• Must perform all the functions of
 centralized DBMS
• Must handle all necessary functions
 imposed by distribution of data and
 processing
 • Must perform these additional functions
 transparently to the end user
DDBMS Components
• Must include (at least) the following
 components:
 • Computer workstations
 • Network hardware and software
 • Communications media
 • Transaction processor (application processor,
 transaction manager)
• Software component found in each computer
 that requests data
DDBMS Components (continued)
• Must include (at least) the following
 components (continued):
 • Data processor or data manager
• Software component residing on each
 computer that stores and retrieves data
 located at the site
• May be a centralized DBMS
Single-Site Processing, Single-
Site Data (SPSD)
• All processing is done on single CPU or host
 computer (mainframe, midrange, or PC)
• All data are stored on host computer’s local disk
• Processing cannot be done on end user’s side of
 system
Single-Site Processing,
Single-Site Data (SPSD)
(continued)
• Typical of most mainframe and midrange
 computer DBMSs
• DBMS is located on host computer, which is
 accessed by dumb terminals connected to it
• Also typical of first generation of single-user
 microcomputer databases
Multiple-Site Processing, Single-
Site Data (MPSD)
• Multiple processes run on different computers sharing
 single data repository
• MPSD scenario requires network file server running
 conventional applications that are accessed through
 LAN
• Many multiuser accounting applications, running
 under personal computer network, fit such a
 description
Multiple-Site Processing,
Multiple-Site Data (MPMD)
• Fully distributed database management system with
 support for multiple data processors and transaction
 processors at multiple sites
• Classified as either homogeneous or heterogeneous
• Homogeneous DDBMSs
 – Integrate only one type of centralized DBMS over a
 network
Multiple-Site Processing, Multiple-
Site Data (MPMD) (continued)
• Heterogeneous DDBMSs
 • Integrate different types of centralized DBMSs over a
 network
• Fully heterogeneous DDBMS
 • Support different DBMSs that may even support different
 data models (relational, hierarchical, or network) running
 under different computer systems, such as mainframes
 and microcomputers
Distributed Database
Transparency Features
• Allow end user to feel like database’s only user
• Features include:
 –Distribution transparency
 –Transaction transparency
 –Failure transparency
 –Performance transparency
 –Heterogeneity transparency
Distribution Transparency
• Allows management of physically dispersed
 database as though it were a centralized
 database
• Following three levels of distribution
 transparency are recognized:
 • Fragmentation transparency
 • Location transparency
 • Local mapping transparency
Transaction Transparency
• Ensures database transactions will maintain
 distributed database’s integrity and consistency
Distributed Requests and
Distributed Transactions
• Distributed transaction
 • Can update or request data from several different remote
 sites on network
• Remote request
 • Lets single SQL statement access data to be processed by
 single remote database processor
• Remote transaction
 • Accesses data at single remote site
Distributed Requests and Distributed
Transactions (continued)
• Distributed transaction
 • Allows transaction to reference several
 different (local or remote) DP sites
• Distributed request
 • Lets single SQL statement reference data
 located at several different local or remote DP
 sites
Distributed Concurrency Control
• Multisite, multiple-process operations are much more
 likely to create data inconsistencies and deadlocked
 transactions than are single-site systems
Two-Phase Commit Protocol
• Distributed databases make it possible for transaction
 to access data at several sites
• Final COMMIT must not be issued until all sites have
 committed their parts of transaction
• Two-phase commit protocol requires each individual
 DP’s transaction log entry be written before database
 fragment is actually updated
Performance Transparency and
Query Optimization
• Objective of query optimization routine is to minimize
 total cost associated with execution of request
• Costs associated with request are function of:
 • Access time (I/O) cost
 • Communication cost
 • CPU time cost
• Must provide distribution transparency as well as
 replica transparency
Performance Transparency and
Query Optimization (continued)
• Replica transparency
 • DDBMS’s ability to hide existence of multiple copies
 of data from user
• Query optimization techniques include:
 • Manual or automatic
 • Static or dynamic
 • Statistically based or rule-based algorithms
Distributed Database Design
• Data fragmentation
 • How to partition database into fragments
• Data replication
 • Which fragments to replicate
• Data allocation
 • Where to locate those fragments and replicas
Data Fragmentation
• Breaks single object into two or more segments or
 fragments
• Each fragment can be stored at any site over
 computer network
• Information about data fragmentation is stored in
 distributed data catalog (DDC), from which it is
 accessed by TP to process user requests
Data Fragmentation Strategies
• Horizontal fragmentation
 • Division of a relation into subsets (fragments) of
 tuples (rows)
• Vertical fragmentation
 • Division of a relation into attribute (column) subsets
• Mixed fragmentation
 • Combination of horizontal and vertical strategies
Data Replication
• Storage of data copies at multiple sites served by
 computer network
• Fragment copies can be stored at several sites to
 serve specific information requirements
 • Can enhance data availability and response time
 • Can help to reduce communication and total query
 costs
Data Replication scenarios
• Fully replicated database
 • Stores multiple copies of each database fragment at
 multiple sites
 • Can be impractical due to amount of overhead
• Partially replicated database
 • Stores multiple copies of some database fragments
 at multiple sites
• Most DDBMSs are able to handle the partially
 replicated database well
Replication scenarios (continued)
•Un-replicated database
•Stores each database fragment at
 single site
•No duplicate database fragments
Data Allocation
• Deciding where to locate data
• Allocation strategies
 • Centralized data allocation
• Entire database is stored at one site
 • Partitioned data allocation
• Database is divided into several disjointed parts
 (fragments) and stored at several sites
Data Allocation (continued)
•Replicated data allocation
•Copies of one or more database
 fragments are stored at several sites
•Data distribution over computer
 network is achieved through data
 partition, data replication, or
 combination of both
Client/Server vs. DDBMS
•Way in which computers interact to form
 system
•Features user of resources, or client, and
 provider of resources, or server
•Can be used to implement a DBMS in
 which client is the TP and server is the
 DP
Client/Server vs. DDBMS
(continued)
•Client/server advantages
 • Less expensive than alternate
 minicomputer or mainframe solutions
 • Allow end user to use microcomputer’s
 GUI, thereby improving functionality and
 simplicity
 • More people in job market have PC skills
 than mainframe skills
 • PC is well established in workplace
Client/Server vs. DDBMS
(continued)
•Client/server advantages
 (continued)
 •Numerous data analysis and query
 tools exist to facilitate interaction with
 DBMSs available in PC market
 •Considerable cost advantage to
 offloading applications development
 from mainframe to powerful PCs
Client/Server vs. DDBMS
(continued)
• Client/server disadvantages
 • Creates more complex environment
• Different platforms (LANs, operating
 systems, and so on) are often difficult to
 manage
 • An increase in number of users and processing
 sites often paves the way for security
 problems
Client/Server vs. DDBMS
(continued)
•Possible to spread data access to much
 wider circle of users
•Increases demand for people with
 broad knowledge of computers and
 software
•Increases burden of training and cost
 of maintaining the environment
C. J. Date’s Twelve Commandments for DDBMS
• Local site independence
 • Site independent, autonomous & centralized DBMS
 • Site responsible for security, concurrency control, backup &
 recovery
• Central site independence
 • No site relies on a central or any other site
 • All sites have same capabilities
• Failure independence
 • System not affected by node failures
 • System must be in a continuous operation even in case of node
 failure or network expansion
C. J. Date’s Twelve
Commandments for DDBMS
• Location transparency
 • User does not need to know location of data in order to
 retrieve that data
• Fragmentation transparency
 • User only sees one logical database
 • User is not aware of data fragmentation
• Replication transparency
 • User only sees one logical database
 • User is not aware that data is replicated
C. J. Date’s Twelve Commandments for DDBMS
• Distributed query processing
 • A query may be processed at several sites
 • Query optimization is performed transparently by the
 DDBMS
• Distributed transaction processing
 • A transaction may update data at several sites
 • A transaction is transparently executed at several sites
• Hardware independence
 • System must run on any hardware platform
 • IBM/DEC/HP/PC’s, etc.
C. J. Date’s Twelve
Commandments for DDBMS
• Operating system independence
 • The system must run on any operating system
 • Some sites may run UNIX; some PC/DOS
• Network independence
 • System must run on any network platform
 • Different hardware, different operating systems, different
 communication networks
• Database independence
 • System must support any vendor’s database product
 • One site may run ORACLE while other sites may run INGRES
Summary
• Distributed database stores logically related data in
 two or more physically independent sites connected
 via computer network
• Distributed processing is division of logical database
 processing among two or more network nodes
• Distributed databases require distributed processing
• Main components of DDBMS are transaction
 processor and data processor
Summary (continued)
• Current database systems can be classified by extent
 to which they support processing and data
 distribution
• Homogeneous distributed database system integrates
 only one particular type of DBMS over computer
 network
• Heterogeneous distributed database system
 integrates several different types of DBMSs over
 computer network
Summary (continued)
• DDBMS characteristics are best described as set of
 transparencies
• Transaction is formed by one or more database
 requests
• Distributed concurrency control is required in network
 of distributed databases
• Distributed DBMS evaluates every data request to find
 optimum access path in distributed database
Summary (continued)
• The design of distributed database must consider
 fragmentation and replication of data
• Database can be replicated over several different sites
 on computer network
• Client/server architecture refers to way in which two
 computers interact over computer network to form a
 system