DISTRIBUTED DATABASE
SYSTEMS
 Distributed Databases 1
Outline
 Introduction
 Distributed DBMS Architecture
 Distributed Database Design Issues
 Date’s Twelve Rules for DDBMS1
 Parallel Database Systems
 Distributed Databases 2
Motivation
 Database Computer
 Technology Networks
Integration Distribution
 Distributed
 Database Systems
 Integration
 Integration // Centralization
 Distributed Databases 3
Distributed Computing
Synonymous terms
 distributed function
 distributed data processing
 multiprocessors/multicomputers
 satellite processing
 backend processing
 dedicated/special purpose computers
 timeshared systems
 functionally modular systems
 Distributed Databases 4
What is Distributed….
 Processing logic
 Functions
 Data
 Control
 Distributed Databases 5
What is Distributed Database
System?
 A distributed database system(DDB) exists where
 logically related data is physically distributed between
 a number of separate processors linked by a
 communications network.
 A distributed database management system (DDBMS)
 is the software that manages the DDB and provides an
 access mechanism that makes this distribution
 transparent to the users.
 Distributed database system (DDBS) = DDB +
 DDBMS
 Distributed Databases 6
What is not a DDBS?
 A timesharing computer system
 A loosely or tightly coupled multiprocessor
 system
 A database system which resides at one of the
 nodes of a network of computers - this is a
 centralized database on a network node
 Distributed Databases 7
Centralized DBMS on a
Network
 Site 1 Site 2
 Database
 Communications
 Network
 Site 3 Site 4
 Distributed Databases 8
Distributed DBMS Environment
 Site 1 Site 2
 Database2
 Database 1
 Communications
 Network
 Site 3 Site 4
 Database4
 Distributed Databases 9
Implicit Assumptions
 Data stored at a number of sites => each site logically
 consists of a single processor.
 Processors at different sites are interconnected by a
 computer network => no multiprocessors -->
 parallel database systems
 Distributed database is a database, not a collection of
 files => data logically related as exhibited in the users’
 access patterns --> relational data model
 DDBMS is a full-fledged DBMS
 --> not remote file system, not a TP system
 Distributed Databases 10
Parallel Database Architectures
 Distributed Databases 11
Shared Memory Architecture
 Processors and disks have access to a
 common memory, typically via a bus or
 through an interconnection network.
 Distributed Databases 12
Shared Disk Architecture
 All processors can directly access all disks
 via an interconnection network, but the
 processors have private memories.
 Architecture provides a degree of fault-
 tolerance — if a processor fails, the other
 processors can take over its tasks since
 the database is resident on disks that are
 accessible from all processors.
 Distributed Databases 13
Shared Nothing Architecture
 Node consists of a processor, memory,
 and one or more disks. Processors at one
 node communicate with another processor
 at another node using an interconnection
 network. A node functions as the server for
 the data on the disk or disks the node
 owns.
 Distributed Databases 14
Applications of DDB
 Manufacturing - especially multi-plant
 manufacturing
 Military command and control
 Banking
 Corporate MIS
 Airlines
 Hotel chains
 Any organization which has a decentralized
 organization structure
 Distributed Databases 15
Advantages of DDBS
 Organizational Structure - fits into
 organizations distributed over several locations
 Shareability and Local Autonomy - Local users
 can control their own data while being
 accessible ‘globally’
 Improved availability – if there is failure at one
 site, others are accessible
 Distributed Databases 16
Advantages of DDBS
 Improved reliability - replication of data
 Improved performance - local data is located
 where demand for it is likely to be greatest
 Transparent management of distributed,
 fragmented, and replicated data
 Economical - centralized processing power in a
 single piece of hardware is not necessarily
 cheaper than separate smaller units
 Modular growth – simpler to expand
 Distributed Databases 17
Disadvantages
 Complexity – by hiding their distributed nature
 and trying to ensure optimum performance,
 reliability and availability, DDBS are more
 complex
 10
 Cost – procurement and maintenance cost
 Security – more difficult
 Integrity Control more difficult
 Lack of Standards
 Lack of Experience
 Database Design more Complex
 Distributed Databases 18
Types of DDBMS
 Homogeneous DDBMS
 Heterogeneous DDBMS
 Distributed Databases 19
Homogeneous DDBMS
 All sites use same DBMS product.
 Much easier to design and manage.
 Approach provides incremental growth
 and allows increased performance.
 Distributed Databases 20
Heterogeneous DDBMS
Sites may run different DBMS products, with
possibly different underlying data models.
Occurs when sites have implemented their
own databases and integration is considered
later
Translations required to allow for:
 Different hardware
 Different DBMS products
 Different hardware and different DBMS products
Typical solution is to use gateways
 Distributed Databases 21
Reference Architecture for
DDBMS
 Due to diversity, no accepted architecture
 equivalent to ANSI/SPARC 3-level
 architecture.
 A reference architecture consists of:
 Set of global external schemas.
 Global conceptual schema (GCS).
 Fragmentation schema and allocation schema.
 Set of schemas for each local DBMS conforming
 to 3-level ANSI/SPARC.
 Some levels may be missing, depending on
 levels of transparency supported.
 Distributed Databases 22
Reference Architecture for
DDBMS
 Distributed Databases 23
Key Design Issues
 Division and Location of Data
 Why fragment at all?
 How to fragment?
 How much to fragment?
 Division and Location of Control
 Performance
 Transparency to the User
 Degree of homogeneity
 Distributed Databases 24
 Fragmentation
Why fragment?
Usage:
 - Apps work with views rather than entire relations.
Efficiency:
 - Data stored close to where most frequently used.
 - Data not needed by local applications is not stored.
Security:
 - and so not available to unauthorized users.
Parallelism:
 - With fragments as unit of distribution, T can be divided
 into several subqueries that operate on
fragments.
 Distributed Databases 25
Horizontal Fragmentation
Projects with Budget less than/greater than or equal to 400,000
 Pno pname budget loc
 H90 CAD/CAM 200000 Nairobi
 S67 Database Dev 600000 H/Q
 T67 Maintenance 450000 Kisumu
 T90 Networks 300000 Mombasa
 S45 School System 100000 Nairobi
 Distributed Databases 26
Vertical Fragmentation
Info about project budgets/Info about project names and
locations
 Pno pname budget loc
 H90 CAD/CAM 200000 Nairobi
 S67 Database Dev 600000 H/Q
 T67 Maintenance 450000 Kisumu
 T90 Networks 300000 Mombasa
 S45 School System 100000 Nairobi
 Distributed Databases 27
Allocation Alternatives
 Non-replicated
 partitioned : each fragment resides at only one site
 Replicated
 fully replicated : each fragment at each site
 partially replicated : each fragment at some of the
 sites
 Distributed Databases 28
 Replication Alternatives
 Full Partial Partioned
 Replication Replication
Query Easy Same Same
Processing Difficulty Difficulty
Directory Easy or Same Same
Management Non-existent Difficulty Difficulty
Concurrency Moderate Difficulty Easy
Control
Reliability Very High High Low
Realty Possible Realistic Possible
 Application Application
 Distributed Databases 29
Parallelism Requirements
 Have as much of the data required by each
 application at the site where the application
 executes
 Full replication
 How about updates?
 Updates to replicated data requires implementation
 of distributed concurrency control and commit
 protocols
 Distributed Databases 30
System Expansion
 Issue is database scaling
 Emergence of microprocessor and workstation
 technologies
 Client-server model of computing
 Data communication cost vs telecommunication
 cost
 Distributed Databases 31
Distributed DBMS Issues
Distributed Database Design
 how to distribute the database
 replicated & non-replicated database
 distribution
 related problem in directory management
Query Processing
 convert user transactions to data
 manipulation instructions
 optimization problem
 Distributed Databases 32
Distributed DBMS Issues
Concurrency Control
 synchronization of concurrent accesses
 consistency and isolation of transactions' effects
 deadlock management
Reliability
 how to make the system resilient to failures
 atomicity and durability
 Distributed Databases 33
Transparency in a DDBMS
Transparency hides implementation details from users.
Overall objective: equivalence to user of DDBMs to
 centralised DBMS. FULL transparency not universally accepted
 objective
Four main types:
1. Distribution transparency
2. Transaction transparency
3. Performance transparency
4. DBMS transparency (only applicable to heterogeneous)
 Distributed Databases 34
 1. Distribution Transparency
Distribution transparency: allows user to perceive database as
 single, logical entity.
If DDBMS exhibits distribution transparency, user does not need to know:
• fragmentation transparency: data is fragmented
• Location transparency: location of data items
• otherwise call this local mapping transparency
• replication transparency: user unaware of replication of
 fragments
 Distributed Databases 35
 2. Transaction Transparency
Transaction transparency: Ensures all distributed tx
 maintain distributed database’s integrity and
 consistency.
• Distributed Tx accesses data stored at more than one
 location.
• Each Tx is divided into no. of subTs, one for each site
 that has to be accessed.
• DDBMS must ensure the indivisibility of both the global
 Tx and each of the subTxs.
 Distributed Databases 36
 2. Transaction Transparency
Concurrency transparency: All Txs must execute independently and
 be logically consistent with results obtained if Txs executed in some
 arbitrary serial order.
 • Replication makes concurrency more complex
Failure transparency: must ensure atomicity and durability of global
 Tx.
 • Means ensuring that subTxs of global Tx either all commit or all
 abort.
• Classification transparency: In IBM’s Distributed Relational
 Database Architecture (DRDA), four types of Txs:
 – Remote request
 – Remote unit of work
 – Distributed unit of work
 – Distributed request. Distributed Databases 37
 3. Performance Transparency
DDBMS: - no performance degradation due to distributed architecture.
 - determine most cost-effective strategy to execute a request.
Distributed Query Processor (DQP) maps data request into ordered
 sequence of operations on local databases.
 - Must consider fragmentation, replication, and allocation schemas.
DQP has to decide:
 1. which fragment to access
 2. which copy of a fragment to use
 3. which location to use.
- produces execution strategy optimized with respect to some cost
 function.
Typically, costs associated with a distributed request include: I/O cost;
CPU cost, communication cost.Distributed Databases 38
Date’s Twelve Rules for DDBMS 1
0 Fundamental Principle
 To the user, a distributed system should look exactly like a
 non-distributed system
1 Local Autonomy
 The sites in a distributed system should be autonomous.
 In this context, autonomy means that:
  Local data is locally owned and managed;
  Local operations remain purely local;
  All operations at a given site are controlled by that
 site
 Distributed Databases 39
Date’s Twelve Rules for DDBMS 2
2 No reliance on a Central Site
 There should be no one site without which the system cannot
 operate.
 This implies that there should be no central servers for services
 such as transaction management, deadlock detection, query
 optimization, and management of the Global System Catalog
3 Continuous operation
 Ideally, there should never be a need for a planned system
 shutdown;
 for operations such as:
 adding or removing a site from the system;
 the dynamic creation and deletion of fragments at one or more
 sites
 Distributed Databases 40
Date’s Twelve Rules for DDBMS 3
4 Location Independence (Transparency)
 The user should be able to access the database from
 any site. Furthermore, the user should be able to
 access all data as if it were stored at the user’s site,
 no matter where it is physically stored
5 Fragmentation Independence
 The user should be able to access the data, no
 matter how it is fragmented.
 Distributed Databases 41
Date’s Twelve Rules for DDBMS 4
6 Replication independence
 The user should be unaware that data has been
 replicated.
 Thus, the user should not be able to access a
 particular copy of a data item directly, nor should
 the user have to specifically update all copies of a
 data item
7 Distributed query processing
 The system should be capable of processing
 queries that reference data at more than one site
 Distributed Databases 42
Date’s Twelve Rules for DDBMS 5
8 Distributed Transaction Processing
 The system should support the transaction as the unit of
 recovery.
 The system should ensure that both the global and local
 transactions conform to the ACID rules for transactions,
 namely: atomicity, consistency, isolation, and durability.
9 Hardware independence
 It should be possible to run the DDBMS on a variety of
 hardware platforms.
10 Operating system independence
 As a corollary to the previous rule, it should be possible to
 run the DDBMS on a variety of operating systems
 Distributed Databases 43
Date’s Twelve Rules for DDBMS 6
11Network Independence
 Again, it should be possible to run the
 DDBMS on a variety of disparate
 communication networks
12 Database Independence
 It should be possible to run different local
 DBMSs, perhaps supporting different
 underlying data models.
 In other words, the system should support
 heterogeneity
 Distributed Databases 44