Distributed File Systems
File system is a sub-system of OS that performs file management activities such as organization, naming, storage, retrieval, sharing, protection. These functions are handled by the name service, the file service, and the directory service. File service is the specification of what the file system offers to its clients. File server is a process that runs on some machine and helps implement the file service.
Advantages of DFS Permanent storage on disks Remote Sharing of information - created by one application, shared by various applications transparently User mobility minimal, access from anywhere without physically relocating the user or secondary storage devices Availability multiple copies for better fault tolerance Diskless workstations for cost reduction, less noisy, less heat
DFS services 3 types of services 1. Storage service also known as disk service or Block Service
Allocation and management of space on secondary storage device ( disk) Allocate disk space in units of fixed-size blocks Concerned with operations on individual files like create, delete, modify etc Design issues to be handled are file access mechanism, file sharing, replication, caching , concurrency control
2. True File service
DFS services
3. Name service or Directory Service
Provides mapping between text names and file Ids Use of directories to perform mapping Perform services like creation, deletion of directories, adding, deleting files, changing name of files etc.
Features of DFS Transparency: should include structure, access, naming,
and replication transparency.
User mobility: should not force a user to work on a
specific node.
Performance: should be comparable to that of a
centralized file system since DFS involves not only access time from secondary storage but also network communication overhead.
Simplicity: should give the same semantics as a CFS.
User interface must be simple and number of commands very small DFS should be able to support entire range of applications
Features of DFS Availability: should not face a failure stop and maintain backup copies. Scalability: should cope with the growth of nodes by
not causing disruption of service and withstand high service load.
Synchronization: should complete concurrent access requests consistently. Security: should protect files from network intruders. Heterogeneity: should allow a variety of nodes to
share files in different storage media
Reliability: Failure and loss should be minimal
FILE Models - Unstructured and Structured Files
Unstructured Appears to be an un-interpreted sequence of bytes. OS is not interested in the contents of file, only application programs are interested in it like UNIX and MSDOS Structured file models appears as ordered sequence of records Many types of various sized records exist with different properties Non-indexed records: File record is accessed by specifying its position within the file - IBM mainframe Indexed records: hash tables are used for indexing such as B-tree: Research Storage System(RSS) and Oracle
Modern OS uses unstructured File model since sharing is easy Along with data , files have attributes information like owner, size, access rights, creation date, modified date, describing the file Each attribute has NAME and VALUE File attributes are maintained and used by directory service, but stored with the corresponding file
FILE Models- Mutable and Immutable Files
According to modifiability criteria, are two types: Mutable Files - an update performed on a file overwrites
on its old contents to produce the new contents (ex. Unix and MSDOS) Immediate notification: each and every operation to a file is instantaneously visible to every participant holding a copy of the file. This method is very difficult and impractical Notification on close: other participants are only notified of file modifications when a participant closes a file and thereby terminates their access to the file. Notification on transaction completion: A transaction is a fixed set of operation. When this fixed set of operations is completed, members of the system are notified
FILE Models- Mutable and Immutable Files Immutable Files : a file cannot be altered once created except to be deleted Updates are done on versions history of immutable versions, each created every update (ex. Cedar File System) Storage space may be reduced by keeping only a record of the differences between old and new version
File-Accessing Models
Based on two factors Accessing remote files Unit of data access
File Accessing Models
1. Accessing remote files
File access
Merits
Demerits
Remote service Processing of A simple data packing & model ( Client client request at implementation Communication server model) a server overhead Data caching model At a client that cached a file copy Reducing network traffic Cache consistency problem
File-Accessing Models 2. Unit of data access first 3 used in unstructured model
Transfer Merits level File Simple, less communication overhead, Reduce server load and immune to server and network failures Block A client not required to have (page large storage space, used on level diskless workstations model) Byte Record Flexibility maximized by specifying arbitrary sequential size data Handling structured and indexed files Demerits A client required to have large storage space More network traffic /overhead when large info required Difficult cache management to handle the variable-length data More network traffic More overhead to reconstruct a file.
File-Sharing Semantics
1. 2. 3. 4.
Defines when modifications of the file data made by a user are observable by other users There are many file sharing semantics: Unix semantics Session Semantics Immutable shared-files semantics Transaction-like semantics
File-Sharing Semantics - Unix Semantics Enforces absolute time ordering on all operations Every read operation on a file sees the effect of all previous write operations performed on that file
Un-updated Client A a b t1 Client B abc t2 delayed read a b ab c t3 t4 Append(d)
Network Delays
Expected Append(e) abc d
a bcde t5
abcde t6
Append(c)
read
File-Sharing Semantics - Session Semantics
Session is a series of file accesses made between open and close operation- below are 3 sessions All changes made to a file during a session are made visible only to the client node that opened the session and are invisible to other nodes Once the file is closed the changes are visible to other nodes later starting sessions Multiple clients are allowed to do read write operations concurrently on the same file The final image is sent back to server when file is closed The final image on server depends on who closes at last Due to network delay there may be un-ordering final image storage
File-Sharing Semantics - Session Semantics
Client A
Open(file)
Append(c) Append(d) Append(e)
Client B
Client C
a b a bc a bc d a bc de
Open(file)
Server a b
a b
Append(x) Append(y) Append(z)
a bx a bxy a bxyz
a bxyz a b c d em
Close(file)
a bc de
Open(file)
a bc d e
Close(file)
Append(m) Close(file)
a b c d em
Close(file)
File-Sharing Semantics
Transaction-Like Semantics (Concurrency Control)
Backward validation
Client A Trans_start Client B
Forward validation
Client A Trans_start Client B
Client C
Client D
Client C
Client D
R1 R2 W3 R4 W5
Compare reads with former writes Trans_start
validation Commitment Trans_end
R1 R2 W6 R4 W7
Trans_start
R1 R2 W3 R4 W5
Compare write with later reads Trans_start
Trans_end
R1 R2 W9 R4 W8
Trans_start
R1 R2 R6 R8 W8
Trans_abort Trans_restart
validation Commitment Trans_end
R1 R2 W6 R4 W7
Trans_start
Trans_abort Trans_restart
R1 R2 W9 R4 W8
Trans_start
R1 R2 R6 R8 W8
Trans_end
Trans_end
Abort itself or conflicting active transactions
Trans_end
File-Sharing Semantics
Immutable Shared-Files Semantics Client A
Tentative based on 1.0
Client B
Tentative based on 1.0
Server
Version 1.0
Version 1.1
Version conflict Abort
Version 1.2 Version 1.2
Ignore conflict Merge Depend on each file system. Abortion is simple (later, the client A can Decide to overwrite it with its tentative 1.0 by changing the corresponding directory) Since File once created cannot be modified , this semantics allows files to be shared only in read only mode.
File Caching Schemes
File caching reduces disk transfers for better performance File caching also increases reliability File caching scheme in DCS should address the following key issues: Cache location Modification propagation Cache validation
File-Caching Schemes - Cache Location
Node boundary Client Server
4
Cache in client MM
Main memory copy copy Disk
Main memory copy
2
Cache in Server MM
3
Cache in client disk
Disk file
1 No Cache
Loca Merits tion 1 no cache No modifications 2 cache One-time disk access, On server Easy implementation, Main memory Total transparency, Multiple access from different clients easily synchronized 3 cache No network access cost, on client disk Reliable - crash on volatile main memory will not affect the data in clients disk Large Storage capacity more data cached and higher cache hit ratio 4 cache Maximum performance- eliminates on client network & disk access cost, main memory Diskless workstations used, Reliable & Scalable
Demerits Busy network traffic Busy network traffic frequent network access on cache miss
Cache consistency problem, Costly on cache hits due to frequent disk access, Diskless workstation not allowed Cache size restriction, Cache consistency problem, File access semantics
File-Caching Schemes - Modification Propagation
Client 1 Main memory copy W Client 2 Main memory new copy W
Disk file Immediate write W
When to update the modified data in cache on to server 2 schemes exist Write-through scheme Update immediately Used when read-to-write accesses ratio is large Pros: suitable for Unix-like semantics and high reliability Cons: Poor write performance
File-Caching Schemes - Modification Propagation
Client 1 Main memory copy W W Client 2 Main memory
copy
delayed write
Disk file
Delayed-write scheme 3 types Write on ejection from cache Periodic write Write on close
Write on ejection from cache Modified data in cache is sent to server only at the end of usage of cache data ie., during cache replacement Result in good performance But changed data is not available to other clients
File-Caching Schemes - Modification Propagation
Periodic write Cache is scanned periodically ( 30 seconds) , any changes in this period since last scan will be updated in server Write on close Modified data in cache is sent to server only at the end of usage of cached file ie., when client closes the file Perfect match for session semantics Does not reduce network traffic for files that are opened and closed frequently Used when files are opened for long period and frequently modified
File-Caching Schemes - Modification Propagation
Pros: Write accesses complete quickly since new value is written only in the cache of client Older writes may be omitted by the following writes latest updation is sufficient Gathering all writes ad sending it once mitigates network overhead. Cons: Suffers from reliability Client crash loses all update data Delaying of write propagation results in fuzzier filesharing semantics.
File-Caching Schemes Cache Validation
Client 1 Main memory copy W Disk Delayed write? file W
Client 2 Main memory
copy Check before every access
Write through
Client 1 Client 2 Main Main memory memory copy new W copy W W Check-on-open Write-on-close Disk Check-on-close? file W
Cache validation means check if the cache data on client node and the master copy on server is consistent Basically two approaches to validate cached data Client Initiated Approach Server Initiated Approach
File-Caching Schemes Cache Validation Client-Initiated Approach Client contacts the server to check consistency The frequency of validity check varies like: 1. Checking before every access defeats the main purpose of caching, but good for Unix-like semantics too slow 2. Checking periodically check is initiated every fixed interval of time, better performance but fuzzy filesharing semantics 3. Checking on file open validate only when client opens the file for use, simple, suitable for sessionsemantics Problem: High network traffic
File-Caching Schemes
Cache Validation Schemes Server-Initiated Approach
Client 1 Client 2 Client 4 Client 3 Main Main Main Main memory memory memory memory copy copy copy W Deny for a new open W W Notify (invalidate) Write through Disk Or Delayed write? file W
Keeping track of clients having a copy, when they have opened , in what mode Denying a new request for new client write, if other client is writing, queuing it, and disabling caching Notifying all clients of any update on the original file Problem: violating client-server model immediate response to client request Stateful file servers required Check-on-open still needed for the 2nd time file opening.
DFS Replication
1. To increase reliability by having independent backups of each file. If one server goes down, or is even lost permanently, no data are lost. 2. To allow file access to occur even if one file server is down. A server crash should not bring the entire system down until the server can be rebooted. 3. To split the workload over multiple servers. As the system grows in size, having all the files on one server can become a performance bottleneck. By having files replicated on two or more servers, the least heavily loaded one can be used
Differences between Replication and caching A replica is associated with a server, whereas a cache with client. A replicate focuses on availability, while a cache focuses on locality A replicate is more persistent, secure, available, complete and accurate than a cache A cache is depedent upon a replica
Advantages of replication
The replication of data in a distributed system offers the following potential benefits: 1. Increased availability
It marks and tolerates failures in the network . The system remains operational and available to the users despite failures. By replicating critical data on servers with independent failure modes, the probability that one copy of the data will be accessible increases. Therefore alternate copies of a replicated data can be used when the primary copy is unavailable.
2. Increased reliability
Many applications require extremely high reliability of their data stored in files. Due to the presence of redundant information in the system, recovery from catastrophic failures become possible.
3. Improved response time
Replication enables data to be accessed either locally or from a node to which access time is lower than the primary copy access time. The access time differential may arise either because of network topology or because of uneven loading of nodes.
4. Reduced network traffic
If a file's replica resides on a client's node, the client's access requests can be serviced locally
5. Improved system throughput
enables several client's requests for access to the same file to be serviced in parallel by different servers,
6. Better scalability
As the number of users of a shared file grows, the same requests can now be serviced more efficiently by multiple servers due to workload distributed.
7. Autonomous operation
A distributed system having this feature can support detachable, portable machines.
Replication Transparency A replicated file service must function exactly like non-replicated service Two important issues related to replication transparency are Naming of Replicas Replication Control
Naming of Replicas
Replicas of an object must be assigned the same identifier for immutable objects In mutable objects, Since there could be inconsistency at a particular instant of time, kernel finds difficulty in finding the up-todate version Hence the naming system should be able to map a user supplied identifier into the replica
Replication Control
Though replication is transparent, sometimes flexibility should be given to user to create replicas in local nodes Irrelevant to transparency, the replication process is of two types 1. Explicit replication users are given the flexibility to control replication process 2. Implicit/Lazy Replication automatically controlled by the system without users knowledge
Multi-copy Update Problem
Read-only replication
Allow the replication of only immutable files.
Primary backup replication
Designate one copy as the primary copy and all the others as secondary copies.
Access any or all of replicas
Read-any-write-all protocol Available-copies protocol Quorum-based consensus
Primary-Copy Replication
1.
Client
Front End Replica Manger
Replica Manger
Primary Backup
2.
3.
Replica Manger
Client
Front End
4.
Backup
5.
Request: The front end sends a request to the primary replica. Coordination:. The primary takes the request atomically. Execution: The primary executes and stores the results. Agreement: The primary sends the updates to all the backups and receives an ask from them. Response: reply to the front end.
Advantage: an easy implementation, linearizable, coping with n-1 crashes. Disadvantage: large overhead especially if the failing primary must be replaced with a backup.
Read-Any-Write-All Protocol
Read from any one of them
Client
Front End Replica Manger
Replica Manger
Write to all of them
Client
Front End
Replica Manger
Read - Lock any one of replicas for a read
Write - Lock all of replicas for a write Sequential consistency Intolerable for even 1 failing replica upon a write.
Available-Copies Protocol
Read from any one of them
Client
Front End
Replica Manger
Write to all available replicas
Replica Manger
Client
Front End
Replica Manger
Read - Lock any one of replicas for a read Write - Lock all available replicas for a write
Recovering replica Bring itself up to date by coping from other servers before accepting any user request. Better availability Cannot cope with network partition. (Inconsistency in two sub-divided network groups)
Quorum-Based Protocols
Read Retrieve the read quorum Read quorum Replica Replica Replica Select the one with the Manger Manger Manger Front Client End latest version. Replica Replica Replica Perform a read on it Manger Manger Manger Write Client Front End Replica Replica Retrieve the write Read-any-write-all: Manger Manger quorum. Write quorum r = 1, w = n Find the latest version and increment it. Perform a write on the entire write quorum. If a sufficient number of replicas from read/write quorum, the operation must be aborted.
#replicas in read quorum + #replicas in write quorum > n