The document provides an introduction to distributed systems, defining them as systems where multiple independent computers communicate only through message passing to achieve a common goal. It highlights key characteristics, applications such as online stores and IoT, and challenges including concurrency, fault tolerance, and scalability. The document also discusses design goals like transparency and availability, as well as the evolution of distributed systems over time.
Distributed Systems – AnIntroduction CS4262 Distributed Systems Dilum Bandara Dilum.Bandara@uom.lk Some slides extracted from Dr. Srinath Perera & Dr. Rajkumar Buyya’s Presentation Deck
2.
What is aDistributed System? 2 Source: http://people.reed.edu/~jimfix/442ds/
3.
What is aDistributed System? 3 Source: www.cs.bham.ac.uk/~nagarajs/courses/DSS/
4.
What is aDistributed System? “A distributed system is one on which I cannot get any work done because some machine I have never heard of has crashed.” --Leslie Lamport 4
5.
What is aDistributed System? “A system in which hardware or software components located at networked computers communicate & coordinate their actions only by message passing.” - [Coulouris] “A distributed system is a collection of independent computers that appear to the users of the system as a single coherent system.” - [Tanenbaum] 5
6.
Key Characteristics Manyindependent computers Only communicate via message passing Coordinate/communicate to fulfill a common goal 6
Question Which ofthe following is true? a) Horizontal scaling is used in enterprise applications b) Vertical scaling is used in cloud computing c) Elasticity is the ability to scale in & out d) Stateful servers/components are easier to scale 8
9.
Why Distributed Systems? Some systems are inherently distributed e.g., file sharing, mobile social networks, vehicles Some problems are too big for a single system For scalability e.g., Google, Amazon, YouTube For better QoS/QoE e.g., YouTube, Google For reliability e.g., Google, Amazon For specific demographics e.g., Amazon, Yahoo, Google, CircInfo Economic reasons Resource sharing, Cloud System evolution Heterogeneous system with time 9
10.
Applications – OnlineStores Many items List of all items, with their specs Index items by many dimensions & support search Many sellers Many byers Byers behaviour Support checkout, tracks delivery, returns, ratings, & complains Supported by partitioning sellers/ items across many nodes Business analytics 10
11.
Applications – DesktopGrids Many volunteering their computing power Scientists submit computing jobs to system Broker match resources with jobs Resource run jobs & return results Handle failures Avoid free riding Biggest computers on earth www.seti.cl/aprendiendo-mas-sobre-boinc-y-la-computacion-voluntaria/ 11
12.
Applications – IoT Many sensors – Weather, travel, traffic, surveillance, stock exchange, smart grid, production lines Monitor, understand, & react to events Usually handled with Stream Processing, Complex Event Processing, or custom applications Source: www.flickr.com/photos/imuttoo/4257813689/ by Ian Muttoo, www.flickr.com/photos/eastcapital/4554220770/ , www.flickr.com/photos/patdavid/4619331472/ by Pat David copyright CC 12
13.
Applications – MobileCrowdsourcing Modern mobile phones are like a weather centre GPS, Barometer, Temperature, Light Proximity, & Moisture sensors Get volunteer phones to send sensor data (Crowd source) Report on weather Crop diseases (agriculture officials) Epidemics (from hospitals, doctors) Use that to do weather predictions, crop disease & epidemic spread Source: www.fotopedia.com/items/flickr-2548697541 , www.geograph.org.uk/photo/1534209, and www.yourbdnews.com/2011/10/17/samsung-files-to-halt-iphone-4s- in-japan-australia/iphone-4s, Licensed CC Mobile: Solving the Last Mile Problem 13
14.
Applications – DataStorages & Provenance (Sky Server) Telescopes (Square Kilometer Array) keep collecting data from sky (TBs per day) Sky Survey lets scientists to come & see the sky of a given location, as seen at a given time Moving data take a long time 1TB takes 100 Mbps network - 30 hrs 1 Gbps network - 3 hrs 10 Gbps network - 20 minuteswww.geograph.org.uk/photo/103069, Licensed CC 14 http://cas.sdss.org
15.
Applications – TheoreticalComputer Science Concerns with Coordination algorithms Leader election, multicast, distributed locks, barriers, snapshot algorithms Impossibility results, upper & lower bounds Distributed versions of some centralized algorithms e.g., shortest path A lot of work done on 70s lay the ground work for Distributed Systems http://www.flickr.com/photos/lodz_na_nowo/5690492370/ http://xkcd.com/384/ http://www.flickr.com/photos/quinnanya/4990131194/sizes/z/in/photostream/, Licensed CC 15
16.
Parallel vs. DistributedSystems Convergence of concepts of parallel & distributed systems Differentiation with parallel systems is blurring New middleware Extensibility of clusters leads to heterogeneity As new hardware is added 16 Parallel Systems Distributed Systems Tight coupling Loose coupling Physical proximity Server room to Global Homogeneous Heterogeneous Threads & MPI RPC, Web Services, REST
17.
Distributed Systems Timeline/History 17 PeriodTopics 1965-late 70s Parallel Programming, Self Stabilization, Fault Tolerance, ER Model/ Transactions, Time Clock 1980s Consensus & impossibility, SQL, Distributed Snapshots, Replications, Group Communication Early 90s Linearizability, Parallel DB, Transactional Memory, RAID, MPI Late 90s Volunteer Computing, P2P file sharing, Complex event processing Early 2000 Oceanostore, Web Services, Symantec Web, REST, DHT, Pub/Sub, Grid, Autonomic Computing, Google File System, Virtualization, SOA, Map reduce 2005-2010 Cloud, NoSQL, Mobile Apps, Data Provenance
18.
Design Goals Transparency Differences between computers & the way they communicate are hidden from users Single System View (SSV/SSI) Users & applications interact with a distributed system in a consistent & uniform way regardless of where or when the interaction takes place Scalability Relatively easy to expand or scale Availability Continuously available even though certain parts may be temporarily out of order 18
Fallacies of DistributedSystems Network is reliable Latency is zero Bandwidth is infinite Transport cost is zero Network is secure Topology doesn't change There is one administrator Infrastructure is homogeneous 20 By Arnon Rotem-Gal-Oz
21.
Distributed System Challenges Concurrency No global state Failures in different elements Transparency Fault tolerance Scalability Heterogeneity Security Openness 21
22.
1. Concurrency Challenges Every software or hardware component Autonomous, enable resource sharing, & synchronize & coordinate via message passing A & B are concurrent, if either A can happen before B, or B can happen before A Typical problems Deadlocks Unreliable communication Provide & manage concurrent access to shared resources Preserve dependencies, e.g., distributed transactions Fair scheduling 22
23.
2. Lack ofGlobal State Absence of a global state Typically no single process would have a knowledge of current global state of the system Due to concurrency & message passing communication Absence of a global clock Hard to say who’s first There are limits on precision with which processes in a can synchronize their clocks However, problem can now be addressed in some application with GPS time stamps 23
24.
3. Failures inDifferent Elements Failures are more common than in centralized systems Processes run autonomously, in isolation Failures of individual processes may remain undetected Individual processes may be unaware of failures in the system context Network failures isolate processes & partition system 24
25.
4. Transparency Presentsystem to users & applications as a single computer system Hides the fact that resources are physically distributed across multiple computers There’s a trade-off between high degree of transparency & system performance Attempting to blindly hide all distribution aspects from users is not always a good idea Transparency can be applied to several aspects 25
26.
Forms of Transparency Access transparency Access to local or remote resources is identical e.g., Network File System Location transparency Access without knowledge of location e.g., URLs, e-mail addresses Failure transparency Tasks can be completed despite failures e.g., retransmission of e-mails, failure of a Web server node should not bring down the website Replication transparency Access to replicated resources as if there was just one Provide enhanced reliability & performance 26
27.
Forms of Transparency(Cont.) Migration (mobility/relocation) transparency Movement of resources & clients within a system without affection operation of users or applications e.g., switching from one name server to another, migration of a VM from physical server to another Concurrency transparency A process shouldn’t notice that there are others sharing same resources Performance transparency Allows system to be dynamically reconfigured to improve performance as loads vary Scaling transparency Allows system & applications to expand in scale without change to system structure or application algorithms 27
28.
Question(s) Migration transparency a)Allows access without knowledge of location b) Enables multiple instances of resources c) Enables the movement of resources d) Enables the concealment of faults Higher degree of transparency is always desirable True / False 28
29.
5. Fault Tolerance Failure When an offered service no longer complies with its specification Fault Cause of a failure Fault tolerance No failure despite faults Failures in distributed systems are partial Some components fail while others continue to function Therefore, handling failure is particularly difficult 29
6. Scalability Atmany different scales No of applications, users, transactions, products to be sold, attributes of products Goal is to remain effective when there is a significant increase in no of resources & users 3 different dimensions: Size scalability Limitations due to centralized services, centralized data, centralized algorithms Geographic scalability Unreliable communication, lack of performance guarantees Administrative scalability Conflicting policies for resource usage, security, etc. 31
32.
Scaling Techniques Scalabilityproblems typically appear as performance problems 3 basic scaling techniques Hiding communication latencies Distribution Replication 32
33.
Scalability Concerns Costof physical resources Cost should linearly increase with system size Performance loss Finding things in large & distributed systems are hard Looking for algorithms with O(log n), n is size of data Preventing software resources from running out Nos used to represent resources, users, services, etc. e.g., IP v4 to V6, Y2K problem, Year 2038 problem Avoiding performance bottlenecks Maintaining global state Difficulty of maintaining up to date state 33
34.
7-9. Other Challenges Heterogeneity Heterogeneous components must be able to interoperate Applies to hardware, software, middleware, & protocols Security System should only be used in the way intended Distributed resources, networks, & users Distributed authentication, authorization, enforcing integrity, non-repudiation, & accounting is hard Openness Interfaces should be publicly available to ease adding new components 34
35.
Summary We usethem without realizing they are distributed Goals Transparency, single system view, scalability, availability Challenges Concurrency, no global state, failures, transparency, fault tolerance, scalability, heterogeneity, security, debugging 35
Editor's Notes
#15 Sky Server – 200 GB a night Chilbolton Radio Telescope - 7 PB of raw data will be produced each year ~19 TB a day
#20 According to the Fundamentals of Software Engineering book by Ghezzi et. al., software engineering principles include (1) Rigor and Formality (2) Separation of Concerns (3) Modularity (4) Abstraction (5) Anticipation of Change (6) Generality and (7) Incrementality. Specific issues need to be resolved for the design of software for distributed systems.