Designing Distributed Systems Malisa Ncube inbox@malisancube.com malisancube.com
What this talk is about  The distributed Paradigm  Reactive Manifesto  Cloud computing considerations  Akka
What is a distributed computing “Distributed computing is a field of computer science that studies distributed systems. A distributed system is a software system in which components located on networked computers communicate and coordinate their actions by passing messages.” https://en.wikipedia.org/wiki/Distributed_computing
What is cloud computing?  Lets the platform do the hard stuff by leveraging the application services.  Uses non-blocking asynchronous communication in a loosely coupled architecture.  Scales horizontally in an elastic mechanism.  Does not waste resources  Handles scaling events, node failures, transient failures without downtime or performance degradation.  Uses geographic distribution to minimize network latency.  Upgrades without downtime.  Scales automatically using proactive and reactive actions.  Monitors and manages application logs as nodes come and go.
What are microservices?  In computing, microservices is a software architecture style in which complex applications are composed of small, independent processes communicating with each other using language- agnostic APIs.[1] These services are small building blocks, highlydecoupled and focussed on doing a small task, facilitating a modular approach to system- building.[5]
Moore's law
Brewers CAP thorem  Eric Brewer introduced the idea that there is a fundamental trade-off between consistency, availability, and partition tolerance.  In a network subject to communication failures, it is impossible for any web service to implement an atomic read/write shared memory that guarantees a response to every request.
Amdahl’s Law Gives the theoretical speedup in latency of the execution of a task at fixed workload that can be expected of a system whose resources are improved.
Concerns
Horizontal scaling compute Pattern  Horizontal scaling is reversible.  Supports scaling out and scaling in  Stateful nodes  They keep user session information  They have single point of failure  Stateless nodes  Store session information externally from the nodes.
Queue-Centric Workflow Pattern  Used in web applications to decouple communication between web-tier and service tier by focusing on the flow of commands.  A service tier that is unreliable or slow can affect the web tier negatively.  All communication is asynchronous as message over a queue  The sender and receiver are loosely coupled. Neither one knows about the implementation of the other.  There is some edge cases where the risk of invisibility windows occurs when processing takes longer than allowed.  Idempotency concerns. Database transactions, compensating transaction.  Poison messages placed in dead letter queue.  QCW is not full CQRS as it does not articulate the read model.
Eventual Consistency  Simultaneous requests for the same data may result in different values.  Leads to better performance and lower cost.  Uses Brewer’s CAP theorem (Consistency Availability and Partition tolerance). 3 Guarantees and application an pick only 2.  Consistency. Everyone get the same answer.  Availability. Clients have ongoing access (even if there is a partial system failure)  Partition tolerance. Means correct operation even if some nodes are cut of from the network.  DNS updates and NoSQL are examples of eventually consistent services.
Busy Signal Pattern  Applies to services or resources accessed over a network where a signal response is busy.  These may include management, data services and more, and periodic transient failure should be expected. E.g. Busy signal on telephones.  A good application should be able to handle retries and properly handle failures.  On HTTP. Response 503 Service Unavailable.  Clearly identify Busy Signal and Errors and retry on Busy state after an interval. Log them for further analysis of patterns.
Some Concerns  Node Failure Pattern  Concerns availability and graceful handling of unexpected application/hardware failures, reboots or node shutdown.  Application state should be in reliable storage, not on local disk or individual node  Network latency problem  Move application data closer to users  Ensure nodes within your application are closer together (Colocation)  WA uses Affinity Groups  Consider Data Compression, Background processing, Predictive Fetching, CDN
Traditional systems
Traits of a reactive system  scalable (react to load, minimize contention) scale up/out  responsive  resilient (has to be bolted by design, things will fail, isolation of components and preventing cascading failures)  event driven (asynchronous)
The reactive manifesto Responsive Message Driven ResilientElastic http://www.reactivemanifesto.org/
Actors
The Actor Model  The actor model in computer science is a mathematical model of concurrent computation that treats "actors" as the universal primitives of concurrent computation.  In response to a message that it receives, an actor can: make local decisions, create more actors, send more messages, and determine how to respond to the next message received. Actors may modify private state, but can only affect each other through messages (avoiding the need for any locks)
In different scenarios, an Actor may be an alternative to:  a thread  an object instance or component  a callback or listener  a singleton or service  a router; load-balancer or pool  a Java EE Session Bean or Message-Driven Bean  an out-of-process service  a Finite State Machine (FSM)
The structure of Akka
Actors  - Simple add high level abstractions for concurrency and parallelism  - Asynchronous, non-blocking and highly event driven programming model  Fault tolerance  Supervisor hierarchies with let-it-crash semantics  Supervisor hierarchies can san multiple VMs  Excellent for writing highly fault tolerant systems that self heal and never stop  Location Transparency  Everything akk is designed to work in distributed environment: All interactions of actors use pure message passing and everything is asynchronous  akka.tcp://MySystem@localhost:9001/user/actorName1
Multi threads vs Multi Actors  1 MB per thread (4MB in 64 bit) vs 2.7 million actors per gigabyte
More on Actors  Let it crash philosophy  The failure strategies  One for one supervision  All for one. One node crashes the siblings will be restarted
Some Code
References Akka - http://akka.io Reactive Extensions - http://reactivex.io/ Halo - https://www.halowaypoint.com/en-us/games Carl Hewitt - http://en.wikipedia.org/wiki/Carl_Hewitt Orbit Framework - https://github.com/electronicarts/orbit,http://orbit.bioware.com/ Project Orleans - https://github.com/dotnet/orleans
That’s all friends  Malisa Ncube  @malisancube  http://malisancube.com

Designing distributed systems

  • 1.
  • 2.
    What this talkis about  The distributed Paradigm  Reactive Manifesto  Cloud computing considerations  Akka
  • 3.
    What is adistributed computing “Distributed computing is a field of computer science that studies distributed systems. A distributed system is a software system in which components located on networked computers communicate and coordinate their actions by passing messages.” https://en.wikipedia.org/wiki/Distributed_computing
  • 4.
    What is cloudcomputing?  Lets the platform do the hard stuff by leveraging the application services.  Uses non-blocking asynchronous communication in a loosely coupled architecture.  Scales horizontally in an elastic mechanism.  Does not waste resources  Handles scaling events, node failures, transient failures without downtime or performance degradation.  Uses geographic distribution to minimize network latency.  Upgrades without downtime.  Scales automatically using proactive and reactive actions.  Monitors and manages application logs as nodes come and go.
  • 5.
    What are microservices? In computing, microservices is a software architecture style in which complex applications are composed of small, independent processes communicating with each other using language- agnostic APIs.[1] These services are small building blocks, highlydecoupled and focussed on doing a small task, facilitating a modular approach to system- building.[5]
  • 6.
  • 7.
    Brewers CAP thorem Eric Brewer introduced the idea that there is a fundamental trade-off between consistency, availability, and partition tolerance.  In a network subject to communication failures, it is impossible for any web service to implement an atomic read/write shared memory that guarantees a response to every request.
  • 8.
    Amdahl’s Law Gives thetheoretical speedup in latency of the execution of a task at fixed workload that can be expected of a system whose resources are improved.
  • 9.
  • 10.
    Horizontal scaling computePattern  Horizontal scaling is reversible.  Supports scaling out and scaling in  Stateful nodes  They keep user session information  They have single point of failure  Stateless nodes  Store session information externally from the nodes.
  • 12.
    Queue-Centric Workflow Pattern Used in web applications to decouple communication between web-tier and service tier by focusing on the flow of commands.  A service tier that is unreliable or slow can affect the web tier negatively.  All communication is asynchronous as message over a queue  The sender and receiver are loosely coupled. Neither one knows about the implementation of the other.  There is some edge cases where the risk of invisibility windows occurs when processing takes longer than allowed.  Idempotency concerns. Database transactions, compensating transaction.  Poison messages placed in dead letter queue.  QCW is not full CQRS as it does not articulate the read model.
  • 13.
    Eventual Consistency  Simultaneousrequests for the same data may result in different values.  Leads to better performance and lower cost.  Uses Brewer’s CAP theorem (Consistency Availability and Partition tolerance). 3 Guarantees and application an pick only 2.  Consistency. Everyone get the same answer.  Availability. Clients have ongoing access (even if there is a partial system failure)  Partition tolerance. Means correct operation even if some nodes are cut of from the network.  DNS updates and NoSQL are examples of eventually consistent services.
  • 14.
    Busy Signal Pattern Applies to services or resources accessed over a network where a signal response is busy.  These may include management, data services and more, and periodic transient failure should be expected. E.g. Busy signal on telephones.  A good application should be able to handle retries and properly handle failures.  On HTTP. Response 503 Service Unavailable.  Clearly identify Busy Signal and Errors and retry on Busy state after an interval. Log them for further analysis of patterns.
  • 15.
    Some Concerns  NodeFailure Pattern  Concerns availability and graceful handling of unexpected application/hardware failures, reboots or node shutdown.  Application state should be in reliable storage, not on local disk or individual node  Network latency problem  Move application data closer to users  Ensure nodes within your application are closer together (Colocation)  WA uses Affinity Groups  Consider Data Compression, Background processing, Predictive Fetching, CDN
  • 16.
  • 17.
    Traits of areactive system  scalable (react to load, minimize contention) scale up/out  responsive  resilient (has to be bolted by design, things will fail, isolation of components and preventing cascading failures)  event driven (asynchronous)
  • 18.
  • 19.
  • 20.
    The Actor Model The actor model in computer science is a mathematical model of concurrent computation that treats "actors" as the universal primitives of concurrent computation.  In response to a message that it receives, an actor can: make local decisions, create more actors, send more messages, and determine how to respond to the next message received. Actors may modify private state, but can only affect each other through messages (avoiding the need for any locks)
  • 21.
    In different scenarios,an Actor may be an alternative to:  a thread  an object instance or component  a callback or listener  a singleton or service  a router; load-balancer or pool  a Java EE Session Bean or Message-Driven Bean  an out-of-process service  a Finite State Machine (FSM)
  • 22.
  • 23.
    Actors  - Simpleadd high level abstractions for concurrency and parallelism  - Asynchronous, non-blocking and highly event driven programming model  Fault tolerance  Supervisor hierarchies with let-it-crash semantics  Supervisor hierarchies can san multiple VMs  Excellent for writing highly fault tolerant systems that self heal and never stop  Location Transparency  Everything akk is designed to work in distributed environment: All interactions of actors use pure message passing and everything is asynchronous  akka.tcp://MySystem@localhost:9001/user/actorName1
  • 24.
    Multi threads vsMulti Actors  1 MB per thread (4MB in 64 bit) vs 2.7 million actors per gigabyte
  • 25.
    More on Actors Let it crash philosophy  The failure strategies  One for one supervision  All for one. One node crashes the siblings will be restarted
  • 26.
  • 27.
    References Akka - http://akka.io ReactiveExtensions - http://reactivex.io/ Halo - https://www.halowaypoint.com/en-us/games Carl Hewitt - http://en.wikipedia.org/wiki/Carl_Hewitt Orbit Framework - https://github.com/electronicarts/orbit,http://orbit.bioware.com/ Project Orleans - https://github.com/dotnet/orleans
  • 28.
    That’s all friends Malisa Ncube  @malisancube  http://malisancube.com

Editor's Notes

  • #19 Responsive: The system responds in a timely manner if at all possible. Responsiveness is the cornerstone of usability and utility, but more than that, responsiveness means that problems may be detected quickly and dealt with effectively. Responsive systems focus on providing rapid and consistent response times, establishing reliable upper bounds so they deliver a consistent quality of service. This consistent behaviour in turn simplifies error handling, builds end user confidence, and encourages further interaction. Resilient: The system stays responsive in the face of failure. This applies not only to highly-available, mission critical systems — any system that is not resilient will be unresponsive after a failure. Resilience is achieved byreplication, containment, isolation and delegation. Failures are contained within each component, isolating components from each other and thereby ensuring that parts of the system can fail and recover without compromising the system as a whole. Recovery of each component is delegated to another (external) component and high-availability is ensured by replication where necessary. The client of a component is not burdened with handling its failures. Elastic: The system stays responsive under varying workload. Reactive Systems can react to changes in the input rate by increasing or decreasing the resources allocated to service these inputs. This implies designs that have no contention points or central bottlenecks, resulting in the ability to shard or replicate components and distribute inputs among them. Reactive Systems support predictive, as well as Reactive, scaling algorithms by providing relevant live performance measures. They achieve elasticity in a cost-effective way on commodity hardware and software platforms. Message Driven: Reactive Systems rely on asynchronous message-passingto establish a boundary between components that ensures loose coupling, isolation and location transparency. This boundary also provides the means to delegate failures as messages. Employing explicit message-passing enables load management, elasticity, and flow control by shaping and monitoring the message queues in the system and applying back-pressurewhen necessary. Location transparent messaging as a means of communication makes it possible for the management of failure to work with the same constructs and semantics across a cluster or within a single host. Non-blocking communication allows recipients to only consumeresources while active, leading to less system overhead. Large systems are composed of smaller ones and therefore depend on the Reactive properties of their constituents. This means that Reactive Systems apply design principles so these properties apply at all levels of scale, making them composable. The largest systems in the world rely upon architectures based on these properties and serve the needs of billions of people daily. It is time to apply these design principles consciously from the start instead of rediscovering them each time.