Designing distributed systems

Designing Distributed Systems Malisa Ncube inbox@malisancube.com malisancube.com

What this talk is about  The distributed Paradigm  Reactive Manifesto  Cloud computing considerations  Akka

What is a distributed computing “Distributed computing is a field of computer science that studies distributed systems. A distributed system is a software system in which components located on networked computers communicate and coordinate their actions by passing messages.” https://en.wikipedia.org/wiki/Distributed_computing

What is cloud computing?  Lets the platform do the hard stuff by leveraging the application services.  Uses non-blocking asynchronous communication in a loosely coupled architecture.  Scales horizontally in an elastic mechanism.  Does not waste resources  Handles scaling events, node failures, transient failures without downtime or performance degradation.  Uses geographic distribution to minimize network latency.  Upgrades without downtime.  Scales automatically using proactive and reactive actions.  Monitors and manages application logs as nodes come and go.

What are microservices?  In computing, microservices is a software architecture style in which complex applications are composed of small, independent processes communicating with each other using language- agnostic APIs.[1] These services are small building blocks, highlydecoupled and focussed on doing a small task, facilitating a modular approach to system- building.[5]

Brewers CAP thorem  Eric Brewer introduced the idea that there is a fundamental trade-off between consistency, availability, and partition tolerance.  In a network subject to communication failures, it is impossible for any web service to implement an atomic read/write shared memory that guarantees a response to every request.

Amdahl’s Law Gives the theoretical speedup in latency of the execution of a task at fixed workload that can be expected of a system whose resources are improved.

Horizontal scaling compute Pattern  Horizontal scaling is reversible.  Supports scaling out and scaling in  Stateful nodes  They keep user session information  They have single point of failure  Stateless nodes  Store session information externally from the nodes.

Queue-Centric Workflow Pattern  Used in web applications to decouple communication between web-tier and service tier by focusing on the flow of commands.  A service tier that is unreliable or slow can affect the web tier negatively.  All communication is asynchronous as message over a queue  The sender and receiver are loosely coupled. Neither one knows about the implementation of the other.  There is some edge cases where the risk of invisibility windows occurs when processing takes longer than allowed.  Idempotency concerns. Database transactions, compensating transaction.  Poison messages placed in dead letter queue.  QCW is not full CQRS as it does not articulate the read model.

Eventual Consistency  Simultaneous requests for the same data may result in different values.  Leads to better performance and lower cost.  Uses Brewer’s CAP theorem (Consistency Availability and Partition tolerance). 3 Guarantees and application an pick only 2.  Consistency. Everyone get the same answer.  Availability. Clients have ongoing access (even if there is a partial system failure)  Partition tolerance. Means correct operation even if some nodes are cut of from the network.  DNS updates and NoSQL are examples of eventually consistent services.

Busy Signal Pattern  Applies to services or resources accessed over a network where a signal response is busy.  These may include management, data services and more, and periodic transient failure should be expected. E.g. Busy signal on telephones.  A good application should be able to handle retries and properly handle failures.  On HTTP. Response 503 Service Unavailable.  Clearly identify Busy Signal and Errors and retry on Busy state after an interval. Log them for further analysis of patterns.

Some Concerns  Node Failure Pattern  Concerns availability and graceful handling of unexpected application/hardware failures, reboots or node shutdown.  Application state should be in reliable storage, not on local disk or individual node  Network latency problem  Move application data closer to users  Ensure nodes within your application are closer together (Colocation)  WA uses Affinity Groups  Consider Data Compression, Background processing, Predictive Fetching, CDN

Traits of a reactive system  scalable (react to load, minimize contention) scale up/out  responsive  resilient (has to be bolted by design, things will fail, isolation of components and preventing cascading failures)  event driven (asynchronous)

The reactive manifesto Responsive Message Driven ResilientElastic http://www.reactivemanifesto.org/

The Actor Model  The actor model in computer science is a mathematical model of concurrent computation that treats "actors" as the universal primitives of concurrent computation.  In response to a message that it receives, an actor can: make local decisions, create more actors, send more messages, and determine how to respond to the next message received. Actors may modify private state, but can only affect each other through messages (avoiding the need for any locks)

In different scenarios, an Actor may be an alternative to:  a thread  an object instance or component  a callback or listener  a singleton or service  a router; load-balancer or pool  a Java EE Session Bean or Message-Driven Bean  an out-of-process service  a Finite State Machine (FSM)

Actors  - Simple add high level abstractions for concurrency and parallelism  - Asynchronous, non-blocking and highly event driven programming model  Fault tolerance  Supervisor hierarchies with let-it-crash semantics  Supervisor hierarchies can san multiple VMs  Excellent for writing highly fault tolerant systems that self heal and never stop  Location Transparency  Everything akk is designed to work in distributed environment: All interactions of actors use pure message passing and everything is asynchronous  akka.tcp://MySystem@localhost:9001/user/actorName1

Multi threads vs Multi Actors  1 MB per thread (4MB in 64 bit) vs 2.7 million actors per gigabyte

More on Actors  Let it crash philosophy  The failure strategies  One for one supervision  All for one. One node crashes the siblings will be restarted

References Akka - http://akka.io Reactive Extensions - http://reactivex.io/ Halo - https://www.halowaypoint.com/en-us/games Carl Hewitt - http://en.wikipedia.org/wiki/Carl_Hewitt Orbit Framework - https://github.com/electronicarts/orbit,http://orbit.bioware.com/ Project Orleans - https://github.com/dotnet/orleans

That’s all friends  Malisa Ncube  @malisancube  http://malisancube.com

Designing distributed systems

More Related Content

What's hot

Similar to Designing distributed systems

Recently uploaded

Designing distributed systems

Editor's Notes