Enterprise Architecture with Ruby (and Rails) Building amazing products, companies and technology using Ruby on Rails and friends. An opinionated overview for MagmaRails.MX by Konstantin Gredeskoul, CTO, Wanelo, Inc twitter: @kig, github.com/kigster
My Background CTO @ Wanelo.com — “Pinterest for shopping” Principal @ ModCloth.com — is one of the largest independent e-commerce Rails sites Principal @ Blurb.com — print-on-demand bookstore, and a large e-commerce web site Professionally building enterprise software since 1995 Converted from Java/Perl/C to ruby in 2006
What is Enterprise?
What is Enterprise? It’s an organization with many people, services, technologies
What is Enterprise? It’s an organization with many people, services, technologies Enterprise architecture is an ongoing business function that helps an 'enterprise' figure out how to best execute the strategies that drive its development [ref: wikipedia]
From Start-Up To Enterprise
From Start-Up To Enterprise Many modern enterprises started small, as tiny start-ups
From Start-Up To Enterprise Many modern enterprises started small, as tiny start-ups Many start-ups choose RoR for productivity
From Start-Up To Enterprise Many modern enterprises started small, as tiny start-ups Many start-ups choose RoR for productivity As the start-up grows, so does the technology, applications, and the stack.
Teams using RoR can be very productive
Teams using RoR can be very productive Productivity is super important for unproven young companies trying things out
Teams using RoR can be very productive Productivity is super important for unproven young companies trying things out “Build quickly, iterate, avoid building features users don’t need” — Lean Start-Up Movement
Teams using RoR can be very productive Productivity is super important for unproven young companies trying things out “Build quickly, iterate, avoid building features users don’t need” — Lean Start-Up Movement Do not optimize “prematurely”, but think about tomorrow’s scalability when building today.
Productivity vs Scale: The Dilemma!
Productivity vs Scale: The Dilemma! To move fast - we use Ruby (dynamic languages), a framework (Rails), cloud, a familiar database, and keep the team small
Productivity vs Scale: The Dilemma! To move fast - we use Ruby (dynamic languages), a framework (Rails), cloud, a familiar database, and keep the team small To truly scale an application - need multiple languages (Java, C/C++, Scala), custom or no frameworks, datacenter, large team
But does everyone need mega scale?
But does everyone need mega scale? Majority of Rails projects are OK without mega- scale (only a tiny fraction is like Twitter or Facebook)
But does everyone need mega scale? Majority of Rails projects are OK without mega- scale (only a tiny fraction is like Twitter or Facebook) Ruby/Rails can happily grow into a large applications without major rewrites
But does everyone need mega scale? Majority of Rails projects are OK without mega- scale (only a tiny fraction is like Twitter or Facebook) Ruby/Rails can happily grow into a large applications without major rewrites Best assurance that an application will grow well with it’s use, is to follow best practices.
So what is this talk about?
So what is this talk about? How to start small But move fast
So what is this talk about? How to start small But move fast How to evolve a Rails app But keep it scalable
So what is this talk about? How to start small But move fast How to evolve a Rails app But keep it scalable How to split things up When the app gets large, and keep everyone sane
Part 1: How to start small, but move fast
Get a great team together
Get a great team together Keep team size small, 4-6 developers is ideal
Get a great team together Keep team size small, 4-6 developers is ideal Have at least 2-3 ruby/rails/front-end experts on the team
Get a great team together Keep team size small, 4-6 developers is ideal Have at least 2-3 ruby/rails/front-end experts on the team Do automated testing (and TDD) from the beginning. Hard to add later.
Process matters
Process matters Paired Programming is amazing. Level the field, transfer knowledge, build trust within the team, move faster
Process matters Paired Programming is amazing. Level the field, transfer knowledge, build trust within the team, move faster Morning stand-ups, weekly sprint planners, technical discussions as needed, retrospectives
Process matters Paired Programming is amazing. Level the field, transfer knowledge, build trust within the team, move faster Morning stand-ups, weekly sprint planners, technical discussions as needed, retrospectives Dedicated graphic designer/UXR, and a Product Manager
Everyday tools matter
Everyday tools matter RubyMine IDE is very powerful, but $69 Other tools also work, VIM, TextMate
Everyday tools matter RubyMine IDE is very powerful, but $69 Other tools also work, VIM, TextMate When pairing, using consistent toolset is very important. Pick it and stick to it.
Everyday tools matter RubyMine IDE is very powerful, but $69 Other tools also work, VIM, TextMate When pairing, using consistent toolset is very important. Pick it and stick to it. If everyone has their own laptop, create a common OS account and use it to pair
Communication is key
Communication is key Continuous Integration server runs all automated tests (Jenkins is great!) Everyone knows when tests break!
Communication is key Continuous Integration server runs all automated tests (Jenkins is great!) Everyone knows when tests break!
Communication is key Continuous Integration server runs all automated tests (Jenkins is great!) Everyone knows when tests break! Pivotal CI Monitor open source app pulls from Jenkins
Communication is key
Communication is key Use Chat (eg, Campfire) to notify team about check-ins, deploys or failed builds
Communication is key Use Chat (eg, Campfire) to notify team about check-ins, deploys or failed builds Review other’s commits (ie, on GitHub) to learn as much code as possible
Communication is key Use Chat (eg, Campfire) to notify team about check-ins, deploys or failed builds Review other’s commits (ie, on GitHub) to learn as much code as possible Take care of your team mates, and do worry about the project. Success depends on it.
A few more awesome tools*
A few more awesome tools* iTerm2 - free mega awesome Terminal replacement (Cmd-D/Cmd-Shift-D)
A few more awesome tools* iTerm2 - free mega awesome Terminal replacement (Cmd-D/Cmd-Shift-D) SizeUp - align windows on the screen right/left/ up/down/middle.
A few more awesome tools* iTerm2 - free mega awesome Terminal replacement (Cmd-D/Cmd-Shift-D) SizeUp - align windows on the screen right/left/ up/down/middle. iStat Menus - view CPU, Network IO, Disk in Mac OS-X Toolbar
A few more awesome tools* iTerm2 - free mega awesome Terminal replacement (Cmd-D/Cmd-Shift-D) SizeUp - align windows on the screen right/left/ up/down/middle. iStat Menus - view CPU, Network IO, Disk in Mac OS-X Toolbar CCMenu - view results of CI in your toolbar
Choice of libraries matters
Choice of libraries matters MiniTest, Jasmine, Capybara (RackTest + Selenium) for testing
Choice of libraries matters MiniTest, Jasmine, Capybara (RackTest + Selenium) for testing Devise for authentication, user mgmt
Choice of libraries matters MiniTest, Jasmine, Capybara (RackTest + Selenium) for testing Devise for authentication, user mgmt Twitter Bootstrap for early UI is amazing although we prefer SCSS instead of LESS
Choice of libraries matters MiniTest, Jasmine, Capybara (RackTest + Selenium) for testing Devise for authentication, user mgmt Twitter Bootstrap for early UI is amazing although we prefer SCSS instead of LESS HAML for views, RABL for APIs
Data matters the most
Data matters the most Relational Databases: PostgreSQL, MySQL High consistency, reliability, decades of research, great performance, gets tricky at mega-scale
Data matters the most Relational Databases: PostgreSQL, MySQL High consistency, reliability, decades of research, great performance, gets tricky at mega-scale BigTable based: MongoDB, HBase Eventual consistency, recent, have indexes, almost table-like. Also tricky at mega scale.
Data matters the most Relational Databases: PostgreSQL, MySQL High consistency, reliability, decades of research, great performance, gets tricky at mega-scale BigTable based: MongoDB, HBase Eventual consistency, recent, have indexes, almost table-like. Also tricky at mega scale. Amazon Dynamo like: RIAK, Voldemort Distributed hash-table, tricky from the very beginning.
What to choose?
What to choose? Without a strong reason otherwise, choose a relational database. I prefer PostgreSQL.
What to choose? Without a strong reason otherwise, choose a relational database. I prefer PostgreSQL. Instagram scaled on PostgreSQL very well
What to choose? Without a strong reason otherwise, choose a relational database. I prefer PostgreSQL. Instagram scaled on PostgreSQL very well If under pressure and in doubt, it’s OK to choose whatever you are familiar with.
Part 2: How to evolve a Rails App
New Rails Project: Day 1
New Rails Project: Day 1 rails new my-awesome-app cd my-awesome-app rake db:migrate
New Rails Project: Day 1 rails new my-awesome-app cd my-awesome-app rake db:migrate
New Rails Project: Day 1 rails new my-awesome-app cd my-awesome-app rake db:migrate ruby 1.9.3-p125 rails 3.2.3 macbook air 1.8Ghz
incoming http 1. Starting Up One app server, one db, nginx 10 unicorns per app server Unicorn // Passenger Unicorn Passenger Ruby Ruby VM N) VM (times nginx for static assets PostgreSQL for data Always put your DB on a separate server DB Cloud
incoming http 1. Starting Up nginx Unicorn // Passenger Unicorn Passenger Ruby Ruby VM N) VM (times DB
incoming http 1. Starting Up nginx Simple, but no app server Unicorn // Passenger Unicorn Passenger redundancy, limited Ruby Ruby VM N) VM (times throughput DB
incoming http 1. Starting Up nginx Simple, but no app server Unicorn // Passenger Unicorn Passenger redundancy, limited Ruby Ruby VM N) VM (times throughput 10 unicorns = 10 concurrent requests at any DB one time
incoming http 2. Growing Up nginx Split into multiple App Unicorn // Passenger Unicorn Passenger Servers Ruby Ruby VM N) VM (times HAProxy to distribute load nginx for static files found on local file system, proxy requests otherwise DB
incoming http 2. Growing Up nginx haproxy Split into multiple App Servers Unicorn Unicorn / Passenger HAProxy to distribute load Ruby VM Ruby VM nginx for static files found on local file system, proxy requests otherwise DB
incoming http 2. Growing Up nginx haproxy Site usage grows. Responses get slow. Unicorn Unicorn / Passenger Ruby VM Ruby VM DB
incoming http 2. Growing Up nginx haproxy Site usage grows. Responses get slow. Unicorn Unicorn / Passenger Ruby VM Ruby VM Started at 150ms, then 400ms, then 700ms.... DB
incoming http 3. Scaling Up nginx haproxy Unicorn Unicorn / Passenger Ruby VM Ruby VM DB
incoming http 3. Scaling Up nginx Add MemCached (1Gb+) haproxy Unicorn Unicorn / Passenger Ruby VM Ruby VM DB
incoming http 3. Scaling Up nginx Add MemCached (1Gb+) haproxy Use Redis (or cookies) for sessions (reduce db load) Unicorn Unicorn / Passenger Ruby VM Ruby VM DB
incoming http 3. Scaling Up nginx Add MemCached (1Gb+) haproxy Use Redis (or cookies) for sessions (reduce db load) Unicorn Unicorn / Passenger Ruby VM Ruby VM memcache redis DB
incoming http 3. Scaling Up nginx Add MemCached (1Gb+) haproxy Use Redis (or cookies) for sessions (reduce db load) Unicorn Unicorn / Passenger Ruby VM Ruby VM memcache Add action caching (even short TTL helps, i.e. 1min) redis DB
incoming http 3. Scaling Up nginx Add MemCached (1Gb+) haproxy Use Redis (or cookies) for sessions (reduce db load) Unicorn Unicorn / Passenger Ruby VM Ruby VM memcache Add action caching (even short TTL helps, i.e. 1min) redis Use AJAX to personalize pages to make them cacheable* DB
Personalization with AJAX - A brief de-tour
Personalization with AJAX - A brief de-tour 1. Logged in (or not) user requests a page...
Personalization with AJAX - A brief de-tour 1. Logged in (or not) user requests a page... 2. Page is served from the cache without any personalization (no “Hi John!”, “Logout”, etc)
Personalization with AJAX - A brief de-tour 1. Logged in (or not) user requests a page... 2. Page is served from the cache without any personalization (no “Hi John!”, “Logout”, etc) 3. on document.ready: AJAX hits the server, gets tiny JSON data of the current user (or “not logged in”)
Personalization with AJAX - A brief de-tour 1. Logged in (or not) user requests a page... 2. Page is served from the cache without any personalization (no “Hi John!”, “Logout”, etc) 3. on document.ready: AJAX hits the server, gets tiny JSON data of the current user (or “not logged in”) 4. JS modifies the DOM to show user’s logged in state, any other personalization, or “Log In”.
Personalization with AJAX - How?
Personalization with AJAX - Where?
Personalization with AJAX - Why?
Personalization with AJAX - Why? Because entire page can be served from the cache (often 50Kb+ per request)
Personalization with AJAX - Why? Because entire page can be served from the cache (often 50Kb+ per request) No ActiveRecord and no rendering makes it really fast!
Personalization with AJAX - Why? Because entire page can be served from the cache (often 50Kb+ per request) No ActiveRecord and no rendering makes it really fast! Recent rough test using Rails 3.2.3, ruby 1.9.3-p194, memcached: 4ms latency!!!
Why not page caching?
Why not page caching? Because unlike action caching, page caching is file-system based.
Why not page caching? Because unlike action caching, page caching is file-system based. Because it’s more difficult to expire
Why not page caching? Because unlike action caching, page caching is file-system based. Because it’s more difficult to expire Because it’s more difficult to share across many servers
incoming http 4. Scaling Images nginx haproxy We are serving lots of images. Nginx is getting slammed. Unicorn Unicorn / Passenger Ruby VM memcache Ruby VM redis DB
incoming http 4. Scaling Images nginx haproxy We are serving lots of images. Nginx is getting slammed. Unicorn Unicorn / Passenger Ruby VM memcache Ruby VM Should we add more redis balancers? Write our own? DB
incoming http 4. Scaling Images nginx haproxy We are serving lots of images. Nginx is getting slammed. Unicorn Unicorn / Passenger Ruby VM memcache Ruby VM Should we add more redis balancers? Write our own? DB HELLZ NO!
incoming incoming http http 4. Scaling Images CDN cache images, JS nginx haproxy Unicorn // Passenger Unicorn Passenger RubyRuby VM N) VM (times memcache redis DB
incoming incoming http http 4. Scaling Images CDN cache images, JS Don’t wait to use a CDN to SERVE images, especially nginx user-uploaded images. haproxy Unicorn // Passenger Unicorn Passenger RubyRuby VM N) VM (times memcache redis DB
incoming incoming http http 4. Scaling Images CDN cache images, JS Don’t wait to use a CDN to SERVE images, especially nginx user-uploaded images. haproxy S3 is a popular choice to STORE images. Unicorn // Passenger Unicorn Passenger RubyRuby VM N) VM (times memcache redis DB
incoming incoming http http 4. Scaling Images CDN cache images, JS Don’t wait to use a CDN to SERVE images, especially nginx user-uploaded images. haproxy S3 is a popular choice to STORE images. Unicorn // Passenger Unicorn Passenger RubyRuby VM N) VM (times memcache But it’s smart to keep a redis local backup copy... DB
incoming incoming http http 5. Deployments and CDN Downtime cache images, JS Our site is popular! nginx And our users hate haproxy downtime. Unicorn // Passenger Unicorn Passenger RubyRuby VM N) VM (times memcache redis DB
incoming incoming http http 5. Deployments and CDN Downtime cache images, JS Our site is popular! nginx And our users hate haproxy downtime. Unicorn // Passenger Unicorn Passenger RubyRuby VM N) VM (times memcache They really really do. redis DB
5. Deployments and Downtime
5. Deployments and Downtime We want to be able to deploy the code while the site is running. So users are happy.
5. Deployments and Downtime We want to be able to deploy the code while the site is running. So users are happy. There are several ways to do that.
5. Deployments and Downtime We want to be able to deploy the code while the site is running. So users are happy. There are several ways to do that. This solution uses DNS round robin with two balancers, and two public IP addresses.
Two Cluster Solution = Almost Zero Downtime incoming http incoming http balancer1 balancer2 nginx nginx haproxy haproxy Unicorn // Passenger Unicorn Passenger Unicorn // Passenger Unicorn Passenger RubyRuby VM N) VM (times memcache memcache RubyRuby VM N) VM (times redis DB
Two Cluster Solution = Almost Zero Downtime incoming http incoming http balancer1 balancer2 nginx nginx haproxy haproxy Unicorn // Passenger Unicorn Passenger Unicorn // Passenger Unicorn Passenger RubyRuby VM N) VM (times memcache memcache RubyRuby VM N) VM (times redis DB
Temporary Redirect Rule
Two Clusters are cool!
Two Clusters are cool! Cluster 1 runs old code and is live
Two Clusters are cool! Cluster 1 runs old code and is live Cluster 2 gets new code
Two Clusters are cool! Cluster 1 runs old code and is live Cluster 2 gets new code Old and new run in parallel, but only one is serving live traffic
Migrations with Zero Downtime?
Migrations with Zero Downtime? Almost possible on a live system, if:
Migrations with Zero Downtime? Almost possible on a live system, if: We are not removing or renaming columns or tables in active use
Migrations with Zero Downtime? Almost possible on a live system, if: We are not removing or renaming columns or tables in active use Migrations do not lock tables (for too long)
Migrations with Zero Downtime? Almost possible on a live system, if: We are not removing or renaming columns or tables in active use Migrations do not lock tables (for too long) Column/Table renames/deletes can be done in two deployments instead of one
So the app is now faster, and we can deploy without a downtime
So the app is now faster, and we can deploy without a downtime What about email and other long-running tasks?
So the app is now faster, and we can deploy without a downtime What about email and other long-running tasks? Don’t forget SPF records.
incoming http Background Jobs with Resque balancer1 nginx But monitor it’s queues haproxy Must restart on reboot Resque Workers Unicorn memcache redis DB
incoming http Background Jobs with Resque balancer1 nginx But monitor it’s queues haproxy Must restart on reboot Resque Workers Unicorn memcache redis resque-cleaner is awesome! DB
Different queues for different types of jobs
Different queues for different types of jobs Relatively easy to implement priorities for Jobs (order queues by priority)
Different queues for different types of jobs Relatively easy to implement priorities for Jobs (order queues by priority) Group Jobs by execution times to avoid delays
Different queues for different types of jobs Relatively easy to implement priorities for Jobs (order queues by priority) Group Jobs by execution times to avoid delays Resque::Worker x N QUEUE=SlowQueue1,SlowQueue2 redis Resque::Worker x M QUEUE=FastQueue1,FastQueue2
DB Usage and complexity grows. We are doing big joins with many tables, and they are taking their sweet time.
Solr to the Resque
Solr to the Resque Use Solr instead of doing complex joins
Solr to the Resque Use Solr instead of doing complex joins Solr reads are < 10ms
Solr to the Resque Use Solr instead of doing complex joins Solr reads are < 10ms Sunspot Gem by default writes to Solr from each ruby VM (i.e. unicorn)!
Solr to the Resque Use Solr instead of doing complex joins Solr reads are < 10ms Sunspot Gem by default writes to Solr from each ruby VM (i.e. unicorn)! Serialize writes with Resque!
Solr to the Resque Use Solr instead of doing complex joins Solr reads are < 10ms Sunspot Gem by default writes to Solr from each ruby VM (i.e. unicorn)! Serialize writes with Resque! One master for writes
Solr to the Resque Use Solr instead of doing complex joins Solr reads are < 10ms Sunspot Gem by default writes to Solr from each ruby VM (i.e. unicorn)! Serialize writes with Resque! One master for writes Read replicas on each app server
Putting it together Unicorn Unicorn / Workers Resque Passenger Unicorn / Passenger Ruby VM Ruby VM 3. Update Solr solr_replica 4. Replicate solr_master 2. Read Model Info 1. Model Changed redis DB
At this size...
At this size... Automate everything Chef or Puppet is awesome
At this size... Automate everything Chef or Puppet is awesome Monitor everything Tolerate reboots, restarts, partial failures
At this size... Automate everything Chef or Puppet is awesome Monitor everything Tolerate reboots, restarts, partial failures Use OS services layer to start/stop everything Ensures recovery after reboot
At this size... Automate everything Chef or Puppet is awesome Monitor everything Tolerate reboots, restarts, partial failures Use OS services layer to start/stop everything Ensures recovery after reboot Capistrano tends to gets “complex” Can also deploy with Chef
Choose Vendors Wisely You can pick your own, but here is my list:
Choose Vendors Wisely You can pick your own, but here is my list: Clouds - JOYENT, EngineYard fastest I/O cloud, but on Solaris derivative
Choose Vendors Wisely You can pick your own, but here is my list: Clouds - JOYENT, EngineYard fastest I/O cloud, but on Solaris derivative Automation - Chef + OPSCODE
Choose Vendors Wisely You can pick your own, but here is my list: Clouds - JOYENT, EngineYard fastest I/O cloud, but on Solaris derivative Automation - Chef + OPSCODE Caching/CDN - FASTLY.COM varnish based CDN, very fast, full power of VCL configuration
Choose Vendors Wisely You can pick your own, but here is my list: Clouds - JOYENT, EngineYard fastest I/O cloud, but on Solaris derivative Automation - Chef + OPSCODE Caching/CDN - FASTLY.COM varnish based CDN, very fast, full power of VCL configuration Metrics and Performance - NewRelic Turnkey solution, getting better every day
In Development
In Development Use foreman to start dependent services (Solr, Redis, Resque) from a Procfile
In Development Use foreman to start dependent services (Solr, Redis, Resque) from a Procfile Do “Just enough” testing with Solr - it’s slow!
In Development Use foreman to start dependent services (Solr, Redis, Resque) from a Procfile Do “Just enough” testing with Solr - it’s slow! Deploy to single-box demo servers often using the same Capistrano scripts used for production
Mature App: Day 700
Mature App: Day 700 cd my-awesome-app rake rspec:models rspec:controllers rspec:libs rake cucumber:webrat cucumber:selenium jasmine
Mature App: Day 700 cd my-awesome-app rake rspec:models rspec:controllers rspec:libs rake cucumber:webrat cucumber:selenium jasmine
How Big Exactly?
How Big Exactly? 200+ models 200K+ lines of RUBY source code without gems 100K+ lines of ERB, HTML and HAML templates 100+ gem dependencies
How Big Exactly? 200+ models 200K+ lines of RUBY source code without gems 100K+ lines of ERB, HTML and HAML templates 100+ gem dependencies this is a real world application that’s in production today.
Is that too big?
Is that too big? This cat’s name is Lenin
Here is why I think it is.
Here is why I think it is. 1.5+ hours for the full the test suite to complete 10 mins of db seeds, 30 minutes for unit tests only, etc
Here is why I think it is. 1.5+ hours for the full the test suite to complete 10 mins of db seeds, 30 minutes for unit tests only, etc merges often result in integration tests going RED
Here is why I think it is. 1.5+ hours for the full the test suite to complete 10 mins of db seeds, 30 minutes for unit tests only, etc merges often result in integration tests going RED 20 seconds boot-up time for Rails env (r console, etc)!
Here is why I think it is. 1.5+ hours for the full the test suite to complete 10 mins of db seeds, 30 minutes for unit tests only, etc merges often result in integration tests going RED 20 seconds boot-up time for Rails env (r console, etc)! 500Mb of RSS RAM for one single-threaded web process
Here is why I think it is. 1.5+ hours for the full the test suite to complete 10 mins of db seeds, 30 minutes for unit tests only, etc merges often result in integration tests going RED 20 seconds boot-up time for Rails env (r console, etc)! 500Mb of RSS RAM for one single-threaded web process it’s a difficult undertaking to upgrade dependencies and rails
Yeah.
It was much nicer when it was a bit smaller..
It was much nicer when it was a bit smaller..
Let’s zoom in...
Let’s zoom in... Is PERFORMANCE of the app an issue?
Let’s zoom in... Is PERFORMANCE of the app an issue? NO! 150ms per request avg
Let’s zoom in... Is PERFORMANCE of the app an issue? NO! 150ms per request avg Is SCALABILITY of the app an issue?
Let’s zoom in... Is PERFORMANCE of the app an issue? NO! 150ms per request avg Is SCALABILITY of the app an issue? NO! 8000+ concurrent users
Let’s zoom in... Is PERFORMANCE of the app an issue? NO! 150ms per request avg Is SCALABILITY of the app an issue? NO! 8000+ concurrent users Is RELIABILITY of the app an issue?
Let’s zoom in... Is PERFORMANCE of the app an issue? NO! 150ms per request avg Is SCALABILITY of the app an issue? NO! 8000+ concurrent users Is RELIABILITY of the app an issue? NO! barely any downtime in over one year
Then WTF is the Problem?
Then WTF is the Problem? Is PRODUCTIVITY of developing the app an issue?
Then WTF is the Problem? Is PRODUCTIVITY of developing the app an issue? YES! Lots of waiting all the time
Then WTF is the Problem? Is PRODUCTIVITY of developing the app an issue? YES! Lots of waiting all the time Is MERGING source code between parallel projects difficult?
Then WTF is the Problem? Is PRODUCTIVITY of developing the app an issue? YES! Lots of waiting all the time Is MERGING source code between parallel projects difficult? YES! 30+ people sharing large codebase
Then WTF is the Problem? Is PRODUCTIVITY of developing the app an issue? YES! Lots of waiting all the time Is MERGING source code between parallel projects difficult? YES! 30+ people sharing large codebase Is KEEPING TEST SUIT GREEN challenging?
Then WTF is the Problem? Is PRODUCTIVITY of developing the app an issue? YES! Lots of waiting all the time Is MERGING source code between parallel projects difficult? YES! 30+ people sharing large codebase Is KEEPING TEST SUIT GREEN challenging? YES! tests are brittle and long running
But wait, there’s more!
But wait, there’s more! What about DEPLOYMENT of a large app? Takes a long time, and small tweaks require full deploys
But wait, there’s more! What about DEPLOYMENT of a large app? Takes a long time, and small tweaks require full deploys What about HOSTING COSTS? Necessary to provide enough RAM for the app to be scalable.
RAM? Latency matters...
RAM? Latency matters... 1 Request = 200ms on average latency 5 reqs/second on a single-threaded ruby VM process
RAM? Latency matters... 1 Request = 200ms on average latency 5 reqs/second on a single-threaded ruby VM process 30,000 RPM = 500 r/sec = 100 processes 50Gb of RAM @ 200ms latency If average latency is 600ms, need 150Gb of RAM !!!
Smaller is actually better.
So how do we solve this?
Part 3: How to split things up
Couple of main themes
Couple of main themes Break up into smaller applications
Couple of main themes Break up into smaller applications Extract services and create APIs
Couple of main themes Break up into smaller applications Extract services and create APIs Extract libraries (gems)
Smaller Applications
Smaller Applications Contain web GUI, logic, and data
Smaller Applications Contain web GUI, logic, and data May combine with other apps
Smaller Applications Contain web GUI, logic, and data May combine with other apps May rely on common libraries
Smaller Applications Contain web GUI, logic, and data May combine with other apps May rely on common libraries May rely on services
Smaller Applications Contain web GUI, logic, and data May combine with other apps May rely on common libraries May rely on services Typically run in their own Ruby VM
Consider a Typical E-Commerce Store
Consider a Typical E-Commerce Store Users must be able to register, login, logout (profiles)
Consider a Typical E-Commerce Store Users must be able to register, login, logout (profiles) Users must be able to browse and search products, view, and add to cart
Consider a Typical E-Commerce Store Users must be able to register, login, logout (profiles) Users must be able to browse and search products, view, and add to cart Users must be able to checkout
Consider a Typical E-Commerce Store Users must be able to register, login, logout (profiles) Users must be able to browse and search products, view, and add to cart Users must be able to checkout Probably many other stories, such as admin, but we’ll ignore for now.
One idea...
One idea... Application 1: Marketing, Product Catalog Browser, Search + Product Detail Page
One idea... Application 1: Marketing, Product Catalog Browser, Search + Product Detail Page Application 2: Checkout, Payment, Order History, Returns Fulfillment
One idea... Application 1: Marketing, Product Catalog Browser, Search + Product Detail Page Application 2: Checkout, Payment, Order History, Returns Fulfillment Very clear user flow transfer and data separation.
Some things can be shared
Some things can be shared Service: Single Sign-on, User profiles, Login/Registration [devise?, rest-full authentication?]
Some things can be shared Service: Single Sign-on, User profiles, Login/Registration [devise?, rest-full authentication?] Service: Product Catalog data, Inventory Data
Some things can be shared Service: Single Sign-on, User profiles, Login/Registration [devise?, rest-full authentication?] Service: Product Catalog data, Inventory Data Service: Comments, Votes, Ratings, Reviews
Services Technologies
Services Technologies Rack/Sinatra/Rails are popular, and are often an entirely sufficient choice
Services Technologies Rack/Sinatra/Rails are popular, and are often an entirely sufficient choice Goliath is awesome if performance is important, and if the service is mostly I/O bound
Services Technologies Rack/Sinatra/Rails are popular, and are often an entirely sufficient choice Goliath is awesome if performance is important, and if the service is mostly I/O bound node.js is also a popular choice
Services Technologies Rack/Sinatra/Rails are popular, and are often an entirely sufficient choice Goliath is awesome if performance is important, and if the service is mostly I/O bound node.js is also a popular choice Implementation may change in the future, as long as the API stays consistent
http balancer / router /checkout → checkout app /* → catalog app catalog app checkout app cart DB CSS/UI Library DB Product Service Reviews, User Auth / Login Inventory Comments, Profiles Votes, Ratings DB DB DB
http balancer / router /checkout → checkout app /* → catalog app catalog app checkout app CSS/UI Library DB DB Product Service Reviews, User Auth / Login Inventory Comments, Profiles Votes, Ratings DB DB DB
http balancer / router Extract look and feel /checkout → checkout app /* → catalog app (CSS/UI) into a gem to catalog app checkout app share across apps CSS/UI Library DB DB Product Service Reviews, User Auth / Login Inventory Comments, Profiles Votes, Ratings DB DB DB
http balancer / router Extract look and feel /checkout → checkout app /* → catalog app (CSS/UI) into a gem to catalog app checkout app share across apps CSS/UI Library DB DB Create client API wrapper Product Service Reviews, User Auth / Login gems for consumers Inventory Comments, Profiles Votes, Ratings DB DB DB
http balancer / router Extract look and feel /checkout → checkout app /* → catalog app (CSS/UI) into a gem to catalog app checkout app share across apps CSS/UI Library DB DB Create client API wrapper Product Service Reviews, User Auth / Login gems for consumers Inventory Comments, Profiles Votes, Ratings DB DB DB Create a single shared “base” client gem library
Rails App with < 30 Models
Rails App with < 30 Models Can run tests pretty quickly, hopefully under 5 minutes
Rails App with < 30 Models Can run tests pretty quickly, hopefully under 5 minutes Is often large enough to describe typical “clusters of functionality”, i.e. - mini apps
Rails App with < 30 Models Can run tests pretty quickly, hopefully under 5 minutes Is often large enough to describe typical “clusters of functionality”, i.e. - mini apps Ruby VM might actually stay under 100Mb of RSS RAM
Rails App with < 30 Models Can run tests pretty quickly, hopefully under 5 minutes Is often large enough to describe typical “clusters of functionality”, i.e. - mini apps Ruby VM might actually stay under 100Mb of RSS RAM Is more comprehensible and can be effectively maintained by a small dev team.
3rd Party Integrations catalog app checkout app analytics financial warehouse and ERP system management reporting system CSS/UI Library DB DB Product Service Reviews, User Auth / Login Inventory Comments, Profiles Votes, Ratings DB DB DB
Ecosystem of Applications
Ecosystem of Applications Is inevitable in large companies
Ecosystem of Applications Is inevitable in large companies Scale better from team perspective
Ecosystem of Applications Is inevitable in large companies Scale better from team perspective Offer decoupling and implementation hiding
Ecosystem of Applications Is inevitable in large companies Scale better from team perspective Offer decoupling and implementation hiding Can be individually optimized and scaled
But then... Must every app know about every other app?
API Proxy / Router catalog app checkout app analytics financial and ERP CSS/UI Library DB DB reporting system http://api.mycompany.com/ Product Service Reviews, User Auth / Login Inventory Comments, Profiles Votes, Ratings DB DB DB
Example: Order Placed
Example: Order Placed Warehouse Management System needs to be updated
Example: Order Placed Warehouse Management System needs to be updated Analytics Engine needs to be notified
Example: Order Placed Warehouse Management System needs to be updated Analytics Engine needs to be notified Financials needs to be updated
Example: Order Placed Warehouse Management System needs to be updated Analytics Engine needs to be notified Financials needs to be updated Question: which component is responsible for updating each application?
1995 Was Great
1995 Was Great GoF Design Patterns: Observer “...One-to-Many dependency between objects so that when one object changes state, all its dependents are notified and updated automatically...”
Distributed Version
Distributed Version Publish/Subscribe and Point-to-Point Asynchronous Middleware
Distributed Version Publish/Subscribe and Point-to-Point Asynchronous Middleware
RabbitMQ is Great
Some Options for Pub/Sub
Some Options for Pub/Sub RabbitMQ ruby-amqp gem to interface
Some Options for Pub/Sub RabbitMQ ruby-amqp gem to interface EventMachine::Channel
Other Distributed Options
Other Distributed Options DRb - distributed Ruby (also Rinda, Starfish, beanstalkd, etc)
Other Distributed Options DRb - distributed Ruby (also Rinda, Starfish, beanstalkd, etc) DCell - actor based based on 0MQ http://www.unlimitednovelty.com/2012/04/ introducing-dcell-actor-based.html
Other Distributed Options DRb - distributed Ruby (also Rinda, Starfish, beanstalkd, etc) DCell - actor based based on 0MQ http://www.unlimitednovelty.com/2012/04/ introducing-dcell-actor-based.html All of them are a bit too low level for sharing and consuming business events
I would love a library for publishing business events built on top
Dependency Stack Business Events Collection Commerce::OrderPlaced SharedEvent::Base AMQP-ruby Library AMQP::connect RabbitMQ Middleware
Future Library
Future Library Hides complexities of queues and exchanges
Future Library Hides complexities of queues and exchanges Consumers declare interest in events they care about, define persistence and retry policy
Future Library Hides complexities of queues and exchanges Consumers declare interest in events they care about, define persistence and retry policy Publishers fire! events and forget about it
Future Library, ctd.
Future Library, ctd. Once registered, consumers get messages even after being offline
Future Library, ctd. Once registered, consumers get messages even after being offline When publisher can submit an event to the queue, it’s job is done.
Future Library, ctd. Once registered, consumers get messages even after being offline When publisher can submit an event to the queue, it’s job is done. Library of business events becomes a compliment to the set of business APIs
Untangle Communications catalog app checkout app 4. consume InventoryChanged 1. publish OrderPlaced Messaging Bus 3. publish InventoryChanged 4. consume InventoryChanged 2. consume OrderPlaced analytics and financial ERP inventory management reporting system
I don’t think this library exists yet, but I would like to write one soon =)
Distributed Ruby Reading
Thank you.
Thank you. twitter: @kig github/kigster

Enterprise Architectures with Ruby (and Rails)

  • 1.
    Enterprise Architecture with Ruby(and Rails) Building amazing products, companies and technology using Ruby on Rails and friends. An opinionated overview for MagmaRails.MX by Konstantin Gredeskoul, CTO, Wanelo, Inc twitter: @kig, github.com/kigster
  • 2.
    My Background CTO@ Wanelo.com — “Pinterest for shopping” Principal @ ModCloth.com — is one of the largest independent e-commerce Rails sites Principal @ Blurb.com — print-on-demand bookstore, and a large e-commerce web site Professionally building enterprise software since 1995 Converted from Java/Perl/C to ruby in 2006
  • 3.
  • 4.
    What is Enterprise? It’san organization with many people, services, technologies
  • 5.
    What is Enterprise? It’san organization with many people, services, technologies Enterprise architecture is an ongoing business function that helps an 'enterprise' figure out how to best execute the strategies that drive its development [ref: wikipedia]
  • 6.
    From Start-Up ToEnterprise
  • 7.
    From Start-Up ToEnterprise Many modern enterprises started small, as tiny start-ups
  • 8.
    From Start-Up ToEnterprise Many modern enterprises started small, as tiny start-ups Many start-ups choose RoR for productivity
  • 9.
    From Start-Up ToEnterprise Many modern enterprises started small, as tiny start-ups Many start-ups choose RoR for productivity As the start-up grows, so does the technology, applications, and the stack.
  • 10.
    Teams using RoRcan be very productive
  • 11.
    Teams using RoRcan be very productive Productivity is super important for unproven young companies trying things out
  • 12.
    Teams using RoRcan be very productive Productivity is super important for unproven young companies trying things out “Build quickly, iterate, avoid building features users don’t need” — Lean Start-Up Movement
  • 13.
    Teams using RoRcan be very productive Productivity is super important for unproven young companies trying things out “Build quickly, iterate, avoid building features users don’t need” — Lean Start-Up Movement Do not optimize “prematurely”, but think about tomorrow’s scalability when building today.
  • 14.
  • 15.
    Productivity vs Scale: TheDilemma! To move fast - we use Ruby (dynamic languages), a framework (Rails), cloud, a familiar database, and keep the team small
  • 16.
    Productivity vs Scale: TheDilemma! To move fast - we use Ruby (dynamic languages), a framework (Rails), cloud, a familiar database, and keep the team small To truly scale an application - need multiple languages (Java, C/C++, Scala), custom or no frameworks, datacenter, large team
  • 17.
    But does everyoneneed mega scale?
  • 18.
    But does everyoneneed mega scale? Majority of Rails projects are OK without mega- scale (only a tiny fraction is like Twitter or Facebook)
  • 19.
    But does everyoneneed mega scale? Majority of Rails projects are OK without mega- scale (only a tiny fraction is like Twitter or Facebook) Ruby/Rails can happily grow into a large applications without major rewrites
  • 20.
    But does everyoneneed mega scale? Majority of Rails projects are OK without mega- scale (only a tiny fraction is like Twitter or Facebook) Ruby/Rails can happily grow into a large applications without major rewrites Best assurance that an application will grow well with it’s use, is to follow best practices.
  • 21.
    So what isthis talk about?
  • 22.
    So what isthis talk about? How to start small But move fast
  • 23.
    So what isthis talk about? How to start small But move fast How to evolve a Rails app But keep it scalable
  • 24.
    So what isthis talk about? How to start small But move fast How to evolve a Rails app But keep it scalable How to split things up When the app gets large, and keep everyone sane
  • 25.
    Part 1: How tostart small, but move fast
  • 26.
    Get a greatteam together
  • 27.
    Get a greatteam together Keep team size small, 4-6 developers is ideal
  • 28.
    Get a greatteam together Keep team size small, 4-6 developers is ideal Have at least 2-3 ruby/rails/front-end experts on the team
  • 29.
    Get a greatteam together Keep team size small, 4-6 developers is ideal Have at least 2-3 ruby/rails/front-end experts on the team Do automated testing (and TDD) from the beginning. Hard to add later.
  • 30.
  • 31.
    Process matters PairedProgramming is amazing. Level the field, transfer knowledge, build trust within the team, move faster
  • 32.
    Process matters PairedProgramming is amazing. Level the field, transfer knowledge, build trust within the team, move faster Morning stand-ups, weekly sprint planners, technical discussions as needed, retrospectives
  • 33.
    Process matters PairedProgramming is amazing. Level the field, transfer knowledge, build trust within the team, move faster Morning stand-ups, weekly sprint planners, technical discussions as needed, retrospectives Dedicated graphic designer/UXR, and a Product Manager
  • 34.
  • 35.
    Everyday tools matter RubyMine IDE is very powerful, but $69 Other tools also work, VIM, TextMate
  • 36.
    Everyday tools matter RubyMine IDE is very powerful, but $69 Other tools also work, VIM, TextMate When pairing, using consistent toolset is very important. Pick it and stick to it.
  • 37.
    Everyday tools matter RubyMine IDE is very powerful, but $69 Other tools also work, VIM, TextMate When pairing, using consistent toolset is very important. Pick it and stick to it. If everyone has their own laptop, create a common OS account and use it to pair
  • 38.
  • 39.
    Communication is key Continuous Integration server runs all automated tests (Jenkins is great!) Everyone knows when tests break!
  • 40.
    Communication is key Continuous Integration server runs all automated tests (Jenkins is great!) Everyone knows when tests break!
  • 41.
    Communication is key Continuous Integration server runs all automated tests (Jenkins is great!) Everyone knows when tests break! Pivotal CI Monitor open source app pulls from Jenkins
  • 42.
  • 43.
    Communication is key Use Chat (eg, Campfire) to notify team about check-ins, deploys or failed builds
  • 44.
    Communication is key Use Chat (eg, Campfire) to notify team about check-ins, deploys or failed builds Review other’s commits (ie, on GitHub) to learn as much code as possible
  • 45.
    Communication is key Use Chat (eg, Campfire) to notify team about check-ins, deploys or failed builds Review other’s commits (ie, on GitHub) to learn as much code as possible Take care of your team mates, and do worry about the project. Success depends on it.
  • 46.
    A few moreawesome tools*
  • 47.
    A few moreawesome tools* iTerm2 - free mega awesome Terminal replacement (Cmd-D/Cmd-Shift-D)
  • 48.
    A few moreawesome tools* iTerm2 - free mega awesome Terminal replacement (Cmd-D/Cmd-Shift-D) SizeUp - align windows on the screen right/left/ up/down/middle.
  • 49.
    A few moreawesome tools* iTerm2 - free mega awesome Terminal replacement (Cmd-D/Cmd-Shift-D) SizeUp - align windows on the screen right/left/ up/down/middle. iStat Menus - view CPU, Network IO, Disk in Mac OS-X Toolbar
  • 50.
    A few moreawesome tools* iTerm2 - free mega awesome Terminal replacement (Cmd-D/Cmd-Shift-D) SizeUp - align windows on the screen right/left/ up/down/middle. iStat Menus - view CPU, Network IO, Disk in Mac OS-X Toolbar CCMenu - view results of CI in your toolbar
  • 51.
  • 52.
    Choice of librariesmatters MiniTest, Jasmine, Capybara (RackTest + Selenium) for testing
  • 53.
    Choice of librariesmatters MiniTest, Jasmine, Capybara (RackTest + Selenium) for testing Devise for authentication, user mgmt
  • 54.
    Choice of librariesmatters MiniTest, Jasmine, Capybara (RackTest + Selenium) for testing Devise for authentication, user mgmt Twitter Bootstrap for early UI is amazing although we prefer SCSS instead of LESS
  • 55.
    Choice of librariesmatters MiniTest, Jasmine, Capybara (RackTest + Selenium) for testing Devise for authentication, user mgmt Twitter Bootstrap for early UI is amazing although we prefer SCSS instead of LESS HAML for views, RABL for APIs
  • 56.
  • 57.
    Data matters themost Relational Databases: PostgreSQL, MySQL High consistency, reliability, decades of research, great performance, gets tricky at mega-scale
  • 58.
    Data matters themost Relational Databases: PostgreSQL, MySQL High consistency, reliability, decades of research, great performance, gets tricky at mega-scale BigTable based: MongoDB, HBase Eventual consistency, recent, have indexes, almost table-like. Also tricky at mega scale.
  • 59.
    Data matters themost Relational Databases: PostgreSQL, MySQL High consistency, reliability, decades of research, great performance, gets tricky at mega-scale BigTable based: MongoDB, HBase Eventual consistency, recent, have indexes, almost table-like. Also tricky at mega scale. Amazon Dynamo like: RIAK, Voldemort Distributed hash-table, tricky from the very beginning.
  • 60.
  • 61.
    What to choose? Without a strong reason otherwise, choose a relational database. I prefer PostgreSQL.
  • 62.
    What to choose? Without a strong reason otherwise, choose a relational database. I prefer PostgreSQL. Instagram scaled on PostgreSQL very well
  • 63.
    What to choose? Without a strong reason otherwise, choose a relational database. I prefer PostgreSQL. Instagram scaled on PostgreSQL very well If under pressure and in doubt, it’s OK to choose whatever you are familiar with.
  • 64.
    Part 2: How toevolve a Rails App
  • 65.
  • 66.
    New Rails Project:Day 1 rails new my-awesome-app cd my-awesome-app rake db:migrate
  • 67.
    New Rails Project:Day 1 rails new my-awesome-app cd my-awesome-app rake db:migrate
  • 68.
    New Rails Project:Day 1 rails new my-awesome-app cd my-awesome-app rake db:migrate ruby 1.9.3-p125 rails 3.2.3 macbook air 1.8Ghz
  • 69.
    incoming http 1. Starting Up One app server, one db, nginx 10 unicorns per app server Unicorn // Passenger Unicorn Passenger Ruby Ruby VM N) VM (times nginx for static assets PostgreSQL for data Always put your DB on a separate server DB Cloud
  • 70.
    incoming http 1. Starting Up nginx Unicorn // Passenger Unicorn Passenger Ruby Ruby VM N) VM (times DB
  • 71.
    incoming http 1. Starting Up nginx Simple, but no app server Unicorn // Passenger Unicorn Passenger redundancy, limited Ruby Ruby VM N) VM (times throughput DB
  • 72.
    incoming http 1. Starting Up nginx Simple, but no app server Unicorn // Passenger Unicorn Passenger redundancy, limited Ruby Ruby VM N) VM (times throughput 10 unicorns = 10 concurrent requests at any DB one time
  • 73.
    incoming http 2. Growing Up nginx Split into multiple App Unicorn // Passenger Unicorn Passenger Servers Ruby Ruby VM N) VM (times HAProxy to distribute load nginx for static files found on local file system, proxy requests otherwise DB
  • 74.
    incoming http 2. Growing Up nginx haproxy Split into multiple App Servers Unicorn Unicorn / Passenger HAProxy to distribute load Ruby VM Ruby VM nginx for static files found on local file system, proxy requests otherwise DB
  • 75.
    incoming http 2. Growing Up nginx haproxy Site usage grows. Responses get slow. Unicorn Unicorn / Passenger Ruby VM Ruby VM DB
  • 76.
    incoming http 2. Growing Up nginx haproxy Site usage grows. Responses get slow. Unicorn Unicorn / Passenger Ruby VM Ruby VM Started at 150ms, then 400ms, then 700ms.... DB
  • 77.
    incoming http 3. Scaling Up nginx haproxy Unicorn Unicorn / Passenger Ruby VM Ruby VM DB
  • 78.
    incoming http 3. Scaling Up nginx Add MemCached (1Gb+) haproxy Unicorn Unicorn / Passenger Ruby VM Ruby VM DB
  • 79.
    incoming http 3. Scaling Up nginx Add MemCached (1Gb+) haproxy Use Redis (or cookies) for sessions (reduce db load) Unicorn Unicorn / Passenger Ruby VM Ruby VM DB
  • 80.
    incoming http 3. Scaling Up nginx Add MemCached (1Gb+) haproxy Use Redis (or cookies) for sessions (reduce db load) Unicorn Unicorn / Passenger Ruby VM Ruby VM memcache redis DB
  • 81.
    incoming http 3. Scaling Up nginx Add MemCached (1Gb+) haproxy Use Redis (or cookies) for sessions (reduce db load) Unicorn Unicorn / Passenger Ruby VM Ruby VM memcache Add action caching (even short TTL helps, i.e. 1min) redis DB
  • 82.
    incoming http 3. Scaling Up nginx Add MemCached (1Gb+) haproxy Use Redis (or cookies) for sessions (reduce db load) Unicorn Unicorn / Passenger Ruby VM Ruby VM memcache Add action caching (even short TTL helps, i.e. 1min) redis Use AJAX to personalize pages to make them cacheable* DB
  • 83.
    Personalization with AJAX- A brief de-tour
  • 84.
    Personalization with AJAX- A brief de-tour 1. Logged in (or not) user requests a page...
  • 85.
    Personalization with AJAX- A brief de-tour 1. Logged in (or not) user requests a page... 2. Page is served from the cache without any personalization (no “Hi John!”, “Logout”, etc)
  • 86.
    Personalization with AJAX- A brief de-tour 1. Logged in (or not) user requests a page... 2. Page is served from the cache without any personalization (no “Hi John!”, “Logout”, etc) 3. on document.ready: AJAX hits the server, gets tiny JSON data of the current user (or “not logged in”)
  • 87.
    Personalization with AJAX- A brief de-tour 1. Logged in (or not) user requests a page... 2. Page is served from the cache without any personalization (no “Hi John!”, “Logout”, etc) 3. on document.ready: AJAX hits the server, gets tiny JSON data of the current user (or “not logged in”) 4. JS modifies the DOM to show user’s logged in state, any other personalization, or “Log In”.
  • 88.
  • 89.
  • 90.
  • 91.
    Personalization with AJAX- Why? Because entire page can be served from the cache (often 50Kb+ per request)
  • 92.
    Personalization with AJAX- Why? Because entire page can be served from the cache (often 50Kb+ per request) No ActiveRecord and no rendering makes it really fast!
  • 93.
    Personalization with AJAX- Why? Because entire page can be served from the cache (often 50Kb+ per request) No ActiveRecord and no rendering makes it really fast! Recent rough test using Rails 3.2.3, ruby 1.9.3-p194, memcached: 4ms latency!!!
  • 94.
    Why not pagecaching?
  • 95.
    Why not pagecaching? Because unlike action caching, page caching is file-system based.
  • 96.
    Why not pagecaching? Because unlike action caching, page caching is file-system based. Because it’s more difficult to expire
  • 97.
    Why not pagecaching? Because unlike action caching, page caching is file-system based. Because it’s more difficult to expire Because it’s more difficult to share across many servers
  • 98.
    incoming http 4. Scaling Images nginx haproxy We are serving lots of images. Nginx is getting slammed. Unicorn Unicorn / Passenger Ruby VM memcache Ruby VM redis DB
  • 99.
    incoming http 4. Scaling Images nginx haproxy We are serving lots of images. Nginx is getting slammed. Unicorn Unicorn / Passenger Ruby VM memcache Ruby VM Should we add more redis balancers? Write our own? DB
  • 100.
    incoming http 4. Scaling Images nginx haproxy We are serving lots of images. Nginx is getting slammed. Unicorn Unicorn / Passenger Ruby VM memcache Ruby VM Should we add more redis balancers? Write our own? DB HELLZ NO!
  • 101.
    incoming incoming http http 4. Scaling Images CDN cache images, JS nginx haproxy Unicorn // Passenger Unicorn Passenger RubyRuby VM N) VM (times memcache redis DB
  • 102.
    incoming incoming http http 4. Scaling Images CDN cache images, JS Don’t wait to use a CDN to SERVE images, especially nginx user-uploaded images. haproxy Unicorn // Passenger Unicorn Passenger RubyRuby VM N) VM (times memcache redis DB
  • 103.
    incoming incoming http http 4. Scaling Images CDN cache images, JS Don’t wait to use a CDN to SERVE images, especially nginx user-uploaded images. haproxy S3 is a popular choice to STORE images. Unicorn // Passenger Unicorn Passenger RubyRuby VM N) VM (times memcache redis DB
  • 104.
    incoming incoming http http 4. Scaling Images CDN cache images, JS Don’t wait to use a CDN to SERVE images, especially nginx user-uploaded images. haproxy S3 is a popular choice to STORE images. Unicorn // Passenger Unicorn Passenger RubyRuby VM N) VM (times memcache But it’s smart to keep a redis local backup copy... DB
  • 105.
    incoming incoming http http 5. Deployments and CDN Downtime cache images, JS Our site is popular! nginx And our users hate haproxy downtime. Unicorn // Passenger Unicorn Passenger RubyRuby VM N) VM (times memcache redis DB
  • 106.
    incoming incoming http http 5. Deployments and CDN Downtime cache images, JS Our site is popular! nginx And our users hate haproxy downtime. Unicorn // Passenger Unicorn Passenger RubyRuby VM N) VM (times memcache They really really do. redis DB
  • 107.
  • 108.
    5. Deployments and Downtime We want to be able to deploy the code while the site is running. So users are happy.
  • 109.
    5. Deployments and Downtime We want to be able to deploy the code while the site is running. So users are happy. There are several ways to do that.
  • 110.
    5. Deployments and Downtime We want to be able to deploy the code while the site is running. So users are happy. There are several ways to do that. This solution uses DNS round robin with two balancers, and two public IP addresses.
  • 111.
    Two Cluster Solution= Almost Zero Downtime incoming http incoming http balancer1 balancer2 nginx nginx haproxy haproxy Unicorn // Passenger Unicorn Passenger Unicorn // Passenger Unicorn Passenger RubyRuby VM N) VM (times memcache memcache RubyRuby VM N) VM (times redis DB
  • 112.
    Two Cluster Solution= Almost Zero Downtime incoming http incoming http balancer1 balancer2 nginx nginx haproxy haproxy Unicorn // Passenger Unicorn Passenger Unicorn // Passenger Unicorn Passenger RubyRuby VM N) VM (times memcache memcache RubyRuby VM N) VM (times redis DB
  • 113.
  • 114.
  • 115.
    Two Clusters arecool! Cluster 1 runs old code and is live
  • 116.
    Two Clusters arecool! Cluster 1 runs old code and is live Cluster 2 gets new code
  • 117.
    Two Clusters arecool! Cluster 1 runs old code and is live Cluster 2 gets new code Old and new run in parallel, but only one is serving live traffic
  • 118.
  • 119.
    Migrations with Zero Downtime? Almost possible on a live system, if:
  • 120.
    Migrations with Zero Downtime? Almost possible on a live system, if: We are not removing or renaming columns or tables in active use
  • 121.
    Migrations with Zero Downtime? Almost possible on a live system, if: We are not removing or renaming columns or tables in active use Migrations do not lock tables (for too long)
  • 122.
    Migrations with Zero Downtime? Almost possible on a live system, if: We are not removing or renaming columns or tables in active use Migrations do not lock tables (for too long) Column/Table renames/deletes can be done in two deployments instead of one
  • 124.
    So the appis now faster, and we can deploy without a downtime
  • 125.
    So the appis now faster, and we can deploy without a downtime What about email and other long-running tasks?
  • 126.
    So the appis now faster, and we can deploy without a downtime What about email and other long-running tasks? Don’t forget SPF records.
  • 127.
    incoming http Background Jobswith Resque balancer1 nginx But monitor it’s queues haproxy Must restart on reboot Resque Workers Unicorn memcache redis DB
  • 128.
    incoming http Background Jobswith Resque balancer1 nginx But monitor it’s queues haproxy Must restart on reboot Resque Workers Unicorn memcache redis resque-cleaner is awesome! DB
  • 129.
    Different queues fordifferent types of jobs
  • 130.
    Different queues fordifferent types of jobs Relatively easy to implement priorities for Jobs (order queues by priority)
  • 131.
    Different queues fordifferent types of jobs Relatively easy to implement priorities for Jobs (order queues by priority) Group Jobs by execution times to avoid delays
  • 132.
    Different queues fordifferent types of jobs Relatively easy to implement priorities for Jobs (order queues by priority) Group Jobs by execution times to avoid delays Resque::Worker x N QUEUE=SlowQueue1,SlowQueue2 redis Resque::Worker x M QUEUE=FastQueue1,FastQueue2
  • 133.
    DB Usage andcomplexity grows. We are doing big joins with many tables, and they are taking their sweet time.
  • 134.
  • 135.
    Solr to theResque Use Solr instead of doing complex joins
  • 136.
    Solr to theResque Use Solr instead of doing complex joins Solr reads are < 10ms
  • 137.
    Solr to theResque Use Solr instead of doing complex joins Solr reads are < 10ms Sunspot Gem by default writes to Solr from each ruby VM (i.e. unicorn)!
  • 138.
    Solr to theResque Use Solr instead of doing complex joins Solr reads are < 10ms Sunspot Gem by default writes to Solr from each ruby VM (i.e. unicorn)! Serialize writes with Resque!
  • 139.
    Solr to theResque Use Solr instead of doing complex joins Solr reads are < 10ms Sunspot Gem by default writes to Solr from each ruby VM (i.e. unicorn)! Serialize writes with Resque! One master for writes
  • 140.
    Solr to theResque Use Solr instead of doing complex joins Solr reads are < 10ms Sunspot Gem by default writes to Solr from each ruby VM (i.e. unicorn)! Serialize writes with Resque! One master for writes Read replicas on each app server
  • 141.
    Putting it together Unicorn Unicorn / Workers Resque Passenger Unicorn / Passenger Ruby VM Ruby VM 3. Update Solr solr_replica 4. Replicate solr_master 2. Read Model Info 1. Model Changed redis DB
  • 142.
  • 143.
    At this size... Automateeverything Chef or Puppet is awesome
  • 144.
    At this size... Automateeverything Chef or Puppet is awesome Monitor everything Tolerate reboots, restarts, partial failures
  • 145.
    At this size... Automateeverything Chef or Puppet is awesome Monitor everything Tolerate reboots, restarts, partial failures Use OS services layer to start/stop everything Ensures recovery after reboot
  • 146.
    At this size... Automateeverything Chef or Puppet is awesome Monitor everything Tolerate reboots, restarts, partial failures Use OS services layer to start/stop everything Ensures recovery after reboot Capistrano tends to gets “complex” Can also deploy with Chef
  • 147.
    Choose Vendors Wisely Youcan pick your own, but here is my list:
  • 148.
    Choose Vendors Wisely Youcan pick your own, but here is my list: Clouds - JOYENT, EngineYard fastest I/O cloud, but on Solaris derivative
  • 149.
    Choose Vendors Wisely Youcan pick your own, but here is my list: Clouds - JOYENT, EngineYard fastest I/O cloud, but on Solaris derivative Automation - Chef + OPSCODE
  • 150.
    Choose Vendors Wisely Youcan pick your own, but here is my list: Clouds - JOYENT, EngineYard fastest I/O cloud, but on Solaris derivative Automation - Chef + OPSCODE Caching/CDN - FASTLY.COM varnish based CDN, very fast, full power of VCL configuration
  • 151.
    Choose Vendors Wisely Youcan pick your own, but here is my list: Clouds - JOYENT, EngineYard fastest I/O cloud, but on Solaris derivative Automation - Chef + OPSCODE Caching/CDN - FASTLY.COM varnish based CDN, very fast, full power of VCL configuration Metrics and Performance - NewRelic Turnkey solution, getting better every day
  • 152.
  • 153.
    In Development Use foremanto start dependent services (Solr, Redis, Resque) from a Procfile
  • 154.
    In Development Use foremanto start dependent services (Solr, Redis, Resque) from a Procfile Do “Just enough” testing with Solr - it’s slow!
  • 155.
    In Development Use foremanto start dependent services (Solr, Redis, Resque) from a Procfile Do “Just enough” testing with Solr - it’s slow! Deploy to single-box demo servers often using the same Capistrano scripts used for production
  • 156.
  • 157.
    Mature App: Day700 cd my-awesome-app rake rspec:models rspec:controllers rspec:libs rake cucumber:webrat cucumber:selenium jasmine
  • 158.
    Mature App: Day700 cd my-awesome-app rake rspec:models rspec:controllers rspec:libs rake cucumber:webrat cucumber:selenium jasmine
  • 159.
  • 160.
    How Big Exactly? 200+models 200K+ lines of RUBY source code without gems 100K+ lines of ERB, HTML and HAML templates 100+ gem dependencies
  • 161.
    How Big Exactly? 200+models 200K+ lines of RUBY source code without gems 100K+ lines of ERB, HTML and HAML templates 100+ gem dependencies this is a real world application that’s in production today.
  • 162.
  • 163.
    Is that toobig? This cat’s name is Lenin
  • 164.
    Here is whyI think it is.
  • 165.
    Here is whyI think it is. 1.5+ hours for the full the test suite to complete 10 mins of db seeds, 30 minutes for unit tests only, etc
  • 166.
    Here is whyI think it is. 1.5+ hours for the full the test suite to complete 10 mins of db seeds, 30 minutes for unit tests only, etc merges often result in integration tests going RED
  • 167.
    Here is whyI think it is. 1.5+ hours for the full the test suite to complete 10 mins of db seeds, 30 minutes for unit tests only, etc merges often result in integration tests going RED 20 seconds boot-up time for Rails env (r console, etc)!
  • 168.
    Here is whyI think it is. 1.5+ hours for the full the test suite to complete 10 mins of db seeds, 30 minutes for unit tests only, etc merges often result in integration tests going RED 20 seconds boot-up time for Rails env (r console, etc)! 500Mb of RSS RAM for one single-threaded web process
  • 169.
    Here is whyI think it is. 1.5+ hours for the full the test suite to complete 10 mins of db seeds, 30 minutes for unit tests only, etc merges often result in integration tests going RED 20 seconds boot-up time for Rails env (r console, etc)! 500Mb of RSS RAM for one single-threaded web process it’s a difficult undertaking to upgrade dependencies and rails
  • 171.
  • 172.
    It was muchnicer when it was a bit smaller..
  • 173.
    It was muchnicer when it was a bit smaller..
  • 174.
  • 175.
    Let’s zoom in... IsPERFORMANCE of the app an issue?
  • 176.
    Let’s zoom in... IsPERFORMANCE of the app an issue? NO! 150ms per request avg
  • 177.
    Let’s zoom in... IsPERFORMANCE of the app an issue? NO! 150ms per request avg Is SCALABILITY of the app an issue?
  • 178.
    Let’s zoom in... IsPERFORMANCE of the app an issue? NO! 150ms per request avg Is SCALABILITY of the app an issue? NO! 8000+ concurrent users
  • 179.
    Let’s zoom in... IsPERFORMANCE of the app an issue? NO! 150ms per request avg Is SCALABILITY of the app an issue? NO! 8000+ concurrent users Is RELIABILITY of the app an issue?
  • 180.
    Let’s zoom in... IsPERFORMANCE of the app an issue? NO! 150ms per request avg Is SCALABILITY of the app an issue? NO! 8000+ concurrent users Is RELIABILITY of the app an issue? NO! barely any downtime in over one year
  • 181.
    Then WTF isthe Problem?
  • 182.
    Then WTF isthe Problem? Is PRODUCTIVITY of developing the app an issue?
  • 183.
    Then WTF isthe Problem? Is PRODUCTIVITY of developing the app an issue? YES! Lots of waiting all the time
  • 184.
    Then WTF isthe Problem? Is PRODUCTIVITY of developing the app an issue? YES! Lots of waiting all the time Is MERGING source code between parallel projects difficult?
  • 185.
    Then WTF isthe Problem? Is PRODUCTIVITY of developing the app an issue? YES! Lots of waiting all the time Is MERGING source code between parallel projects difficult? YES! 30+ people sharing large codebase
  • 186.
    Then WTF isthe Problem? Is PRODUCTIVITY of developing the app an issue? YES! Lots of waiting all the time Is MERGING source code between parallel projects difficult? YES! 30+ people sharing large codebase Is KEEPING TEST SUIT GREEN challenging?
  • 187.
    Then WTF isthe Problem? Is PRODUCTIVITY of developing the app an issue? YES! Lots of waiting all the time Is MERGING source code between parallel projects difficult? YES! 30+ people sharing large codebase Is KEEPING TEST SUIT GREEN challenging? YES! tests are brittle and long running
  • 188.
  • 189.
    But wait, there’smore! What about DEPLOYMENT of a large app? Takes a long time, and small tweaks require full deploys
  • 190.
    But wait, there’smore! What about DEPLOYMENT of a large app? Takes a long time, and small tweaks require full deploys What about HOSTING COSTS? Necessary to provide enough RAM for the app to be scalable.
  • 191.
  • 192.
    RAM? Latency matters... 1 Request = 200ms on average latency 5 reqs/second on a single-threaded ruby VM process
  • 193.
    RAM? Latency matters... 1 Request = 200ms on average latency 5 reqs/second on a single-threaded ruby VM process 30,000 RPM = 500 r/sec = 100 processes 50Gb of RAM @ 200ms latency If average latency is 600ms, need 150Gb of RAM !!!
  • 194.
  • 195.
    So how dowe solve this?
  • 196.
    Part 3: How tosplit things up
  • 197.
  • 198.
    Couple of mainthemes Break up into smaller applications
  • 199.
    Couple of mainthemes Break up into smaller applications Extract services and create APIs
  • 200.
    Couple of mainthemes Break up into smaller applications Extract services and create APIs Extract libraries (gems)
  • 201.
  • 202.
    Smaller Applications Containweb GUI, logic, and data
  • 203.
    Smaller Applications Containweb GUI, logic, and data May combine with other apps
  • 204.
    Smaller Applications Containweb GUI, logic, and data May combine with other apps May rely on common libraries
  • 205.
    Smaller Applications Containweb GUI, logic, and data May combine with other apps May rely on common libraries May rely on services
  • 206.
    Smaller Applications Containweb GUI, logic, and data May combine with other apps May rely on common libraries May rely on services Typically run in their own Ruby VM
  • 207.
    Consider a TypicalE-Commerce Store
  • 208.
    Consider a TypicalE-Commerce Store Users must be able to register, login, logout (profiles)
  • 209.
    Consider a TypicalE-Commerce Store Users must be able to register, login, logout (profiles) Users must be able to browse and search products, view, and add to cart
  • 210.
    Consider a TypicalE-Commerce Store Users must be able to register, login, logout (profiles) Users must be able to browse and search products, view, and add to cart Users must be able to checkout
  • 211.
    Consider a TypicalE-Commerce Store Users must be able to register, login, logout (profiles) Users must be able to browse and search products, view, and add to cart Users must be able to checkout Probably many other stories, such as admin, but we’ll ignore for now.
  • 212.
  • 213.
    One idea... Application1: Marketing, Product Catalog Browser, Search + Product Detail Page
  • 214.
    One idea... Application1: Marketing, Product Catalog Browser, Search + Product Detail Page Application 2: Checkout, Payment, Order History, Returns Fulfillment
  • 215.
    One idea... Application1: Marketing, Product Catalog Browser, Search + Product Detail Page Application 2: Checkout, Payment, Order History, Returns Fulfillment Very clear user flow transfer and data separation.
  • 216.
    Some things canbe shared
  • 217.
    Some things canbe shared Service: Single Sign-on, User profiles, Login/Registration [devise?, rest-full authentication?]
  • 218.
    Some things canbe shared Service: Single Sign-on, User profiles, Login/Registration [devise?, rest-full authentication?] Service: Product Catalog data, Inventory Data
  • 219.
    Some things canbe shared Service: Single Sign-on, User profiles, Login/Registration [devise?, rest-full authentication?] Service: Product Catalog data, Inventory Data Service: Comments, Votes, Ratings, Reviews
  • 220.
  • 221.
    Services Technologies Rack/Sinatra/Railsare popular, and are often an entirely sufficient choice
  • 222.
    Services Technologies Rack/Sinatra/Railsare popular, and are often an entirely sufficient choice Goliath is awesome if performance is important, and if the service is mostly I/O bound
  • 223.
    Services Technologies Rack/Sinatra/Railsare popular, and are often an entirely sufficient choice Goliath is awesome if performance is important, and if the service is mostly I/O bound node.js is also a popular choice
  • 224.
    Services Technologies Rack/Sinatra/Railsare popular, and are often an entirely sufficient choice Goliath is awesome if performance is important, and if the service is mostly I/O bound node.js is also a popular choice Implementation may change in the future, as long as the API stays consistent
  • 225.
    http balancer / router /checkout → checkout app /* → catalog app catalog app checkout app cart DB CSS/UI Library DB Product Service Reviews, User Auth / Login Inventory Comments, Profiles Votes, Ratings DB DB DB
  • 226.
    http balancer / router /checkout → checkout app /* → catalog app catalog app checkout app CSS/UI Library DB DB Product Service Reviews, User Auth / Login Inventory Comments, Profiles Votes, Ratings DB DB DB
  • 227.
    http balancer / router Extract look and feel /checkout → checkout app /* → catalog app (CSS/UI) into a gem to catalog app checkout app share across apps CSS/UI Library DB DB Product Service Reviews, User Auth / Login Inventory Comments, Profiles Votes, Ratings DB DB DB
  • 228.
    http balancer / router Extract look and feel /checkout → checkout app /* → catalog app (CSS/UI) into a gem to catalog app checkout app share across apps CSS/UI Library DB DB Create client API wrapper Product Service Reviews, User Auth / Login gems for consumers Inventory Comments, Profiles Votes, Ratings DB DB DB
  • 229.
    http balancer / router Extract look and feel /checkout → checkout app /* → catalog app (CSS/UI) into a gem to catalog app checkout app share across apps CSS/UI Library DB DB Create client API wrapper Product Service Reviews, User Auth / Login gems for consumers Inventory Comments, Profiles Votes, Ratings DB DB DB Create a single shared “base” client gem library
  • 231.
    Rails App with< 30 Models
  • 232.
    Rails App with< 30 Models Can run tests pretty quickly, hopefully under 5 minutes
  • 233.
    Rails App with< 30 Models Can run tests pretty quickly, hopefully under 5 minutes Is often large enough to describe typical “clusters of functionality”, i.e. - mini apps
  • 234.
    Rails App with< 30 Models Can run tests pretty quickly, hopefully under 5 minutes Is often large enough to describe typical “clusters of functionality”, i.e. - mini apps Ruby VM might actually stay under 100Mb of RSS RAM
  • 235.
    Rails App with< 30 Models Can run tests pretty quickly, hopefully under 5 minutes Is often large enough to describe typical “clusters of functionality”, i.e. - mini apps Ruby VM might actually stay under 100Mb of RSS RAM Is more comprehensible and can be effectively maintained by a small dev team.
  • 236.
    3rd Party Integrations catalog app checkout app analytics financial warehouse and ERP system management reporting system CSS/UI Library DB DB Product Service Reviews, User Auth / Login Inventory Comments, Profiles Votes, Ratings DB DB DB
  • 237.
  • 238.
    Ecosystem of Applications Is inevitable in large companies
  • 239.
    Ecosystem of Applications Is inevitable in large companies Scale better from team perspective
  • 240.
    Ecosystem of Applications Is inevitable in large companies Scale better from team perspective Offer decoupling and implementation hiding
  • 241.
    Ecosystem of Applications Is inevitable in large companies Scale better from team perspective Offer decoupling and implementation hiding Can be individually optimized and scaled
  • 242.
    But then... Must everyapp know about every other app?
  • 243.
    API Proxy /Router catalog app checkout app analytics financial and ERP CSS/UI Library DB DB reporting system http://api.mycompany.com/ Product Service Reviews, User Auth / Login Inventory Comments, Profiles Votes, Ratings DB DB DB
  • 244.
  • 245.
    Example: Order Placed Warehouse Management System needs to be updated
  • 246.
    Example: Order Placed Warehouse Management System needs to be updated Analytics Engine needs to be notified
  • 247.
    Example: Order Placed Warehouse Management System needs to be updated Analytics Engine needs to be notified Financials needs to be updated
  • 248.
    Example: Order Placed Warehouse Management System needs to be updated Analytics Engine needs to be notified Financials needs to be updated Question: which component is responsible for updating each application?
  • 249.
  • 250.
    1995 Was Great GoF Design Patterns: Observer “...One-to-Many dependency between objects so that when one object changes state, all its dependents are notified and updated automatically...”
  • 251.
  • 252.
    Distributed Version Publish/Subscribe andPoint-to-Point Asynchronous Middleware
  • 253.
    Distributed Version Publish/Subscribe andPoint-to-Point Asynchronous Middleware
  • 254.
  • 255.
  • 256.
    Some Options forPub/Sub RabbitMQ ruby-amqp gem to interface
  • 257.
    Some Options forPub/Sub RabbitMQ ruby-amqp gem to interface EventMachine::Channel
  • 258.
  • 259.
    Other Distributed Options DRb- distributed Ruby (also Rinda, Starfish, beanstalkd, etc)
  • 260.
    Other Distributed Options DRb- distributed Ruby (also Rinda, Starfish, beanstalkd, etc) DCell - actor based based on 0MQ http://www.unlimitednovelty.com/2012/04/ introducing-dcell-actor-based.html
  • 261.
    Other Distributed Options DRb- distributed Ruby (also Rinda, Starfish, beanstalkd, etc) DCell - actor based based on 0MQ http://www.unlimitednovelty.com/2012/04/ introducing-dcell-actor-based.html All of them are a bit too low level for sharing and consuming business events
  • 262.
    I would lovea library for publishing business events built on top
  • 263.
    Dependency Stack Business Events Collection Commerce::OrderPlaced SharedEvent::Base AMQP-ruby Library AMQP::connect RabbitMQ Middleware
  • 264.
  • 265.
    Future Library Hidescomplexities of queues and exchanges
  • 266.
    Future Library Hidescomplexities of queues and exchanges Consumers declare interest in events they care about, define persistence and retry policy
  • 267.
    Future Library Hidescomplexities of queues and exchanges Consumers declare interest in events they care about, define persistence and retry policy Publishers fire! events and forget about it
  • 268.
  • 269.
    Future Library, ctd. Once registered, consumers get messages even after being offline
  • 270.
    Future Library, ctd. Once registered, consumers get messages even after being offline When publisher can submit an event to the queue, it’s job is done.
  • 271.
    Future Library, ctd. Once registered, consumers get messages even after being offline When publisher can submit an event to the queue, it’s job is done. Library of business events becomes a compliment to the set of business APIs
  • 272.
    Untangle Communications catalog app checkout app 4. consume InventoryChanged 1. publish OrderPlaced Messaging Bus 3. publish InventoryChanged 4. consume InventoryChanged 2. consume OrderPlaced analytics and financial ERP inventory management reporting system
  • 273.
    I don’t thinkthis library exists yet, but I would like to write one soon =)
  • 274.
  • 275.
  • 276.

Editor's Notes