About Me • Joined Achievers in June 2009 • Prior to Achievers, I was the CTO of ZipLocal • I have spent the last 7 years worrying about how to build scalable applications • Academic Background: – Ph.D. from the University of Toronto – Naval Research Labs Post Doctoral Fellow of Secure Systems at Cambridge University
Goals • Tell you about our journey to a scalable architecture • Give you insight into common scaling problems • Give you a way to think about the issues of scaling that you can apply today
ACHIEVERS
What Does Achievers Do • Achievers started in rewards and recognition space in 2007 • We provide reward and recognition software – Points based system to reward performance – Catalog to redeem the points • Our mission is to “Change the way the world works”
The Achievers Home Page
Our Traffic Growth • From 2009 to today – Visits up 903% – Unique Visitors up 832% • Last month we did 2.5 million page views • During business hours we have about 250 people on the site at any given moment
Funding • 3.3 million Series A from JLA Ventures • 6.9 million Series B form Grandbanks • 24 million Series C from Sequoia Capital
PRELIMINARIES
Definitions • Performance – Performance measures the speed which a single request can be executed • Scalability – Scalability is the ability to handle a growing number of requests in a capable manner Scalability != Performance
Which Language Scales the Best? • Languages Don’t Scale Architectures Do • If you hear “language X doesn’t scale” then turn around and walk away. – That person doesn’t understand scalability
There is a bit more to Scalability • Scalability is also about how you scale the development team • If you are successful and need to add people how easy is it for them to contribute • How fast can you write code – Your competitors are right behind you – He who can develop good code fast wins!
OUR SAAS PLATFORM
The Achievers Platform • Multi tenant architecture – One code base – One database • Module based platform – Hundreds of configuration options for each module – Lots of legacy configurations
Backend Processing • We handle many millions of dollars of orders every month • We send out hundreds of thousands of emails a month
THE ARCHITECTURE CIRCA 2009
The Stack • Pretty Standard J2EE stack • Hibernate • Spring • JMS • MySql • All running on Amazon EC2
Aside – Amazon EC2 • EC2 is great • Spin up machines for testing then shut them down • A must for any startup – Don’t manage your own servers when you are small. It isn’t worth it
Architecture Presentation Business Logic JSP Pages Hibernate Servlet Objects HTML MySql
LOOKS GREAT SO WHAT'S THE PROBLEM?
Architecture – Data Center View Server 1
But J2EE Scales • Sure it does BUT • The devil is in the details
MEET THE DEVIL DETAILS
Scaling Was an Afterthought • We had to scale vertically since the underlying design did not consider what would happen if we had 2 web servers • We had the largest EC2 instance money could buy • You cannot retrofit scalability – Your architecture and design either have it or they doesn’t
Design Decisions • Your basic approach and philosophy to a few things will determine how hard it will be to scale your infrastructure
COMPLEXITY
Who doesn’t like magic • Extensive use of Aspect Oriented Programming (AOP) – Allows you to define ‘cut-points’ to insert code before or after a function call • As an academic AOP is brilliant • As a CTO not so much
There is a Pattern for That • Use of design patterns for the sake of using a design pattern • Don’t get me wrong every developer must know and understand design patterns • But it isn’t a competition to see who can use the most design patterns in any given day – The right tool for the right job – Don’t force it!
Overly complex object model • The Access Control model had so many objects and relationships that other than the original author no other person ever understood it
Why is Complexity Bad? • If the system dies at two o'clock in the morning and I'm staring at your code, can I easily figure out what's going on? • People Forget about Magic – Code needs to be in front of you not buried in an XML file or magically invoked
What Does This Have To Do With Scalability? • Complex systems are really, really hard to scale – In a clustered environment you need to first figure out if the problem is because of clustering or because of your code – This isn’t trivial even for simple systems • To many things to worry about • When you hit a wall (and you will) it becomes very hard to figure out what to do
Don’t Forget About the People • As you grow your team you need to ramp everybody up • A complex system takes longer to learn than a simple one • Complexity ALWAYS increases over time. If you start with something that is complex it will quickly get beyond the scope of a meer mortal
Desire for Complex Solutions Complexity Experience
THE DATABASE
The Database • ORMs make you stupid … kidding … sort of • You need to understand your data – Do not let an ORM define your database you will be sorry • Generating reports out of an ORM is painful • Developers must understand how a DB works – You will forget about what a DB is good for if you don’t consider it explicitly – New developers usually do not understand the importance of the DB in scaling
ORM’s • Can they scale? – Sure • Is it hard? – Yup • A quote from stackoverflow on scaling ORM’s – “… a good ORM will provide plenty of hooks that allow you to optimize quite a bit. You just need to spend some time learning it.”
Is that all? • Initially ORMs might allow you to write code quickly – I would challenge this but that is another topic • Your system runs into a brick wall. Customers are complaining. Your CEO is chewing out the CTO. The VP Engineering is curled up in a ball in the corner. They turn to you as the architect and you answer: “We just need to learn how to use all the hooks”
Just Learn the ORM • I have yet to meet somebody that could convince me that they knew how to scale an ORM – It HAS been done, so yes it is possible but it takes patience and a CEO that likes to wait – I’ve had people tell me “we just have to rewrite the ORM with a new ORM that could scale”
Know your database • I believe that your DB should own all your data – Let it do what it is good at • If that is true then simple replication strategies and a little bit of coding can get you reading data from a replica • You can then start denormalizing the DB to get better performance
Scaling Your Data • Scaling a DB is a well understood problem with well understood solutions • Don’t confuse this with easy!
SESSIONS
Server Side Sessions • Very developer friendly • You have 2 choices to scale: – Session replication – Sticky Sessions
Session Replication • Yuck! • Lots of network chatter • Slow propagation of the session means the user has a bad experience • You could be moving lots of data around – Our sessions were huge
Sticky Sessions • Works but you now need to worry about a machine being overloaded while the others are idle • A machine failure logs out everybody from that machine • You have be very careful when configuring – If all IP addresses go to one server then you essentially have one company per server
CACHING
When to Cache • Our platform made extensive use of caches • That has to be good right? • Not in our case – Items were cached by Java – Shared state posed a problem when adding another server – Yes there are Java based solutions but all you are doing is adding complexity
ADMITTING YOU HAVE A PROBLEM
It Won’t Love You Back • Never fall in love with your technology. It will break your heart. • You must always challenge your assumptions and be prepared to throw away something – Hard to throw away your ‘baby’ – Remember it is just a bunch of 1’s and 0’s
THE JOURNEY
Basic Premise • Every web application follows the same basic flow: 1. User makes a request 2. Validate the request 3. Grab some data 4. Process it a bit 5. Build a Page for the user
Guiding Architectural Principles • Initial deployment would be on 3 machines – Forcing us to understand how we are going to scale upfront • Servers must be stateless • The database owns all the data • Caching is an explicit choice to solve a real problem • Always use the right tool for the job • Minimize complexity
Other Goals • Zero downtime deployments • We wanted to be able upgrade customers one at a time • Maximize developer productivity
The Target Load Balancer Web Server Web Server Web Server Background MemcacheD NAS Processing Cluster Device MySql MySql Master Slave
The Language Choice • Why PHP – Faster code/debug cycles • This has increased our productivity – Zero downtime deployments • We have patched running servers multiple times in a day and nobody has noticed anything – Shared nothing philosophy • Forces a good frame of mind for server development
Doesn’t PHP Suck? • Languages don’t suck only the developers using them do • PHP isn’t perfect – Google ‘why php sucks’ for an extensive list • But PHP doesn’t scale – Remember, languages don’t scale … – If you don’t believe me ask Wikipedia, Facebook, Digg etc.
Sure but PHP is Slow • If your web application is not database bound then you are probably doing it wrong • Yes Java might perform at some things but that will not be a limiting factor
Surely There are Down Sides? • Because PHP does not have strong typing you need really good error detection and reporting – We will do another talk on our struggles and solutions • Coding standards are a must since PHP lets you pretty much do whatever you want – Naming conventions are super important – Don’t start a religious war over bracket placement. There really is only one right way 
The Framework • We use Codeigniter (CI) • Simple MVC framework – The code is very easy to follow • Works out of the box, but is very extensible – Strictly follows the Open/Closed principle – We have extended CI a lot to meet our needs • Doesn’t require learning anything but PHP
Using the Right Tool • Have Apache (or a faster web server) server all static content • A Network Attached Storage (NAS) device was used for a shared file system. – This makes life a TON easier • Have your web servers serve requests • Move background work to another server
The Problem • We had about 120 customers and we couldn’t just go away to do what we needed to do – Not a bad problem to have
THE MIGRATION
Step 1 • We wrote a controller that would forward requests to the new code base • GET requests could be easily forwarded • POST request were a bit more complicated • This step allowed us to start developing the new platform AND keep releasing features
Step 2 • Start migrating customers to the new platform • We put a proxy server in front of our old and new platforms. • We then proxied specific requests to the version they were running on
The Setup HAProxy Express Achievers Platform Platform MySql
HAProxy • If you don’t have it installed go back to the office download it and install it! • It isn’t just a load balancer – We can move specific traffic to specific machines for whatever reason – We have a machine with profiling capabilities that we have used to profile production problems – Fine grain control over your request
We did it! • It took us almost 6 months to migrate every customer but we did get there • Our productivity has improved • And we have an architecture that we know can handle whatever we can throw at it – At least in the short term
CONCLUSIONS
Scaling is Hard • Don’t make it harder on yourself – Reduce complexity – Understand your database – Have an upfront strategy to deal with state • We picked stateless but you don’t have to
Never let anybody tell you a language or framework does or doesn’t scale. It is all in the details
Scaling a High Traffic Web Application: Our Journey from Java to PHP

Scaling a High Traffic Web Application: Our Journey from Java to PHP

  • 2.
    About Me • JoinedAchievers in June 2009 • Prior to Achievers, I was the CTO of ZipLocal • I have spent the last 7 years worrying about how to build scalable applications • Academic Background: – Ph.D. from the University of Toronto – Naval Research Labs Post Doctoral Fellow of Secure Systems at Cambridge University
  • 3.
    Goals • Tell youabout our journey to a scalable architecture • Give you insight into common scaling problems • Give you a way to think about the issues of scaling that you can apply today
  • 4.
  • 5.
    What Does AchieversDo • Achievers started in rewards and recognition space in 2007 • We provide reward and recognition software – Points based system to reward performance – Catalog to redeem the points • Our mission is to “Change the way the world works”
  • 6.
  • 7.
    Our Traffic Growth •From 2009 to today – Visits up 903% – Unique Visitors up 832% • Last month we did 2.5 million page views • During business hours we have about 250 people on the site at any given moment
  • 8.
    Funding • 3.3 millionSeries A from JLA Ventures • 6.9 million Series B form Grandbanks • 24 million Series C from Sequoia Capital
  • 9.
  • 10.
    Definitions • Performance – Performance measures the speed which a single request can be executed • Scalability – Scalability is the ability to handle a growing number of requests in a capable manner Scalability != Performance
  • 11.
    Which Language Scalesthe Best? • Languages Don’t Scale Architectures Do • If you hear “language X doesn’t scale” then turn around and walk away. – That person doesn’t understand scalability
  • 12.
    There is abit more to Scalability • Scalability is also about how you scale the development team • If you are successful and need to add people how easy is it for them to contribute • How fast can you write code – Your competitors are right behind you – He who can develop good code fast wins!
  • 13.
  • 14.
    The Achievers Platform •Multi tenant architecture – One code base – One database • Module based platform – Hundreds of configuration options for each module – Lots of legacy configurations
  • 15.
    Backend Processing • Wehandle many millions of dollars of orders every month • We send out hundreds of thousands of emails a month
  • 16.
  • 17.
    The Stack • Pretty Standard J2EE stack • Hibernate • Spring • JMS • MySql • All running on Amazon EC2
  • 18.
    Aside – AmazonEC2 • EC2 is great • Spin up machines for testing then shut them down • A must for any startup – Don’t manage your own servers when you are small. It isn’t worth it
  • 19.
    Architecture Presentation Business Logic JSP Pages Hibernate Servlet Objects HTML MySql
  • 20.
    LOOKS GREAT SOWHAT'S THE PROBLEM?
  • 21.
    Architecture – DataCenter View Server 1
  • 22.
    But J2EE Scales •Sure it does BUT • The devil is in the details
  • 23.
  • 24.
    Scaling Was anAfterthought • We had to scale vertically since the underlying design did not consider what would happen if we had 2 web servers • We had the largest EC2 instance money could buy • You cannot retrofit scalability – Your architecture and design either have it or they doesn’t
  • 25.
    Design Decisions • Yourbasic approach and philosophy to a few things will determine how hard it will be to scale your infrastructure
  • 26.
  • 27.
    Who doesn’t likemagic • Extensive use of Aspect Oriented Programming (AOP) – Allows you to define ‘cut-points’ to insert code before or after a function call • As an academic AOP is brilliant • As a CTO not so much
  • 28.
    There is aPattern for That • Use of design patterns for the sake of using a design pattern • Don’t get me wrong every developer must know and understand design patterns • But it isn’t a competition to see who can use the most design patterns in any given day – The right tool for the right job – Don’t force it!
  • 29.
    Overly complex objectmodel • The Access Control model had so many objects and relationships that other than the original author no other person ever understood it
  • 30.
    Why is ComplexityBad? • If the system dies at two o'clock in the morning and I'm staring at your code, can I easily figure out what's going on? • People Forget about Magic – Code needs to be in front of you not buried in an XML file or magically invoked
  • 31.
    What Does ThisHave To Do With Scalability? • Complex systems are really, really hard to scale – In a clustered environment you need to first figure out if the problem is because of clustering or because of your code – This isn’t trivial even for simple systems • To many things to worry about • When you hit a wall (and you will) it becomes very hard to figure out what to do
  • 32.
    Don’t Forget Aboutthe People • As you grow your team you need to ramp everybody up • A complex system takes longer to learn than a simple one • Complexity ALWAYS increases over time. If you start with something that is complex it will quickly get beyond the scope of a meer mortal
  • 33.
    Desire for ComplexSolutions Complexity Experience
  • 34.
  • 35.
    The Database • ORMsmake you stupid … kidding … sort of • You need to understand your data – Do not let an ORM define your database you will be sorry • Generating reports out of an ORM is painful • Developers must understand how a DB works – You will forget about what a DB is good for if you don’t consider it explicitly – New developers usually do not understand the importance of the DB in scaling
  • 36.
    ORM’s • Can theyscale? – Sure • Is it hard? – Yup • A quote from stackoverflow on scaling ORM’s – “… a good ORM will provide plenty of hooks that allow you to optimize quite a bit. You just need to spend some time learning it.”
  • 37.
    Is that all? •Initially ORMs might allow you to write code quickly – I would challenge this but that is another topic • Your system runs into a brick wall. Customers are complaining. Your CEO is chewing out the CTO. The VP Engineering is curled up in a ball in the corner. They turn to you as the architect and you answer: “We just need to learn how to use all the hooks”
  • 38.
    Just Learn theORM • I have yet to meet somebody that could convince me that they knew how to scale an ORM – It HAS been done, so yes it is possible but it takes patience and a CEO that likes to wait – I’ve had people tell me “we just have to rewrite the ORM with a new ORM that could scale”
  • 39.
    Know your database •I believe that your DB should own all your data – Let it do what it is good at • If that is true then simple replication strategies and a little bit of coding can get you reading data from a replica • You can then start denormalizing the DB to get better performance
  • 40.
    Scaling Your Data •Scaling a DB is a well understood problem with well understood solutions • Don’t confuse this with easy!
  • 41.
  • 42.
    Server Side Sessions •Very developer friendly • You have 2 choices to scale: – Session replication – Sticky Sessions
  • 43.
    Session Replication • Yuck! •Lots of network chatter • Slow propagation of the session means the user has a bad experience • You could be moving lots of data around – Our sessions were huge
  • 44.
    Sticky Sessions • Worksbut you now need to worry about a machine being overloaded while the others are idle • A machine failure logs out everybody from that machine • You have be very careful when configuring – If all IP addresses go to one server then you essentially have one company per server
  • 45.
  • 46.
    When to Cache •Our platform made extensive use of caches • That has to be good right? • Not in our case – Items were cached by Java – Shared state posed a problem when adding another server – Yes there are Java based solutions but all you are doing is adding complexity
  • 47.
  • 48.
    It Won’t LoveYou Back • Never fall in love with your technology. It will break your heart. • You must always challenge your assumptions and be prepared to throw away something – Hard to throw away your ‘baby’ – Remember it is just a bunch of 1’s and 0’s
  • 49.
  • 50.
    Basic Premise • Everyweb application follows the same basic flow: 1. User makes a request 2. Validate the request 3. Grab some data 4. Process it a bit 5. Build a Page for the user
  • 51.
    Guiding Architectural Principles •Initial deployment would be on 3 machines – Forcing us to understand how we are going to scale upfront • Servers must be stateless • The database owns all the data • Caching is an explicit choice to solve a real problem • Always use the right tool for the job • Minimize complexity
  • 52.
    Other Goals • Zerodowntime deployments • We wanted to be able upgrade customers one at a time • Maximize developer productivity
  • 53.
    The Target Load Balancer Web Server Web Server Web Server Background MemcacheD NAS Processing Cluster Device MySql MySql Master Slave
  • 54.
    The Language Choice •Why PHP – Faster code/debug cycles • This has increased our productivity – Zero downtime deployments • We have patched running servers multiple times in a day and nobody has noticed anything – Shared nothing philosophy • Forces a good frame of mind for server development
  • 55.
    Doesn’t PHP Suck? •Languages don’t suck only the developers using them do • PHP isn’t perfect – Google ‘why php sucks’ for an extensive list • But PHP doesn’t scale – Remember, languages don’t scale … – If you don’t believe me ask Wikipedia, Facebook, Digg etc.
  • 56.
    Sure but PHPis Slow • If your web application is not database bound then you are probably doing it wrong • Yes Java might perform at some things but that will not be a limiting factor
  • 57.
    Surely There areDown Sides? • Because PHP does not have strong typing you need really good error detection and reporting – We will do another talk on our struggles and solutions • Coding standards are a must since PHP lets you pretty much do whatever you want – Naming conventions are super important – Don’t start a religious war over bracket placement. There really is only one right way 
  • 58.
    The Framework • Weuse Codeigniter (CI) • Simple MVC framework – The code is very easy to follow • Works out of the box, but is very extensible – Strictly follows the Open/Closed principle – We have extended CI a lot to meet our needs • Doesn’t require learning anything but PHP
  • 59.
    Using the RightTool • Have Apache (or a faster web server) server all static content • A Network Attached Storage (NAS) device was used for a shared file system. – This makes life a TON easier • Have your web servers serve requests • Move background work to another server
  • 60.
    The Problem • Wehad about 120 customers and we couldn’t just go away to do what we needed to do – Not a bad problem to have
  • 61.
  • 62.
    Step 1 • Wewrote a controller that would forward requests to the new code base • GET requests could be easily forwarded • POST request were a bit more complicated • This step allowed us to start developing the new platform AND keep releasing features
  • 63.
    Step 2 • Startmigrating customers to the new platform • We put a proxy server in front of our old and new platforms. • We then proxied specific requests to the version they were running on
  • 64.
    The Setup HAProxy Express Achievers Platform Platform MySql
  • 65.
    HAProxy • If youdon’t have it installed go back to the office download it and install it! • It isn’t just a load balancer – We can move specific traffic to specific machines for whatever reason – We have a machine with profiling capabilities that we have used to profile production problems – Fine grain control over your request
  • 66.
    We did it! •It took us almost 6 months to migrate every customer but we did get there • Our productivity has improved • And we have an architecture that we know can handle whatever we can throw at it – At least in the short term
  • 67.
  • 68.
    Scaling is Hard •Don’t make it harder on yourself – Reduce complexity – Understand your database – Have an upfront strategy to deal with state • We picked stateless but you don’t have to
  • 69.
    Never let anybodytell you a language or framework does or doesn’t scale. It is all in the details