1 A view Inside PHP
2 Hello ● Julien PAULI ● Programming in PHP since early 2000s ● PHP Internals hacker and trainer ● PHP 5.5/5.6 Release Manager ● Working at SensioLabs in Paris - Blackfire ● Writing PHP tech articles and books ● http://phpinternalsbook.com ● @julienpauli - http://jpauli.tech - jpauli@php.net ● Like working on OSS such as PHP :-)
3 A look into the engine
4 PHP
5 Noyau : Zend Engine ● 125K LOC ● ZendE VM ● ZendE Core ● ZendE Tools ● Thread-Safety (TSRM)
6 Coeur : main et ext/standard ● 55K LOC ● str_ ● array_ ● files and streams ● ...
7 Extensions : ext/xxx ● 530K LOC for ext/ ● "Extensions" and "Zend extensions" ● Static or dynamic compilation and linking ● Add features ● Consume resources ● php -m ; php --re ● Mandatory ext : ● core / date / pcre / reflection / SPL / standard / hash ● Other extensions : ● http://pecl.php.net
8 PHP ● A program in itself ● Written in C ● Goal : Define a programming Web language ● High level, interpreted ● Interpreted language ● Less efficient than native-instr compiled language ● but simpler to handle
9 PHP from inside ● A software virtual machine ● Compiler/Executor ● intermediate OPCode ● Mono Thread, Mono process ● Automatic dynamic memory management ● Memory Manager ● Garbage collector
10 Request treatment steps ● Startup (memory allocations) ● Compilation ● Lexical and syntaxic analysis ● Compilation (OP Code generation) ● Execution ● OPCode interpretation ● Several VM flavors ● Include/require/eval = go back to compilation ● Shutdown (free resources) ● "Share nothing architecture" Startup Shutdown zend_compile_file() zend_execute()
11 PHP startup
12 Request startup
13 Script execution ● Compilation ● Execution ● Destruction
14 Lexical analysis (lexing) ● Characters recognition ● Transform chars to tokens ● Lexer generator : Re2c ● http://re2c.org/ ● http://www.php.net/tokens.php ● highlight_file() ● highlight_string() ● compile_file() ● compile_string()
15 zend_language_scanner.l ● int lex_scan(zval *zendlval) ● Re2C also used in: ● PDO : PS emulation ● dates : strtotime() serialize()/unserialize() /*!re2c HNUM "0x"[0-9a-fA-F]+ BNUM "0b"[01]+ LABEL [a-zA-Z_x7f-xff][a-zA-Z0-9_x7f-xff]* TABS_AND_SPACES [ t]* NEWLINE ("r"|"n"|"rn") <ST_IN_SCRIPTING>"("{TABS_AND_SPACES}("int"|"integer"){TABS_AND_SPACES}")" { return T_INT_CAST; } $(RE2C) $(RE2C_FLAGS) --case-inverted -cbdFt $(srcdir)/zend_language_scanner_defs.h -o $(srcdir)/zend_language_scanner.l
16 Accessing lexical analyzer ● Lexer from PHP userland : ● https://github.com/sebastianbergmann/phptok ● https://github.com/nikic/PHP-Parser ● ext/tokenizer function display_data(array $data) { $buf = ''; foreach ($data as $k=>$v) { $buf .= sprintf("%s: %s n", $k, $v); } return $buf; } Line Token Text --------------------------------------------------------- 1 OPEN_TAG <?php 2 WHITESPACE 3 FUNCTION function 3 WHITESPACE 3 STRING display_data 3 OPEN_BRACKET ( 3 ARRAY array 3 WHITESPACE 3 VARIABLE $data 3 CLOSE_BRACKET ) 3 WHITESPACE 4 OPEN_CURLY { 4 WHITESPACE … … ...
17 Sementical analysis (parsing) ● "Understands" a set of tokens ● Defines the language syntax ● Parser generator : GNU/Bison (LALR) ● Foreach token or token set ● → Execute a function to generate an AST statement ● → Goto next token ● → Can generate "Parse error" and halt ● Very tied to lexical analyzer
18 zend_language_parser.y ● ext/tokenizer statement: '{' inner_statement_list '}' { $$ = $2; } | if_stmt { $$ = $1; } | alt_if_stmt { $$ = $1; } | T_WHILE '(' expr ')' while_statement { $$ = zend_ast_create(ZEND_AST_WHILE, $3, $5); } | T_DO statement T_WHILE '(' expr ')' ';' { $$ = zend_ast_create(ZEND_AST_DO_WHILE, $2, $5); } | T_FOR '(' for_exprs ';' for_exprs ';' for_exprs ')' for_statement { $$ = zend_ast_create(ZEND_AST_FOR, $3, $5, $7, $9); } | T_SWITCH '(' expr ')' switch_case_list { $$ = zend_ast_create(ZEND_AST_SWITCH, $3, $5); } | T_BREAK optional_expr ';' { $$ = zend_ast_create(ZEND_AST_BREAK, $2); } | T_CONTINUE optional_expr ';' { $$ = zend_ast_create(ZEND_AST_CONTINUE, $2); } | T_RETURN optional_expr ';' { $$ = zend_ast_create(ZEND_AST_RETURN, $2); } $(YACC) -p zend -v -d $(srcdir)/zend_language_parser.y -o zend_language_parser.c
19 Wuups
20 Compilation ● Invoked on final AST ● Userland AST: https://github.com/nikic/php-ast ● Creates an OPCodes array ● OPCode = low level VM instruction ● Somehow similar to low level assembly ● Example : ADD (a,b) → c ; CONCAT(c,d) → e ; etc... ● Compilation step is very heavy ● Lots of checks ● address resolutions ● many stacks and memory pools ● Some early optimizations/computations are performed
21 Compilation easy example <?php print 'foo';
22 Compilation easy example <?php print 'foo'; <ST_IN_SCRIPTING>"print" { return T_PRINT; } T_PRINT expr { $$ = zend_ast_create(ZEND_AST_PRINT, $2); } lexing parsing
23 Compilation easy example case ZEND_AST_PRINT: zend_compile_print(result, ast); return; compiling T_PRINT expr { $$ = zend_ast_create(ZEND_AST_PRINT, $2); } void zend_compile_print(znode *result, zend_ast *ast) /* {{{ */ { zend_op *opline; zend_ast *expr_ast = ast->child[0]; znode expr_node; zend_compile_expr(&expr_node, expr_ast); opline = zend_emit_op(NULL, ZEND_ECHO, &expr_node, NULL); opline->extended_value = 1; result->op_type = IS_CONST; ZVAL_LONG(&result->u.constant, 1); }
24 OPCode ? ● phpdbg -p file.php try { get_logger()->log($message, $priority, $extras); } catch(Exception $e) { } L5 #0 INIT_FCALL_BY_NAME "get_logger" L5 #1 DO_FCALL @0 L5 #2 INIT_METHOD_CALL @0 "log" L5 #3 SEND_VAR_EX $message 1 L5 #4 SEND_VAR_EX $priority 2 L5 #5 SEND_VAR_EX $extras 3 L5 #6 DO_FCALL L10 #7 RETURN 1 L6 #8 CATCH "Exception" $e 1 L10 #9 RETURN 1
25 Execution ● Execute OPCodes ● Most complex part of Zend Engine ● VM executor ● zend_vm_execute.h ● Each OPCode ● is run through a handler() function ● "zend_vm_handler" ● runs the instructions in an infinite dipatch loop ● Branching possibles (loops, catch blocks, gotos, etc...) Startup Shutdown zend_compile_file() zend_execute()
26 ZEND_ECHO ZEND_VM_HANDLER(40, ZEND_ECHO, CONST|TMPVAR|CV, ANY) { USE_OPLINE zend_free_op free_op1; zval *z; SAVE_OPLINE(); z = GET_OP1_ZVAL_PTR_UNDEF(BP_VAR_R); if (Z_TYPE_P(z) == IS_STRING) { zend_string *str = Z_STR_P(z); if (ZSTR_LEN(str) != 0) { zend_write(ZSTR_VAL(str), ZSTR_LEN(str)); } } else { zend_string *str = _zval_get_string_func(z); if (ZSTR_LEN(str) != 0) { zend_write(ZSTR_VAL(str), ZSTR_LEN(str)); } else if (OP1_TYPE == IS_CV && UNEXPECTED(Z_TYPE_P(z) == IS_UNDEF)) { GET_OP1_UNDEF_CV(z, BP_VAR_R); } zend_string_release(str); }
27 BREAK
28 OPCode Cache ● First time ● Compile ● Cache to SHM ● Execute ● Then, if file did not change ● Load from SHM ● Execute ● Compilation is too heavy ● Prevent it with OPCode cache
29 Example <?php function foo() { $data = file('/etc/fstab'); sort($data); return $data; } for($i=0; $i<=$argv[1]; $i++) { $a = foo(); $a[] = range(0, $i); $result[] = $a; } var_dump($result);
30 Compilation / Execution function foo() { $data = file('/etc/fstab'); sort($data); return $data; } for($i=0; $i<=$argv[1]; $i++) { $a = foo(); $a[] = range(0, $i); $result[] = $a; } var_dump($result); main()==>run_init::tmp/php.php//1 241 main()==>compile::tmp/php.php//1 89 main()==>run_init::tmp/php.php//1 1731 main()==>compile::tmp/php.php//1 89 argv = 1 argv = 10
41 memory consumption
42 Zend Memory Manager ● ZendMM : Request bound dynamic memory allocator ● Owns its heap reclaimed using malloc() / mmap() ● Used by PHP, Zend Engine and extensions while treating a request ● Tunable ● zend_alloc.c/h
44 Memory consumption ● memory_get_usage(): size used by your runtime code ● memory_get_usage(true): size allocated through the OS ● ZendMM caches blocks ● use gc_mem_caches() to reclaim them if needed ● Use your OS to be accurate php> echo memory_get_usage(); 625272 php> echo memory_get_usage(1); 786432 cat /proc/13399/status Name:php State: S (sleeping) VmPeak: 154440 kB VmSize: 133700 kB VmRSS: 10304 kB VmData: 4316 kB VmStk: 136 kB VmExe: 9876 kB VmLib: 13408 kB
45 Thank you for listening

Php engine

  • 1.
  • 2.
    2 Hello ● Julien PAULI ●Programming in PHP since early 2000s ● PHP Internals hacker and trainer ● PHP 5.5/5.6 Release Manager ● Working at SensioLabs in Paris - Blackfire ● Writing PHP tech articles and books ● http://phpinternalsbook.com ● @julienpauli - http://jpauli.tech - jpauli@php.net ● Like working on OSS such as PHP :-)
  • 3.
    3 A look intothe engine
  • 4.
  • 5.
    5 Noyau : Zend Engine ●125K LOC ● ZendE VM ● ZendE Core ● ZendE Tools ● Thread-Safety (TSRM)
  • 6.
    6 Coeur : main etext/standard ● 55K LOC ● str_ ● array_ ● files and streams ● ...
  • 7.
    7 Extensions : ext/xxx ● 530KLOC for ext/ ● "Extensions" and "Zend extensions" ● Static or dynamic compilation and linking ● Add features ● Consume resources ● php -m ; php --re ● Mandatory ext : ● core / date / pcre / reflection / SPL / standard / hash ● Other extensions : ● http://pecl.php.net
  • 8.
    8 PHP ● A programin itself ● Written in C ● Goal : Define a programming Web language ● High level, interpreted ● Interpreted language ● Less efficient than native-instr compiled language ● but simpler to handle
  • 9.
    9 PHP from inside ●A software virtual machine ● Compiler/Executor ● intermediate OPCode ● Mono Thread, Mono process ● Automatic dynamic memory management ● Memory Manager ● Garbage collector
  • 10.
    10 Request treatment steps ●Startup (memory allocations) ● Compilation ● Lexical and syntaxic analysis ● Compilation (OP Code generation) ● Execution ● OPCode interpretation ● Several VM flavors ● Include/require/eval = go back to compilation ● Shutdown (free resources) ● "Share nothing architecture" Startup Shutdown zend_compile_file() zend_execute()
  • 11.
  • 12.
  • 13.
    13 Script execution ● Compilation ●Execution ● Destruction
  • 14.
    14 Lexical analysis (lexing) ●Characters recognition ● Transform chars to tokens ● Lexer generator : Re2c ● http://re2c.org/ ● http://www.php.net/tokens.php ● highlight_file() ● highlight_string() ● compile_file() ● compile_string()
  • 15.
    15 zend_language_scanner.l ● int lex_scan(zval*zendlval) ● Re2C also used in: ● PDO : PS emulation ● dates : strtotime() serialize()/unserialize() /*!re2c HNUM "0x"[0-9a-fA-F]+ BNUM "0b"[01]+ LABEL [a-zA-Z_x7f-xff][a-zA-Z0-9_x7f-xff]* TABS_AND_SPACES [ t]* NEWLINE ("r"|"n"|"rn") <ST_IN_SCRIPTING>"("{TABS_AND_SPACES}("int"|"integer"){TABS_AND_SPACES}")" { return T_INT_CAST; } $(RE2C) $(RE2C_FLAGS) --case-inverted -cbdFt $(srcdir)/zend_language_scanner_defs.h -o $(srcdir)/zend_language_scanner.l
  • 16.
    16 Accessing lexical analyzer ●Lexer from PHP userland : ● https://github.com/sebastianbergmann/phptok ● https://github.com/nikic/PHP-Parser ● ext/tokenizer function display_data(array $data) { $buf = ''; foreach ($data as $k=>$v) { $buf .= sprintf("%s: %s n", $k, $v); } return $buf; } Line Token Text --------------------------------------------------------- 1 OPEN_TAG <?php 2 WHITESPACE 3 FUNCTION function 3 WHITESPACE 3 STRING display_data 3 OPEN_BRACKET ( 3 ARRAY array 3 WHITESPACE 3 VARIABLE $data 3 CLOSE_BRACKET ) 3 WHITESPACE 4 OPEN_CURLY { 4 WHITESPACE … … ...
  • 17.
    17 Sementical analysis (parsing) ●"Understands" a set of tokens ● Defines the language syntax ● Parser generator : GNU/Bison (LALR) ● Foreach token or token set ● → Execute a function to generate an AST statement ● → Goto next token ● → Can generate "Parse error" and halt ● Very tied to lexical analyzer
  • 18.
    18 zend_language_parser.y ● ext/tokenizer statement: '{' inner_statement_list'}' { $$ = $2; } | if_stmt { $$ = $1; } | alt_if_stmt { $$ = $1; } | T_WHILE '(' expr ')' while_statement { $$ = zend_ast_create(ZEND_AST_WHILE, $3, $5); } | T_DO statement T_WHILE '(' expr ')' ';' { $$ = zend_ast_create(ZEND_AST_DO_WHILE, $2, $5); } | T_FOR '(' for_exprs ';' for_exprs ';' for_exprs ')' for_statement { $$ = zend_ast_create(ZEND_AST_FOR, $3, $5, $7, $9); } | T_SWITCH '(' expr ')' switch_case_list { $$ = zend_ast_create(ZEND_AST_SWITCH, $3, $5); } | T_BREAK optional_expr ';' { $$ = zend_ast_create(ZEND_AST_BREAK, $2); } | T_CONTINUE optional_expr ';' { $$ = zend_ast_create(ZEND_AST_CONTINUE, $2); } | T_RETURN optional_expr ';' { $$ = zend_ast_create(ZEND_AST_RETURN, $2); } $(YACC) -p zend -v -d $(srcdir)/zend_language_parser.y -o zend_language_parser.c
  • 19.
  • 20.
    20 Compilation ● Invoked onfinal AST ● Userland AST: https://github.com/nikic/php-ast ● Creates an OPCodes array ● OPCode = low level VM instruction ● Somehow similar to low level assembly ● Example : ADD (a,b) → c ; CONCAT(c,d) → e ; etc... ● Compilation step is very heavy ● Lots of checks ● address resolutions ● many stacks and memory pools ● Some early optimizations/computations are performed
  • 21.
  • 22.
    22 Compilation easy example <?php print'foo'; <ST_IN_SCRIPTING>"print" { return T_PRINT; } T_PRINT expr { $$ = zend_ast_create(ZEND_AST_PRINT, $2); } lexing parsing
  • 23.
    23 Compilation easy example caseZEND_AST_PRINT: zend_compile_print(result, ast); return; compiling T_PRINT expr { $$ = zend_ast_create(ZEND_AST_PRINT, $2); } void zend_compile_print(znode *result, zend_ast *ast) /* {{{ */ { zend_op *opline; zend_ast *expr_ast = ast->child[0]; znode expr_node; zend_compile_expr(&expr_node, expr_ast); opline = zend_emit_op(NULL, ZEND_ECHO, &expr_node, NULL); opline->extended_value = 1; result->op_type = IS_CONST; ZVAL_LONG(&result->u.constant, 1); }
  • 24.
    24 OPCode ? ● phpdbg-p file.php try { get_logger()->log($message, $priority, $extras); } catch(Exception $e) { } L5 #0 INIT_FCALL_BY_NAME "get_logger" L5 #1 DO_FCALL @0 L5 #2 INIT_METHOD_CALL @0 "log" L5 #3 SEND_VAR_EX $message 1 L5 #4 SEND_VAR_EX $priority 2 L5 #5 SEND_VAR_EX $extras 3 L5 #6 DO_FCALL L10 #7 RETURN 1 L6 #8 CATCH "Exception" $e 1 L10 #9 RETURN 1
  • 25.
    25 Execution ● Execute OPCodes ●Most complex part of Zend Engine ● VM executor ● zend_vm_execute.h ● Each OPCode ● is run through a handler() function ● "zend_vm_handler" ● runs the instructions in an infinite dipatch loop ● Branching possibles (loops, catch blocks, gotos, etc...) Startup Shutdown zend_compile_file() zend_execute()
  • 26.
    26 ZEND_ECHO ZEND_VM_HANDLER(40, ZEND_ECHO, CONST|TMPVAR|CV,ANY) { USE_OPLINE zend_free_op free_op1; zval *z; SAVE_OPLINE(); z = GET_OP1_ZVAL_PTR_UNDEF(BP_VAR_R); if (Z_TYPE_P(z) == IS_STRING) { zend_string *str = Z_STR_P(z); if (ZSTR_LEN(str) != 0) { zend_write(ZSTR_VAL(str), ZSTR_LEN(str)); } } else { zend_string *str = _zval_get_string_func(z); if (ZSTR_LEN(str) != 0) { zend_write(ZSTR_VAL(str), ZSTR_LEN(str)); } else if (OP1_TYPE == IS_CV && UNEXPECTED(Z_TYPE_P(z) == IS_UNDEF)) { GET_OP1_UNDEF_CV(z, BP_VAR_R); } zend_string_release(str); }
  • 27.
  • 28.
    28 OPCode Cache ● Firsttime ● Compile ● Cache to SHM ● Execute ● Then, if file did not change ● Load from SHM ● Execute ● Compilation is too heavy ● Prevent it with OPCode cache
  • 29.
    29 Example <?php function foo() { $data =file('/etc/fstab'); sort($data); return $data; } for($i=0; $i<=$argv[1]; $i++) { $a = foo(); $a[] = range(0, $i); $result[] = $a; } var_dump($result);
  • 30.
    30 Compilation / Execution functionfoo() { $data = file('/etc/fstab'); sort($data); return $data; } for($i=0; $i<=$argv[1]; $i++) { $a = foo(); $a[] = range(0, $i); $result[] = $a; } var_dump($result); main()==>run_init::tmp/php.php//1 241 main()==>compile::tmp/php.php//1 89 main()==>run_init::tmp/php.php//1 1731 main()==>compile::tmp/php.php//1 89 argv = 1 argv = 10
  • 31.
  • 32.
    42 Zend Memory Manager ●ZendMM : Request bound dynamic memory allocator ● Owns its heap reclaimed using malloc() / mmap() ● Used by PHP, Zend Engine and extensions while treating a request ● Tunable ● zend_alloc.c/h
  • 33.
    44 Memory consumption ● memory_get_usage():size used by your runtime code ● memory_get_usage(true): size allocated through the OS ● ZendMM caches blocks ● use gc_mem_caches() to reclaim them if needed ● Use your OS to be accurate php> echo memory_get_usage(); 625272 php> echo memory_get_usage(1); 786432 cat /proc/13399/status Name:php State: S (sleeping) VmPeak: 154440 kB VmSize: 133700 kB VmRSS: 10304 kB VmData: 4316 kB VmStk: 136 kB VmExe: 9876 kB VmLib: 13408 kB
  • 34.