DEV Community

Cover image for Nginx parser with PHP and Bison
Anton Sukhachev
Anton Sukhachev

Posted on • Edited on

Nginx parser with PHP and Bison

Read this post if you don't know what Bison is.

Today I'll try to parse Nginx config into AST.
I get the actual Nginx config from official Symfony documentation to test the parser.
nginx.conf

server { server_name domain.tld www.domain.tld; root /var/www/project/public; location / { # try to serve file directly, fallback to index.php try_files $uri /index.php$is_args$args; } location /bundles { try_files $uri =404; } location ~ ^/index\.php(/|$) { fastcgi_pass unix:/var/run/php/php-fpm.sock; fastcgi_split_path_info ^(.+\.php)(/.*)$; include fastcgi_params; # optionally set the value of the environment variables used in the application fastcgi_param APP_ENV prod; fastcgi_param APP_SECRET <app-secret-id>; fastcgi_param DATABASE_URL "mysql://db_user:db_pass@host:3306/db_name"; fastcgi_param SCRIPT_FILENAME $realpath_root$fastcgi_script_name; fastcgi_param DOCUMENT_ROOT $realpath_root; internal; } location ~ \.php$ { return 404; } error_log /var/log/nginx/project_error.log; access_log /var/log/nginx/project_access.log; } 
Enter fullscreen mode Exit fullscreen mode

First, we need to install PHP dependencies.

composer require --dev mrsuh/php-bison-skeleton composer require mrsuh/tree-printer composer require doctrine/lexer 
Enter fullscreen mode Exit fullscreen mode

We will store our files like this:

. ├── /ast-parser ├── /bin │ └── parse.php # entry point to parse nginx configs ├── /lib │ └── parser.php # generated file ├── /src │ ├── Lexer.php │ └── Node.php # AST node └── grammar.y 
Enter fullscreen mode Exit fullscreen mode

The Node class must implement Mrsuh\Tree\NodeInterface to print AST.
src/Node.php

<?php namespace App; use Mrsuh\Tree\NodeInterface; class Node implements NodeInterface { private string $name; /** @var array<string, mixed> */ private array $attributes; /** @var Node[] */ private array $children; public function __construct(string $name, array $attributes = [], array $children = []) { $this->name = $name; $this->attributes = $attributes; $this->children = $children; } public function getChildren(): array { return $this->children; } public function __toString(): string { $line = $this->name; if (!empty($this->attributes)) { $line .= ' {'; foreach ($this->attributes as $key => $value) { $line .= sprintf( " %s: '%s'", $key, is_array($value) ? implode(', ', $value) : $value ); } $line .= ' }'; } return $line; } } 
Enter fullscreen mode Exit fullscreen mode

This time I'll use Doctrine lexer library. It can help to parse complex text.

src/Lexer.php

<?php namespace App; use Doctrine\Common\Lexer\AbstractLexer; class Lexer extends AbstractLexer implements LexerInterface { public function __construct($resource) { $this->setInput(stream_get_contents($resource)); $this->moveNext(); } protected function getCatchablePatterns(): array { return [';']; } protected function getNonCatchablePatterns(): array { return [' ','[\n]+','#[^\n]+']; // skip spaces, eol, and comments  } protected function getType(&$value): int { switch ($value) { case 'server': return LexerInterface::T_SERVER; case 'server_name': return LexerInterface::T_SERVER_NAME; ... } return ord($value); } public function yyerror(string $message): void { printf("%s\n", $message); } public function getLVal() { return $this->token->value; } public function yylex(): int { if (!$this->lookahead) { return LexerInterface::YYEOF; } $this->moveNext(); return $this->token->type; } } 
Enter fullscreen mode Exit fullscreen mode

For example, Lexer will translate the Nginx config below

server { server_name domain.tld www.domain.tld; root /var/www/project/public; location / { # try to serve file directly, fallback to index.php try_files $uri /index.php$is_args$args; } } 
Enter fullscreen mode Exit fullscreen mode

into this:

word token
server LexerInterface::T_SERVER (258)
{ ASCII (123)
server_name LexerInterface::T_SERVER_NAME (259)
domain.tld LexerInterface::T_SERVER_NAME_VALUE (260)
www.domain.tld LexerInterface::T_SERVER_NAME_VALUE (260)
; ASCII (59)
root LexerInterface::T_SERVER_ROOT (261)
/var/www/project/public LexerInterface::T_SERVER_ROOT_PATH (262)
; ASCII (59)
location LexerInterface::T_LOCATION (263)
/ ASCII (264)
{ ASCII (123)
try_files LexerInterface::T_TRY_FILES (283)
$uri LexerInterface::T_TRY_FILES_PATH (284)
/index.php$is_args$args LexerInterface::T_TRY_FILES_PATH (284)
; ASCII (59)
} ASCII (125)
} ASCII (125)
LexerInterface::YYEOF (0)

Time to create grammar.y file and build lib/parser.php

We will use block %code parser to define variables and methods to store AST into the Parser class.
You can find full grammar file here.
grammar.y

%define api.parser.class {Parser} %define api.namespace {App} %code parser { private Node $ast; public function setAst(Node $ast): void { $this->ast = $ast; } public function getAst(): Node { return $this->ast; } } %token T_SERVER %token T_SERVER_NAME %token T_SERVER_NAME_VALUE %token T_SERVER_ROOT %token T_SERVER_ROOT_PATH ... %token T_TRY_FILES %token T_TRY_FILES_PATH %% server: T_SERVER '{' server_body_list '}' { self::setAst(new Node('T_SERVER', [], $3)); } ; server_name_values: T_SERVER_NAME_VALUE { $$ = [$1]; } | server_name_values T_SERVER_NAME_VALUE { $$ = $1; $$[] = $2; } ; server_body: T_SERVER_NAME server_name_values ';' { $$ = new Node('T_SERVER_NAME', ['names' => $2]); } | T_SERVER_ROOT T_SERVER_ROOT_PATH ';' { $$ = new Node('T_SERVER_ROOT', ['path' => $2]); } | T_ERROR_LOG T_ERROR_LOG_PATH ';' { $$ = new Node('T_ERROR_LOG', ['path' => $2]); } | T_ACCESS_LOG T_ACCESS_LOG_PATH ';' { $$ = new Node('T_ACCESS_LOG', ['path' => $2]); } ; ... 
Enter fullscreen mode Exit fullscreen mode
bison -S vendor/mrsuh/php-bison-skeleton/src/php-skel.m4 -o lib/parser.php grammar.y 
Enter fullscreen mode Exit fullscreen mode

Command options:

  • -S vendor/mrsuh/php-bison-skeleton/src/php-skel.m4 - path to skeleton file
  • -o parser.php - output parser file
  • grammar.y - our grammar file

The final PHP file is the entry point for the parser.
bin/parse.php

<?php require_once __DIR__ . '/../vendor/autoload.php'; use App\Parser; use App\Lexer; use Mrsuh\Tree\Printer; $lexer = new Lexer(fopen($argv[1], 'r')); $parser = new Parser($lexer); if (!$parser->parse()) { exit(1); } $printer = new Printer(); $printer->print($parser->getAst()); 
Enter fullscreen mode Exit fullscreen mode

Autoload for generated lib/parser.php file.
composer.json

{ "autoload": { "psr-4": { "App\\": "src/" }, "files": ["lib/parser.php"] }, ... } 
Enter fullscreen mode Exit fullscreen mode

Finally, we can test our parser.

php bin/parse.php nginx.conf . ├── T_SERVER ├── T_SERVER_NAME { names: 'domain.tld, www.domain.tld' } ├── T_SERVER_ROOT { path: '/var/www/project/public' } ├── T_LOCATION { regexp: '' path: '/' } │ └── T_TRY_FILES { paths: '$uri, /index.php$is_args$args' } ├── T_LOCATION { regexp: '' path: '/bundles' } │ └── T_TRY_FILES { paths: '$uri, =404' } ├── T_LOCATION { regexp: '~' path: '^/index\.php(/|$)' } │ ├── T_FAST_CGI_PATH { path: 'unix:/var/run/php/php-fpm.sock' } │ ├── T_FAST_CGI_SPLIT_PATH_INFO { path: '^(.+\.php)(/.*)$' } │ ├── T_INCLUDE { path: 'fastcgi_params' } │ ├── T_FAST_CGI_PARAM { APP_ENV: 'prod' } │ ├── T_FAST_CGI_PARAM { APP_SECRET: '<app-secret-id>' } │ ├── T_FAST_CGI_PARAM { DATABASE_URL: '"mysql://db_user:db_pass@host:3306/db_name"' } │ ├── T_FAST_CGI_PARAM { SCRIPT_FILENAME: '$realpath_root$fastcgi_script_name' } │ ├── T_FAST_CGI_PARAM { DOCUMENT_ROOT: '$realpath_root' } │ └── T_INTERNAL ├── T_LOCATION { regexp: '~' path: '\.php$' } │ └── T_RETURN { code: '404' body: '' } ├── T_ERROR_LOG { path: '/var/log/nginx/project_error.log' } └── T_ACCESS_LOG { path: '/var/log/nginx/project_access.log' } 
Enter fullscreen mode Exit fullscreen mode

It works!

You can get the parser source code here and test it by yourself.

Some useful links:

Top comments (0)