DEV Community

Cover image for AST parser with PHP and Bison
Anton Sukhachev
Anton Sukhachev

Posted on • Edited on

AST parser with PHP and Bison

Read this post if you don't know what Bison is.

I already have the Bison AST parser, but this time I will do it without PHP FFI.

First, we need to install the PHP skeleton package to build the parser and tree printer package to print AST.

composer require --dev mrsuh/php-bison-skeleton composer require mrsuh/tree-printer 
Enter fullscreen mode Exit fullscreen mode

It will more readable if we separate code from printer.php to individual files.

. ├── /ast-parser ├── /bin │ └── parse.php # entry point for parser ├── /lib │ └── parser.php # generated file ├── /src │ ├── Lexer.php │ └── Node.php # AST node └── grammar.y 
Enter fullscreen mode Exit fullscreen mode

To print AST with the tree printer package Node class must implement Mrsuh\Tree\NodeInterface.

src/Node.php

<?php namespace App; use Mrsuh\Tree\NodeInterface; class Node implements NodeInterface { private string $name; private string $value; /** @var Node[] */ private array $children; public function __construct(string $name, string $value, array $children = []) { $this->name = $name; $this->value = $value; $this->children = $children; } public function getChildren(): array { return $this->children; } public function __toString(): string { $line = $this->name; if (!empty($this->value)) { $line .= sprintf(" '%s'", $this->value); } return $line; } } 
Enter fullscreen mode Exit fullscreen mode

Lexer is not modified from previous post, but this time we will put it in a separate file src/Lexer.php.

src/Lexer.php

<?php namespace App; class Lexer implements LexerInterface { private array $words; private int $index = 0; private int $value = 0; public function __construct($resource) { $this->words = explode(' ', trim(fgets($resource))); } public function yyerror(string $message): void { printf("%s\n", $message); } public function getLVal() { return $this->value; } public function yylex(): int { if ($this->index >= count($this->words)) { return LexerInterface::YYEOF; } $word = $this->words[$this->index++]; if (is_numeric($word)) { $this->value = (int)$word; return LexerInterface::T_NUMBER; } return ord($word); } } 
Enter fullscreen mode Exit fullscreen mode

For example, Lexer will translate the expression 10 + 20 - 30 into this:

word token value
10 LexerInterface::T_NUMBER (258) 10
+ ASCII (43)
20 LexerInterface::T_NUMBER (258) 20
- ASCII (45)
30 LexerInterface::T_NUMBER (258) 30
LexerInterface::YYEOF (0)

It's time to create the grammar.y file and build lib/parser.php
You can define %code blocks, so Bison will render code as is in printer.php

grammar.y

%code imports { // code imports }; %code parser { // code parser };  %code init { // code init };  
Enter fullscreen mode Exit fullscreen mode

printer.php

<?php // code imports class Parser { // code parser public function __construct() { // code init } } 
Enter fullscreen mode Exit fullscreen mode

We will use block %code parser to define variables and methods to store AST into the Parser class.
Bison has reserved the symbol $ in grammar actions.
It's very sad for PHP developers, but we can call the function setAst() with self::setAst() instead of $this->setAst().

grammar.y

%define api.parser.class {Parser} %define api.namespace {App} %code parser { private Node $ast; public function setAst(Node $ast): void { $this->ast = $ast; } public function getAst(): Node { return $this->ast; } } %token T_NUMBER %left '-' '+' %% start: expression { self::setAst($1); } ; expression: T_NUMBER { $$ = new Node('NUMBER', $1); } | expression '+' expression { $$ = new Node('OPERATION_PLUS', '', [$1, $3]); } | expression '-' expression { $$ = new Node('OPERATION_MINUS', '', [$1, $3]); } ; 
Enter fullscreen mode Exit fullscreen mode
bison -S vendor/mrsuh/php-bison-skeleton/src/php-skel.m4 -o lib/parser.php grammar.y 
Enter fullscreen mode Exit fullscreen mode

Command options:

  • -S vendor/mrsuh/php-bison-skeleton/src/php-skel.m4 - path to skeleton file
  • -o parser.php - output parser file
  • grammar.y - our grammar file

And final PHP file is the entry point bin/parse.php.

bin/parse.php

<?php require_once __DIR__ . '/../vendor/autoload.php'; use App\Lexer; use App\Parser; use Mrsuh\Tree\Printer; $lexer = new Lexer(STDIN); $parser = new Parser($lexer); if (!$parser->parse()) { exit(1); } $printer = new Printer(STDOUT); $printer->print($parser->getAst()); 
Enter fullscreen mode Exit fullscreen mode

We need to add a special autoload section to composer.json for generated lib/parser.php file.

composer.json

{ "autoload": { "psr-4": { "App\\": "src/" }, "files": ["lib/parser.php"] }, ... } 
Enter fullscreen mode Exit fullscreen mode

Ok. Our parser is ready and we can test it:

php bin/parse.php <<< "1 + 2 - 3" . ├── OPERATION_MINUS ├── OPERATION_PLUS │ ├── NUMBER '1' │ └── NUMBER '2' └── NUMBER '3' 
Enter fullscreen mode Exit fullscreen mode

Try to parse big expression:

php bin/parse.php <<< "1 + 2 - 3 + 4 - 5 + 6 - 7 + 8 - 9 + 10" . ├── OPERATION_PLUS ├── OPERATION_MINUS │ ├── OPERATION_PLUS │ │ ├── OPERATION_MINUS │ │ │ ├── OPERATION_PLUS │ │ │ │ ├── OPERATION_MINUS │ │ │ │ │ ├── OPERATION_PLUS │ │ │ │ │ │ ├── OPERATION_MINUS │ │ │ │ │ │ │ ├── OPERATION_PLUS │ │ │ │ │ │ │ │ ├── NUMBER '1' │ │ │ │ │ │ │ │ └── NUMBER '2' │ │ │ │ │ │ │ └── NUMBER '3' │ │ │ │ │ │ └── NUMBER '4' │ │ │ │ │ └── NUMBER '5' │ │ │ │ └── NUMBER '6' │ │ │ └── NUMBER '7' │ │ └── NUMBER '8' │ └── NUMBER '9' └── NUMBER '10' 
Enter fullscreen mode Exit fullscreen mode

Great!

You can get the parser source code here and test it by yourself.

Some useful links:

Top comments (0)