Read this post if you don't know what Bison is.
I already have the Bison AST parser, but this time I will do it without PHP FFI.
First, we need to install the PHP skeleton package to build the parser and tree printer package to print AST
.
composer require --dev mrsuh/php-bison-skeleton composer require mrsuh/tree-printer
It will more readable if we separate code from printer.php
to individual files.
. ├── /ast-parser ├── /bin │ └── parse.php # entry point for parser ├── /lib │ └── parser.php # generated file ├── /src │ ├── Lexer.php │ └── Node.php # AST node └── grammar.y
To print AST
with the tree printer package Node
class must implement Mrsuh\Tree\NodeInterface
.
src/Node.php
<?php namespace App; use Mrsuh\Tree\NodeInterface; class Node implements NodeInterface { private string $name; private string $value; /** @var Node[] */ private array $children; public function __construct(string $name, string $value, array $children = []) { $this->name = $name; $this->value = $value; $this->children = $children; } public function getChildren(): array { return $this->children; } public function __toString(): string { $line = $this->name; if (!empty($this->value)) { $line .= sprintf(" '%s'", $this->value); } return $line; } }
Lexer
is not modified from previous post, but this time we will put it in a separate file src/Lexer.php
.
src/Lexer.php
<?php namespace App; class Lexer implements LexerInterface { private array $words; private int $index = 0; private int $value = 0; public function __construct($resource) { $this->words = explode(' ', trim(fgets($resource))); } public function yyerror(string $message): void { printf("%s\n", $message); } public function getLVal() { return $this->value; } public function yylex(): int { if ($this->index >= count($this->words)) { return LexerInterface::YYEOF; } $word = $this->words[$this->index++]; if (is_numeric($word)) { $this->value = (int)$word; return LexerInterface::T_NUMBER; } return ord($word); } }
For example, Lexer
will translate the expression 10 + 20 - 30
into this:
word | token | value |
---|---|---|
10 | LexerInterface::T_NUMBER (258) | 10 |
+ | ASCII (43) | |
20 | LexerInterface::T_NUMBER (258) | 20 |
- | ASCII (45) | |
30 | LexerInterface::T_NUMBER (258) | 30 |
LexerInterface::YYEOF (0) |
It's time to create the grammar.y
file and build lib/parser.php
You can define %code
blocks, so Bison will render code as is in printer.php
grammar.y
%code imports { // code imports }; %code parser { // code parser }; %code init { // code init };
printer.php
<?php // code imports class Parser { // code parser public function __construct() { // code init } }
We will use block %code parser
to define variables and methods to store AST
into the Parser
class.
Bison has reserved the symbol $
in grammar actions.
It's very sad for PHP developers, but we can call the function setAst()
with self::setAst()
instead of $this->setAst()
.
grammar.y
%define api.parser.class {Parser} %define api.namespace {App} %code parser { private Node $ast; public function setAst(Node $ast): void { $this->ast = $ast; } public function getAst(): Node { return $this->ast; } } %token T_NUMBER %left '-' '+' %% start: expression { self::setAst($1); } ; expression: T_NUMBER { $$ = new Node('NUMBER', $1); } | expression '+' expression { $$ = new Node('OPERATION_PLUS', '', [$1, $3]); } | expression '-' expression { $$ = new Node('OPERATION_MINUS', '', [$1, $3]); } ;
bison -S vendor/mrsuh/php-bison-skeleton/src/php-skel.m4 -o lib/parser.php grammar.y
Command options:
-
-S vendor/mrsuh/php-bison-skeleton/src/php-skel.m4
- path toskeleton
file -
-o parser.php
- output parser file -
grammar.y
- our grammar file
And final PHP file is the entry point bin/parse.php
.
bin/parse.php
<?php require_once __DIR__ . '/../vendor/autoload.php'; use App\Lexer; use App\Parser; use Mrsuh\Tree\Printer; $lexer = new Lexer(STDIN); $parser = new Parser($lexer); if (!$parser->parse()) { exit(1); } $printer = new Printer(STDOUT); $printer->print($parser->getAst());
We need to add a special autoload section to composer.json
for generated lib/parser.php
file.
composer.json
{ "autoload": { "psr-4": { "App\\": "src/" }, "files": ["lib/parser.php"] }, ... }
Ok. Our parser is ready and we can test it:
php bin/parse.php <<< "1 + 2 - 3" . ├── OPERATION_MINUS ├── OPERATION_PLUS │ ├── NUMBER '1' │ └── NUMBER '2' └── NUMBER '3'
Try to parse big expression:
php bin/parse.php <<< "1 + 2 - 3 + 4 - 5 + 6 - 7 + 8 - 9 + 10" . ├── OPERATION_PLUS ├── OPERATION_MINUS │ ├── OPERATION_PLUS │ │ ├── OPERATION_MINUS │ │ │ ├── OPERATION_PLUS │ │ │ │ ├── OPERATION_MINUS │ │ │ │ │ ├── OPERATION_PLUS │ │ │ │ │ │ ├── OPERATION_MINUS │ │ │ │ │ │ │ ├── OPERATION_PLUS │ │ │ │ │ │ │ │ ├── NUMBER '1' │ │ │ │ │ │ │ │ └── NUMBER '2' │ │ │ │ │ │ │ └── NUMBER '3' │ │ │ │ │ │ └── NUMBER '4' │ │ │ │ │ └── NUMBER '5' │ │ │ │ └── NUMBER '6' │ │ │ └── NUMBER '7' │ │ └── NUMBER '8' │ └── NUMBER '9' └── NUMBER '10'
Great!
You can get the parser source code here and test it by yourself.
Some useful links:
Top comments (0)