Lark is a Python parsing library. Unlike parser generators like Yacc it doesn’t generate a source code file from a grammar — the parser is generated dynamically. Let’s see hot it works. You import Lark:
from lark import Lark then specify the grammar:
grammar = """ start: WORD "," WORD "!" %import common.WORD %ignore " " """ The grammar can be a Python string or read from a separate file. After that, just create a Lark class instance, initializing it with the grammar:
parser = Lark(grammar) and you are ready to parse:
def main(): print(parser.parse("Hello, world!")) print(parser.parse("Adios, amigo!")) if \_\_name\_\_ == '\_\_main\_\_': main() parser.parse returns a Tree instance containing the parse tree:
Tree(start, [Token(WORD, 'Hello'), Token(WORD, 'world')]) Tree(start, [Token(WORD, 'Adios'), Token(WORD, 'amigo')]) That’s it, clean and simple. It’s up to you to decide what to do with the parsed string. Let’s see where we can go from there. Here is an example of a simple arithmetic expression parser:
from lark import Lark grammar = """ start: add\_expr | sub\_expr add\_expr: NUMBER "+" NUMBER sub\_expr: NUMBER "-" NUMBER %import common.NUMBER %ignore " " """ The grammar ignores spaces. Also note that the grammar terminals are written in uppercase letters (NUMBER) while the grammar rules are written in lowercase letters (start, add_expr and sub_expr). %import and %ignore are directives. You can find the grammar reference in the Lark documentation. We can import definitions from other grammars — in this case common.lark .( common.lark just contains some useful definitions). The above grammar will successfully parse addition and subtraction expressions, like:
1+1 2-1 3 - 2 and nothing else. Next, create the Lark object:
parser = Lark(grammar) and we are ready to parse:
def main(): print(parser.parse("1+1")) print(parser.parse("2-1")) print(parser.parse("3 - 2")) if \_\_name\_\_ == '\_\_main\_\_': main() The output is as expected:
Tree(start, [Tree(add\_expr, [Token(NUMBER, '1'), Token(NUMBER, '1')])]) Tree(start, [Tree(sub\_expr, [Token(NUMBER, '2'), Token(NUMBER, '1')])]) Tree(start, [Tree(sub\_expr, [Token(NUMBER, '3'), Token(NUMBER, '2')])]) Note that this example just prints the parse tree as before. Let’s transform it to something more useful:
from lark import Lark, Transformer grammar = """ start: add\_expr | sub\_expr add\_expr: NUMBER "+" NUMBER -> add\_expr sub\_expr: NUMBER "-" NUMBER -> sub\_expr %import common.NUMBER %ignore " " """ add_expr and sub_expr on the right hand side of the grammar rules are the names of the functions that are to be applied when a rule is successfully parsed. Let’s write them:
class CalcTransformer(Transformer): def add\_expr(self, args): return int(args[0]) + int(args[1]) def sub\_expr(self, args): return int(args[0]) - int(args[1]) Uh. For instance, when parsing
2-1 args[0] will contain "2" and args[1] will contain "1" . In our transformer functions we convert both to integers and add or subtract them returning the result. Now create the Lark object:
parser = Lark(grammar, parser='lalr', transformer=CalcTransformer()) For it to be able to accept transformers the parser needs to be a LALR parser. We are finally ready to parse:
def main(): print(parser.parse("1+1")) print(parser.parse("2-1")) print(parser.parse("3 - 2")) if \_\_name\_\_ == '\_\_main\_\_': main() The output is now:
Tree(start, [2]) Tree(start, [1]) Tree(start, [1]) Better? 1+1 is 2, 2–1 is1 and 3–2 is also 1.
Of course this is just scratching the surface. If you are interested, you can find the full examples on Github.

Top comments (0)