Parser Combinators Writing external DSLs Wednesday, November 9, 2011
DSLs • DSL is an ultimate abstraction • You create a language that precisely describes the problem you are trying to solve. Wednesday, November 9, 2011
Internal & External • DSL can be internal - made using the host programming language. • Or external - you invent a little language for solving your problem. Wednesday, November 9, 2011
Internal DSL: Good • No external tool dependencies • Nice IDE support • Type checking for free • Easy to implement • Gives full power of your language • Parsing is free Wednesday, November 9, 2011
Internal DSL: Bad • It requires expressive language • If the language is not expressive enough DSL will be verbose and not pleasant to use • It can produce obscure error messages Wednesday, November 9, 2011
External DSL: Good • Platform independent • Nicer syntax • External tool can perform static analysis • Good error messages Wednesday, November 9, 2011
External DSLs: Bad • No IDE support • Hard to implement Wednesday, November 9, 2011
External DSLs • External DSLs can be implemented using interpretation or code generation. • Parse • Analyze • Interpret | Generate Code Wednesday, November 9, 2011
parseJSON :: String -> Either Error JSONObject Wednesday, November 9, 2011
JSON Grammar-like stuff object := '{' fields? '}' fields := field (',' field)* field := id ':' value value := literal | object | array Wednesday, November 9, 2011
Bison (C) ANTLR (Java and many others) You can encode grammar by hand Wednesday, November 9, 2011
function parseObject() { match('{') if (lookahead(ID)) parseFields() match('}') } Wednesday, November 9, 2011
function parseFields() { parseField() while (lookahead(',')) { consume() parseField() } } Wednesday, November 9, 2011
• Parser can either consume some input or not. • Parser can fail or succeed. • Parser can return some value. Wednesday, November 9, 2011
• Usually sequences are just sequential calls. • Optional parts of grammar are just IFs • Zero or more elements is WHILE • Grammar rule is a function Wednesday, November 9, 2011
function parseFields() { separatedBy(',', parseField) } function separatedBy(ch, f) { f() while (lookahead(ch)) { consume() f() } } Wednesday, November 9, 2011
function sepBy(ch, p) { return seq(p, many(seq(ch, p))) } Wednesday, November 9, 2011
Combinator pattern Wednesday, November 9, 2011
Haskell in 60 seconds Wednesday, November 9, 2011
main = putStrLn "Hello World!" times x n = x * n times = x n -> x * n times = x -> n -> x * n times = (*) times 2 3 2 `times` 3 data Point = Point Int Int data Either a b = Left a | Right b Point 10 20 Left “Hello” Wednesday, November 9, 2011
import Text.Parsec parse :: Parser -> SourceName -> String -> Either ParseError a Wednesday, November 9, 2011
p = satisfy (x -> x == 'a') main = print (parse p "<test>" "abc") > Right 'a' p = satisfy (x -> x == 'a') main = print (parse p "<test>" "bcd") > Left "<test>" (line 1, column 1): > unexpected "b" Wednesday, November 9, 2011
-- There is something like this in Parsec char c = satisfy (x -> x == c) -- Example p = char 'a' main = print (parse p "<test>" "bcd") > Left "<test>" (line 1, column 1): > unexpected "b" > expecting "a" Wednesday, November 9, 2011
char c = satisfy (==c) <?> show [c] Wednesday, November 9, 2011
p = char 'a' >>= x -> char 'b' >>= y -> return [x, y] main = print (parse p "<test>" "abc") > Right "ab" Wednesday, November 9, 2011
p = do char 'a' char 'b' return "ab" main = print (parse p "<test>" "abc") > Right "ab" Wednesday, November 9, 2011
p = do char 'a' char 'b' <|> char 'q' char 'c' <|> char 'd' main = print (parse p "<test>" "abc") Wednesday, November 9, 2011
letter = satisfy isAlpha <?> "letter" digit = satisfy isDigit <?> "digit" spaces = skipMany space <?> "white space" many p = ... censored ... Wednesday, November 9, 2011
p = do letter many (letter <|> digit) main = print (parse p "<test>" "hello123") > Right "ello123" Wednesday, November 9, 2011
p = do x <- letter xs <- many (letter <|> digit) return (x:xs) main = print (parse p "<test>" "hello123") > Right "hello123" Wednesday, November 9, 2011
ident = do x <- letter xs <- many (letter <|> digit) return (x:xs) p = ident main = print (parse p "<test>" "123hello") > Left "<test>" (line 1, column 1): > unexpected "1" > expecting letter Wednesday, November 9, 2011
ident = do x <- letter xs <- many (letter <|> digit) return (x:xs) p = ident <?> "variable name" main = print (parse p "<test>" "123hello") > Left "<test>" (line 1, column 1): > unexpected "1" > expecting variable name Wednesday, November 9, 2011
ident = do x <- letter xs <- many (letter <|> digit) return (x:xs) <?> "variable name" p = ident main = print (parse p "<test>" "123hello") Wednesday, November 9, 2011
ident = do x <- letter xs <- many (letter <|> digit) return (x:xs) <?> "variable name" letKeyword = string "let" p = ident <|> letKeyword main = print (parse p "<test>" "letter") > Right "letter" Wednesday, November 9, 2011
ident = do x <- letter xs <- many (letter <|> digit) return (x:xs) <?> "variable name" letKeyword = do s <- string "let" notFollowedBy (letter <|> digit) spaces return s p = try(letKeyword) <|> ident main = print (parse p "<test>" "letter") ----- Right "letter" Wednesday, November 9, 2011
Lexer-like example Wednesday, November 9, 2011
type Name = String data Expr = Number Integer | Var Name | Let Name Expr | Seq [Expr] deriving Show Wednesday, November 9, 2011
myDef = emptyDef { identStart = letter <|> char '_', identLetter = alphaNum <|> char '_', opStart = opLetter myDef, opLetter = oneOf "=,;", reservedOpNames = [ ",", ";", "=" ], reservedNames = ["let"], caseSensitive = True } Wednesday, November 9, 2011
tokenParser = PT.makeTokenParser myDef parens = PT.parens tokenParser naturalNumber = PT.natural tokenParser keyword = PT.reserved tokenParser identifier = PT.identifier tokenParser op = PT.reservedOp tokenParser commaSep = PT.commaSep tokenParser commaSep1 = PT.commaSep1 tokenParser Wednesday, November 9, 2011
simple = numberLiteral <|> var numberLiteral = do n <- naturalNumber return $ Number n var = do s <- identifier return $ Var s Wednesday, November 9, 2011
letStmt = do keyword "let" defs <- commaSep1 def op ";" return $ Seq defs def = do name <- identifier op "=" value <- simple return $ Let name value Wednesday, November 9, 2011
main = print ( parse letStmt "<test>" "let one = 1, two = 2;" ) ---- Right ( Seq [ Let "one" (Number 1), Let "two" (Number 2) ] ) Wednesday, November 9, 2011
CSV Parser Example Wednesday, November 9, 2011
import Text.Parsec csvFile = line `endBy1` eol line = cell `sepBy` (char ',') cell = wsopt >> many (noneOf ",n") wsopt = many (oneOf " t") eol = char 'n' main = print (parse csvFile "<test>" input) where input = "aa, b,cnd,e,fn" > Right [["aa","b","c"],["d","e","f"]] Wednesday, November 9, 2011
import scala.util.parsing.combinator._ object CSVParser extends RegexParsers { override def skipWhitespace = false def whitespace = """[ t]*"""r def csv = rep1sep(row, "n") <~ "n" def row = rep1sep(field, ",") def field = opt(whitespace) ~> """[^n,]""".r def parse(s: String) = parseAll(csv, s) } println(CSVParser.parse("a,b,cnd,e, fn")) > [3.1] parsed: List(List(a, b, c), List(d, e, f)) Wednesday, November 9, 2011
Pysec for Python Spirit for C++ In standard library of Scala A lot of others... Wednesday, November 9, 2011
Stuff to read • Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages by Terence Parr Wednesday, November 9, 2011
Stuff to read • Domain Specific Languages by Martin Fowler Wednesday, November 9, 2011
Stuff to read • http://learnyouahaskell.com/ Learn You a Haskell for a Great Good Wednesday, November 9, 2011
The End Wednesday, November 9, 2011

Parser combinators

  • 1.
    Parser Combinators Writing external DSLs Wednesday, November 9, 2011
  • 2.
    DSLs • DSL is an ultimate abstraction • You create a language that precisely describes the problem you are trying to solve. Wednesday, November 9, 2011
  • 3.
    Internal & External • DSL can be internal - made using the host programming language. • Or external - you invent a little language for solving your problem. Wednesday, November 9, 2011
  • 4.
    Internal DSL: Good • No external tool dependencies • Nice IDE support • Type checking for free • Easy to implement • Gives full power of your language • Parsing is free Wednesday, November 9, 2011
  • 5.
    Internal DSL: Bad • It requires expressive language • If the language is not expressive enough DSL will be verbose and not pleasant to use • It can produce obscure error messages Wednesday, November 9, 2011
  • 6.
    External DSL: Good • Platform independent • Nicer syntax • External tool can perform static analysis • Good error messages Wednesday, November 9, 2011
  • 7.
    External DSLs: Bad • No IDE support • Hard to implement Wednesday, November 9, 2011
  • 8.
    External DSLs • External DSLs can be implemented using interpretation or code generation. • Parse • Analyze • Interpret | Generate Code Wednesday, November 9, 2011
  • 9.
    parseJSON :: String-> Either Error JSONObject Wednesday, November 9, 2011
  • 10.
    JSON Grammar-like stuff object := '{' fields? '}' fields := field (',' field)* field := id ':' value value := literal | object | array Wednesday, November 9, 2011
  • 11.
    Bison (C) ANTLR (Java and many others) You can encode grammar by hand Wednesday, November 9, 2011
  • 12.
    function parseObject() { match('{') if (lookahead(ID)) parseFields() match('}') } Wednesday, November 9, 2011
  • 13.
    function parseFields() { parseField() while (lookahead(',')) { consume() parseField() } } Wednesday, November 9, 2011
  • 14.
    • Parser caneither consume some input or not. • Parser can fail or succeed. • Parser can return some value. Wednesday, November 9, 2011
  • 15.
    • Usually sequencesare just sequential calls. • Optional parts of grammar are just IFs • Zero or more elements is WHILE • Grammar rule is a function Wednesday, November 9, 2011
  • 16.
    function parseFields() { separatedBy(',', parseField) } function separatedBy(ch, f) { f() while (lookahead(ch)) { consume() f() } } Wednesday, November 9, 2011
  • 17.
    function sepBy(ch, p){ return seq(p, many(seq(ch, p))) } Wednesday, November 9, 2011
  • 18.
  • 19.
    Haskell in 60seconds Wednesday, November 9, 2011
  • 20.
    main = putStrLn"Hello World!" times x n = x * n times = x n -> x * n times = x -> n -> x * n times = (*) times 2 3 2 `times` 3 data Point = Point Int Int data Either a b = Left a | Right b Point 10 20 Left “Hello” Wednesday, November 9, 2011
  • 21.
    import Text.Parsec parse :: Parser -> SourceName -> String -> Either ParseError a Wednesday, November 9, 2011
  • 22.
    p = satisfy(x -> x == 'a') main = print (parse p "<test>" "abc") > Right 'a' p = satisfy (x -> x == 'a') main = print (parse p "<test>" "bcd") > Left "<test>" (line 1, column 1): > unexpected "b" Wednesday, November 9, 2011
  • 23.
    -- There issomething like this in Parsec char c = satisfy (x -> x == c) -- Example p = char 'a' main = print (parse p "<test>" "bcd") > Left "<test>" (line 1, column 1): > unexpected "b" > expecting "a" Wednesday, November 9, 2011
  • 24.
    char c =satisfy (==c) <?> show [c] Wednesday, November 9, 2011
  • 25.
    p = char'a' >>= x -> char 'b' >>= y -> return [x, y] main = print (parse p "<test>" "abc") > Right "ab" Wednesday, November 9, 2011
  • 26.
    p = dochar 'a' char 'b' return "ab" main = print (parse p "<test>" "abc") > Right "ab" Wednesday, November 9, 2011
  • 27.
    p = dochar 'a' char 'b' <|> char 'q' char 'c' <|> char 'd' main = print (parse p "<test>" "abc") Wednesday, November 9, 2011
  • 28.
    letter = satisfyisAlpha <?> "letter" digit = satisfy isDigit <?> "digit" spaces = skipMany space <?> "white space" many p = ... censored ... Wednesday, November 9, 2011
  • 29.
    p = doletter many (letter <|> digit) main = print (parse p "<test>" "hello123") > Right "ello123" Wednesday, November 9, 2011
  • 30.
    p = dox <- letter xs <- many (letter <|> digit) return (x:xs) main = print (parse p "<test>" "hello123") > Right "hello123" Wednesday, November 9, 2011
  • 31.
    ident = dox <- letter xs <- many (letter <|> digit) return (x:xs) p = ident main = print (parse p "<test>" "123hello") > Left "<test>" (line 1, column 1): > unexpected "1" > expecting letter Wednesday, November 9, 2011
  • 32.
    ident = dox <- letter xs <- many (letter <|> digit) return (x:xs) p = ident <?> "variable name" main = print (parse p "<test>" "123hello") > Left "<test>" (line 1, column 1): > unexpected "1" > expecting variable name Wednesday, November 9, 2011
  • 33.
    ident = dox <- letter xs <- many (letter <|> digit) return (x:xs) <?> "variable name" p = ident main = print (parse p "<test>" "123hello") Wednesday, November 9, 2011
  • 34.
    ident = dox <- letter xs <- many (letter <|> digit) return (x:xs) <?> "variable name" letKeyword = string "let" p = ident <|> letKeyword main = print (parse p "<test>" "letter") > Right "letter" Wednesday, November 9, 2011
  • 35.
    ident = dox <- letter xs <- many (letter <|> digit) return (x:xs) <?> "variable name" letKeyword = do s <- string "let" notFollowedBy (letter <|> digit) spaces return s p = try(letKeyword) <|> ident main = print (parse p "<test>" "letter") ----- Right "letter" Wednesday, November 9, 2011
  • 36.
  • 37.
    type Name =String data Expr = Number Integer | Var Name | Let Name Expr | Seq [Expr] deriving Show Wednesday, November 9, 2011
  • 38.
    myDef = emptyDef{ identStart = letter <|> char '_', identLetter = alphaNum <|> char '_', opStart = opLetter myDef, opLetter = oneOf "=,;", reservedOpNames = [ ",", ";", "=" ], reservedNames = ["let"], caseSensitive = True } Wednesday, November 9, 2011
  • 39.
    tokenParser = PT.makeTokenParsermyDef parens = PT.parens tokenParser naturalNumber = PT.natural tokenParser keyword = PT.reserved tokenParser identifier = PT.identifier tokenParser op = PT.reservedOp tokenParser commaSep = PT.commaSep tokenParser commaSep1 = PT.commaSep1 tokenParser Wednesday, November 9, 2011
  • 40.
    simple = numberLiteral<|> var numberLiteral = do n <- naturalNumber return $ Number n var = do s <- identifier return $ Var s Wednesday, November 9, 2011
  • 41.
    letStmt = dokeyword "let" defs <- commaSep1 def op ";" return $ Seq defs def = do name <- identifier op "=" value <- simple return $ Let name value Wednesday, November 9, 2011
  • 42.
    main = print( parse letStmt "<test>" "let one = 1, two = 2;" ) ---- Right ( Seq [ Let "one" (Number 1), Let "two" (Number 2) ] ) Wednesday, November 9, 2011
  • 43.
  • 44.
    import Text.Parsec csvFile = line `endBy1` eol line = cell `sepBy` (char ',') cell = wsopt >> many (noneOf ",n") wsopt = many (oneOf " t") eol = char 'n' main = print (parse csvFile "<test>" input) where input = "aa, b,cnd,e,fn" > Right [["aa","b","c"],["d","e","f"]] Wednesday, November 9, 2011
  • 45.
    import scala.util.parsing.combinator._ object CSVParser extends RegexParsers { override def skipWhitespace = false def whitespace = """[ t]*"""r def csv = rep1sep(row, "n") <~ "n" def row = rep1sep(field, ",") def field = opt(whitespace) ~> """[^n,]""".r def parse(s: String) = parseAll(csv, s) } println(CSVParser.parse("a,b,cnd,e, fn")) > [3.1] parsed: List(List(a, b, c), List(d, e, f)) Wednesday, November 9, 2011
  • 46.
    Pysec for Python Spirit for C++ In standard library of Scala A lot of others... Wednesday, November 9, 2011
  • 47.
    Stuff to read • Language Implementation Patterns: Create Your Own Domain-Specific and General Programming Languages by Terence Parr Wednesday, November 9, 2011
  • 48.
    Stuff to read • Domain Specific Languages by Martin Fowler Wednesday, November 9, 2011
  • 49.
    Stuff to read • http://learnyouahaskell.com/ Learn You a Haskell for a Great Good Wednesday, November 9, 2011
  • 50.