Using ANTLR on real example convert “string combined” queries into parameterized queries
Simon Wiki says:  ANTLR (pronounced Antler), or ANother Tool for Language Recognition, is a parser generator that uses LL(*) parsing.  ANTLR takes as input a grammar that specifies a language and generates as output source code for a recognizer for that language. A language is specified using a context-free grammar which is expressed using Extended Backus–Naur Form (EBNF).  ANTLR allows generating lexers, parsers, tree parsers, and combined lexer-parsers. Parsers can automatically generate abstract syntax trees which can be further processed with tree parsers. ANTLR provides a single consistent notation for specifying lexers, parsers, and tree parsers. This is in contrast with other parser/lexer generators and adds greatly to the tool's ease of use.
Used at least in following products:  Drools, JBoss rule engine (DRL DSL)  Hibernate, Java ORM (HQL DSL)  NHibernate, .NET ORM (HQL DSL)  Groovy, language for JVM  Jython, language for JVM
Where we need ANTLR?  Parsing a text stream of formal data  Parsing a text stream of incomplete formal data  Complex parsing  Parsing with good error handling  Writing Domain-Specific Language  You have enough time and some data to parse...
Why just not use regular expression language?  In most cases you should go with RegEx  SO: “RegEx is a text search tool. If all you need to do is pull strings out of strings then it's often the hammer of choice.”  SO: “ANTLR is a parser generator. If you need error messages and parse actions or any of the complicated things that come with a interpreter/compiler then it's a good option.”  SO: “ANTLR has perfect support for "error-messages": they show line/column numbers and what was wrong. RegEx doesn't have this support.”  ANTLR is a something (a-lot-of-things) on top of regular expression language.
ANTLR parsing workflow
Tools under ANTLR umbrella ANTLR3 Code Generation Targets: • Java, JavaScript (in sync with development) • C, C++, C#, Objective C, Ruby (almost in sync) • Python, ActionScript (current with 3.1 instead of 3.4)
Tools under ANTLR umbrella ANTLR Grammars: Java, C, C++, ECMAScript, ANTLR, C#, PHP, Verilog, x86 Assembler, ISO SQL 2003, PL/SQL, Clojure, XPath, Pascal, GraphViz Dot, Fortran, Python, CSS, Objective C, Lua, Ruby, Eiffel, ECMA CIL (.NET), Classic ASP, CORBA IDL
Tools under ANTLR umbrella Editors, IDEs, etc: • ANTLRWorks, GUI IDE. http://antlr.org/works/ • Eclipse, NetBeans, JetBrains IDEA, Visual Studio integration. • VIM syntax highlighter. https://github.com/rollxx/vim- antlr • ANTLR-Mode for Emacs. http://antlr- mode.sourceforge.net/
ANTLRWorks. Editor window
ANTLRWorks. Interpreter window
Ambigious path visualization
ANTLRWorks. Interactive debugger
Eclipse. ANTLR integration
JetBrains IDEA. ANTRL integration
Sample syntax. CSV grammar
Real example. Test cases • Query without any parameters • Query with concat and variable • Query with dotted and escaped table names and single quote in sql • Query with function call and func args concat • Query with function call with several func args • Query with nested function call with several func args • Query with concat and two variables • Insert query with four params • Query with dotted param and function name and funciton arg • Endline symbol will be dropped from query • Single line comment will be dropped from query • Strip single quote only if it next to parameter • Query with like keyword (FAILED) • Refactor multiline query (FAILED)
Real example. Syntax tree strsql = "SELECT * FROM TABLE_NAME WHERE FIRST_FIELD = " & DOTTED.PARAM_VAR & " AND SECOND_FIELD = " & DOTTED.FUNC_CALL(DOTTED.FUNC_ARG)
Grammar:1. Options, tokens
Grammar:2. Lexer/parser members
Grammar:3. Top-level elements
Grammar:4. End
Questions are Welcome! 31337

Using ANTLR on real example - convert "string combined" queries into parameterized queries

  • 1.
    Using ANTLR onreal example convert “string combined” queries into parameterized queries
  • 2.
    Simon Wiki says:  ANTLR (pronounced Antler), or ANother Tool for Language Recognition, is a parser generator that uses LL(*) parsing.  ANTLR takes as input a grammar that specifies a language and generates as output source code for a recognizer for that language. A language is specified using a context-free grammar which is expressed using Extended Backus–Naur Form (EBNF).  ANTLR allows generating lexers, parsers, tree parsers, and combined lexer-parsers. Parsers can automatically generate abstract syntax trees which can be further processed with tree parsers. ANTLR provides a single consistent notation for specifying lexers, parsers, and tree parsers. This is in contrast with other parser/lexer generators and adds greatly to the tool's ease of use.
  • 3.
    Used at leastin following products:  Drools, JBoss rule engine (DRL DSL)  Hibernate, Java ORM (HQL DSL)  NHibernate, .NET ORM (HQL DSL)  Groovy, language for JVM  Jython, language for JVM
  • 4.
    Where we needANTLR?  Parsing a text stream of formal data  Parsing a text stream of incomplete formal data  Complex parsing  Parsing with good error handling  Writing Domain-Specific Language  You have enough time and some data to parse...
  • 5.
    Why just notuse regular expression language?  In most cases you should go with RegEx  SO: “RegEx is a text search tool. If all you need to do is pull strings out of strings then it's often the hammer of choice.”  SO: “ANTLR is a parser generator. If you need error messages and parse actions or any of the complicated things that come with a interpreter/compiler then it's a good option.”  SO: “ANTLR has perfect support for "error-messages": they show line/column numbers and what was wrong. RegEx doesn't have this support.”  ANTLR is a something (a-lot-of-things) on top of regular expression language.
  • 6.
  • 7.
    Tools under ANTLRumbrella ANTLR3 Code Generation Targets: • Java, JavaScript (in sync with development) • C, C++, C#, Objective C, Ruby (almost in sync) • Python, ActionScript (current with 3.1 instead of 3.4)
  • 8.
    Tools under ANTLRumbrella ANTLR Grammars: Java, C, C++, ECMAScript, ANTLR, C#, PHP, Verilog, x86 Assembler, ISO SQL 2003, PL/SQL, Clojure, XPath, Pascal, GraphViz Dot, Fortran, Python, CSS, Objective C, Lua, Ruby, Eiffel, ECMA CIL (.NET), Classic ASP, CORBA IDL
  • 9.
    Tools under ANTLRumbrella Editors, IDEs, etc: • ANTLRWorks, GUI IDE. http://antlr.org/works/ • Eclipse, NetBeans, JetBrains IDEA, Visual Studio integration. • VIM syntax highlighter. https://github.com/rollxx/vim- antlr • ANTLR-Mode for Emacs. http://antlr- mode.sourceforge.net/
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
    Real example. Testcases • Query without any parameters • Query with concat and variable • Query with dotted and escaped table names and single quote in sql • Query with function call and func args concat • Query with function call with several func args • Query with nested function call with several func args • Query with concat and two variables • Insert query with four params • Query with dotted param and function name and funciton arg • Endline symbol will be dropped from query • Single line comment will be dropped from query • Strip single quote only if it next to parameter • Query with like keyword (FAILED) • Refactor multiline query (FAILED)
  • 18.
    Real example. Syntaxtree strsql = "SELECT * FROM TABLE_NAME WHERE FIRST_FIELD = " & DOTTED.PARAM_VAR & " AND SECOND_FIELD = " & DOTTED.FUNC_CALL(DOTTED.FUNC_ARG)
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.