Lecture 7 Syntax Analysis III Top Down Parsing
Top Down Parsing • A top-down parser starts with the root of the parse tree, labeled with the start or goal symbol of the grammar. • To build a parse, it repeats the following steps until the fringe of the parse tree matches the input string 1. At a node labeled A, select a production A α and construct the appropriate child for each symbol of α 2. When a terminal is added to the fringe that does’nt match the input string, backtrack (Some grammars are backtrack free (predictive)) 3. Find the next node to be expanded • The key is selecting the right production in step 1 – should be guided by input string 28-Jan-15 2CS 346 Lecture 5
Recursive Descent Parsing • Parse tree is constructed – From the top level non-terminal – Try productions in order from left to right • Terminals are seen in order of appearance in the token stream. • When productions fail, backtrack to try other alternatives • Example: – Consider the parse of the string: (int5) – The grammar is : E T | T + E T int | int * T | (E) 28-Jan-15 3CS 346 Lecture 5
Recursive Descent Parsing Algorithm • TOKEN – type of tokens – In our case, let the tokens be: INT, OPEN, CLOSE, PLUS, TIMES – *next – points to the next input token • Define boolean functions that check for a match of: – A given token terminal– A given token terminal bool term(TOKEN tok) { return *next++ == tok; } • The nth production of a particular non-terminal S: bool Sn() { … } • Try all productions of S: bool S() { … } 28-Jan-15 4CS 346 Lecture 5
Recursive Descent Parsing Algorithm • For production E T bool E1() { return T(); } • For production E T + E bool E2() { return T() && term(PLUS) && E(); } Functions for non-terminal ‘E’ [E T | T + E] • For all productions of E (with backtracking) bool E() { TOKEN *save = next; return (next = save, E1()) || (next = save, E2()); } 28-Jan-15 5CS 346 Lecture 5
• Functions for non-terminal T : [T int | int * T | (E)] – bool T1() { return term(INT); } – bool T2() { return term(INT) && term(TIMES) && T(); } – bool T3() { return term(OPEN) && E() && term(CLOSE); } Recursive Descent Parsing Algorithm • bool T() { TOKEN *save = next; return (next = save, T1()) || (next = save, T2()) || (next = save, T3()); } 28-Jan-15 6CS 346 Lecture 5
• To start the parser – Initialize next to point to first token – Invoke E() bool term(TOKEN tok) { return *next++ == tok; } bool E1() { return T(); } bool E2() { return T() && term(PLUS) && E(); } bool E() { TOKEN *save = next; return (next = save, E1()) || (next = save, E2()); } Recursive Descent Parsing Algorithm • Try parsing by hand: – (int) } bool T1() { return term(INT); } bool T2() { return term(INT) && term(TIMES) && T( ); } bool T3() { return term(OPEN) && E() && term(CLOSE); } bool T() { TOKEN *save = next; return (next = save, T1()) || (next = save, T2()) || (next = save, T3()); } 28-Jan-15 7CS 346 Lecture 5
Limitations of RD Parser Grammar E T | T + E T int | int * T | (E) Input String: int * int bool term(TOKEN tok) { return *next++ == tok; } bool E1() { return T(); } bool E2() { return T() && term(PLUS) && E(); } bool E() { TOKEN *save = next; return (next = save, E1()) || (next = save, E2()); } int * int } bool T1() { return term(INT); } bool T2() { return term(INT) && term(TIMES) && T( ); } bool T3() { return term(OPEN) && E() && term(CLOSE); } bool T() { TOKEN *save = next; return (next = save, T1()) || (next = save, T2()) || (next = save, T3()); } 28-Jan-15 8CS 346 Lecture 5
Limitations • If a production for non- terminal X succeedes – Can’t backtrack to try a different production for X later • General recursive-descent algorithms supports such “full” backtracking – Can implement any grammar 28-Jan-15 9CS 346 Lecture 5
Countermeasures • Discussed RD algorithm is not general – But easy to implement by hand • Sufficient for the grammars where for any non- terminal at most one production can succeed.terminal at most one production can succeed. • The example grammar can be rewritten to work with the presented algorithm – Left factoring 28-Jan-15 10CS 346 Lecture 5
Left Recursive Grammar • Grammar: S Sa – bool S1() {return S() && terminal (a);} – bool S() { return S1();} • S( ) goes into an infinite loop • A left recursive grammar has a non-terminal S such that S S α for some αS S α for some α S Sa Saa Saaa … … Sa… …a • Recursive Descent does not work in such cases. • Consider the grammar: A A α| β • A generates all string starting with a β and followed by any number of α’s. 28-Jan-15 11CS 346 Lecture 5
Eliminating Left-Recursion • Direct Left-Recursion: A Aα | β A Aα1 | ... | Aαn | β1|...|βn A β A' A β1 A' | ... | βn A' A' α A' | ɛ A' α1A' | ... | αn A' | ɛ A generates all the strings with a β and followed by any number of α’s All strings derived from A start with one of β1… βn and continue with several instances of α1 … αn 28-Jan-15 12CS 346 Lecture 5
• Indirect Left-Recursion S A α | δ A Sβ • Algorithm: 1. Arrange the non-terminals in some order A1 ,...,An 2. for (i in 1..n) { 3. for (j in 1..i-1) { Eliminating Left-Recursion The grammar is also left recursive because S + S β α 3. for (j in 1..i-1) { 4. replace each production of the form Ai Ajγ by the productions Ai δ1γ | δ2γ |... | δkγ where Aj δ1 | δ2 |... | δk 5. } 6. eliminate the immediate left recursion among Ai productions 7. } The above algorithm guaranteed to work if the grammar has no cycle [derivation of the form A+ A or ɛ production A ɛ]. Cycles can be eliminated systematically from a grammar as can ɛ productions. 28-Jan-15 13CS 346 Lecture 5
• S Aa | b • A Ac | Sd | ɛ • Out loop (2 to 7) eliminates any left recursion among A1 productions. Any remaining A1 productions of the form A1 Alα must therefore have l > 1. The grammar is also left recursive because S + Sda Eliminating Left-Recursion • After i-1st iteration of the outer for loop, all non terminal Ak, k < i, is cleaned i.e. any production Ak Alα, must have l>k • At the ith iteration, inner loop 3to5, progressively raises the lower limit in any productions Ai Amα, until we have m>i. • Line 6, eliminating left recursion for Ai forces m to be greater than i 28-Jan-15 14CS 346 Lecture 5
Lecture7 syntax analysis_3

Lecture7 syntax analysis_3

  • 1.
    Lecture 7 Syntax AnalysisIII Top Down Parsing
  • 2.
    Top Down Parsing •A top-down parser starts with the root of the parse tree, labeled with the start or goal symbol of the grammar. • To build a parse, it repeats the following steps until the fringe of the parse tree matches the input string 1. At a node labeled A, select a production A α and construct the appropriate child for each symbol of α 2. When a terminal is added to the fringe that does’nt match the input string, backtrack (Some grammars are backtrack free (predictive)) 3. Find the next node to be expanded • The key is selecting the right production in step 1 – should be guided by input string 28-Jan-15 2CS 346 Lecture 5
  • 3.
    Recursive Descent Parsing •Parse tree is constructed – From the top level non-terminal – Try productions in order from left to right • Terminals are seen in order of appearance in the token stream. • When productions fail, backtrack to try other alternatives • Example: – Consider the parse of the string: (int5) – The grammar is : E T | T + E T int | int * T | (E) 28-Jan-15 3CS 346 Lecture 5
  • 4.
    Recursive Descent ParsingAlgorithm • TOKEN – type of tokens – In our case, let the tokens be: INT, OPEN, CLOSE, PLUS, TIMES – *next – points to the next input token • Define boolean functions that check for a match of: – A given token terminal– A given token terminal bool term(TOKEN tok) { return *next++ == tok; } • The nth production of a particular non-terminal S: bool Sn() { … } • Try all productions of S: bool S() { … } 28-Jan-15 4CS 346 Lecture 5
  • 5.
    Recursive Descent ParsingAlgorithm • For production E T bool E1() { return T(); } • For production E T + E bool E2() { return T() && term(PLUS) && E(); } Functions for non-terminal ‘E’ [E T | T + E] • For all productions of E (with backtracking) bool E() { TOKEN *save = next; return (next = save, E1()) || (next = save, E2()); } 28-Jan-15 5CS 346 Lecture 5
  • 6.
    • Functions fornon-terminal T : [T int | int * T | (E)] – bool T1() { return term(INT); } – bool T2() { return term(INT) && term(TIMES) && T(); } – bool T3() { return term(OPEN) && E() && term(CLOSE); } Recursive Descent Parsing Algorithm • bool T() { TOKEN *save = next; return (next = save, T1()) || (next = save, T2()) || (next = save, T3()); } 28-Jan-15 6CS 346 Lecture 5
  • 7.
    • To startthe parser – Initialize next to point to first token – Invoke E() bool term(TOKEN tok) { return *next++ == tok; } bool E1() { return T(); } bool E2() { return T() && term(PLUS) && E(); } bool E() { TOKEN *save = next; return (next = save, E1()) || (next = save, E2()); } Recursive Descent Parsing Algorithm • Try parsing by hand: – (int) } bool T1() { return term(INT); } bool T2() { return term(INT) && term(TIMES) && T( ); } bool T3() { return term(OPEN) && E() && term(CLOSE); } bool T() { TOKEN *save = next; return (next = save, T1()) || (next = save, T2()) || (next = save, T3()); } 28-Jan-15 7CS 346 Lecture 5
  • 8.
    Limitations of RDParser Grammar E T | T + E T int | int * T | (E) Input String: int * int bool term(TOKEN tok) { return *next++ == tok; } bool E1() { return T(); } bool E2() { return T() && term(PLUS) && E(); } bool E() { TOKEN *save = next; return (next = save, E1()) || (next = save, E2()); } int * int } bool T1() { return term(INT); } bool T2() { return term(INT) && term(TIMES) && T( ); } bool T3() { return term(OPEN) && E() && term(CLOSE); } bool T() { TOKEN *save = next; return (next = save, T1()) || (next = save, T2()) || (next = save, T3()); } 28-Jan-15 8CS 346 Lecture 5
  • 9.
    Limitations • If aproduction for non- terminal X succeedes – Can’t backtrack to try a different production for X later • General recursive-descent algorithms supports such “full” backtracking – Can implement any grammar 28-Jan-15 9CS 346 Lecture 5
  • 10.
    Countermeasures • Discussed RDalgorithm is not general – But easy to implement by hand • Sufficient for the grammars where for any non- terminal at most one production can succeed.terminal at most one production can succeed. • The example grammar can be rewritten to work with the presented algorithm – Left factoring 28-Jan-15 10CS 346 Lecture 5
  • 11.
    Left Recursive Grammar •Grammar: S Sa – bool S1() {return S() && terminal (a);} – bool S() { return S1();} • S( ) goes into an infinite loop • A left recursive grammar has a non-terminal S such that S S α for some αS S α for some α S Sa Saa Saaa … … Sa… …a • Recursive Descent does not work in such cases. • Consider the grammar: A A α| β • A generates all string starting with a β and followed by any number of α’s. 28-Jan-15 11CS 346 Lecture 5
  • 12.
    Eliminating Left-Recursion • DirectLeft-Recursion: A Aα | β A Aα1 | ... | Aαn | β1|...|βn A β A' A β1 A' | ... | βn A' A' α A' | ɛ A' α1A' | ... | αn A' | ɛ A generates all the strings with a β and followed by any number of α’s All strings derived from A start with one of β1… βn and continue with several instances of α1 … αn 28-Jan-15 12CS 346 Lecture 5
  • 13.
    • Indirect Left-Recursion SA α | δ A Sβ • Algorithm: 1. Arrange the non-terminals in some order A1 ,...,An 2. for (i in 1..n) { 3. for (j in 1..i-1) { Eliminating Left-Recursion The grammar is also left recursive because S + S β α 3. for (j in 1..i-1) { 4. replace each production of the form Ai Ajγ by the productions Ai δ1γ | δ2γ |... | δkγ where Aj δ1 | δ2 |... | δk 5. } 6. eliminate the immediate left recursion among Ai productions 7. } The above algorithm guaranteed to work if the grammar has no cycle [derivation of the form A+ A or ɛ production A ɛ]. Cycles can be eliminated systematically from a grammar as can ɛ productions. 28-Jan-15 13CS 346 Lecture 5
  • 14.
    • S Aa| b • A Ac | Sd | ɛ • Out loop (2 to 7) eliminates any left recursion among A1 productions. Any remaining A1 productions of the form A1 Alα must therefore have l > 1. The grammar is also left recursive because S + Sda Eliminating Left-Recursion • After i-1st iteration of the outer for loop, all non terminal Ak, k < i, is cleaned i.e. any production Ak Alα, must have l>k • At the ith iteration, inner loop 3to5, progressively raises the lower limit in any productions Ai Amα, until we have m>i. • Line 6, eliminating left recursion for Ai forces m to be greater than i 28-Jan-15 14CS 346 Lecture 5