Chapter 2
Chang Chi-Chung
2008.03 rev.1
A Simple Syntax-Directed Translator
This chapter contains introductory material to
Chapters 3 to 8
To create a syntax-directed translator that maps
infix arithmetic expressions into postfix
expressions.
Building a simple compiler involves:
Defining the syntax of a programming language
Develop a source code parser: for our compiler
we will use predictive parsing
Implementing syntax directed translation to
generate intermediate code
A Code Fragment To Be Translated
To extend syntax-directed translator to map code fragments into three-
address code. See appendix A.
1: i = i + 1
2: t1 = a [ i ]
{ 3: if t1 < v goto 1
int i; int j; 4: j = j -1
float[100] a; float v; float x; 5: t2 = a [ j ]
while (true) { 6: if t2 > v goto 4
do i = i + 1; while ( a[i] < v ); 7: ifFalse i >= j goto 9
do j = j – 1; while ( a[j] > v ); 8: goto 14
if ( i>= j ) break; 9: x = a [ i ]
x = a[i]; a[i] = a[j]; a[j] = x; 10: t3 = a [ j ]
} 11: a [ i ] = t3
} 12: a [ j ] = x
13: goto 1
14:
A Model of a Compiler Front End
Source Lexical Token Syntax Intermediate Three-address
program analyzer stream Parser tree Code code
Generator
Character
Stream
Symbol
Table
Two Forms of Intermediate Code
Abstract syntax trees Tree-Address instructions
do-while
body > 1: i = i + 1
2: t1 = a [ i ]
3: if t1 < v goto 1
assign [] v
i + a i
i 1
Syntax Definition
Using Context-free grammar (CFG)
BNF: Backus-Naur Form
Context-free grammar has four components:
A set of tokens (terminal symbols)
A set of nonterminals
A set of productions
A designated start symbol
Example of CFG
G = <T, N, P, S>
T = { +,-,0,1,2,3,4,5,6,7,8,9 }
N = { list, digit }
P=
list list + digit
list list – digit
list digit
digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
S = list
Derivations
The set of all strings (sequences of tokens)
generated by the CFG using derivation
Begin with the start symbol
Repeatedly replace a nonterminal symbol in the
current sentential form with one of the right-hand
sides of a production for that nonterminal
Example of the Derivations
list Production
list + digit list list + digit
list - digit + digit list list – digit
digit - digit + digit
list digit
9 - digit + digit
9 - 5 + digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
9-5+2
Leftmost derivation
replaces the leftmost nonterminal (underlined) in each step.
Rightmost derivation
replaces the rightmost nonterminal in each step.
Parser Trees
Given a CFG, a parse tree according to the grammar is a tree
with following propertes.
The root of the tree is labeled by the start symbol
Each leaf of the tree is labeled by a terminal (=token) or
Each interior node is labeled by a nonterminal
If A X1 X2 … Xn is a production, then node A has immediate
children X1, X2, …, Xn where Xi is a (non)terminal or ( denotes
the empty string)
Example A
A XYZ
X Y Z
Example of the Parser Tree
Parse tree of the string 9-5+2 using grammar G
list
list digit
list digit
digit
The sequence of
9 - 5 + 2 leafs is called the
yield of the parse tree
Ambiguity
Consider the following context-free grammar
G = <{string}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, string>
P = string string + string | string - string | 0 | 1 | … | 9
This grammar is ambiguous, because more
than one parse tree represents the string 9-
5+2
Ambiguity (Cont’d)
string string
string string string
string string string string string
9 - 5 + 2 9 - 5 + 2
Associativity of Operators
Left-associative
If an operand with an operator on both sides of it, then it
belongs to the operator to its left.
string a+b+c has the same meaning as (a+b)+c
Left-associative operators have left-recursive productions
left left + term | term
Right-associative
If an operand with an operator on both sides of it, then it
belongs to the operator to its right.
string a=b=c has the same meaning as a=(b=c)
Right-associative operators have right-recursive productions
right term = right | term
Associativity of Operators (cont’d)
list right
list digit letter right
list digit letter right
digit letter
a + b + c a = b = c
left-associative right-associative
Precedence of Operators
String 9+5*2 has the same meaning as 9+(5*2)
* has higher precedence than +
Constructs a grammar for arithmetic
expressions with precedence of operators.
left-associative : + - (expr)
left-associative:* / (term)
Step 1: Step 3:
factor digit | ( expr ) expr expr + term
| expr – term
| term
Step 2: Step 4:
term term * factor expr expr + term | expr – term | term
| term / factor term term * factor | term / factor | factor
| factor factor digit | ( expr )
An Example: Syntax of Statements
The grammar is a subset of Java statements.
This approach prevents the build-up of semicolons
after statements such as if- and while-, which end
with nested substatements.
stmt id = expression ;
| if ( expression ) stmt
| if ( expression ) stmt else stmt
| while ( expression ) stmt
| do stmt while ( expression ) ;
| { stmts }
stmts stmts stmt
|
Syntax-Directed Translation
Syntax-Directed translation is done by attaching rules
or program fragments to productions in a grammar.
Translate infix expressions into postfix notation. ( in
this chapter )
Infix: 9 – 5 + 2
Postfix: 9 5 – 2 +
An Example
expr expr1 + term
The pseudo-code of the translation
translate expr1 ;
translate term ;
handle + ;
Syntax-Directed Translation (Cont’d)
Two concepts (approaches) related to
Syntax-Directed Translation.
Synthesized Attributes
Syntax-directed definition
Build up a translation by attaching strings (semantic
rules) as attributes to the nodes in the parse tree.
Translation Schemes
Syntax-directed translation
Build up a translation by program fragments which are
called semantic actions and embedded within production
bodies.
Syntax-directed definition
The syntax-directed definition associates
With each grammar symbol (terminals and nonterminals), a
set of attributes.
With each production, a set of semantic rules for computing
the values of the attributes associated with the symbols
appearing in the production.
An attribute is said to be
Synthesized
if its value at a parse-tree node is determined from attribute
values at its children and at the node itself.
Inherited
if its value at a parse-tree node is determined from attribute
values at the node itself, its parent, and its siblings in the parse
tree.
An Example: Synthesized Attributes
An annotated parse tree
Suppose a node N in a parse tree is labeled by
grammar symbol X.
The X.a is denoted the value of attribute a of X at
node N.
expr.t = “95-2+”
expr.t = “95-” term.t = “2”
expr.t = “9” term.t = “5”
term.t = “9”
9 - 5 + 2
Semantic Rules
Production Semantic Rules
expr expr1 + term expr.t = expr1.t || term.t || ‘+’
expr expr1 - term expr.t = expr1.t || term.t || ‘-’
expr term expr.t = term.t
term 0 term.t = ‘0’
term 1 term.t = ‘1’
… …
term 9 term.t = ‘9’
|| is the operator for string concatenation in semantic rule.
Depth-First Traversals
Tree traversals
Breadth-First
Depth-First
Preorder: N L R
Inorder: L N R
Postorder: L R N
Depth-First Traversals: Postorder、From left to right
procedure visit(node N)
{
for ( each child C of N, from left to right )
{
visit(C);
}
evaluate semantic rules at node N;
}
Example: Depth-First Traversals
expr.t = 95-2+
expr.t = 95- term.t = 2
expr.t = 9 term.t = 5
term.t = 9
9 - 5 + 2
Note: all attributes are the synthesized type
Translation Schemes
A translation scheme is a CFG embedded
with semantic actions
Example
rest + term { print(“+”) } rest
Embedded Semantic Action
rest
+ term { print(“+”) } rest
An Example: Translation Scheme
expr
expr + term { print(‘+’) }
expr - term { print(‘-’) } 2 { print(‘2’) }
term 5 { print(‘5’) }
expr expr + term { print(‘+’) }
expr expr – term { print(‘-’) }
9 { print(‘9’) } expr term
term 0 { print(‘0’) }
term 1 { print(‘1’) }
…
term 9 { print(‘9’) }
Parsing
The process of determining if a string of
terminals (tokens) can be generated by a
grammar.
Time complexity:
For any CFG there is a parser that takes at most O(n3)
time to parse a string of n terminals.
Linear algorithms suffice to parse essentially all
languages that arise in practice.
Two kinds of methods
Top-down: constructs a parse tree from root to leaves
Bottom-up: constructs a parse tree from leaves to root
Top-Down Parsing
Recursive descent parsing is a top-down method
of syntax analysis in which a set of recursive
procedures is used to process the input.
One procedure is associated with each nonterminal of a
grammar.
If a nonterminal has multiple productions, each production
is implemented in a branch of a selection statement based
on input lookahead information
Predictive parsing
A special form of recursive descent parsing
The lookahead symbol unambiguously determines the flow
of control through the procedure body for each nonterminal.
An Example: Top-Down Parsing
stmt expr ;
| if ( expr ) stmt
| for ( optexpr ; optexpr ; optexpr ) stmt
| other
optexpr
| expr
stmt
for ( optexpr ; optexpr ; optexpr ) stmt
ε expr expr other
void stmt() {
Pseudocode For a switch ( lookahead ) {
case expr:
match(expr); match(‘;’); break;
Predictive Parser case if:
match(if); match(‘(‘);
match(expr); match(‘)’);
stmt(); break;
case for:
match(for); match(‘(‘);
optexpr(); match(‘;’);
stmt expr ; optexpr(); match(‘;’);
| if ( expr ) stmt optexpr(); match(‘)’);
| for ( optexpr ; optexpr ; optexpr ) stmt stmt(); break;
case other:
| other
match(other); break;
default:
report(“syntax error”);
}
Use ε-Productions }
void optexpr() {
optexpr | expr if ( lookahead == expr ) match(expr);
}
void match(terminal t) {
if ( lookahead == t )
lookahead = nextTerminal;
else
report(“syntax error”);
}
Example: Predictive Parsing
Parse LL(1)
Tree stmt
for ( optexpr ; optexpr ; optexpr ) stmt
match(for) match(‘(‘) optexpr()match(‘;‘)
optexpr()match(‘;‘) optexpr()
match(‘)‘) stmt()
Input
for ( ; expr ; expr ) other
lookahead
FIRST
FIRST() is the set of terminals that appear
as the first symbols of one or more strings
generated from
is Sentential Form
Example
FIRST(stmt) = { expr, if, for, other }
FIRST(expr ;) = { expr }
stmt expr ;
| if ( expr ) stmt
| for ( optexpr ; optexpr ; optexpr ) stmt
| other
Examples: First
type simple
| ^ id
| array [ simple ] of type
simple integer
| char
| num dotdot num
FIRST(simple) = { integer, char, num }
FIRST(^ id) = { ^ }
FIRST(type) = { integer, char, num, ^, array }
Designing a Predictive Parser
A predictive parser is a program consisting of a
procedure for every nonterminal.
The procedure for nonterminal A
It decides which A-production to use by examining
the lookahead symbol.
Left Factor
Left Recursion
ε Production
Mimics the body of the chosen production.
Applying translation scheme
Construct a predictive parser, ignoring the actions.
Copy the actions from the translation scheme into
the parser
Left Factor
Left Factor
One production for nonterminal A starts with the
same symbols.
Example:
stmt if ( expr ) stmt
| if ( expr ) stmt else stmt
Use Left Factoring to fix it
stmt if ( expr ) stmt rest
rest else stmt | ε
Left Recursion
Left Recursive
A production for nonterminal A starts with a self
reference.
A Aα | β
An Example:
expr expr + term | term
Rewrite the left recursive to right recursive by
using the following rules.
A βR
R αR | ε
Example: Left and Right Recursive
A A
… R
A R
A …
A R
R
β α α …. α β α α …. α ε
left recursive right recursive
Abstract and Concrete Syntax
+
- 2
expr
9 5
expr term
expr term
helper
term
9 - 5 + 2
Conclusion: Parsing and Translation Scheme
Give a CFG grammar G as below:
expr expr + term { print(‘+’) }
expr expr – term { print(‘-’) }
expr term
term 0 { print(‘0’) }
term 1 { print(‘1’) }
…
term 9 { print(‘9’) }
Semantic actions for translating into postfix notation.
Conclusion: Parsing and Translation Scheme
Step 1
To elimination left-recursion
Technique
A Aα | Aβ | γ
into
A γR
R αR | βR | ε
Use the rule to transforms G.
Conclusion: Parsing and Translation Scheme
Left-Recursion-elimination
expr term rest
rest + term { print(‘+’) } rest
| – term { print(‘-’) } rest
| ε
term 0 { print(‘0’) }
term 1 { print(‘1’) }
…
term 9 { print(‘9’) }
An Example: Left-Recursion-elimination
expr
term rest
9 { print(‘9’) } - term { print(‘-’) } rest
5 { print(‘5’) } + term { print(‘+’) } rest
2 { print(‘2’) } ε
expr term rest
rest + term { print(‘+’) } rest
| – term { print(‘-’) } rest
| ε
term 0 { print(‘0’) } | 1 { print(‘1’) } | … | 9 { print(‘9’) }
Conclusion: Parsing and Translation Scheme
void expr() {
Step 2 term(); rest();
}
Procedures for
void rest() {
Nonterminals. if ( lookahead == ‘+’ ) {
match(‘+’); term();
print(‘+’); rest();
}
else if ( lookahead == ‘-’ ) {
match(‘-’); term();
print(‘-’); rest();
}
else { } //do nothing with the input
}
void term() {
if ( lookahead is a digit ) {
t = lookahead; match(lookahead);
print(t);
}
else
report(“syntax error”);
}
Conclusion: Parsing and Translation Scheme
Step 3
Simplifying the Translator
void rest() {
void rest() { while ( true ) {
if ( lookahead == ‘+’ ) { if ( lookahead == ‘+’ ) {
match(‘+’); term(); match(‘+’); term();
print(‘+’); rest(); print(‘+’); continue;
} }
else if (lookahead == ‘-’) { else if (lookahead == ‘-’) {
match(‘-’); term(); match(‘-’); term();
print(‘-’); rest(); print(‘-’); continue;
} }
else { } break;
}
}
Conclusion: Parsing and Translation Scheme
Complete
import java.io.*;
class Parser {
static int lookahead;
public Parser() throws IOException {
lookahead = System.in.read();
}
void expr() { void term() throws IOException {
term(); if (Character.isDigit((char)lookahead){
while ( true ) { System.out.write((char)lookahead);
if ( lookahead == ‘+’ ) { match(lookahead);
match(‘+’); term(); }
System.out.write(‘+’); else throw new Error(“syntax error”);
continue; }
}
else if (lookahead == ‘-’) { void match(int t) throws IOException {
match(‘-’); term(); if ( lookahead == t )
System.out.write(‘-’); lookahead = System.in.read();
continue; else throw new Error(“syntax error”);
} }
else return; }
}