Parsing techniques, notations, methods of parsing in compiler design

Presentation for the purpose of students reference . Its prepared on the basis of Bharathidasan University MCA Program in the academic Year 2023-24 onwards. By G.Nithya Assistant Professor Sri Sarada Niketan college of science for women , Karur

REDUCTION (LEFTMOST) RIGHTMOST DERIVATION Handles: A handle of a string is a substring that matches the right side of a production, and whose reduction to the non-terminal on the left side of the production represents one step along the reverse of a rightmost derivation. Example: Consider the grammar: E → E+E E → E*E E → (E) E → id And the input string id1+id2*id3 The rightmost derivation is : E → E+E → E+E*E → E+E*id3 → E+id2*id3 → id1+id2*id3 In the above derivation the underlined substrings are called handles. Handle pruning: A rightmost derivation in reverse can be obtained by “handle pruning”. th abbcde (A → b) S → aABe aAbcde (A → Abc) → aAde aAde (B → d) → aAbcde aABe (S → aABe) → abbcde S BOTTOM-UP PARSING Constructing a parse tree for an input string beginning at the leaves and going towards the root is called bottom-up parsing. A general type of bottom-up parser is a shift-reduce parser. SHIFT-REDUCE PARSING Shift-reduce parsing is a type of bottom-up parsing that attempts to construct a parse tree for an input string beginning at the leaves (the bottom) and working up towards the root (the top).

Stack implementation of shift-reduce parsing : Actions in shift-reduce parser:  shift – The next input symbol is shifted onto the top of the stack.  reduce – The parser replaces the handle within a stack with a non- terminal.  accept – The parser announces successful completion of parsing.  erro r – The parser discovers that a syntax error has occurred and calls an error recovery routine. Conflicts in shift-reduce parsing: There are two conflicts that occur in shift shift-reduce parsing: 1.Shift-reduce conflict: The parser cannot decide whether to shift or to reduce. 2.Reduce-reduce conflict: The parser cannot decide which of several reductions to make. 1. Shift-reduce conflict: Example: Consider the grammar: E→E+E | E*E | id and input id+id*id Stack Input Action $ id1+id2*id3 $ shift $ id1 +id2*id3 $ reduce by E→id $ E +id2*id3 $ shift $ E+ id2*id3 $ shift $ E+id2 *id3 $ reduce by E→id $ E+E *id3 $ shift $ E+E* id3 $ shift $ E+E*id3 $ reduce by E→id $ E+E*E $ reduce by E→ E *E $ E+E $ reduce by E→ E+E $ E $ accept

2. Reduce-reduce conflict: Consider the grammar: M → R+R | R+c | R R → c and input c+c Stack Input Action Stack Input Action $ E+E *id $ Reduce by E→E+E $E+E *id $ Shift $ E *id $ Shift $E+E* id $ Shift $ E* id $ Shift $E+E*id $ Reduce by E→id $ E*id $ Reduce by E→id $E+E*E $ Reduce by E→E*E $ E*E $ Reduce by E→E*E $E+E $ Reduce by E→E*E $ E $E Stack Input Action Stack Input Action $ c+c $ Shift $ c+c $ Shift $ c +c $ Reduce by R→c $ c +c $ Reduce by R→c $ R +c $ Shift $ R +c $ Shift $ R+ c $ Shift $ R+ c $ Shift $ R+c $ Reduce by R→c $ R+c $ Reduce by M→R+c $ R+R $ Reduce by M→R+R $ M $ $ M $

Viable prefixes: α is a viable prefix of the grammar if there is w such that αw is a right sentinel form. The set of prefixes of right sentinel forms that can appear on the stack of a shift- reduce parser are called viable prefixes. The set of viable prefixes is a regular language. OPERATOR-PRECEDENCE PARSING An efficient way of constructing shift-reduce parser is called operator-precedence parsing. Operator precedence parser can be constructed from a grammar called Operator- grammar. These grammars have the property that no production on right side is or has two adjacent non ɛ - terminals. Example: Consider the grammar: E → EAE | (E) | -E | id A → + | - | * | / | ↑ Since the right side EAE has three consecutive non-terminals, the grammar can be written as follows: E → E+E | E-E | E*E | E/E | E↑E | -E | id Operator precedence relations: There are three disjoint precedence relations namely < . - less than = - equal to . > - greater than The relations give the following meaning: a < . b – a yields precedence to b a = b – a has the same precedence as b a . > b – a takes precedence over b Rules for binary operations: 1.If operator θ1 has higher precedence than operator θ2, then make θ1 . > θ2 and θ2 < . θ1 2.If operators θ1 and θ2, are of equal precedence, then make θ1 . > θ2 and θ2 . > θ1 if operators are left associative θ1 < . θ2 and θ2 < . θ1 if right associative 3.Make the following for all operators θ: θ < . id , id . > θ θ < . ( , ( < . θ ) . > θ , θ . > ) θ . > $ , $ < . θ

Also make ( = ) , ( < . ( , ) . > ) , ( < . id , id . > ) , $ < . id , id . > $ , $ < . ( , ) . > $ Example: Operator-precedence relations for the grammar E → E+E | E-E | E*E | E/E | E↑E | (E) | -E | id is given in the following table assuming 1. ↑ is of highest precedence and right-associative 2. * and / are of next higher precedence and left-associative, and 3. + and - are of lowest precedence and left-associative Note that the blanks in the table denote error entries. TABLE : Operator-precedence relations Operator precedence parsing algorithm: Input : An input string w and a table of precedence relations. Output : If w is well formed, a skeletal parse tree ,with a placeholder non-terminal E labeling all interior nodes; otherwise, an error indication. Method : Initially the stack contains $ and the input buffer the string w $. To parse, we execute the following program : (1)Set ip to point to the first symbol of w$; (2)repeat forever (3) if $ is on top of the stack and ip points to $ then (4) return else begin let a be the topmost terminal symbol on the stack and let b be the symbol pointed to by ip; if a <. b or a = b then begin push b onto the stack; advance ip to the next input symbol; end; (5 ) (6 ) (7 ) (8 ) + - * / ↑ id ( ) $ + . > . > <. <. <. <. <. . > . > - . > . > <. <. <. <. <. . > . > * . > . > . > . > <. <. <. . > . > / . > . > . > . > <. <. <. . > . > ↑ . > . > . > . > <. <. <. . > . > id . > . > . > . > . > ∙ . > . > ( <. <. <. <. <. <. <. = ) . > . > . > . > . > . > . > $ <. <. <. <. <. <. <.

(9) (10 ) (11 ) (12 ) else if a . > b then repeat pop the stack until the top stack terminal is related by <. to the terminal most recently popped else error( ) end / *reduce* / (13 ) Stack implementation of operator precedence parsing: Operator precedence parsing uses a stack and precedence relation table for its implementation of above algorithm. It is a shift-reduce parsing containing all four actions shift, reduce, accept and error. The initial configuration of an operator precedence parsing is STAC K $ INPU T w $ where w is the input string to be parsed. Example: Consider the grammar E → E+E | E-E | E*E | E/E | E↑E | (E) | id. Input string is id+id*id .The implementation is as follows: Advantages of operator precedence parsing: 1.It is easy to implement. 2.Once an operator precedence relation is made between all pairs of terminals of a grammar , the grammar can be ignored. The grammar is not referred anymore during implementation. Disadvantages of operator precedence parsing: 3.It is hard to handle tokens like the minus sign (-) which has two different precedence. 4.Only a small class of grammar can be parsed using operator-precedence parser. STACK INPUT COMMENT $ <∙ id+id*id $ shift id $ id ∙> +id*id $ pop the top of the stack id $ <∙ +id*id $ shift + $ + <∙ id*id $ shift id $ +id ∙> *id $ pop id $ + <∙ *id $ shift * $ + * <∙ id $ shift id $ + * id ∙> $ pop id $ + * ∙> $ pop * $ + ∙> $ pop + $ $ accept

LR PARSERS An efficient bottom-up syntax analysis technique that can be used to parse a large class of CFG is called LR(k) parsing. The ‘L’ is for left-to-right scanning of the input, the ‘R’ for constructing a rightmost derivation in reverse, and the ‘k’ for the number of input symbols. When ‘k’ is omitted, it is assumed to be 1. Advantages of LR parsing: It recognizes virtually all programming language constructs for which CFG can be written. It is an efficient non-backtracking shift-reduce parsing method. A grammar that can be parsed using LR method is a proper superset of a grammar that can be parsed with predictive parser. It detects a syntactic error as soon as possible. Drawbacks of LR method: It is too much of work to construct a LR parser by hand for a programming language grammar. A specialized tool, called a LR parser generator, is needed. Example: YACC. Types of LR parsing method: 1.SLR- Simple LR  Easiest to implement, least powerful. 2.CLR- Canonical LR  Most powerful, most expensive. 3.LALR- Look-Ahead LR  Intermediate in size and cost between the other two methods. The LR parsing algorithm: The schematic form of an LR parser is as follows: INPU T OUTPU T STAC K LR parsing program actio n got o Sm Xm Sm-1 Xm-1 … S0 a1 … ai … an $

It consists of : an input, an output, a stack, a driver program, and a parsing table that has two parts (action and goto). The driver program is the same for all LR parser. The parsing program reads characters from an input buffer one at a time. The program uses a stack to store a string of the form s0X1s1X2s2…Xmsm, where sm is on top. Each Xi is a grammar symbol and each si is a state. The parsing table consists of two parts : action and goto functions. Action : The parsing program determines sm, the state currently on top of stack, and ai, the current input symbol. It then consults action[sm,ai] in the action table which can have one of four values : 1.shift s, where s is a state, 2.reduce by a grammar production A → β, 3.accept, and 4.error. Goto : The function goto takes a state and grammar symbol as arguments and produces a state. LR Parsing algorithm: Input: An input string w and an LR parsing table with functions action and goto for grammar G. Output: If w is in L(G), a bottom-up-parse for w; otherwise, an error indication. Method: Initially, the parser has s0 on its stack, where s0 is the initial state, and w$ in the input buffer. The parser then executes the following program : set ip to point to the first input symbol of w$; repeat forever begin let s be the state on top of the stack and a the symbol pointed to by ip; if action[s, a] = shift s’ then begin push a then s’ on top of the stack; advance ip to the next input symbol end else if action[s, a] = reduce A→β then begin pop 2* | β | symbols off the stack; let s’ be the state now on top of the stack; push A then goto[s’, A] on top of the stack; output the production A→ β end else if action[s, a] = accept then return else error( ) end

CONSTRUCTING SLR(1) PARSING TABLE: To perform SLR parsing, take grammar as input and do the following: 1.Find LR(0) items. 2.Completing the closure. 3.Compute goto(I,X), where, I is set of items and X is grammar symbol. LR(0) items: An LR(0) item of a grammar G is a production of G with a dot at some position of the right side. For example, production A → XYZ yields the four items : A → . XYZ A → X . YZ A → XY . Z A → XYZ . Closure operation: If I is a set of items for a grammar G, then closure(I) is the set of items constructed from I by the two rules: 4.Initially, every item in I is added to closure(I). 5.If A → α . Bβ is in closure(I) and B → γ is a production, then add the item B → . γ to I , if it is not already there. We apply this rule until no more new items can be added to closure(I). Goto operation: Goto(I, X) is defined to be the closure of the set of all items [A→ αX . β] such that [A→ α . Xβ] is in I. Steps to construct SLR parsing table for grammar G are: 1. Augment G and produce G’ 2. Construct the canonical collection of set of items C for G’ 3. Construct the parsing action function action and goto using the following algorithm that requires FOLLOW(A) for each non-terminal of grammar. Algorithm for construction of SLR parsing table: Input : An augmented grammar G’ Output : The SLR parsing table functions action and goto for G’ Method : 6.Construct C = {I0, I1, …. In}, the collection of sets of LR(0) items for G’. 7.State i is constructed from Ii.. The parsing functions for state i are determined as follows: (a) If [A→α∙aβ] is in Ii and goto(Ii,a) = Ij, then set action[i,a] to “shift j”. Here a must be terminal. (b) If [A→α∙] is in Ii , then set action[i,a] to “reduce A→α” for all a in FOLLOW(A). (c) If [S’→S.] is in Ii, then set action[i,$] to “accept”.

3. The goto transitions for state i are constructed for all non-terminals A using the rule: If goto(Ii,A) = Ij, then goto[i,A] = j. 4. All entries not defined by rules (2) and (3) are made “error” 5. The initial state of the parser is the one constructed from the set of items containing [S’→.S]. Example for SLR parsing: Construct SLR parsing for the following grammar : G : E → E + T | T T → T * F | F F → (E) | id The given grammar is : Step 1 : Convert given grammar into augmented grammar. Augmented grammar : E’ → E E → E + T E → T T → T * F T → F F → (E) F → id Step 2 : Find LR (0) items. I0 : E’ → . E E → . E + T E → . T T → . T * F T → . F F → . (E) F → . id GOTO ( I0 , E) GOTO ( I4 , id ) I1 : E’ → E . E → E . + T I5 : F → id . G : E → E + T ------ (1) E →T ------ (2) T → T * F ------ (3) T → F ------ (4) F → (E) ------ (5) F → id ------ (6)

GOTO ( I6 , T ) GOTO ( I0 , T) I9 : E → E + T . T → T . * F I2 : E → T . T → T . * F GOTO ( I6 , F ) GOTO ( I0 , F) I3 : T → F . I3 : T → F . GOTO ( I6 , ( ) I4 : F → ( . E ) GOTO ( I0 , ( ) GOTO ( I6 , id) I4 : F → ( . E) E → . E + T E → . T T → . T * F T → . F F → . (E) F → . id I5 : F → id . GOTO ( I7 , F ) I10 : T → T * F . GOTO ( I7 , ( ) I4 : F → ( . E ) E → . E + T E → . T T → . T * F T → . F F → . (E) F → . id GOTO ( I0 , id ) I5 : F → id . GOTO ( I1 , + ) I6 : E → E + . T T → . T * F T → . F F → . (E) F → . id GOTO ( I7 , id ) I5 : F → id . GOTO ( I8 , ) ) GOTO ( I2 , * ) I11 : F → ( E ) . I7 : T → T * . F F → . (E) F → . id GOTO ( I8 , + ) I6 : E → E + . T T → . T * F T → . F F → . ( E ) F → . id GOTO ( I4 , E ) I8 : F → ( E . ) E → E . + T GOTO ( I4 , T) GOTO ( I9 , *) I2 : E →T . T → T . * F I7 : T → T * . F F → . ( E ) F → . id GOTO ( I4 , F) I3 : T → F .

GOTO ( I4 , ( ) I4 : F → ( . E) E → . E + T E → . T T → . T * F T → . F F → . (E) F → id FOLLOW (E) = { $ , ) , +) FOLLOW (T) = { $ , + , ) , * } FOOLOW (F) = { * , + , ) , $ } SLR parsing table: Blank entries are error entries. Stack implementation: Check whether the input id + id * id is valid or not. ACTION GOTO id + * ( ) $ E T F I0 s5 s4 1 2 3 I1 s6 ACC I2 r2 s7 r2 r2 I3 r4 r4 r4 r4 I4 s5 s4 8 2 3 I5 r6 r6 r6 r6 I6 s5 s4 9 3 I7 s5 s4 10 I8 s6 s11 I9 r1 s7 r1 r1 I10 r3 r3 r3 r3 I11 r5 r5 r5 r5

TYPE CHECKING A compiler must check that the source program follows both syntactic and semantic conventions of the source language. This checking, called static checking, detects and reports programming errors. Some examples of static checks: 1. Type checks – A compiler should report an error if an operator is applied to an incompatible operand. Example: If an array variable and function variable are added together. STACK INPUT ACTION 0 id + id * id $ GOTO ( I0 , id ) = s5 ; shift 0 id 5 + id * id $ GOTO ( I5 , + ) = r6 ; reduce by F→id 0 F 3 + id * id $ GOTO ( I0 , F ) = 3 GOTO ( I3 , + ) = r4 ; reduce by T → F 0 T 2 + id * id $ GOTO ( I0 , T ) = 2 GOTO ( I2 , + ) = r2 ; reduce by E → T 0 E 1 + id * id $ GOTO ( I0 , E ) = 1 GOTO ( I1 , + ) = s6 ; shift 0 E 1 + 6 id * id $ GOTO ( I6 , id ) = s5 ; shift 0 E 1 + 6 id 5 * id $ GOTO ( I5 , * ) = r6 ; reduce by F → id 0 E 1 + 6 F 3 * id $ GOTO ( I6 , F ) = 3 GOTO ( I3 , * ) = r4 ; reduce by T → F 0 E 1 + 6 T 9 * id $ GOTO ( I6 , T ) = 9 GOTO ( I9 , * ) = s7 ; shift 0 E 1 + 6 T 9 * 7 id $ GOTO ( I7 , id ) = s5 ; shift 0 E 1 + 6 T 9 * 7 id 5 $ GOTO ( I5 , $ ) = r6 ; reduce by F → id 0 E 1 + 6 T 9 * 7 F 10 $ GOTO ( I7 , F ) = 10 GOTO ( I10 , $ ) = r3 ; reduce by T → T * F 0 E 1 + 6 T 9 $ GOTO ( I6 , T ) = 9 GOTO ( I9 , $ ) = r1 ; reduce by E → E + T 0 E 1 $ GOTO ( I0 , E ) = 1 GOTO ( I1 , $ ) = accept

2. Flow-of-control checks – Statements that cause flow of control to leave a construct must have some place to which to transfer the flow of control. Example: An error occurs when an enclosing statement, such as break, does not exist in switch statement. Position of type checker token strea m synta x tree synta x tree intermediate representati on  A type checker verifies that the type of a construct matches that expected by its context. For example : arithmetic operator mod in Pascal requires integer operands, so a type checker verifies that the operands of mod have type integer.  Type information gathered by a type checker may be needed when code is generated. TYPE SYSTEMS The design of a type checker for a language is based on information about the syntactic constructs in the language, the notion of types, and the rules for assigning types to language constructs. For example : “ if both operands of the arithmetic operators of +,- and * are of type integer, then the result is of type integer ” Type Expressions  The type of a language construct will be denoted by a “type expression.”  A type expression is either a basic type or is formed by applying an operator called a type constructor to other type expressions.  The sets of basic types and constructors depend on the language to be checked. The following are the definitions of type expressions: 1. Basic types such as boolean, char, integer, real are type expressions. A special basic type, type_error , will signal an error during type checking; void denoting “the absence of a value” allows statements to be checked. 2. Since type expressions may be named, a type name is a type expression. 3. A type constructor applied to type expressions is a type expression. Constructors include: Arrays : If T is a type expression then array (I,T) is a type expression denoting the type of an array with elements of type T and index set I. parser type checker intermediate code generator

Records : The difference between a record and a product is that the fields of a record have names. The record type constructor will be applied to a tuple formed from field names and field types. For example: type row = record address: integer; lexeme: array[1..15] of char end; var table: array[1...101] of row; declares the type name row representing the type expression record((address X integer) X (lexeme X array(1..15,char))) and the variable table to be an array of records of this type. Pointers : If T is a type expression, then pointer(T) is a type expression denoting the type “pointer to an object of type T”. For example, var p: ↑ row declares variable p to have type pointer(row). Functions : A function in programming languages maps a domain type D to a range type R. The type of such function is denoted by the type expression D → R 4. Type expressions may contain variables whose values are type expressions. Tree representation for char x char → pointer (integer) → x pointe r cha r cha r intege r Type systems A type system is a collection of rules for assigning type expressions to the various parts of a program. A type checker implements a type system. It is specified in a syntax-directed manner. Different type systems may be used by different compilers or processors of the same language. Static and Dynamic Checking of Types  Checking done by a compiler is said to be static, while checking done when the target program runs is termed dynamic.  Any check can be done dynamically, if the target code carries the type of an element along with the value of that element.

Sound type system A sound type system eliminates the need for dynamic checking for type errors because it allows us to determine statically that these errors cannot occur when the target program runs. That is, if a sound type system assigns a type other than type_error to a program part, then type errors cannot occur when the target code for the program part is run. Strongly typed language A language is strongly typed if its compiler can guarantee that the programs it accepts will execute without type errors. Error Recovery  Since type checking has the potential for catching errors in program, it is desirable for type checker to recover from errors, so it can check the rest of the input.  Error handling has to be designed into the type system right from the start; the type checking rules must be prepared to cope with errors. SPECIFICATION OF A SIMPLE TYPE CHECKER Here, we specify a type checker for a simple language in which the type of each identifier must be declared before the identifier is used. The type checker is a translation scheme that synthesizes the type of each expression from the types of its subexpressions. The type checker can handle arrays, pointers, statements and functions. A Simple Language Consider the following grammar: P → D ; E D → D ; D | id : T T → char | integer | array [ num ] of T | ↑ T E → literal | num | id | E mod E | E [ E ] | E ↑ Translation scheme: P → D ; E D → D ; D D → id : T T → char T → integer T → ↑ T1 { addtype (id.entry , T.type) } { T.type : = char } { T.type : = integer } { T.type : = pointer(T1.type) } T → array [ num ] of T1 { T.type : = array ( 1… num.val , T1.type) } In the above language, → There are two basic types : char and integer ; → type_error is used to signal errors; → the prefix operator ↑ builds a pointer type. Example , ↑ integer leads to the type expression pointer ( integer ).

Type checking of expressions In the following rules, the attribute type for E gives the type expression assigned to the expression generated by E. 1. E → literal E → num { E.type : = char } { E.type : = integer } Here, constants represented by the tokens literal and num have type char and integer. 2. E → id { E.type : = lookup ( id.entry ) } lookup ( e ) is used to fetch the type saved in the symbol table entry pointed to by e. 3. E → E1 mod E2 { E.type : = if E1. type = integer and E2. type = integer then integer else type_error } The expression formed by applying the mod operator to two subexpressions of type integer has type integer; otherwise, its type is type_error. 4. E → E1 [ E2 ] { E.type : = if E2.type = integer and E1.type = array(s,t) then t else type_error } In an array reference E1 [ E2 ] , the index expression E2 must have type integer. The result is the element type t obtained from the type array(s,t) of E1. 5. E → E1 ↑ { E.type : = if E1.type = pointer (t) then t else type_error } The postfix operator ↑ yields the object pointed to by its operand. The type of E ↑ is the type t of the object pointed to by the pointer E. Type checking of statements Statements do not have values; hence the basic type void can be assigned to them. If an error is detected within a statement, then type_error is assigned. Translation scheme for checking the type of statements: 1. Assignment statement: S → id : = E { S.type : = if id.type = E.type then void else type_error } 2. Conditional statement: S → if E then S1 { S.type : = if E.type = boolean then S1.type else type_error } 3. While statement: S → while E do S1 { S.type : = if E.type = boolean then S1.type else type_error }

4. Sequence of statements: S → S1 ; S2 { S.type : = if S1.type = void and S1.type = void then void else type_error } Type checking of functions The rule for checking the type of a function application is : E → E1 ( E2) { E.type : = if E2.type = s and E1.type = s → t then t else type_error } SOURCE LANGUAGE ISSUES Procedures: A procedure definition is a declaration that associates an identifier with a statement. The identifier is the procedure name, and the statement is the procedure body. For example, the following is the definition of procedure named readarray : procedure readarray; var i : integer; begin for i : = 1 to 9 do read(a[i]) end; When a procedure name appears within an executable statement, the procedure is said to be called at that point. Activation trees: An activation tree is used to depict the way control enters and leaves activations. In an activation tree, 1.Each node represents an activation of a procedure. 2.The root represents the activation of the main program. 3.The node for a is the parent of the node for b if and only if control flows from activation a to b. 4.The node for a is to the left of the node for b if and only if the lifetime of a occurs before the lifetime of b. Control stack:  A control stack is used to keep track of live procedure activations. The idea is to push the node for an activation onto the control stack as the activation begins and to pop the node when the activation ends.  The contents of the control stack are related to paths to the root of the activation tree. When node n is at the top of control stack, the stack contains

The Scope of a Declaration: A declaration is a syntactic construct that associates information with a name. Declarations may be explicit, such as: var i : integer ; or they may be implicit. Example, any variable name starting with I is assumed to denote an integer. The portion of the program to which a declaration applies is called the scope of that declaration. Binding of names: Even if each name is declared once in a program, the same name may denote different data objects at run time. “Data object” corresponds to a storage location that holds values. The term environment refers to a function that maps a name to a storage location. The term state refers to a function that maps a storage location to the value held there. environment state name storage value When an environment associates storage location s with a name x, we say that x is bound to s. This association is referred to as a binding of x. STORAGE ORGANISATION The executing target program runs in its own logical address space in which each program value has a location. The management and organization of this logical address space is shared between the complier, operating system and target machine. The operating system maps the logical address into physical addresses, which are usually spread throughout memory. Typical subdivision of run-time memory: Code Static Data Stack free memory Heap

 Run-time storage comes in blocks, where a byte is the smallest unit of addressable memory. Four bytes form a machine word. Multibyte objects are stored in consecutive bytes and given the address of first byte.  The storage layout for data objects is strongly influenced by the addressing constraints of the target machine.  A character array of length 10 needs only enough bytes to hold 10 characters, a compiler may allocate 12 bytes to get alignment, leaving 2 bytes unused.  This unused space due to alignment considerations is referred to as padding.  The size of some program objects may be known at run time and may be placed in an area called static.  The dynamic areas used to maximize the utilization of space at run time are stack and heap. Activation records:  Procedure calls and returns are usually managed by a run time stack called the control stack.  Each live activation has an activation record on the control stack, with the root of the activation tree at the bottom, the latter activation has its record at the top of the stack.  The contents of the activation record vary with the language being implemented. The diagram below shows the contents of activation record.  Temporary values such as those arising from the evaluation of expressions.  Local data belonging to the procedure whose activation record this is.  A saved machine status, with information about the state of the machine just before the call to procedures.  An access link may be needed to locate data needed by the called procedure but found elsewhere.  A control link pointing to the activation record of the caller.

 Space for the return value of the called functions, if any. Again, not all called procedures return a value, and if one does, we may prefer to place that value in a register for efficiency.  The actual parameters used by the calling procedure. These are not placed in activation record but rather in registers, when possible, for greater efficiency. STORAGE ALLOCATION STRATEGIES The different storage allocation strategies are : 1. Static allocation – lays out storage for all data objects at compile time 2. Stack allocation – manages the run-time storage as a stack. 3. Heap allocation – allocates and deallocates storage as needed at run time from a data area known as heap. STATIC ALLOCATION  In static allocation, names are bound to storage as the program is compiled, so there is no need for a run-time support package.  Since the bindings do not change at run-time, everytime a procedure is activated, its names are bound to the same storage locations.  Therefore values of local names are retained across activations of a procedure. That is, when control returns to a procedure the values of the locals are the same as they were when control left the last time.  From the type of a name, the compiler decides the amount of storage for the name and decides where the activation records go. At compile time, we can fill in the addresses at which the target code can find the data it operates on. STACK ALLOCATION OF SPACE  All compilers for languages that use procedures, functions or methods as units of user- defined actions manage at least part of their run-time memory as a stack.  Each time a procedure is called , space for its local variables is pushed onto a stack, and when the procedure terminates, that space is popped off the stack. Calling sequences:  Procedures called are implemented in what is called as calling sequence, which consists of code that allocates an activation record on the stack and enters information into its fields.  A return sequence is similar to code to restore the state of machine so the calling procedure can continue its execution after the call.  The code in calling sequence is often divided between the calling procedure (caller) and the procedure it calls (callee).  When designing calling sequences and the layout of activation records, the following principles are helpful:

 Fixed length items are generally placed in the middle. Such items typically include the control link, the access link, and the machine status fields.  Items whose size may not be known early enough are placed at the end of the activation record. The most common example is dynamically sized array, where the value of one of the callee’s parameters determines the length of the array.  We must locate the top-of-stack pointer judiciously. A common approach is to have it point to the end of fixed-length fields in the activation record. Fixed-length data can then be accessed by fixed offsets, known to the intermediate-code generator, relative to the top-of-stack pointer. . . . Parameters and returned values caller’s activatio n record caller’ s responsibil ity callee’s activation recor d top_s p callee ’s responsibili ty Division of tasks between caller and callee  The calling sequence and its division between caller and callee are as follows.  The caller evaluates the actual parameters.  The caller stores a return address and the old value of top_sp into the callee’s activation record. The caller then increments the top_sp to the respective positions.  The callee saves the register values and other status information.  The callee initializes its local data and begins execution.  A suitable, corresponding return sequence is:  The callee places the return value next to the parameters.  Using the information in the machine-status field, the callee restores top_sp and other registers, and then branches to the return address that the caller placed in the status field.  Although top_sp has been decremented, the caller knows where the return value is, relative to the current value of top_sp; the caller therefore may use that value. control link links and saved status temporaries and local data Parameters and returned values control link links and saved status temporaries and local data

Variable length data on stack: The run-time memory management system must deal frequently with the allocation of space for objects, the sizes of which are not known at the compile time, but which are local to a procedure and thus may be allocated on the stack. The reason to prefer placing objects on the stack is that we avoid the expense of garbage collecting their space. The same scheme works for objects of any type if they are local to the procedure called and have a size that depends on the parameters of the call. . activatio n record for p pointe r to B pointe r to C array A arrays of p array B array C activation record for procedure q called by p top_s p arrays of q to p Access to dynamically allocated arrays  Procedure p has three local arrays, whose sizes cannot be determined at compile time. The storage for these arrays is not part of the activation record for p.  Access to the data is through two pointers, top and top- sp. Here the top marks the actual top of stack; it points the position at which the next activation record will begin.  The second top-sp is used to find local, fixed-length fields of the top activation record.  The code to reposition top and top-sp can be generated at compile time, in terms of sizes that will become control link pointer to A control link

HEAP ALLOCATION Stack allocation strategy cannot be used if either of the following is possible : 1.The values of local names must be retained when an activation ends. 2.A called activation outlives the caller.  Heap allocation parcels out pieces of contiguous storage, as needed for activation records or other objects.  Pieces may be deallocated in any order, so over the time the heap will consist of alternate areas that are free and in use.  The record for an activation of procedure r is retained when the activation ends.  Therefore, the record for the new activation q(1 , 9) cannot follow that for s physically.  If the retained activation record for r is deallocated, there will be free space in the heap between the activation records for s and q. Position in the activation tree Activation records in the heap Remarks s r q ( 1 , 9) s control link r control link q(1,9) control link Retained activation record for r

Parsing techniques, notations, methods of parsing in compiler design

More Related Content

Similar to Parsing techniques, notations, methods of parsing in compiler design

Recently uploaded

Parsing techniques, notations, methods of parsing in compiler design