1 Compiler Design (40-414) Compiler Design (40-414)  Main Text Book: Main Text Book: Compilers: Principles, Techniques & Tools, 2 Compilers: Principles, Techniques & Tools, 2nd nd ed., ed., Aho, Lam, Sethi, and Ullman, 2007 Aho, Lam, Sethi, and Ullman, 2007  Evaluation: Evaluation:  Midterm Exam 35%  Final Exam 35%  Assignments and Quizzes 10%  Project 20%
2 Compiler learning Compiler learning  Isn’t it an old discipline? Isn’t it an old discipline?  Yes, it is a well-established discipline  Algorithms, methods and techniques were developed in early stages of computer science  There are many compilers around, and  many tools to generate them automatically  So, why we need to learn it? So, why we need to learn it?  Although you may never write a full compiler  But the techniques we learn is useful in many tasks like:  writing an interpreter for a scripting language,  validation checking for forms, and  so on
3 Terminology Terminology  Compiler: Compiler:  a program that translates an executable program in a source language (usually high level) into an equivalent executable program in a target language (usually low level)  Interpreter: Interpreter:  a program that reads an executable program and produces the results of running that program  usually, this involves executing the source program in some fashion  Our course is mainly about compilers but many of Our course is mainly about compilers but many of the same issues arise in interpreters the same issues arise in interpreters
4 A Compiler A Compiler Compiler Source Program Target Program Errors Target Progtam Input Output
5 An Interpreter An Interpreter Interpreter Input Output  Translates line by line Translates line by line  Executes each translated line immediately Executes each translated line immediately  Execution is slower because translation is repeated Execution is slower because translation is repeated  But, usually give better error diagnostics than a compiler But, usually give better error diagnostics than a compiler Source Program
6 A Hybrid Compiler A Hybrid Compiler Translator Source Program Intermediate Program Errors Virtual Machine Input Output
7 Classifications of Compilers Classifications of Compilers  There are different types of Compilers: There are different types of Compilers: Single Pass Multiple Pass Construction Absolute (e.g., *.com) Relocateable (e.g., *.exe) Type of produced code
8 The Many The Many Phases Phases of a Compiler of a Compiler Source Program Lexical analyzer 1 Syntax Analyzer 2 Semantic Analyzer 3 Intermediate Code Generator 4 Code Optimizer 5 Code Generator Target Program Symbol-table Manager Error Handler Analyses Peephole Optimization 7 1, 2, 3, 4, 5 : Front-End 6, 7 : Back-End 6 Syntheses
9 Front-end, Back-end division Front-end, Back-end division  Front end maps legal code into IR Front end maps legal code into IR  Back end maps IR onto target machine Back end maps IR onto target machine  Simplifies retargeting Simplifies retargeting  Allows multiple front ends Allows multiple front ends Front end Source code Machine code errors IR Back end
10 Front end Front end  Scanner: Scanner:  Maps characters into tokens – the basic unit of syntax  x = x + y becomes <id, x> = <id, x> + <id, y>  Typical tokens: number, id, +, -, *, /, do, end  Eliminate white space (tabs, blanks, comments)  A key issue is speed so instead of using a tool like LEX it A key issue is speed so instead of using a tool like LEX it sometimes needed to write your own scanner sometimes needed to write your own scanner Scanner Source code Parse Tree errors tokens Parser
11 Front end Front end  Parser: Parser:  Recognize context-free syntax  Guide context-sensitive analysis  Construct IR  Produce meaningful error messages  Attempt error correction  There are parser generators like YACC which automates There are parser generators like YACC which automates much of the work much of the work Scanner Source code Parse Tree errors tokens Parser
12 Front end Front end  Context free grammars are used to represent Context free grammars are used to represent programming language syntaxes: programming language syntaxes: <expr> ::= <expr> <op> <term> | <term> <expr> ::= <expr> <op> <term> | <term> <term> ::= <number> | <id> <term> ::= <number> | <id> <op> ::= + | - <op> ::= + | -
13 Front end Front end  A parser tries to map a A parser tries to map a program to the syntactic program to the syntactic elements defined in the elements defined in the grammar grammar  A parse can be represented A parse can be represented by a tree called a parse or by a tree called a parse or syntax tree syntax tree
14 Front end Front end  A parse tree can be A parse tree can be represented more compactly represented more compactly referred to as Abstract Syntax referred to as Abstract Syntax Tree (AST) Tree (AST)  AST can be used as IR AST can be used as IR between front end and back between front end and back end end
15 Back end Back end  Translate IR into target machine code Translate IR into target machine code  Choose instructions for each IR operation Choose instructions for each IR operation  Decide what to keep in registers at each point Decide what to keep in registers at each point Instruction selection IR Machine code errors Register Allocation
16 Back end Back end  Produce compact fast code Produce compact fast code  Use available addressing modes Use available addressing modes Code Generation IR Machine code errors Peephole Optimization
17 Back end Back end  Limited resources Limited resources  Optimal allocation is difficult Optimal allocation is difficult Code Generation IR Machine code errors Peephole Optimization
18  Three Phases: Three Phases:  Lexical Analysis:  Left-to-right Scan to Identify Tokens token: sequence of chars having a collective meaning  Syntax Analysis:  Grouping of Tokens Into Meaningful Collection  Semantic Analysis:  Checking to ensure Correctness of Components The Analysis Task For Compilation
19 Phase 1. Lexical Analysis Easiest Analysis - Identify tokens which are the basic building blocks For Example: All are tokens Blanks, Line breaks, etc. are scanned out Position := initial + rate * 60 ; _______ __ _____ _ ___ _ __ _
20 Phase 2. Phase 2. Syntax Analysis Syntax Analysis or or Parsing Parsing For previous example, we would have Parse Tree: identifier identifier expression identifier expression number expression expression expression assignment statement position := + * 60 initial rate Nodes of tree are constructed using a grammar for the language
21 Phase 3. Semantic Analysis Phase 3. Semantic Analysis  Finds Semantic Errors Finds Semantic Errors  One of the Most Important One of the Most Important Activity in This Phase: Activity in This Phase:  Type Checking Type Checking - - Legality of Operands Legality of Operands position initial rate := + * 60 Syntax Tree position initial rate := + * inttoreal 60 Conversion Action
22 Supporting Phases/ Activities for Analysis  Symbol Table Creation / Maintenance Symbol Table Creation / Maintenance  Contains Info (storage, type, scope, args) on Each “Meaningful” Token, Typically Identifiers  Data Structure Created / Initialized During Lexical Analysis  Utilized / Updated During Later Analysis & Synthesis  Error Handling Error Handling  Detection of Different Errors Which Correspond to All Phases  What Happens When an Error Is Found?
23 The Synthesis Task For Compilation  Intermediate Code Generation Intermediate Code Generation  Abstract Machine Version of Code - Independent of Architecture  Easy to Produce and Do Final, Machine Dependent Code Generation  Code Optimization Code Optimization  Find More Efficient Ways to Execute Code  Replace Code With More Optimal Statements  Final Code Generation Final Code Generation  Generate Relocatable Machine Dependent Code  Peephole Optimization Peephole Optimization  With a Very Limited View Improves Produced Final Code
24 Reviewing the Entire Process Reviewing the Entire Process Errors position := initial + rate * 60 lexical analyzer syntax analyzer semantic analyzer intermediate code generator id1 := id2 + id3 * 60 := id1 id2 id3 + * 60 := id1 id2 id3 + * inttoreal 60 Symbol Table position .... initial …. rate….
25 Reviewing the Entire Process Reviewing the Entire Process Errors intermediate code generator code optimizer final code generator t1 := inttoreal(60) t2 := id3 * t1 temp3 := id2 + t2 id1 := t3 t1 := id3 * 60.0 id1 := id2 + t1 MOVF id3, R2 MULF #60.0, R2 MOVF id2, R1 ADDF R1, R2 MOVF R1, id1 position .... initial …. rate…. Symbol Table 3 address code

Compiler design computer science engineering.ppt

  • 1.
    1 Compiler Design (40-414) CompilerDesign (40-414)  Main Text Book: Main Text Book: Compilers: Principles, Techniques & Tools, 2 Compilers: Principles, Techniques & Tools, 2nd nd ed., ed., Aho, Lam, Sethi, and Ullman, 2007 Aho, Lam, Sethi, and Ullman, 2007  Evaluation: Evaluation:  Midterm Exam 35%  Final Exam 35%  Assignments and Quizzes 10%  Project 20%
  • 2.
    2 Compiler learning Compiler learning Isn’t it an old discipline? Isn’t it an old discipline?  Yes, it is a well-established discipline  Algorithms, methods and techniques were developed in early stages of computer science  There are many compilers around, and  many tools to generate them automatically  So, why we need to learn it? So, why we need to learn it?  Although you may never write a full compiler  But the techniques we learn is useful in many tasks like:  writing an interpreter for a scripting language,  validation checking for forms, and  so on
  • 3.
    3 Terminology Terminology  Compiler: Compiler:  aprogram that translates an executable program in a source language (usually high level) into an equivalent executable program in a target language (usually low level)  Interpreter: Interpreter:  a program that reads an executable program and produces the results of running that program  usually, this involves executing the source program in some fashion  Our course is mainly about compilers but many of Our course is mainly about compilers but many of the same issues arise in interpreters the same issues arise in interpreters
  • 4.
  • 5.
    5 An Interpreter An Interpreter Interpreter Input Output Translates line by line Translates line by line  Executes each translated line immediately Executes each translated line immediately  Execution is slower because translation is repeated Execution is slower because translation is repeated  But, usually give better error diagnostics than a compiler But, usually give better error diagnostics than a compiler Source Program
  • 6.
    6 A Hybrid Compiler AHybrid Compiler Translator Source Program Intermediate Program Errors Virtual Machine Input Output
  • 7.
    7 Classifications of Compilers Classificationsof Compilers  There are different types of Compilers: There are different types of Compilers: Single Pass Multiple Pass Construction Absolute (e.g., *.com) Relocateable (e.g., *.exe) Type of produced code
  • 8.
    8 The Many The ManyPhases Phases of a Compiler of a Compiler Source Program Lexical analyzer 1 Syntax Analyzer 2 Semantic Analyzer 3 Intermediate Code Generator 4 Code Optimizer 5 Code Generator Target Program Symbol-table Manager Error Handler Analyses Peephole Optimization 7 1, 2, 3, 4, 5 : Front-End 6, 7 : Back-End 6 Syntheses
  • 9.
    9 Front-end, Back-end division Front-end,Back-end division  Front end maps legal code into IR Front end maps legal code into IR  Back end maps IR onto target machine Back end maps IR onto target machine  Simplifies retargeting Simplifies retargeting  Allows multiple front ends Allows multiple front ends Front end Source code Machine code errors IR Back end
  • 10.
    10 Front end Front end Scanner: Scanner:  Maps characters into tokens – the basic unit of syntax  x = x + y becomes <id, x> = <id, x> + <id, y>  Typical tokens: number, id, +, -, *, /, do, end  Eliminate white space (tabs, blanks, comments)  A key issue is speed so instead of using a tool like LEX it A key issue is speed so instead of using a tool like LEX it sometimes needed to write your own scanner sometimes needed to write your own scanner Scanner Source code Parse Tree errors tokens Parser
  • 11.
    11 Front end Front end Parser: Parser:  Recognize context-free syntax  Guide context-sensitive analysis  Construct IR  Produce meaningful error messages  Attempt error correction  There are parser generators like YACC which automates There are parser generators like YACC which automates much of the work much of the work Scanner Source code Parse Tree errors tokens Parser
  • 12.
    12 Front end Front end Context free grammars are used to represent Context free grammars are used to represent programming language syntaxes: programming language syntaxes: <expr> ::= <expr> <op> <term> | <term> <expr> ::= <expr> <op> <term> | <term> <term> ::= <number> | <id> <term> ::= <number> | <id> <op> ::= + | - <op> ::= + | -
  • 13.
    13 Front end Front end A parser tries to map a A parser tries to map a program to the syntactic program to the syntactic elements defined in the elements defined in the grammar grammar  A parse can be represented A parse can be represented by a tree called a parse or by a tree called a parse or syntax tree syntax tree
  • 14.
    14 Front end Front end A parse tree can be A parse tree can be represented more compactly represented more compactly referred to as Abstract Syntax referred to as Abstract Syntax Tree (AST) Tree (AST)  AST can be used as IR AST can be used as IR between front end and back between front end and back end end
  • 15.
    15 Back end Back end Translate IR into target machine code Translate IR into target machine code  Choose instructions for each IR operation Choose instructions for each IR operation  Decide what to keep in registers at each point Decide what to keep in registers at each point Instruction selection IR Machine code errors Register Allocation
  • 16.
    16 Back end Back end Produce compact fast code Produce compact fast code  Use available addressing modes Use available addressing modes Code Generation IR Machine code errors Peephole Optimization
  • 17.
    17 Back end Back end Limited resources Limited resources  Optimal allocation is difficult Optimal allocation is difficult Code Generation IR Machine code errors Peephole Optimization
  • 18.
    18  Three Phases: ThreePhases:  Lexical Analysis:  Left-to-right Scan to Identify Tokens token: sequence of chars having a collective meaning  Syntax Analysis:  Grouping of Tokens Into Meaningful Collection  Semantic Analysis:  Checking to ensure Correctness of Components The Analysis Task For Compilation
  • 19.
    19 Phase 1. LexicalAnalysis Easiest Analysis - Identify tokens which are the basic building blocks For Example: All are tokens Blanks, Line breaks, etc. are scanned out Position := initial + rate * 60 ; _______ __ _____ _ ___ _ __ _
  • 20.
    20 Phase 2. Phase 2.Syntax Analysis Syntax Analysis or or Parsing Parsing For previous example, we would have Parse Tree: identifier identifier expression identifier expression number expression expression expression assignment statement position := + * 60 initial rate Nodes of tree are constructed using a grammar for the language
  • 21.
    21 Phase 3. SemanticAnalysis Phase 3. Semantic Analysis  Finds Semantic Errors Finds Semantic Errors  One of the Most Important One of the Most Important Activity in This Phase: Activity in This Phase:  Type Checking Type Checking - - Legality of Operands Legality of Operands position initial rate := + * 60 Syntax Tree position initial rate := + * inttoreal 60 Conversion Action
  • 22.
    22 Supporting Phases/ Activities forAnalysis  Symbol Table Creation / Maintenance Symbol Table Creation / Maintenance  Contains Info (storage, type, scope, args) on Each “Meaningful” Token, Typically Identifiers  Data Structure Created / Initialized During Lexical Analysis  Utilized / Updated During Later Analysis & Synthesis  Error Handling Error Handling  Detection of Different Errors Which Correspond to All Phases  What Happens When an Error Is Found?
  • 23.
    23 The Synthesis TaskFor Compilation  Intermediate Code Generation Intermediate Code Generation  Abstract Machine Version of Code - Independent of Architecture  Easy to Produce and Do Final, Machine Dependent Code Generation  Code Optimization Code Optimization  Find More Efficient Ways to Execute Code  Replace Code With More Optimal Statements  Final Code Generation Final Code Generation  Generate Relocatable Machine Dependent Code  Peephole Optimization Peephole Optimization  With a Very Limited View Improves Produced Final Code
  • 24.
    24 Reviewing the EntireProcess Reviewing the Entire Process Errors position := initial + rate * 60 lexical analyzer syntax analyzer semantic analyzer intermediate code generator id1 := id2 + id3 * 60 := id1 id2 id3 + * 60 := id1 id2 id3 + * inttoreal 60 Symbol Table position .... initial …. rate….
  • 25.
    25 Reviewing the EntireProcess Reviewing the Entire Process Errors intermediate code generator code optimizer final code generator t1 := inttoreal(60) t2 := id3 * t1 temp3 := id2 + t2 id1 := t3 t1 := id3 * 60.0 id1 := id2 + t1 MOVF id3, R2 MULF #60.0, R2 MOVF id2, R1 ADDF R1, R2 MOVF R1, id1 position .... initial …. rate…. Symbol Table 3 address code