Compiler Construction | Lecture 6 | Introduction to Static Analysis

Eelco Visser CS4200 Compiler Construction TU Delft October 2018 Lecture 6: Introduction to Static Analysis

This Lecture !2 Source Code Editor Parse Abstract Syntax Tree Type Check Check that names are used correctly and that expressions are well-typed Errors

Reading Material 3 The following papers add background, conceptual exposition, and examples to the material from the slides. Some notation and technical details have been changed; check the documentation.

!4 Type checkers are algorithms that check names and types in programs. This paper introduces the NaBL Name Binding Language, which supports the declarative deﬁnition of the name binding and scope rules of programming languages through the deﬁnition of scopes, declarations, references, and imports associated. http://dx.doi.org/10.1007/978-3-642-36089-3_18 SLE 2012 Background

!5 This paper introduces scope graphs as a language-independent representation for the binding information in programs. http://dx.doi.org/10.1007/978-3-662-46669-8_9 ESOP 2015 Best EAPLS paper at ETAPS 2015

!6 Separating type checking into constraint generation and constraint solving provides more declarative deﬁnition of type checkers. This paper introduces a constraint language integrating name resolution into constraint resolution through scope graph constraints. This is the basis for the design of the NaBL2 static semantics speciﬁcation language. https://doi.org/10.1145/2847538.2847543 PEPM 2016

!7 Documentation for NaBL2 at the metaborg.org website. http://www.metaborg.org/en/latest/source/langdev/meta/lang/nabl2/index.html

!8 This paper describes the next generation of the approach. Addresses (previously) open issues in expressiveness of scope graphs for type systems: - Structural types - Generic types Addresses open issue with staging of information in type systems. Introduces Statix DSL for deﬁnition of type systems. Prototype of Statix is available in Spoofax HEAD, but not ready for use in project yet. The future OOPSLA 2018 To appear

Dynamically Typed vs Statically Typed - Dynamic: type checking at run-time - Static: type checking at compile-time (before run-time) What does it mean to type check? - Type safety: guarantee absence of run-time type errors Why static type checking? - Avoid overhead of run-time type checking - Fail faster: ﬁnd (type) errors at compile time - Find all (type) errors: some errors may not be triggered by testing - But: not all errors can be found statically (e.g. array bounds checking) !10 Why Type Checking? Some Discussion Points

Context-Sensitive Properties 11

Homework Assignment: What is the Syntax of This Language? !12 <languages> <language> <name>SDF3</name> <purpose>Syntax Definition</purpose> <implementedin>SDF3</implementedin> </language> <language> <name>Stratego</name> <purpose>Transformation</purpose> <implementedin>SDF3</implementedin> <implementedin>Stratego</implementedin> <target>Java</target> </language> </languages> <catalogue> <book> <title>Modern Compiler Implementation in ML</title> <author>Andrew Appel</author> <publisher>Cambridge</publisher> </book> <book> <title>Parsing Schemata</title> <author>Klaas Sikkel</author> <publisher>Springer</publisher> </book> </catalogue>

Syntax of Book Catalogues !13 context-free syntax Document.Catalogue = [ <catalogue> [Book*] </catalogue> ] Book.Book = [ <book> [Title] [Author] [Publisher] </book> ] …Schema-speciﬁc syntax deﬁnition <catalogue> <book> <title>Modern Compiler Implementation in ML</title> <author>Andrew Appel</author> <publisher>Cambridge</publisher> </book> <book> <title>Parsing Schemata</title> <author>Klaas Sikkel</author> <publisher>Springer</publisher> </book> </catalogue>

A Generic Syntax of XML Documents !14 context-free start-symbols Document sorts Tag Word lexical syntax Tag = [a-zA-Z][a-zA-Z0-9]* Word = ~[<>]+ lexical restrictions Word -/- ~[<>] sorts Document Elem context-free syntax Document.Doc = Elem Elem.Node = [ <[Tag]> [Elem*] </[Tag]> ] Elem.Text = Word+ {longest-match} <catalogue> <book> <title>Modern Compiler Implementation in ML</title> <author>Andrew Appel</author> <publisher>Cambridge</publisher> </book> <book> <title>Parsing Schemata</title> <author>Klaas Sikkel</author> <publisher>Springer</publisher> </book> </catalogue> Doc( Node( "catalogue" , [ Node( "book" , [ Node("title", [Text(["Modern Compiler Implementation in ML"])], "title") , Node("author", [Text(["Andrew Appel"])], "author") , Node("publisher", [Text(["Cambridge"])], "publisher") ] , "book" ) , Node( "book" , [ Node("title", [Text(["Parsing Schemata"])], "title") , Node("author", [Text(["Klaas Sikkel"])], "author") , Node("publisher", [Text(["Springer"])], "publisher") ] , "book" ) ] , "catalogue" ) )

A Generic Syntax of XML Documents !15 context-free start-symbols Document sorts Tag Word lexical syntax Tag = [a-zA-Z][a-zA-Z0-9]* Word = ~[<>]+ lexical restrictions Word -/- ~[<>] sorts Document Elem context-free syntax Document.Doc = Elem Elem.Node = [ <[Tag]> [Elem*] </[Tag]> ] Elem.Text = Word+ {longest-match} <catalogue> <book> <title>Modern Compiler Implementation in ML</title> <author>Andrew Appel</author> <publisher>Cambridge</publisher> </book> <book> <title>Parsing Schemata</title> <author>Klaas Sikkel</author> <publisher>Springer</publisher> </book> </catalogue> What is the problem with this approach?

Generic Syntax is Too Liberal! !16 context-free start-symbols Document sorts Tag Word lexical syntax Tag = [a-zA-Z][a-zA-Z0-9]* Word = ~[<>]+ lexical restrictions Word -/- ~[<>] sorts Document Elem context-free syntax Document.Doc = Elem Elem.Node = [ <[Tag]> [Elem*] </[Tag]> ] Elem.Text = Word+ {longest-match} <catalogue> <book> <title>Modern Compiler Implementation in ML</title> <author>Andrew Appel</author> <publisher>Cambridge</publisher> <year></year> </book> <language> <name>SDF3</name> <purpose>Syntax Definition</purpose> <implementedin>SDF3</implementedin> </language> <book> //<title>Parsing Schemata</title> <author>Klaas Sikkel</author> <publisher>Springer</publisher> </book> <book> </koob> </catalogue> Year is not a valid element of book Only books in catalogue Book should have a title Closing tag is not consistent with starting tag

Context-free grammar is … context-free! - Cannot express alignment Languages have context-sensitive properties How can we have our cake and eat it too? - Generic (liberal) syntax - Forbid programs/documents that are not well-formed !17 Context-Sensitive Properties

Checking Context-Sensitive Properties 18

Generic (liberal) syntax - Allow more programs/documents Check properties on AST - Reject programs/documents that are not well-formed !19 Approach: Checking Context-Sensitive Properties

Doc( Node( "catalogue" , [ Node( "book" , [ Node("title", [Text(["Modern Compiler Implementation in ML"])], "title") , Node("author", [Text(["Andrew Appel"])], "author") , Node("publisher", [Text(["Cambridge"])], "publisher") , Node("year", [], "year") ] , "book" ) , Node( "language" , [ Node("name", [Text(["SDF3"])], "name") , Node("purpose", [Text(["Syntax Definition"])], "purpose") , Node("implementedin", [Text(["SDF3"])], "implementedin") ] , "language" ) , Node( "book" , [ Node("author", [Text(["Klaas Sikkel"])], "author") , Node("publisher", [Text(["Springer"])], "publisher") ] , "book" ) , Node("book", [], "koob") ] , "catalogue" ) ) Checking Context-Context-Sensitive Properties in Spoofax !20 rules // Analysis editor-analyze: (ast, path, project-path) -> (ast', error*, warning*, info*) with ast' := <id> ast ; error* := <collect-all(constraint-error)> ast' ; warning* := <collect-all(constraint-warning)> ast' ; info* := <collect-all(constraint-info)> ast' parse errors

Check Violations using Constraint-Error Rules !21 <catalogue> <book> <title>Modern Compiler Implementation in ML</title> <author>Andrew Appel</author> <publisher>Cambridge</publisher> <year></year> </book> <language> <name>SDF3</name> <purpose>Syntax Definition</purpose> <implementedin>SDF3</implementedin> </language> <book> //<title>Parsing Schemata</title> <author>Klaas Sikkel</author> <publisher>Springer</publisher> </book> <book> </koob> </catalogue> rules // Analysis editor-analyze: (ast, path, project-path) -> (ast', error*, warning*, info*) with ast' := <id> ast ; error* := <collect-all(constraint-error)> ast' ; warning* := <collect-all(constraint-warning)> ast' ; info* := <collect-all(constraint-info)> ast' Find all sub-terms that are not consistent with the context- sensitive rules collect-all: type-unifying generic traversal

Check Violations using Constraint-Error Rules !22 <catalogue> <book> <title>Modern Compiler Implementation in ML</title> <author>Andrew Appel</author> <publisher>Cambridge</publisher> <year></year> </book> <language> <name>SDF3</name> <purpose>Syntax Definition</purpose> <implementedin>SDF3</implementedin> </language> <book> //<title>Parsing Schemata</title> <author>Klaas Sikkel</author> <publisher>Springer</publisher> </book> <book> </koob> </catalogue> Closing tag does not match starting tag rules constraint-error : Node(tag1, elems, tag2) -> (tag2, $[Closing tag does not match starting tag]) where <not(eq)>(tag1, tag2) Origin Error message

Check Violations using Constraint-Error Rules !23 <catalogue> <book> <title>Modern Compiler Implementation in ML</title> <author>Andrew Appel</author> <publisher>Cambridge</publisher> <year></year> </book> <language> <name>SDF3</name> <purpose>Syntax Definition</purpose> <implementedin>SDF3</implementedin> </language> <book> //<title>Parsing Schemata</title> <author>Klaas Sikkel</author> <publisher>Springer</publisher> </book> <book> </koob> </catalogue> Book should have a title rules constraint-error : n@Node(tag@"book", elems, _) -> (tag, $[Book should have title]) where <not(has(|"title"))> elems has(|tag) = fetch(?Node(tag, _, _)) Containment checks

Check Violations using Constraint-Error Rules !24 <catalogue> <book> <title>Modern Compiler Implementation in ML</title> <author>Andrew Appel</author> <publisher>Cambridge</publisher> <year></year> </book> <language> <name>SDF3</name> <purpose>Syntax Definition</purpose> <implementedin>SDF3</implementedin> </language> <book> //<title>Parsing Schemata</title> <author>Klaas Sikkel</author> <publisher>Springer</publisher> </book> <book> </koob> </catalogue> Catalogue can only have books rules constraint-error : Node("catalogue", elems, _) -> error where <filter(not-a-book)> elems => [error | _] not-a-book : Node(tag, _, _) -> (tag, $[Catalogue can only have books]) where <not(eq)> (tag, "book") Containment checks

Check Violations using Constraint-Error Rules !25 <catalogue> <book> <title>Modern Compiler Implementation in ML</title> <author>Andrew Appel</author> <publisher>Cambridge</publisher> <year></year> </book> <language> <name>SDF3</name> <purpose>Syntax Definition</purpose> <implementedin>SDF3</implementedin> </language> <book> //<title>Parsing Schemata</title> <author>Klaas Sikkel</author> <publisher>Springer</publisher> </book> <book> </koob> </catalogue> Book can only have title, author, publisher rules constraint-error : Node("book", elems, _) -> error where <filter(not-a-book-elem)> elems => [error | _] not-a-book-elem : Node(tag, _, _) -> (tag, $[Book can only have title, author, publisher]) where <not(elem())> (tag, ["title", "author", "publisher"]) Containment checks

Generic (liberal) syntax - Allow more programs/documents - `permissive syntax’ Check properties on AST - Reject programs/documents that are not well-formed Advantage - Smaller syntax deﬁnition - Parser does not fail (so often) - Better error messages than parser can give !26 Approach: Checking Context-Sensitive Properties

How are programming languages different from XML? 27

XML checking - Tag consistency - Schema consistency - These are structural properties Programming languages - Type consistency (similar to schema?) - Name consistency: declarations and references should correspond - Name dependent type consistency: names carry types !28 Programming Languages vs XML

Scope !30 let var x : int := 0 + z var y : int := x + 1 var z : int := x + y + 1 in x + y + z end

Scope: Deﬁnition before Use !31 let var x : int := 0 + z // z not in scope var y : int := x + 1 var z : int := x + y + 1 in x + y + z end

Mutual Recursion !32 let function odd(x : int) : int = if x > 0 then even(x - 1) else false function even(x : int) : int = if x > 0 then odd(x - 1) else true in even(34) end let function odd(x : int) : int = if x > 0 then even(x - 1) else false var x : int function even(x : int) : int = if x > 0 then odd(x - 1) else true in even(34) end

Mutually Recursive Functions should be Adjacent !33 let function odd(x : int) : int = if x > 0 then even(x - 1) else false function even(x : int) : int = if x > 0 then odd(x - 1) else true in even(34) end let function odd(x : int) : int = if x > 0 then even(x - 1) else false var x : int function even(x : int) : int = if x > 0 then odd(x - 1) else true in even(34) end

Name Spaces !34 let type foo = int function foo(x : foo) : foo = 3 var foo : foo := foo(4) in foo(56) + foo end

Functions and Variables in Same Name Space !35 let type foo = int function foo(x : foo) : foo = 3 var foo : foo := foo(4) in foo(56) + foo // both refer to the variable foo end Functions and variables are in the same namespace

Type Dependent Name Resolution !36 let type point = {x : int, y : int} var origin : point := point { x = 1, y = 2 } in origin.x end

Type Dependent Name Resolution !37 let type point = {x : int, y : int} var origin : point := point { x = 1, y = 2 } in origin.x end Resolving origin.x requires the type of origin

Name Correspondence !38 let type point = {x : int, y : int} type errpoint = {x : int, x : int} var p : point var e : errpoint in p := point{ x = 3, y = 3, z = "a" } p := point{ x = 3 } end

Name Set Correspondence !39 let type point = {x : int, y : int} type errpoint = {x : int, x : int} var p : point var e : errpoint in p := point{ x = 3, y = 3, z = "a" } p := point{ x = 3 } end Field “y” not initialized Reference “z” not resolved Duplicate Declaration of Field “x”

Recursive Types !40 let type intlist = {hd : int, tl : intlist} type tree = {key : int, children : treelist} type treelist = {hd : tree, tl : treelist} var l : intlist var t : tree var tl : treelist in l := intlist { hd = 3, tl = l }; t := tree { key = 2, children = treelist { hd = tree{ key = 3, children = 3 }, tl = treelist{ } } }; t.children.hd.children := t.children end

Recursive Types !41 let type intlist = {hd : int, tl : intlist} type tree = {key : int, children : treelist} type treelist = {hd : tree, tl : treelist} var l : intlist var t : tree var tl : treelist in l := intlist { hd = 3, tl = l }; t := tree { key = 2, children = treelist { hd = tree{ key = 3, children = 3 }, tl = treelist{ } } }; t.children.hd.children := t.children end type mismatch Field "tl" not initialized Field "hd" not initialized

Intermezzo: Testing Static Analysis 42

Testing Name Resolution !43 test inner name [[ let type t = u type u = int var x: u := 0 in x := 42 ; let type [[u]] = t var y: [[u]] := 0 in y := 42 end end ]] resolve #2 to #1 test outer name [[ let type t = u type [[u]] = int var x: [[u]] := 0 in x := 42 ; let type u = t var y: u := 0 in y := 42 end end ]] resolve #2 to #1

Testing Type Checking !44 test variable reference [[ let type t = u type u = int var x: u := 0 in x := 42 ; let type u = t var y: u := 0 in y := [[x]] end end ]] run get-type to INT() test integer constant [[ let type t = u type u = int var x: u := 0 in x := 42 ; let type u = t var y: u := 0 in y := [[42]] end end ]] run get-type to INT()

Testing Errors !45 test undefined variable [[ let type t = u type u = int var x: u := 0 in x := 42 ; let type u = t var y: u := 0 in y := [[z]] end end ]] 1 error test type error [[ let type t = u type u = string var x: u := 0 in x := 42 ; let type u = t var y: u := 0 in y := [[x]] end end ]] 1 error

Test Corner Cases !46 context-free superset language

Implementing a Type Checker with Rewrite Rules 47

Compute Type of Expression !48 rules type-check : Mod(e) -> t where <type-check> e => t type-check : String(_) -> STRING() type-check : Int(_) -> INT() type-check : Plus(e1, e2) -> INT() where <type-check> e1 => INT(); <type-check> e2 => INT() type-check : Times(e1, e2) -> INT() where <type-check> e1 => INT(); <type-check> e2 => INT() 1 + 2 * 3 Mod( Plus(Int("1"), Times(Int("2"), Int("3"))) ) INT()

Compute Type of Variable? !49 rules type-check : Let([VarDec(x, e)], e_body) -> t where <type-check> e => t_e; <type-check> e_body => t type-check : Var(x) -> t // ??? let var x := 1 in x + 1 end Mod( Let( [VarDec("x", Int("1"))] , [Plus(Var("x"), Int("1"))] ) ) ?

Type Checking Variable Bindings !50 rules type-check(|env) : Let([VarDec(x, e)], e_body) -> t where <type-check(|env)> e => t_e; <type-check(|[(x, t_e) | env])> e_body => t type-check(|env) : Var(x) -> t where <fetch(?(x, t))> env let var x := 1 in x + 1 end INT()Store association between variable and type in type environment Mod( Let( [VarDec("x", Int("1"))] , [Plus(Var("x"), Int("1"))] ) )

Pass Environment to Sub-Expressions !51 rules type-check : Mod(e) -> t where <type-check(|[])> e => t type-check(|env) : String(_) -> STRING() type-check(|env) : Int(_) -> INT() type-check(|env) : Plus(e1, e2) -> INT() where <type-check(|env)> e1 => INT(); <type-check(|env)> e2 => INT() type-check(|env) : Times(e1, e2) -> INT() where <type-check(|env)> e1 => INT(); <type-check(|env)> e2 => INT() let var x := 1 in x + 1 end INT() Mod( Let( [VarDec("x", Int("1"))] , [Plus(Var("x"), Int("1"))] ) )

Type checking ill-typed/named programs? - add rules for ‘bad’ cases More complicated name binding patterns? - Hoisting of variables in JavaScript functions - Mutually recursive bindings - Possible approaches ‣ Multiple traversals over program ‣ Defer checking until entire scope is processed ‣ … !52 But what about?

!53 Name Binding Complicates Type Checking

Language = Set of Trees !55 Fun AddArgDecl VarRefId Int “1” VarDeclId “x” TXINT “x” Tree is a convenient interface for transforming programs

Language = Set of Graphs !56 Fun AddArgDecl VarRefId Int “1” VarDeclId TXINT Edges from references to declarations

Fun AddArgDecl VarRefId Int “1” VarDeclId “x” TXINT “x” Language = Set of Graphs !57 Names are placeholders for edges in linear / tree representation

!58 What are the commonalities of name binding rules in programming languages?

Variables !60 let function fact(n : int) : int = if n < 1 then 1 else n * fact(n - 1) in fact(10) end

Function Calls !61 let function fact(n : int) : int = if n < 1 then 1 else n * fact(n - 1) in fact(10) end

!62 function prettyprint(tree: tree) : string = let var output := "" function write(s: string) = output := concat(output, s) function show(n: int, t: tree) = let function indent(s: string) = (write("n"); for i := 1 to n do write(" "); output := concat(output, s)) in if t = nil then indent(".") else (indent(t.key); show(n+1, t.left); show(n+1, t.right)) end in show(0, tree); output end Nested Scopes (Shadowing)

!63 let type point = { x : int, y : int } var origin := point { x = 1, y = 2 } in origin.x := 10; origin := nil end Type References

!64 let type point = { x : int, y : int } var origin := point { x = 1, y = 2 } in origin.x := 10; origin := nil end Record Fields

!65 let type point = { x : int, y : int } var origin := point { x = 1, y = 2 } in origin.x := 10; origin := nil end Type Dependent Name Resolution

Used in many different language artifacts - compiler, interpreter, semantics, IDE, refactoring Binding rules encoded in many different and ad-hoc ways - symbol tables, environments, substitutions No standard approach to formalization - there is no BNF for name binding No reuse of binding rules between artifacts - how do we know substitution respects binding rules? !66 Name Binding is Pervasive

What are the name binding rules of a language? - What are introductions of names? - What are uses of names? - What are the shadowing rules? - What are the name spaces? How can we deﬁne the name binding rules of a language? - What is the BNF for name binding? !67 Name Binding Rules

Declarative Speciﬁcation of Name Binding Rules 68

Representation - Standardized representation for <aspect> of programs - Independent of specific object language Specification Formalism - Language-specific declarative rules - Abstract from implementation concerns Language-Independent Interpretation - Formalism interpreted by language-independent algorithm - Multiple interpretations for different purposes - Reuse between implementations of different languages !69 Separation of Concerns in Definition of Programming Languages

Representation: (Abstract Syntax) Trees - Standardized representation for structure of programs - Basis for syntactic and semantic operations Formalism: Syntax Definition - Productions + Constructors + Templates + Disambiguation - Language-specific rules: structure of each language construct Language-Independent Interpretation - Well-formedness of abstract syntax trees ‣ provides declarative correctness criterion for parsing - Parsing algorithm !70 Separation of Concerns in Syntax Definition

Representation - To conduct and represent the results of name resolution Declarative Rules - To deﬁne name binding rules of a language Language-Independent Tooling - Name resolution - Code completion - Refactoring - … !71 Wanted: Separation of Concerns in Name Binding

DSLs for specifying binding structure of a (target) language - Ott [Sewell+10] - Romeo [StansiferWand14] - Unbound [Weirich+11] - Cαml [Pottier06] - NaBL [Konat+12] Generate code to do resolution and record results What is the semantics of such a name binding language? !72 Name Binding Languages

Attempt: NaBL Name Binding Language !73 binding rules // variables Param(t, x) : defines Variable x of type t Let(bs, e) : scopes Variable Bind(t, x, e) : defines Variable x of type t Var(x) : refers to Variable x Declarative Name Binding and Scope Rules Gabriël D. P. Konat, Lennart C. L. Kats, Guido Wachsmuth, Eelco Visser SLE 2012 Declarative speciﬁcation Abstracts from implementation Incremental name resolution But: How to explain it to Coq? What is the semantics of NaBL?

Attempt: NaBL Name Binding Language !74 Declarative Name Binding and Scope Rules Gabriël D. P. Konat, Lennart C. L. Kats, Guido Wachsmuth, Eelco Visser SLE 2012 Especially: What is the semantics of imports? binding rules // classes Class(c, _, _, _) : defines Class c of type ClassT(c) scopes Field, Method, Variable Extends(c) : imports Field, Method from Class c ClassT(c) : refers to Class c New(c) : refers to Class c

What is semantics of binding language? - the meaning of a binding speciﬁcation for language L should be given by - a function from L programs to their “resolution structures” So we need - a (uniform, language-independent) method for describing such resolution structures ... - … that can be used to compute the resolution of each program identiﬁer - (or to verify that a claimed resolution is valid) That supports - Handle broad range of language binding features ... - … using minimal number of constructs - Make resolution structure language-independent - Handle named collections of names (e.g. modules, classes, etc.) within the theory - Allow description of programs with resolution errors !75 Approach

Name Resolution in Scope Graphs 76

Representation - ? Declarative Rules - To deﬁne name binding rules of a language Language-Independent Tooling - Name resolution - Code completion - Refactoring - … !77 Separation of Concerns in Name Binding Scope Graphs

!78 let function fact(n : int) : int = if n < 1 then 1 else n * fact(n - 1) in fact(10) end fact S1 fact S2n n fact nn Scope GraphProgram Name Resolution

A Calculus for Name Resolution !79 S R1 R2 SR SRS I(R1 ) S’S S’S P Sx Sx R xS xS D Path in scope graph connects reference to declaration Scopes, References, Declarations, Parents, Imports Neron, Tolmach, Visser, Wachsmuth A Theory of Name Resolution ESOP 2015

Simple Scopes !80 Reference Declaration Scope Reference Step Declaration Step def y1 = x2 + 1 def x1 = 5 S0 def y1 = x2 + 1 def x1 = 5 Sx Sx R xS xS D S0 y1 x1S0 y1 x1S0x2 R y1 x1S0x2 R D y1 x1S0x2 S x x

Lexical Scoping !81 S1 S0 Parent def x1 = z2 5 def z1 = fun y1 { x2 + y2 } S’S S’S P z1 x1S0z2 S1 y1y2 x2 z1 x1S0z2 S1 y1y2 x2 z1 x1S0z2 S1 y1y2 x2 z1 x1S0z2 R S1 y1y2 x2 z1 x1S0z2 R D S1 y1y2 x2 z1 x1S0z2 R S1 y1y2 x2 z1 x1S0z2 R P S1 y1y2 x2 z1 x1S0z2 R P D S’S Parent Step

Imports !82 Associated scope Import S0 SB SA module A1 { def z1 = 5 } module B1 { import A2 def x1 = 1 + z2 } A1 SA z1 B1 SB z2 S0 A2 x1 S R1 R2 SR A1 SA z1 B1 SB z2 S0 A2 x1 A1 SA z1 B1 SB z2 S0 A2 x1 A1 SA z1 B1 SB z2 S0 A2 x1 R A1 SA z1 B1 SB z2 S0 A2 x1 R R A1 SA z1 B1 SB z2 S0 A2 x1 R R P A1 SA z1 B1 SB z2 S0 A2 x1 R R P D A1 SA z1 B1 SB z2 S0 A2 x1 I(A2 )R R P D A1 SA z1 B1 SB z2 S0 A2 x1 I(A2 )R R D P D S R1 R2 SR SRS I(R1) Import Step

Qualiﬁed Names !83 module N1 { def s1 = 5 } module M1 { def x1 = 1 + N2.s2 } S0 N1 SN s2 S0 N2 R D R I(N2 ) D X1 s1 N1 SN s2 S0 N2 R D X1 s1 N1 SN s2 S0 N2 X1 s1

A Calculus for Name Resolution !84 S R1 R2 SR SRS I(R1 ) S’S S’S P Sx Sx R xS xS D Reachability of declarations from references through scope graph edges How about ambiguities? References with multiple paths

A Calculus for Name Resolution !85 S R1 R2 SR SRS I(R1 ) S’S S’S P Sx Sx R xS xS D I(_).p’ < P.p D < I(_).p’ D < P.p s.p < s.p’ p < p’ Visibility Well formed path: R.P*.I(_)*.D Reachability

Ambiguous Resolutions !86 match t with | A x | B x => … z1 x2 x1S0x3 z1 x2 x1S0x3 R z1 x2 x1S0x3 R D z1 x2 x1S0x3 R D z1 x2 x1S0x3 R D D S0def x1 = 5 def x2 = 3 def z1 = x3 + 1

Shadowing !87 S0 S1 S2 D < P.p s.p < s.p’ p < p’ S1 S2 x1 y1 y2 x2 z1 x3S0z2 def x3 = z2 5 7 def z1 = fun x1 { fun y1 { x2 + y2 } } S1 S2 x1 y1 y2 x2 z1 x3S0z2 R P P D S1 S2 x1 y1 y2 x2 z1 x3S0z2 D P R R P P D R.P.D < R.P.P.D

Imports shadow Parents !88 I(_).p’ < P.p R.I(A2).D < R.P.D A1 SA z1 B1 SB z2 S0 A2 x1 z3 A1 SA z1 B1 SB z2 S0 A2 x1 I(A2)R D z3 A1 SA z1 B1 SB z2 S0 A2 x1 I(A2)R D P D z3 R S0def z3 = 2 module A1 { def z1 = 5 } module B1 { import A2 def x1 = 1 + z2 } SA SB

Imports vs. Includes !89 S0def z3 = 2 module A1 { def z1 = 5 } import A2 def x1 = 1 + z2 SA A1 SA z1 z2 S0 A2 x1 I(A2) R D z3 R D D < I(_).p’ R.D < R.I(A2).D A1 SA z1 z2 S0 A2 x1 R z3 D A1 SA z1 z2 S0 A2 x1z3 S0def z3 = 2 module A1 { def z1 = 5 } include A2 def x1 = 1 + z2 SA X

Import Parents !90 def s1 = 5 module N1 { } def x1 = 1 + N2.s2 S0 SN def s1 = 5 module N1 { } def x1 = 1 + N2.s2 S0 SN Well formed path: R.P*.I(_)*.D N1 SN s2 S0 N2 X1 s1 R I(N2 ) D P N1 SN s2 S0 N2 X1 s1

Transitive vs. Non-Transitive !91 With transitive imports, a well formed path is R.P*.I(_)*.D With non-transitive imports, a well formed path is R.P*.I(_)?.D A1 SA z1 B1 SB S0 A2 C1 SCz2 x1 B2 A1 SA z1 B1 SB S0 A2 I(A2) D C1 SCz2 I(B2) R x1 B2 ?? module A1 { def z1 = 5 } module B1 { import A2 } module C1 { import B2 def x1 = 1 + z2 } SA SB SC

A Calculus for Name Resolution !92 S R1 R2 SR SRS I(R1 ) S’S S’S P Sx Sx R xS xS D I(_).p’ < P.p D < I(_).p’ D < P.p s.p < s.p’ p < p’ Visibility Well formed path: R.P*.I(_)*.D Reachability

Visibility Policies !93 Lexical scope L := {P} E := P⇤ D < P Non-transitive imports L := {P, I} E := P⇤ · I? D < P, D < I, I < P Transitive imports L := {P, TI} E := P⇤ · TI⇤ D < P, D < TI, TI < P Transitive Includes L := {P, Inc} E := P⇤ · Inc⇤ D < P, Inc < P Transitive includes and imports, and non-transitive imports L := {P, Inc, TI, I} E := P⇤ · (Inc | TI)⇤ · I? D < P, D < TI, TI < P, Inc < P, D < I, I < P, Figure 10. Example reachability and visibility policies by instan- R[I](xR Envre [I, S](S EnvL re [I, S](S EnvD re [I, S](S Envl re [I, S](S

More Examples 94 From TUD-SERG-2015-001

Let Bindings !95 1 2 b2a1 c3 a4 b6 c12a10 b11 c5 b9 a7 c8 def a1 = 0 def b2 = 1 def c3 = 2 letpar a4 = c5 b6 = a7 c8 = b9 in a10+b11+c12 1 b2a1 c3 a4 b6 c12a10 b11 c5 b9 a7 c8 2 def a1 = 0 def b2 = 1 def c3 = 2 letrec a4 = c5 b6 = a7 c8 = b9 in a10+b11+c12 1 b2a1 c3 a4 b6 c12a10 b11 c5 b9 a7 c8 2 4 3 def a1 = 0 def b2 = 1 def c3 = 2 let a4 = c5 b6 = a7 c8 = b9 in a10+b11+c12

Deﬁnition before Use / Use before Deﬁnition !96 class C1 { int a2 = b3; int b4; void m5 (int a6) { int c7 = a8 + b9; int b10 = b11 + c12; } int c12; } 0 C1 1 2 a2 b4 c12 b3 m5 a6 3 4 c7 b10 b9 a8 b11 c12

Inheritance !97 class C1 { int f2 = 42; } class D3 extends C4 { int g5 = f6; } class E7 extends D8 { int f9 = g10; int h11 = f12; } 32 1 C4 C1 4D3 E7 D8 f2 g5 f6 f9 g10 f12 h11

Java Packages !98 package p3; class D4 {} package p1; class C2 {} 1 p3p1 2 p1 p3 3 D4C2

Java Import !99 package p1; imports r2.*; imports q3.E4; public class C5 {} class D6 {} 4 p1 D6C5 3 2r2 1 p1 E4 q3

C# Namespaces and Partial Classes !100 namespace N1 { using M2; partial class C3 { int f4; } } namespace N5 { partial class C6 { int m7() { return f8; } } } 1 3 6 4 7 8 C3 C6 N1 N5 f4 m7 N1 N5 C3 C6 f8 2 M2 5

Static analysis - Properties that can be checked of the (‘static’) program text - Decide which properties to encode in syntax deﬁnition vs checker - Strict vs liberal syntax Context-sensitive properties - Some properties are intrinsically context-sensitive - In particular: names Declarative speciﬁcation of name binding rules - Common (language agnostic) understanding of name binding  !102 Static Analysis

Representation: Scope Graphs - Standardized representation for lexical scoping structure of programs - Path in scope graph relates reference to declaration - Basis for syntactic and semantic operations Formalism: Name Binding Constraints - References + Declarations + Scopes + Reachability + Visibility - Language-speciﬁc rules map AST to constraints Language-Independent Interpretation - Resolution calculus: correctness of path with respect to scope graph - Name resolution algorithm - Alpha equivalence - Mapping from graph to tree (to text) - Refactorings - And many other applications !103 A Theory of Name Resolution

We have modeled a large set of example binding patterns - definition before use - different let binding flavors - recursive modules - imports and includes - qualified names - class inheritance - partial classes Next goal: fully model some real languages - In progress: Go, Rust, TypeScript - Java, ML !104 Validation

Scope graph semantics for binding languages [OOPSLA18] - starting with NaBL - or rather: a redesign of NaBL based on scope graphs Dynamic analogs to static scope graphs [ECOOP16] - how does scope graph relate to memory at run-time? Supporting mechanized language meta-theory [POPL18] - relating static and dynamic bindings Resolution-sensitive program transformations - renaming, refactoring, substitution, … !105 Ongoing/Future Work

Representation - To conduct and represent the results of name resolution Declarative Rules - To deﬁne name binding rules of a language Language-Independent Tooling - Name resolution - Code completion - Refactoring - … !107 Separation of Concerns in Name Binding

Representation - ? Declarative Rules - To deﬁne name binding rules of a language Language-Independent Tooling - Name resolution - Code completion - Refactoring - … !108 Separation of Concerns in Name Binding Scope Graphs

Representation - ? Declarative Rules - ? Language-Independent Tooling - Name resolution - Code completion - Refactoring - … !109 Separation of Concerns in Name Binding Scope & Type Constraint Rules [PEPM16] Scope Graphs

Next: Constraint-Based Type Checkers !110 Program AST Parse Constraints Generate Constraints Solution Resolve Constraints Language Speciﬁc Language Independent

!111 Except where otherwise noted, this work is licensed under

Compiler Construction | Lecture 6 | Introduction to Static Analysis

More Related Content

What's hot

Similar to Compiler Construction | Lecture 6 | Introduction to Static Analysis

More from Eelco Visser

Recently uploaded

Compiler Construction | Lecture 6 | Introduction to Static Analysis