Eelco Visser CS4200 Compiler Construction TU Delft October 2018 Lecture 6: Introduction to Static Analysis
This Lecture !2 Source Code Editor Parse Abstract Syntax Tree Type Check Check that names are used correctly and that expressions are well-typed Errors
Reading Material 3 The following papers add background, conceptual exposition, and examples to the material from the slides. Some notation and technical details have been changed; check the documentation.
!4 Type checkers are algorithms that check names and types in programs. This paper introduces the NaBL Name Binding Language, which supports the declarative definition of the name binding and scope rules of programming languages through the definition of scopes, declarations, references, and imports associated. http://dx.doi.org/10.1007/978-3-642-36089-3_18 SLE 2012 Background
!5 This paper introduces scope graphs as a language-independent representation for the binding information in programs. http://dx.doi.org/10.1007/978-3-662-46669-8_9 ESOP 2015 Best EAPLS paper at ETAPS 2015
!6 Separating type checking into constraint generation and constraint solving provides more declarative definition of type checkers. This paper introduces a constraint language integrating name resolution into constraint resolution through scope graph constraints. This is the basis for the design of the NaBL2 static semantics specification language. https://doi.org/10.1145/2847538.2847543 PEPM 2016
!7 Documentation for NaBL2 at the metaborg.org website. http://www.metaborg.org/en/latest/source/langdev/meta/lang/nabl2/index.html
!8 This paper describes the next generation of the approach. Addresses (previously) open issues in expressiveness of scope graphs for type systems: - Structural types - Generic types Addresses open issue with staging of information in type systems. Introduces Statix DSL for definition of type systems. Prototype of Statix is available in Spoofax HEAD, but not ready for use in project yet. The future OOPSLA 2018 To appear
Why Type Checking? 9
Dynamically Typed vs Statically Typed - Dynamic: type checking at run-time - Static: type checking at compile-time (before run-time) What does it mean to type check? - Type safety: guarantee absence of run-time type errors Why static type checking? - Avoid overhead of run-time type checking - Fail faster: find (type) errors at compile time - Find all (type) errors: some errors may not be triggered by testing - But: not all errors can be found statically (e.g. array bounds checking) !10 Why Type Checking? Some Discussion Points
Context-Sensitive Properties 11
Homework Assignment: What is the Syntax of This Language? !12 <languages> <language> <name>SDF3</name> <purpose>Syntax Definition</purpose> <implementedin>SDF3</implementedin> </language> <language> <name>Stratego</name> <purpose>Transformation</purpose> <implementedin>SDF3</implementedin> <implementedin>Stratego</implementedin> <target>Java</target> </language> </languages> <catalogue> <book> <title>Modern Compiler Implementation in ML</title> <author>Andrew Appel</author> <publisher>Cambridge</publisher> </book> <book> <title>Parsing Schemata</title> <author>Klaas Sikkel</author> <publisher>Springer</publisher> </book> </catalogue>
Syntax of Book Catalogues !13 context-free syntax Document.Catalogue = [ <catalogue> [Book*] </catalogue> ] Book.Book = [ <book> [Title] [Author] [Publisher] </book> ] …Schema-specific syntax definition <catalogue> <book> <title>Modern Compiler Implementation in ML</title> <author>Andrew Appel</author> <publisher>Cambridge</publisher> </book> <book> <title>Parsing Schemata</title> <author>Klaas Sikkel</author> <publisher>Springer</publisher> </book> </catalogue>
A Generic Syntax of XML Documents !14 context-free start-symbols Document sorts Tag Word lexical syntax Tag = [a-zA-Z][a-zA-Z0-9]* Word = ~[<>]+ lexical restrictions Word -/- ~[<>] sorts Document Elem context-free syntax Document.Doc = Elem Elem.Node = [ <[Tag]> [Elem*] </[Tag]> ] Elem.Text = Word+ {longest-match} <catalogue> <book> <title>Modern Compiler Implementation in ML</title> <author>Andrew Appel</author> <publisher>Cambridge</publisher> </book> <book> <title>Parsing Schemata</title> <author>Klaas Sikkel</author> <publisher>Springer</publisher> </book> </catalogue> Doc( Node( "catalogue" , [ Node( "book" , [ Node("title", [Text(["Modern Compiler Implementation in ML"])], "title") , Node("author", [Text(["Andrew Appel"])], "author") , Node("publisher", [Text(["Cambridge"])], "publisher") ] , "book" ) , Node( "book" , [ Node("title", [Text(["Parsing Schemata"])], "title") , Node("author", [Text(["Klaas Sikkel"])], "author") , Node("publisher", [Text(["Springer"])], "publisher") ] , "book" ) ] , "catalogue" ) )
A Generic Syntax of XML Documents !15 context-free start-symbols Document sorts Tag Word lexical syntax Tag = [a-zA-Z][a-zA-Z0-9]* Word = ~[<>]+ lexical restrictions Word -/- ~[<>] sorts Document Elem context-free syntax Document.Doc = Elem Elem.Node = [ <[Tag]> [Elem*] </[Tag]> ] Elem.Text = Word+ {longest-match} <catalogue> <book> <title>Modern Compiler Implementation in ML</title> <author>Andrew Appel</author> <publisher>Cambridge</publisher> </book> <book> <title>Parsing Schemata</title> <author>Klaas Sikkel</author> <publisher>Springer</publisher> </book> </catalogue> What is the problem with this approach?
Generic Syntax is Too Liberal! !16 context-free start-symbols Document sorts Tag Word lexical syntax Tag = [a-zA-Z][a-zA-Z0-9]* Word = ~[<>]+ lexical restrictions Word -/- ~[<>] sorts Document Elem context-free syntax Document.Doc = Elem Elem.Node = [ <[Tag]> [Elem*] </[Tag]> ] Elem.Text = Word+ {longest-match} <catalogue> <book> <title>Modern Compiler Implementation in ML</title> <author>Andrew Appel</author> <publisher>Cambridge</publisher> <year></year> </book> <language> <name>SDF3</name> <purpose>Syntax Definition</purpose> <implementedin>SDF3</implementedin> </language> <book> //<title>Parsing Schemata</title> <author>Klaas Sikkel</author> <publisher>Springer</publisher> </book> <book> </koob> </catalogue> Year is not a valid element of book Only books in catalogue Book should have a title Closing tag is not consistent with starting tag
Context-free grammar is … context-free! - Cannot express alignment Languages have context-sensitive properties How can we have our cake and eat it too? - Generic (liberal) syntax - Forbid programs/documents that are not well-formed !17 Context-Sensitive Properties
Checking Context-Sensitive Properties 18
Generic (liberal) syntax - Allow more programs/documents Check properties on AST - Reject programs/documents that are not well-formed !19 Approach: Checking Context-Sensitive Properties
Doc( Node( "catalogue" , [ Node( "book" , [ Node("title", [Text(["Modern Compiler Implementation in ML"])], "title") , Node("author", [Text(["Andrew Appel"])], "author") , Node("publisher", [Text(["Cambridge"])], "publisher") , Node("year", [], "year") ] , "book" ) , Node( "language" , [ Node("name", [Text(["SDF3"])], "name") , Node("purpose", [Text(["Syntax Definition"])], "purpose") , Node("implementedin", [Text(["SDF3"])], "implementedin") ] , "language" ) , Node( "book" , [ Node("author", [Text(["Klaas Sikkel"])], "author") , Node("publisher", [Text(["Springer"])], "publisher") ] , "book" ) , Node("book", [], "koob") ] , "catalogue" ) ) Checking Context-Context-Sensitive Properties in Spoofax !20 rules // Analysis editor-analyze: (ast, path, project-path) -> (ast', error*, warning*, info*) with ast' := <id> ast ; error* := <collect-all(constraint-error)> ast' ; warning* := <collect-all(constraint-warning)> ast' ; info* := <collect-all(constraint-info)> ast' parse errors
Check Violations using Constraint-Error Rules !21 <catalogue> <book> <title>Modern Compiler Implementation in ML</title> <author>Andrew Appel</author> <publisher>Cambridge</publisher> <year></year> </book> <language> <name>SDF3</name> <purpose>Syntax Definition</purpose> <implementedin>SDF3</implementedin> </language> <book> //<title>Parsing Schemata</title> <author>Klaas Sikkel</author> <publisher>Springer</publisher> </book> <book> </koob> </catalogue> rules // Analysis editor-analyze: (ast, path, project-path) -> (ast', error*, warning*, info*) with ast' := <id> ast ; error* := <collect-all(constraint-error)> ast' ; warning* := <collect-all(constraint-warning)> ast' ; info* := <collect-all(constraint-info)> ast' Find all sub-terms that are not consistent with the context- sensitive rules collect-all: type-unifying generic traversal
Check Violations using Constraint-Error Rules !22 <catalogue> <book> <title>Modern Compiler Implementation in ML</title> <author>Andrew Appel</author> <publisher>Cambridge</publisher> <year></year> </book> <language> <name>SDF3</name> <purpose>Syntax Definition</purpose> <implementedin>SDF3</implementedin> </language> <book> //<title>Parsing Schemata</title> <author>Klaas Sikkel</author> <publisher>Springer</publisher> </book> <book> </koob> </catalogue> Closing tag does not match starting tag rules constraint-error : Node(tag1, elems, tag2) -> (tag2, $[Closing tag does not match starting tag]) where <not(eq)>(tag1, tag2) Origin Error message
Check Violations using Constraint-Error Rules !23 <catalogue> <book> <title>Modern Compiler Implementation in ML</title> <author>Andrew Appel</author> <publisher>Cambridge</publisher> <year></year> </book> <language> <name>SDF3</name> <purpose>Syntax Definition</purpose> <implementedin>SDF3</implementedin> </language> <book> //<title>Parsing Schemata</title> <author>Klaas Sikkel</author> <publisher>Springer</publisher> </book> <book> </koob> </catalogue> Book should have a title rules constraint-error : n@Node(tag@"book", elems, _) -> (tag, $[Book should have title]) where <not(has(|"title"))> elems has(|tag) = fetch(?Node(tag, _, _)) Containment checks
Check Violations using Constraint-Error Rules !24 <catalogue> <book> <title>Modern Compiler Implementation in ML</title> <author>Andrew Appel</author> <publisher>Cambridge</publisher> <year></year> </book> <language> <name>SDF3</name> <purpose>Syntax Definition</purpose> <implementedin>SDF3</implementedin> </language> <book> //<title>Parsing Schemata</title> <author>Klaas Sikkel</author> <publisher>Springer</publisher> </book> <book> </koob> </catalogue> Catalogue can only have books rules constraint-error : Node("catalogue", elems, _) -> error where <filter(not-a-book)> elems => [error | _] not-a-book : Node(tag, _, _) -> (tag, $[Catalogue can only have books]) where <not(eq)> (tag, "book") Containment checks
Check Violations using Constraint-Error Rules !25 <catalogue> <book> <title>Modern Compiler Implementation in ML</title> <author>Andrew Appel</author> <publisher>Cambridge</publisher> <year></year> </book> <language> <name>SDF3</name> <purpose>Syntax Definition</purpose> <implementedin>SDF3</implementedin> </language> <book> //<title>Parsing Schemata</title> <author>Klaas Sikkel</author> <publisher>Springer</publisher> </book> <book> </koob> </catalogue> Book can only have title, author, publisher rules constraint-error : Node("book", elems, _) -> error where <filter(not-a-book-elem)> elems => [error | _] not-a-book-elem : Node(tag, _, _) -> (tag, $[Book can only have title, author, publisher]) where <not(elem())> (tag, ["title", "author", "publisher"]) Containment checks
Generic (liberal) syntax - Allow more programs/documents - `permissive syntax’ Check properties on AST - Reject programs/documents that are not well-formed Advantage - Smaller syntax definition - Parser does not fail (so often) - Better error messages than parser can give !26 Approach: Checking Context-Sensitive Properties
How are programming languages different from XML? 27
XML checking - Tag consistency - Schema consistency - These are structural properties Programming languages - Type consistency (similar to schema?) - Name consistency: declarations and references should correspond - Name dependent type consistency: names carry types !28 Programming Languages vs XML
Static Analysis for Tiger 29
Scope !30 let var x : int := 0 + z var y : int := x + 1 var z : int := x + y + 1 in x + y + z end
Scope: Definition before Use !31 let var x : int := 0 + z // z not in scope var y : int := x + 1 var z : int := x + y + 1 in x + y + z end
Mutual Recursion !32 let function odd(x : int) : int = if x > 0 then even(x - 1) else false function even(x : int) : int = if x > 0 then odd(x - 1) else true in even(34) end let function odd(x : int) : int = if x > 0 then even(x - 1) else false var x : int function even(x : int) : int = if x > 0 then odd(x - 1) else true in even(34) end
Mutually Recursive Functions should be Adjacent !33 let function odd(x : int) : int = if x > 0 then even(x - 1) else false function even(x : int) : int = if x > 0 then odd(x - 1) else true in even(34) end let function odd(x : int) : int = if x > 0 then even(x - 1) else false var x : int function even(x : int) : int = if x > 0 then odd(x - 1) else true in even(34) end
Name Spaces !34 let type foo = int function foo(x : foo) : foo = 3 var foo : foo := foo(4) in foo(56) + foo end
Functions and Variables in Same Name Space !35 let type foo = int function foo(x : foo) : foo = 3 var foo : foo := foo(4) in foo(56) + foo // both refer to the variable foo end Functions and variables are in the same namespace
Type Dependent Name Resolution !36 let type point = {x : int, y : int} var origin : point := point { x = 1, y = 2 } in origin.x end
Type Dependent Name Resolution !37 let type point = {x : int, y : int} var origin : point := point { x = 1, y = 2 } in origin.x end Resolving origin.x requires the type of origin
Name Correspondence !38 let type point = {x : int, y : int} type errpoint = {x : int, x : int} var p : point var e : errpoint in p := point{ x = 3, y = 3, z = "a" } p := point{ x = 3 } end
Name Set Correspondence !39 let type point = {x : int, y : int} type errpoint = {x : int, x : int} var p : point var e : errpoint in p := point{ x = 3, y = 3, z = "a" } p := point{ x = 3 } end Field “y” not initialized Reference “z” not resolved Duplicate Declaration of Field “x”
Recursive Types !40 let type intlist = {hd : int, tl : intlist} type tree = {key : int, children : treelist} type treelist = {hd : tree, tl : treelist} var l : intlist var t : tree var tl : treelist in l := intlist { hd = 3, tl = l }; t := tree { key = 2, children = treelist { hd = tree{ key = 3, children = 3 }, tl = treelist{ } } }; t.children.hd.children := t.children end
Recursive Types !41 let type intlist = {hd : int, tl : intlist} type tree = {key : int, children : treelist} type treelist = {hd : tree, tl : treelist} var l : intlist var t : tree var tl : treelist in l := intlist { hd = 3, tl = l }; t := tree { key = 2, children = treelist { hd = tree{ key = 3, children = 3 }, tl = treelist{ } } }; t.children.hd.children := t.children end type mismatch Field "tl" not initialized Field "hd" not initialized
Intermezzo: Testing Static Analysis 42
Testing Name Resolution !43 test inner name [[ let type t = u type u = int var x: u := 0 in x := 42 ; let type [[u]] = t var y: [[u]] := 0 in y := 42 end end ]] resolve #2 to #1 test outer name [[ let type t = u type [[u]] = int var x: [[u]] := 0 in x := 42 ; let type u = t var y: u := 0 in y := 42 end end ]] resolve #2 to #1
Testing Type Checking !44 test variable reference [[ let type t = u type u = int var x: u := 0 in x := 42 ; let type u = t var y: u := 0 in y := [[x]] end end ]] run get-type to INT() test integer constant [[ let type t = u type u = int var x: u := 0 in x := 42 ; let type u = t var y: u := 0 in y := [[42]] end end ]] run get-type to INT()
Testing Errors !45 test undefined variable [[ let type t = u type u = int var x: u := 0 in x := 42 ; let type u = t var y: u := 0 in y := [[z]] end end ]] 1 error test type error [[ let type t = u type u = string var x: u := 0 in x := 42 ; let type u = t var y: u := 0 in y := [[x]] end end ]] 1 error
Test Corner Cases !46 context-free superset language
Implementing a Type Checker with Rewrite Rules 47
Compute Type of Expression !48 rules type-check : Mod(e) -> t where <type-check> e => t type-check : String(_) -> STRING() type-check : Int(_) -> INT() type-check : Plus(e1, e2) -> INT() where <type-check> e1 => INT(); <type-check> e2 => INT() type-check : Times(e1, e2) -> INT() where <type-check> e1 => INT(); <type-check> e2 => INT() 1 + 2 * 3 Mod( Plus(Int("1"), Times(Int("2"), Int("3"))) ) INT()
Compute Type of Variable? !49 rules type-check : Let([VarDec(x, e)], e_body) -> t where <type-check> e => t_e; <type-check> e_body => t type-check : Var(x) -> t // ??? let var x := 1 in x + 1 end Mod( Let( [VarDec("x", Int("1"))] , [Plus(Var("x"), Int("1"))] ) ) ?
Type Checking Variable Bindings !50 rules type-check(|env) : Let([VarDec(x, e)], e_body) -> t where <type-check(|env)> e => t_e; <type-check(|[(x, t_e) | env])> e_body => t type-check(|env) : Var(x) -> t where <fetch(?(x, t))> env let var x := 1 in x + 1 end INT()Store association between variable and type in type environment Mod( Let( [VarDec("x", Int("1"))] , [Plus(Var("x"), Int("1"))] ) )
Pass Environment to Sub-Expressions !51 rules type-check : Mod(e) -> t where <type-check(|[])> e => t type-check(|env) : String(_) -> STRING() type-check(|env) : Int(_) -> INT() type-check(|env) : Plus(e1, e2) -> INT() where <type-check(|env)> e1 => INT(); <type-check(|env)> e2 => INT() type-check(|env) : Times(e1, e2) -> INT() where <type-check(|env)> e1 => INT(); <type-check(|env)> e2 => INT() let var x := 1 in x + 1 end INT() Mod( Let( [VarDec("x", Int("1"))] , [Plus(Var("x"), Int("1"))] ) )
Type checking ill-typed/named programs? - add rules for ‘bad’ cases More complicated name binding patterns? - Hoisting of variables in JavaScript functions - Mutually recursive bindings - Possible approaches ‣ Multiple traversals over program ‣ Defer checking until entire scope is processed ‣ … !52 But what about?
!53 Name Binding Complicates Type Checking
Name Binding 54
Language = Set of Trees !55 Fun AddArgDecl VarRefId Int “1” VarDeclId “x” TXINT “x” Tree is a convenient interface for transforming programs
Language = Set of Graphs !56 Fun AddArgDecl VarRefId Int “1” VarDeclId TXINT Edges from references to declarations
Fun AddArgDecl VarRefId Int “1” VarDeclId “x” TXINT “x” Language = Set of Graphs !57 Names are placeholders for edges in linear / tree representation
!58 What are the commonalities of name binding rules in programming languages?
Name Binding Patterns 59
Variables !60 let function fact(n : int) : int = if n < 1 then 1 else n * fact(n - 1) in fact(10) end
Function Calls !61 let function fact(n : int) : int = if n < 1 then 1 else n * fact(n - 1) in fact(10) end
!62 function prettyprint(tree: tree) : string = let var output := "" function write(s: string) = output := concat(output, s) function show(n: int, t: tree) = let function indent(s: string) = (write("n"); for i := 1 to n do write(" "); output := concat(output, s)) in if t = nil then indent(".") else (indent(t.key); show(n+1, t.left); show(n+1, t.right)) end in show(0, tree); output end Nested Scopes (Shadowing)
!63 let type point = { x : int, y : int } var origin := point { x = 1, y = 2 } in origin.x := 10; origin := nil end Type References
!64 let type point = { x : int, y : int } var origin := point { x = 1, y = 2 } in origin.x := 10; origin := nil end Record Fields
!65 let type point = { x : int, y : int } var origin := point { x = 1, y = 2 } in origin.x := 10; origin := nil end Type Dependent Name Resolution
Used in many different language artifacts - compiler, interpreter, semantics, IDE, refactoring Binding rules encoded in many different and ad-hoc ways - symbol tables, environments, substitutions No standard approach to formalization - there is no BNF for name binding No reuse of binding rules between artifacts - how do we know substitution respects binding rules? !66 Name Binding is Pervasive
What are the name binding rules of a language? - What are introductions of names? - What are uses of names? - What are the shadowing rules? - What are the name spaces? How can we define the name binding rules of a language? - What is the BNF for name binding? !67 Name Binding Rules
Declarative Specification of Name Binding Rules 68
Representation - Standardized representation for <aspect> of programs - Independent of specific object language Specification Formalism - Language-specific declarative rules - Abstract from implementation concerns Language-Independent Interpretation - Formalism interpreted by language-independent algorithm - Multiple interpretations for different purposes - Reuse between implementations of different languages !69 Separation of Concerns in Definition of Programming Languages
Representation: (Abstract Syntax) Trees - Standardized representation for structure of programs - Basis for syntactic and semantic operations Formalism: Syntax Definition - Productions + Constructors + Templates + Disambiguation - Language-specific rules: structure of each language construct Language-Independent Interpretation - Well-formedness of abstract syntax trees ‣ provides declarative correctness criterion for parsing - Parsing algorithm !70 Separation of Concerns in Syntax Definition
Representation - To conduct and represent the results of name resolution Declarative Rules - To define name binding rules of a language Language-Independent Tooling - Name resolution - Code completion - Refactoring - … !71 Wanted: Separation of Concerns in Name Binding
DSLs for specifying binding structure of a (target) language - Ott [Sewell+10] - Romeo [StansiferWand14] - Unbound [Weirich+11] - Cαml [Pottier06] - NaBL [Konat+12] Generate code to do resolution and record results What is the semantics of such a name binding language? !72 Name Binding Languages
Attempt: NaBL Name Binding Language !73 binding rules // variables Param(t, x) : defines Variable x of type t Let(bs, e) : scopes Variable Bind(t, x, e) : defines Variable x of type t Var(x) : refers to Variable x Declarative Name Binding and Scope Rules Gabriël D. P. Konat, Lennart C. L. Kats, Guido Wachsmuth, Eelco Visser SLE 2012 Declarative specification Abstracts from implementation Incremental name resolution But: How to explain it to Coq? What is the semantics of NaBL?
Attempt: NaBL Name Binding Language !74 Declarative Name Binding and Scope Rules Gabriël D. P. Konat, Lennart C. L. Kats, Guido Wachsmuth, Eelco Visser SLE 2012 Especially: What is the semantics of imports? binding rules // classes Class(c, _, _, _) : defines Class c of type ClassT(c) scopes Field, Method, Variable Extends(c) : imports Field, Method from Class c ClassT(c) : refers to Class c New(c) : refers to Class c
What is semantics of binding language? - the meaning of a binding specification for language L should be given by - a function from L programs to their “resolution structures” So we need - a (uniform, language-independent) method for describing such resolution structures ... - … that can be used to compute the resolution of each program identifier - (or to verify that a claimed resolution is valid) That supports - Handle broad range of language binding features ... - … using minimal number of constructs - Make resolution structure language-independent - Handle named collections of names (e.g. modules, classes, etc.) within the theory - Allow description of programs with resolution errors !75 Approach
Name Resolution in Scope Graphs 76
Representation - ? Declarative Rules - To define name binding rules of a language Language-Independent Tooling - Name resolution - Code completion - Refactoring - … !77 Separation of Concerns in Name Binding Scope Graphs
!78 let function fact(n : int) : int = if n < 1 then 1 else n * fact(n - 1) in fact(10) end fact S1 fact S2n n fact nn Scope GraphProgram Name Resolution
A Calculus for Name Resolution !79 S R1 R2 SR SRS I(R1 ) S’S S’S P Sx Sx R xS xS D Path in scope graph connects reference to declaration Scopes, References, Declarations, Parents, Imports Neron, Tolmach, Visser, Wachsmuth A Theory of Name Resolution ESOP 2015
Simple Scopes !80 Reference Declaration Scope Reference Step Declaration Step def y1 = x2 + 1 def x1 = 5 S0 def y1 = x2 + 1 def x1 = 5 Sx Sx R xS xS D S0 y1 x1S0 y1 x1S0x2 R y1 x1S0x2 R D y1 x1S0x2 S x x
Lexical Scoping !81 S1 S0 Parent def x1 = z2 5 def z1 = fun y1 { x2 + y2 } S’S S’S P z1 x1S0z2 S1 y1y2 x2 z1 x1S0z2 S1 y1y2 x2 z1 x1S0z2 S1 y1y2 x2 z1 x1S0z2 R S1 y1y2 x2 z1 x1S0z2 R D S1 y1y2 x2 z1 x1S0z2 R S1 y1y2 x2 z1 x1S0z2 R P S1 y1y2 x2 z1 x1S0z2 R P D S’S Parent Step
Imports !82 Associated scope Import S0 SB SA module A1 { def z1 = 5 } module B1 { import A2 def x1 = 1 + z2 } A1 SA z1 B1 SB z2 S0 A2 x1 S R1 R2 SR A1 SA z1 B1 SB z2 S0 A2 x1 A1 SA z1 B1 SB z2 S0 A2 x1 A1 SA z1 B1 SB z2 S0 A2 x1 R A1 SA z1 B1 SB z2 S0 A2 x1 R R A1 SA z1 B1 SB z2 S0 A2 x1 R R P A1 SA z1 B1 SB z2 S0 A2 x1 R R P D A1 SA z1 B1 SB z2 S0 A2 x1 I(A2 )R R P D A1 SA z1 B1 SB z2 S0 A2 x1 I(A2 )R R D P D S R1 R2 SR SRS I(R1) Import Step
Qualified Names !83 module N1 { def s1 = 5 } module M1 { def x1 = 1 + N2.s2 } S0 N1 SN s2 S0 N2 R D R I(N2 ) D X1 s1 N1 SN s2 S0 N2 R D X1 s1 N1 SN s2 S0 N2 X1 s1
A Calculus for Name Resolution !84 S R1 R2 SR SRS I(R1 ) S’S S’S P Sx Sx R xS xS D Reachability of declarations from references through scope graph edges How about ambiguities? References with multiple paths
A Calculus for Name Resolution !85 S R1 R2 SR SRS I(R1 ) S’S S’S P Sx Sx R xS xS D I(_).p’ < P.p D < I(_).p’ D < P.p s.p < s.p’ p < p’ Visibility Well formed path: R.P*.I(_)*.D Reachability
Ambiguous Resolutions !86 match t with | A x | B x => … z1 x2 x1S0x3 z1 x2 x1S0x3 R z1 x2 x1S0x3 R D z1 x2 x1S0x3 R D z1 x2 x1S0x3 R D D S0def x1 = 5 def x2 = 3 def z1 = x3 + 1
Shadowing !87 S0 S1 S2 D < P.p s.p < s.p’ p < p’ S1 S2 x1 y1 y2 x2 z1 x3S0z2 def x3 = z2 5 7 def z1 = fun x1 { fun y1 { x2 + y2 } } S1 S2 x1 y1 y2 x2 z1 x3S0z2 R P P D S1 S2 x1 y1 y2 x2 z1 x3S0z2 D P R R P P D R.P.D < R.P.P.D
Imports shadow Parents !88 I(_).p’ < P.p R.I(A2).D < R.P.D A1 SA z1 B1 SB z2 S0 A2 x1 z3 A1 SA z1 B1 SB z2 S0 A2 x1 I(A2)R D z3 A1 SA z1 B1 SB z2 S0 A2 x1 I(A2)R D P D z3 R S0def z3 = 2 module A1 { def z1 = 5 } module B1 { import A2 def x1 = 1 + z2 } SA SB
Imports vs. Includes !89 S0def z3 = 2 module A1 { def z1 = 5 } import A2 def x1 = 1 + z2 SA A1 SA z1 z2 S0 A2 x1 I(A2) R D z3 R D D < I(_).p’ R.D < R.I(A2).D A1 SA z1 z2 S0 A2 x1 R z3 D A1 SA z1 z2 S0 A2 x1z3 S0def z3 = 2 module A1 { def z1 = 5 } include A2 def x1 = 1 + z2 SA X
Import Parents !90 def s1 = 5 module N1 { } def x1 = 1 + N2.s2 S0 SN def s1 = 5 module N1 { } def x1 = 1 + N2.s2 S0 SN Well formed path: R.P*.I(_)*.D N1 SN s2 S0 N2 X1 s1 R I(N2 ) D P N1 SN s2 S0 N2 X1 s1
Transitive vs. Non-Transitive !91 With transitive imports, a well formed path is R.P*.I(_)*.D With non-transitive imports, a well formed path is R.P*.I(_)?.D A1 SA z1 B1 SB S0 A2 C1 SCz2 x1 B2 A1 SA z1 B1 SB S0 A2 I(A2) D C1 SCz2 I(B2) R x1 B2 ?? module A1 { def z1 = 5 } module B1 { import A2 } module C1 { import B2 def x1 = 1 + z2 } SA SB SC
A Calculus for Name Resolution !92 S R1 R2 SR SRS I(R1 ) S’S S’S P Sx Sx R xS xS D I(_).p’ < P.p D < I(_).p’ D < P.p s.p < s.p’ p < p’ Visibility Well formed path: R.P*.I(_)*.D Reachability
Visibility Policies !93 Lexical scope L := {P} E := P⇤ D < P Non-transitive imports L := {P, I} E := P⇤ · I? D < P, D < I, I < P Transitive imports L := {P, TI} E := P⇤ · TI⇤ D < P, D < TI, TI < P Transitive Includes L := {P, Inc} E := P⇤ · Inc⇤ D < P, Inc < P Transitive includes and imports, and non-transitive imports L := {P, Inc, TI, I} E := P⇤ · (Inc | TI)⇤ · I? D < P, D < TI, TI < P, Inc < P, D < I, I < P, Figure 10. Example reachability and visibility policies by instan- R[I](xR Envre [I, S](S EnvL re [I, S](S EnvD re [I, S](S Envl re [I, S](S
More Examples 94 From TUD-SERG-2015-001
Let Bindings !95 1 2 b2a1 c3 a4 b6 c12a10 b11 c5 b9 a7 c8 def a1 = 0 def b2 = 1 def c3 = 2 letpar a4 = c5 b6 = a7 c8 = b9 in a10+b11+c12 1 b2a1 c3 a4 b6 c12a10 b11 c5 b9 a7 c8 2 def a1 = 0 def b2 = 1 def c3 = 2 letrec a4 = c5 b6 = a7 c8 = b9 in a10+b11+c12 1 b2a1 c3 a4 b6 c12a10 b11 c5 b9 a7 c8 2 4 3 def a1 = 0 def b2 = 1 def c3 = 2 let a4 = c5 b6 = a7 c8 = b9 in a10+b11+c12
Definition before Use / Use before Definition !96 class C1 { int a2 = b3; int b4; void m5 (int a6) { int c7 = a8 + b9; int b10 = b11 + c12; } int c12; } 0 C1 1 2 a2 b4 c12 b3 m5 a6 3 4 c7 b10 b9 a8 b11 c12
Inheritance !97 class C1 { int f2 = 42; } class D3 extends C4 { int g5 = f6; } class E7 extends D8 { int f9 = g10; int h11 = f12; } 32 1 C4 C1 4D3 E7 D8 f2 g5 f6 f9 g10 f12 h11
Java Packages !98 package p3; class D4 {} package p1; class C2 {} 1 p3p1 2 p1 p3 3 D4C2
Java Import !99 package p1; imports r2.*; imports q3.E4; public class C5 {} class D6 {} 4 p1 D6C5 3 2r2 1 p1 E4 q3
C# Namespaces and Partial Classes !100 namespace N1 { using M2; partial class C3 { int f4; } } namespace N5 { partial class C6 { int m7() { return f8; } } } 1 3 6 4 7 8 C3 C6 N1 N5 f4 m7 N1 N5 C3 C6 f8 2 M2 5
Summary 101
Static analysis - Properties that can be checked of the (‘static’) program text - Decide which properties to encode in syntax definition vs checker - Strict vs liberal syntax Context-sensitive properties - Some properties are intrinsically context-sensitive - In particular: names Declarative specification of name binding rules - Common (language agnostic) understanding of name binding
 !102 Static Analysis
Representation: Scope Graphs - Standardized representation for lexical scoping structure of programs - Path in scope graph relates reference to declaration - Basis for syntactic and semantic operations Formalism: Name Binding Constraints - References + Declarations + Scopes + Reachability + Visibility - Language-specific rules map AST to constraints Language-Independent Interpretation - Resolution calculus: correctness of path with respect to scope graph - Name resolution algorithm - Alpha equivalence - Mapping from graph to tree (to text) - Refactorings - And many other applications !103 A Theory of Name Resolution
We have modeled a large set of example binding patterns - definition before use - different let binding flavors - recursive modules - imports and includes - qualified names - class inheritance - partial classes Next goal: fully model some real languages - In progress: Go, Rust, TypeScript - Java, ML !104 Validation
Scope graph semantics for binding languages [OOPSLA18] - starting with NaBL - or rather: a redesign of NaBL based on scope graphs Dynamic analogs to static scope graphs [ECOOP16] - how does scope graph relate to memory at run-time? Supporting mechanized language meta-theory [POPL18] - relating static and dynamic bindings Resolution-sensitive program transformations - renaming, refactoring, substitution, … !105 Ongoing/Future Work
Next 106
Representation - To conduct and represent the results of name resolution Declarative Rules - To define name binding rules of a language Language-Independent Tooling - Name resolution - Code completion - Refactoring - … !107 Separation of Concerns in Name Binding
Representation - ? Declarative Rules - To define name binding rules of a language Language-Independent Tooling - Name resolution - Code completion - Refactoring - … !108 Separation of Concerns in Name Binding Scope Graphs
Representation - ? Declarative Rules - ? Language-Independent Tooling - Name resolution - Code completion - Refactoring - … !109 Separation of Concerns in Name Binding Scope & Type Constraint Rules [PEPM16] Scope Graphs
Next: Constraint-Based Type Checkers !110 Program AST Parse Constraints Generate Constraints Solution Resolve Constraints Language Specific Language Independent
!111 Except where otherwise noted, this work is licensed under

Compiler Construction | Lecture 6 | Introduction to Static Analysis

  • 1.
    Eelco Visser CS4200 CompilerConstruction TU Delft October 2018 Lecture 6: Introduction to Static Analysis
  • 2.
    This Lecture !2 Source Code Editor Parse Abstract Syntax Tree Type Check Checkthat names are used correctly and that expressions are well-typed Errors
  • 3.
    Reading Material 3 The followingpapers add background, conceptual exposition, and examples to the material from the slides. Some notation and technical details have been changed; check the documentation.
  • 4.
    !4 Type checkers arealgorithms that check names and types in programs. This paper introduces the NaBL Name Binding Language, which supports the declarative definition of the name binding and scope rules of programming languages through the definition of scopes, declarations, references, and imports associated. http://dx.doi.org/10.1007/978-3-642-36089-3_18 SLE 2012 Background
  • 5.
    !5 This paper introducesscope graphs as a language-independent representation for the binding information in programs. http://dx.doi.org/10.1007/978-3-662-46669-8_9 ESOP 2015 Best EAPLS paper at ETAPS 2015
  • 6.
    !6 Separating type checkinginto constraint generation and constraint solving provides more declarative definition of type checkers. This paper introduces a constraint language integrating name resolution into constraint resolution through scope graph constraints. This is the basis for the design of the NaBL2 static semantics specification language. https://doi.org/10.1145/2847538.2847543 PEPM 2016
  • 7.
    !7 Documentation for NaBL2at the metaborg.org website. http://www.metaborg.org/en/latest/source/langdev/meta/lang/nabl2/index.html
  • 8.
    !8 This paper describesthe next generation of the approach. Addresses (previously) open issues in expressiveness of scope graphs for type systems: - Structural types - Generic types Addresses open issue with staging of information in type systems. Introduces Statix DSL for definition of type systems. Prototype of Statix is available in Spoofax HEAD, but not ready for use in project yet. The future OOPSLA 2018 To appear
  • 9.
  • 10.
    Dynamically Typed vsStatically Typed - Dynamic: type checking at run-time - Static: type checking at compile-time (before run-time) What does it mean to type check? - Type safety: guarantee absence of run-time type errors Why static type checking? - Avoid overhead of run-time type checking - Fail faster: find (type) errors at compile time - Find all (type) errors: some errors may not be triggered by testing - But: not all errors can be found statically (e.g. array bounds checking) !10 Why Type Checking? Some Discussion Points
  • 11.
  • 12.
    Homework Assignment: Whatis the Syntax of This Language? !12 <languages> <language> <name>SDF3</name> <purpose>Syntax Definition</purpose> <implementedin>SDF3</implementedin> </language> <language> <name>Stratego</name> <purpose>Transformation</purpose> <implementedin>SDF3</implementedin> <implementedin>Stratego</implementedin> <target>Java</target> </language> </languages> <catalogue> <book> <title>Modern Compiler Implementation in ML</title> <author>Andrew Appel</author> <publisher>Cambridge</publisher> </book> <book> <title>Parsing Schemata</title> <author>Klaas Sikkel</author> <publisher>Springer</publisher> </book> </catalogue>
  • 13.
    Syntax of BookCatalogues !13 context-free syntax Document.Catalogue = [ <catalogue> [Book*] </catalogue> ] Book.Book = [ <book> [Title] [Author] [Publisher] </book> ] …Schema-specific syntax definition <catalogue> <book> <title>Modern Compiler Implementation in ML</title> <author>Andrew Appel</author> <publisher>Cambridge</publisher> </book> <book> <title>Parsing Schemata</title> <author>Klaas Sikkel</author> <publisher>Springer</publisher> </book> </catalogue>
  • 14.
    A Generic Syntaxof XML Documents !14 context-free start-symbols Document sorts Tag Word lexical syntax Tag = [a-zA-Z][a-zA-Z0-9]* Word = ~[<>]+ lexical restrictions Word -/- ~[<>] sorts Document Elem context-free syntax Document.Doc = Elem Elem.Node = [ <[Tag]> [Elem*] </[Tag]> ] Elem.Text = Word+ {longest-match} <catalogue> <book> <title>Modern Compiler Implementation in ML</title> <author>Andrew Appel</author> <publisher>Cambridge</publisher> </book> <book> <title>Parsing Schemata</title> <author>Klaas Sikkel</author> <publisher>Springer</publisher> </book> </catalogue> Doc( Node( "catalogue" , [ Node( "book" , [ Node("title", [Text(["Modern Compiler Implementation in ML"])], "title") , Node("author", [Text(["Andrew Appel"])], "author") , Node("publisher", [Text(["Cambridge"])], "publisher") ] , "book" ) , Node( "book" , [ Node("title", [Text(["Parsing Schemata"])], "title") , Node("author", [Text(["Klaas Sikkel"])], "author") , Node("publisher", [Text(["Springer"])], "publisher") ] , "book" ) ] , "catalogue" ) )
  • 15.
    A Generic Syntaxof XML Documents !15 context-free start-symbols Document sorts Tag Word lexical syntax Tag = [a-zA-Z][a-zA-Z0-9]* Word = ~[<>]+ lexical restrictions Word -/- ~[<>] sorts Document Elem context-free syntax Document.Doc = Elem Elem.Node = [ <[Tag]> [Elem*] </[Tag]> ] Elem.Text = Word+ {longest-match} <catalogue> <book> <title>Modern Compiler Implementation in ML</title> <author>Andrew Appel</author> <publisher>Cambridge</publisher> </book> <book> <title>Parsing Schemata</title> <author>Klaas Sikkel</author> <publisher>Springer</publisher> </book> </catalogue> What is the problem with this approach?
  • 16.
    Generic Syntax isToo Liberal! !16 context-free start-symbols Document sorts Tag Word lexical syntax Tag = [a-zA-Z][a-zA-Z0-9]* Word = ~[<>]+ lexical restrictions Word -/- ~[<>] sorts Document Elem context-free syntax Document.Doc = Elem Elem.Node = [ <[Tag]> [Elem*] </[Tag]> ] Elem.Text = Word+ {longest-match} <catalogue> <book> <title>Modern Compiler Implementation in ML</title> <author>Andrew Appel</author> <publisher>Cambridge</publisher> <year></year> </book> <language> <name>SDF3</name> <purpose>Syntax Definition</purpose> <implementedin>SDF3</implementedin> </language> <book> //<title>Parsing Schemata</title> <author>Klaas Sikkel</author> <publisher>Springer</publisher> </book> <book> </koob> </catalogue> Year is not a valid element of book Only books in catalogue Book should have a title Closing tag is not consistent with starting tag
  • 17.
    Context-free grammar is… context-free! - Cannot express alignment Languages have context-sensitive properties How can we have our cake and eat it too? - Generic (liberal) syntax - Forbid programs/documents that are not well-formed !17 Context-Sensitive Properties
  • 18.
  • 19.
    Generic (liberal) syntax -Allow more programs/documents Check properties on AST - Reject programs/documents that are not well-formed !19 Approach: Checking Context-Sensitive Properties
  • 20.
    Doc( Node( "catalogue" , [ Node( "book" ,[ Node("title", [Text(["Modern Compiler Implementation in ML"])], "title") , Node("author", [Text(["Andrew Appel"])], "author") , Node("publisher", [Text(["Cambridge"])], "publisher") , Node("year", [], "year") ] , "book" ) , Node( "language" , [ Node("name", [Text(["SDF3"])], "name") , Node("purpose", [Text(["Syntax Definition"])], "purpose") , Node("implementedin", [Text(["SDF3"])], "implementedin") ] , "language" ) , Node( "book" , [ Node("author", [Text(["Klaas Sikkel"])], "author") , Node("publisher", [Text(["Springer"])], "publisher") ] , "book" ) , Node("book", [], "koob") ] , "catalogue" ) ) Checking Context-Context-Sensitive Properties in Spoofax !20 rules // Analysis editor-analyze: (ast, path, project-path) -> (ast', error*, warning*, info*) with ast' := <id> ast ; error* := <collect-all(constraint-error)> ast' ; warning* := <collect-all(constraint-warning)> ast' ; info* := <collect-all(constraint-info)> ast' parse errors
  • 21.
    Check Violations usingConstraint-Error Rules !21 <catalogue> <book> <title>Modern Compiler Implementation in ML</title> <author>Andrew Appel</author> <publisher>Cambridge</publisher> <year></year> </book> <language> <name>SDF3</name> <purpose>Syntax Definition</purpose> <implementedin>SDF3</implementedin> </language> <book> //<title>Parsing Schemata</title> <author>Klaas Sikkel</author> <publisher>Springer</publisher> </book> <book> </koob> </catalogue> rules // Analysis editor-analyze: (ast, path, project-path) -> (ast', error*, warning*, info*) with ast' := <id> ast ; error* := <collect-all(constraint-error)> ast' ; warning* := <collect-all(constraint-warning)> ast' ; info* := <collect-all(constraint-info)> ast' Find all sub-terms that are not consistent with the context- sensitive rules collect-all: type-unifying generic traversal
  • 22.
    Check Violations usingConstraint-Error Rules !22 <catalogue> <book> <title>Modern Compiler Implementation in ML</title> <author>Andrew Appel</author> <publisher>Cambridge</publisher> <year></year> </book> <language> <name>SDF3</name> <purpose>Syntax Definition</purpose> <implementedin>SDF3</implementedin> </language> <book> //<title>Parsing Schemata</title> <author>Klaas Sikkel</author> <publisher>Springer</publisher> </book> <book> </koob> </catalogue> Closing tag does not match starting tag rules constraint-error : Node(tag1, elems, tag2) -> (tag2, $[Closing tag does not match starting tag]) where <not(eq)>(tag1, tag2) Origin Error message
  • 23.
    Check Violations usingConstraint-Error Rules !23 <catalogue> <book> <title>Modern Compiler Implementation in ML</title> <author>Andrew Appel</author> <publisher>Cambridge</publisher> <year></year> </book> <language> <name>SDF3</name> <purpose>Syntax Definition</purpose> <implementedin>SDF3</implementedin> </language> <book> //<title>Parsing Schemata</title> <author>Klaas Sikkel</author> <publisher>Springer</publisher> </book> <book> </koob> </catalogue> Book should have a title rules constraint-error : n@Node(tag@"book", elems, _) -> (tag, $[Book should have title]) where <not(has(|"title"))> elems has(|tag) = fetch(?Node(tag, _, _)) Containment checks
  • 24.
    Check Violations usingConstraint-Error Rules !24 <catalogue> <book> <title>Modern Compiler Implementation in ML</title> <author>Andrew Appel</author> <publisher>Cambridge</publisher> <year></year> </book> <language> <name>SDF3</name> <purpose>Syntax Definition</purpose> <implementedin>SDF3</implementedin> </language> <book> //<title>Parsing Schemata</title> <author>Klaas Sikkel</author> <publisher>Springer</publisher> </book> <book> </koob> </catalogue> Catalogue can only have books rules constraint-error : Node("catalogue", elems, _) -> error where <filter(not-a-book)> elems => [error | _] not-a-book : Node(tag, _, _) -> (tag, $[Catalogue can only have books]) where <not(eq)> (tag, "book") Containment checks
  • 25.
    Check Violations usingConstraint-Error Rules !25 <catalogue> <book> <title>Modern Compiler Implementation in ML</title> <author>Andrew Appel</author> <publisher>Cambridge</publisher> <year></year> </book> <language> <name>SDF3</name> <purpose>Syntax Definition</purpose> <implementedin>SDF3</implementedin> </language> <book> //<title>Parsing Schemata</title> <author>Klaas Sikkel</author> <publisher>Springer</publisher> </book> <book> </koob> </catalogue> Book can only have title, author, publisher rules constraint-error : Node("book", elems, _) -> error where <filter(not-a-book-elem)> elems => [error | _] not-a-book-elem : Node(tag, _, _) -> (tag, $[Book can only have title, author, publisher]) where <not(elem())> (tag, ["title", "author", "publisher"]) Containment checks
  • 26.
    Generic (liberal) syntax -Allow more programs/documents - `permissive syntax’ Check properties on AST - Reject programs/documents that are not well-formed Advantage - Smaller syntax definition - Parser does not fail (so often) - Better error messages than parser can give !26 Approach: Checking Context-Sensitive Properties
  • 27.
    How are programming languagesdifferent from XML? 27
  • 28.
    XML checking - Tagconsistency - Schema consistency - These are structural properties Programming languages - Type consistency (similar to schema?) - Name consistency: declarations and references should correspond - Name dependent type consistency: names carry types !28 Programming Languages vs XML
  • 29.
  • 30.
    Scope !30 let var x :int := 0 + z var y : int := x + 1 var z : int := x + y + 1 in x + y + z end
  • 31.
    Scope: Definition beforeUse !31 let var x : int := 0 + z // z not in scope var y : int := x + 1 var z : int := x + y + 1 in x + y + z end
  • 32.
    Mutual Recursion !32 let function odd(x: int) : int = if x > 0 then even(x - 1) else false function even(x : int) : int = if x > 0 then odd(x - 1) else true in even(34) end let function odd(x : int) : int = if x > 0 then even(x - 1) else false var x : int function even(x : int) : int = if x > 0 then odd(x - 1) else true in even(34) end
  • 33.
    Mutually Recursive Functionsshould be Adjacent !33 let function odd(x : int) : int = if x > 0 then even(x - 1) else false function even(x : int) : int = if x > 0 then odd(x - 1) else true in even(34) end let function odd(x : int) : int = if x > 0 then even(x - 1) else false var x : int function even(x : int) : int = if x > 0 then odd(x - 1) else true in even(34) end
  • 34.
    Name Spaces !34 let type foo= int function foo(x : foo) : foo = 3 var foo : foo := foo(4) in foo(56) + foo end
  • 35.
    Functions and Variablesin Same Name Space !35 let type foo = int function foo(x : foo) : foo = 3 var foo : foo := foo(4) in foo(56) + foo // both refer to the variable foo end Functions and variables are in the same namespace
  • 36.
    Type Dependent NameResolution !36 let type point = {x : int, y : int} var origin : point := point { x = 1, y = 2 } in origin.x end
  • 37.
    Type Dependent NameResolution !37 let type point = {x : int, y : int} var origin : point := point { x = 1, y = 2 } in origin.x end Resolving origin.x requires the type of origin
  • 38.
    Name Correspondence !38 let type point= {x : int, y : int} type errpoint = {x : int, x : int} var p : point var e : errpoint in p := point{ x = 3, y = 3, z = "a" } p := point{ x = 3 } end
  • 39.
    Name Set Correspondence !39 let typepoint = {x : int, y : int} type errpoint = {x : int, x : int} var p : point var e : errpoint in p := point{ x = 3, y = 3, z = "a" } p := point{ x = 3 } end Field “y” not initialized Reference “z” not resolved Duplicate Declaration of Field “x”
  • 40.
    Recursive Types !40 let type intlist= {hd : int, tl : intlist} type tree = {key : int, children : treelist} type treelist = {hd : tree, tl : treelist} var l : intlist var t : tree var tl : treelist in l := intlist { hd = 3, tl = l }; t := tree { key = 2, children = treelist { hd = tree{ key = 3, children = 3 }, tl = treelist{ } } }; t.children.hd.children := t.children end
  • 41.
    Recursive Types !41 let type intlist= {hd : int, tl : intlist} type tree = {key : int, children : treelist} type treelist = {hd : tree, tl : treelist} var l : intlist var t : tree var tl : treelist in l := intlist { hd = 3, tl = l }; t := tree { key = 2, children = treelist { hd = tree{ key = 3, children = 3 }, tl = treelist{ } } }; t.children.hd.children := t.children end type mismatch Field "tl" not initialized Field "hd" not initialized
  • 42.
  • 43.
    Testing Name Resolution !43 testinner name [[ let type t = u type u = int var x: u := 0 in x := 42 ; let type [[u]] = t var y: [[u]] := 0 in y := 42 end end ]] resolve #2 to #1 test outer name [[ let type t = u type [[u]] = int var x: [[u]] := 0 in x := 42 ; let type u = t var y: u := 0 in y := 42 end end ]] resolve #2 to #1
  • 44.
    Testing Type Checking !44 testvariable reference [[ let type t = u type u = int var x: u := 0 in x := 42 ; let type u = t var y: u := 0 in y := [[x]] end end ]] run get-type to INT() test integer constant [[ let type t = u type u = int var x: u := 0 in x := 42 ; let type u = t var y: u := 0 in y := [[42]] end end ]] run get-type to INT()
  • 45.
    Testing Errors !45 test undefinedvariable [[ let type t = u type u = int var x: u := 0 in x := 42 ; let type u = t var y: u := 0 in y := [[z]] end end ]] 1 error test type error [[ let type t = u type u = string var x: u := 0 in x := 42 ; let type u = t var y: u := 0 in y := [[x]] end end ]] 1 error
  • 46.
  • 47.
    Implementing a Type Checkerwith Rewrite Rules 47
  • 48.
    Compute Type ofExpression !48 rules type-check : Mod(e) -> t where <type-check> e => t type-check : String(_) -> STRING() type-check : Int(_) -> INT() type-check : Plus(e1, e2) -> INT() where <type-check> e1 => INT(); <type-check> e2 => INT() type-check : Times(e1, e2) -> INT() where <type-check> e1 => INT(); <type-check> e2 => INT() 1 + 2 * 3 Mod( Plus(Int("1"), Times(Int("2"), Int("3"))) ) INT()
  • 49.
    Compute Type ofVariable? !49 rules type-check : Let([VarDec(x, e)], e_body) -> t where <type-check> e => t_e; <type-check> e_body => t type-check : Var(x) -> t // ??? let var x := 1 in x + 1 end Mod( Let( [VarDec("x", Int("1"))] , [Plus(Var("x"), Int("1"))] ) ) ?
  • 50.
    Type Checking VariableBindings !50 rules type-check(|env) : Let([VarDec(x, e)], e_body) -> t where <type-check(|env)> e => t_e; <type-check(|[(x, t_e) | env])> e_body => t type-check(|env) : Var(x) -> t where <fetch(?(x, t))> env let var x := 1 in x + 1 end INT()Store association between variable and type in type environment Mod( Let( [VarDec("x", Int("1"))] , [Plus(Var("x"), Int("1"))] ) )
  • 51.
    Pass Environment toSub-Expressions !51 rules type-check : Mod(e) -> t where <type-check(|[])> e => t type-check(|env) : String(_) -> STRING() type-check(|env) : Int(_) -> INT() type-check(|env) : Plus(e1, e2) -> INT() where <type-check(|env)> e1 => INT(); <type-check(|env)> e2 => INT() type-check(|env) : Times(e1, e2) -> INT() where <type-check(|env)> e1 => INT(); <type-check(|env)> e2 => INT() let var x := 1 in x + 1 end INT() Mod( Let( [VarDec("x", Int("1"))] , [Plus(Var("x"), Int("1"))] ) )
  • 52.
    Type checking ill-typed/namedprograms? - add rules for ‘bad’ cases More complicated name binding patterns? - Hoisting of variables in JavaScript functions - Mutually recursive bindings - Possible approaches ‣ Multiple traversals over program ‣ Defer checking until entire scope is processed ‣ … !52 But what about?
  • 53.
  • 54.
  • 55.
    Language = Setof Trees !55 Fun AddArgDecl VarRefId Int “1” VarDeclId “x” TXINT “x” Tree is a convenient interface for transforming programs
  • 56.
    Language = Setof Graphs !56 Fun AddArgDecl VarRefId Int “1” VarDeclId TXINT Edges from references to declarations
  • 57.
    Fun AddArgDecl VarRefId Int “1” VarDeclId “x” TXINT “x” Language =Set of Graphs !57 Names are placeholders for edges in linear / tree representation
  • 58.
    !58 What are thecommonalities of name binding rules in programming languages?
  • 59.
  • 60.
    Variables !60 let function fact(n: int) : int = if n < 1 then 1 else n * fact(n - 1) in fact(10) end
  • 61.
    Function Calls !61 let functionfact(n : int) : int = if n < 1 then 1 else n * fact(n - 1) in fact(10) end
  • 62.
    !62 function prettyprint(tree: tree): string = let var output := "" function write(s: string) = output := concat(output, s) function show(n: int, t: tree) = let function indent(s: string) = (write("n"); for i := 1 to n do write(" "); output := concat(output, s)) in if t = nil then indent(".") else (indent(t.key); show(n+1, t.left); show(n+1, t.right)) end in show(0, tree); output end Nested Scopes (Shadowing)
  • 63.
    !63 let type point ={ x : int, y : int } var origin := point { x = 1, y = 2 } in origin.x := 10; origin := nil end Type References
  • 64.
    !64 let type point ={ x : int, y : int } var origin := point { x = 1, y = 2 } in origin.x := 10; origin := nil end Record Fields
  • 65.
    !65 let type point ={ x : int, y : int } var origin := point { x = 1, y = 2 } in origin.x := 10; origin := nil end Type Dependent Name Resolution
  • 66.
    Used in manydifferent language artifacts - compiler, interpreter, semantics, IDE, refactoring Binding rules encoded in many different and ad-hoc ways - symbol tables, environments, substitutions No standard approach to formalization - there is no BNF for name binding No reuse of binding rules between artifacts - how do we know substitution respects binding rules? !66 Name Binding is Pervasive
  • 67.
    What are thename binding rules of a language? - What are introductions of names? - What are uses of names? - What are the shadowing rules? - What are the name spaces? How can we define the name binding rules of a language? - What is the BNF for name binding? !67 Name Binding Rules
  • 68.
  • 69.
    Representation - Standardized representationfor <aspect> of programs - Independent of specific object language Specification Formalism - Language-specific declarative rules - Abstract from implementation concerns Language-Independent Interpretation - Formalism interpreted by language-independent algorithm - Multiple interpretations for different purposes - Reuse between implementations of different languages !69 Separation of Concerns in Definition of Programming Languages
  • 70.
    Representation: (Abstract Syntax)Trees - Standardized representation for structure of programs - Basis for syntactic and semantic operations Formalism: Syntax Definition - Productions + Constructors + Templates + Disambiguation - Language-specific rules: structure of each language construct Language-Independent Interpretation - Well-formedness of abstract syntax trees ‣ provides declarative correctness criterion for parsing - Parsing algorithm !70 Separation of Concerns in Syntax Definition
  • 71.
    Representation - To conductand represent the results of name resolution Declarative Rules - To define name binding rules of a language Language-Independent Tooling - Name resolution - Code completion - Refactoring - … !71 Wanted: Separation of Concerns in Name Binding
  • 72.
    DSLs for specifyingbinding structure of a (target) language - Ott [Sewell+10] - Romeo [StansiferWand14] - Unbound [Weirich+11] - Cαml [Pottier06] - NaBL [Konat+12] Generate code to do resolution and record results What is the semantics of such a name binding language? !72 Name Binding Languages
  • 73.
    Attempt: NaBL NameBinding Language !73 binding rules // variables Param(t, x) : defines Variable x of type t Let(bs, e) : scopes Variable Bind(t, x, e) : defines Variable x of type t Var(x) : refers to Variable x Declarative Name Binding and Scope Rules Gabriël D. P. Konat, Lennart C. L. Kats, Guido Wachsmuth, Eelco Visser SLE 2012 Declarative specification Abstracts from implementation Incremental name resolution But: How to explain it to Coq? What is the semantics of NaBL?
  • 74.
    Attempt: NaBL NameBinding Language !74 Declarative Name Binding and Scope Rules Gabriël D. P. Konat, Lennart C. L. Kats, Guido Wachsmuth, Eelco Visser SLE 2012 Especially: What is the semantics of imports? binding rules // classes Class(c, _, _, _) : defines Class c of type ClassT(c) scopes Field, Method, Variable Extends(c) : imports Field, Method from Class c ClassT(c) : refers to Class c New(c) : refers to Class c
  • 75.
    What is semanticsof binding language? - the meaning of a binding specification for language L should be given by - a function from L programs to their “resolution structures” So we need - a (uniform, language-independent) method for describing such resolution structures ... - … that can be used to compute the resolution of each program identifier - (or to verify that a claimed resolution is valid) That supports - Handle broad range of language binding features ... - … using minimal number of constructs - Make resolution structure language-independent - Handle named collections of names (e.g. modules, classes, etc.) within the theory - Allow description of programs with resolution errors !75 Approach
  • 76.
  • 77.
    Representation - ? Declarative Rules -To define name binding rules of a language Language-Independent Tooling - Name resolution - Code completion - Refactoring - … !77 Separation of Concerns in Name Binding Scope Graphs
  • 78.
    !78 let function fact(n: int) : int = if n < 1 then 1 else n * fact(n - 1) in fact(10) end fact S1 fact S2n n fact nn Scope GraphProgram Name Resolution
  • 79.
    A Calculus forName Resolution !79 S R1 R2 SR SRS I(R1 ) S’S S’S P Sx Sx R xS xS D Path in scope graph connects reference to declaration Scopes, References, Declarations, Parents, Imports Neron, Tolmach, Visser, Wachsmuth A Theory of Name Resolution ESOP 2015
  • 80.
    Simple Scopes !80 Reference Declaration Scope Reference StepDeclaration Step def y1 = x2 + 1 def x1 = 5 S0 def y1 = x2 + 1 def x1 = 5 Sx Sx R xS xS D S0 y1 x1S0 y1 x1S0x2 R y1 x1S0x2 R D y1 x1S0x2 S x x
  • 81.
    Lexical Scoping !81 S1 S0 Parent def x1= z2 5 def z1 = fun y1 { x2 + y2 } S’S S’S P z1 x1S0z2 S1 y1y2 x2 z1 x1S0z2 S1 y1y2 x2 z1 x1S0z2 S1 y1y2 x2 z1 x1S0z2 R S1 y1y2 x2 z1 x1S0z2 R D S1 y1y2 x2 z1 x1S0z2 R S1 y1y2 x2 z1 x1S0z2 R P S1 y1y2 x2 z1 x1S0z2 R P D S’S Parent Step
  • 82.
    Imports !82 Associated scope Import S0 SB SA module A1{ def z1 = 5 } module B1 { import A2 def x1 = 1 + z2 } A1 SA z1 B1 SB z2 S0 A2 x1 S R1 R2 SR A1 SA z1 B1 SB z2 S0 A2 x1 A1 SA z1 B1 SB z2 S0 A2 x1 A1 SA z1 B1 SB z2 S0 A2 x1 R A1 SA z1 B1 SB z2 S0 A2 x1 R R A1 SA z1 B1 SB z2 S0 A2 x1 R R P A1 SA z1 B1 SB z2 S0 A2 x1 R R P D A1 SA z1 B1 SB z2 S0 A2 x1 I(A2 )R R P D A1 SA z1 B1 SB z2 S0 A2 x1 I(A2 )R R D P D S R1 R2 SR SRS I(R1) Import Step
  • 83.
    Qualified Names !83 module N1{ def s1 = 5 } module M1 { def x1 = 1 + N2.s2 } S0 N1 SN s2 S0 N2 R D R I(N2 ) D X1 s1 N1 SN s2 S0 N2 R D X1 s1 N1 SN s2 S0 N2 X1 s1
  • 84.
    A Calculus forName Resolution !84 S R1 R2 SR SRS I(R1 ) S’S S’S P Sx Sx R xS xS D Reachability of declarations from references through scope graph edges How about ambiguities? References with multiple paths
  • 85.
    A Calculus forName Resolution !85 S R1 R2 SR SRS I(R1 ) S’S S’S P Sx Sx R xS xS D I(_).p’ < P.p D < I(_).p’ D < P.p s.p < s.p’ p < p’ Visibility Well formed path: R.P*.I(_)*.D Reachability
  • 86.
    Ambiguous Resolutions !86 match twith | A x | B x => … z1 x2 x1S0x3 z1 x2 x1S0x3 R z1 x2 x1S0x3 R D z1 x2 x1S0x3 R D z1 x2 x1S0x3 R D D S0def x1 = 5 def x2 = 3 def z1 = x3 + 1
  • 87.
    Shadowing !87 S0 S1 S2 D < P.p s.p< s.p’ p < p’ S1 S2 x1 y1 y2 x2 z1 x3S0z2 def x3 = z2 5 7 def z1 = fun x1 { fun y1 { x2 + y2 } } S1 S2 x1 y1 y2 x2 z1 x3S0z2 R P P D S1 S2 x1 y1 y2 x2 z1 x3S0z2 D P R R P P D R.P.D < R.P.P.D
  • 88.
    Imports shadow Parents !88 I(_).p’< P.p R.I(A2).D < R.P.D A1 SA z1 B1 SB z2 S0 A2 x1 z3 A1 SA z1 B1 SB z2 S0 A2 x1 I(A2)R D z3 A1 SA z1 B1 SB z2 S0 A2 x1 I(A2)R D P D z3 R S0def z3 = 2 module A1 { def z1 = 5 } module B1 { import A2 def x1 = 1 + z2 } SA SB
  • 89.
    Imports vs. Includes !89 S0defz3 = 2 module A1 { def z1 = 5 } import A2 def x1 = 1 + z2 SA A1 SA z1 z2 S0 A2 x1 I(A2) R D z3 R D D < I(_).p’ R.D < R.I(A2).D A1 SA z1 z2 S0 A2 x1 R z3 D A1 SA z1 z2 S0 A2 x1z3 S0def z3 = 2 module A1 { def z1 = 5 } include A2 def x1 = 1 + z2 SA X
  • 90.
    Import Parents !90 def s1= 5 module N1 { } def x1 = 1 + N2.s2 S0 SN def s1 = 5 module N1 { } def x1 = 1 + N2.s2 S0 SN Well formed path: R.P*.I(_)*.D N1 SN s2 S0 N2 X1 s1 R I(N2 ) D P N1 SN s2 S0 N2 X1 s1
  • 91.
    Transitive vs. Non-Transitive !91 Withtransitive imports, a well formed path is R.P*.I(_)*.D With non-transitive imports, a well formed path is R.P*.I(_)?.D A1 SA z1 B1 SB S0 A2 C1 SCz2 x1 B2 A1 SA z1 B1 SB S0 A2 I(A2) D C1 SCz2 I(B2) R x1 B2 ?? module A1 { def z1 = 5 } module B1 { import A2 } module C1 { import B2 def x1 = 1 + z2 } SA SB SC
  • 92.
    A Calculus forName Resolution !92 S R1 R2 SR SRS I(R1 ) S’S S’S P Sx Sx R xS xS D I(_).p’ < P.p D < I(_).p’ D < P.p s.p < s.p’ p < p’ Visibility Well formed path: R.P*.I(_)*.D Reachability
  • 93.
    Visibility Policies !93 Lexical scope L:= {P} E := P⇤ D < P Non-transitive imports L := {P, I} E := P⇤ · I? D < P, D < I, I < P Transitive imports L := {P, TI} E := P⇤ · TI⇤ D < P, D < TI, TI < P Transitive Includes L := {P, Inc} E := P⇤ · Inc⇤ D < P, Inc < P Transitive includes and imports, and non-transitive imports L := {P, Inc, TI, I} E := P⇤ · (Inc | TI)⇤ · I? D < P, D < TI, TI < P, Inc < P, D < I, I < P, Figure 10. Example reachability and visibility policies by instan- R[I](xR Envre [I, S](S EnvL re [I, S](S EnvD re [I, S](S Envl re [I, S](S
  • 94.
  • 95.
    Let Bindings !95 1 2 b2a1 c3 a4 b6 c12a10b11 c5 b9 a7 c8 def a1 = 0 def b2 = 1 def c3 = 2 letpar a4 = c5 b6 = a7 c8 = b9 in a10+b11+c12 1 b2a1 c3 a4 b6 c12a10 b11 c5 b9 a7 c8 2 def a1 = 0 def b2 = 1 def c3 = 2 letrec a4 = c5 b6 = a7 c8 = b9 in a10+b11+c12 1 b2a1 c3 a4 b6 c12a10 b11 c5 b9 a7 c8 2 4 3 def a1 = 0 def b2 = 1 def c3 = 2 let a4 = c5 b6 = a7 c8 = b9 in a10+b11+c12
  • 96.
    Definition before Use/ Use before Definition !96 class C1 { int a2 = b3; int b4; void m5 (int a6) { int c7 = a8 + b9; int b10 = b11 + c12; } int c12; } 0 C1 1 2 a2 b4 c12 b3 m5 a6 3 4 c7 b10 b9 a8 b11 c12
  • 97.
    Inheritance !97 class C1 { intf2 = 42; } class D3 extends C4 { int g5 = f6; } class E7 extends D8 { int f9 = g10; int h11 = f12; } 32 1 C4 C1 4D3 E7 D8 f2 g5 f6 f9 g10 f12 h11
  • 98.
    Java Packages !98 package p3; classD4 {} package p1; class C2 {} 1 p3p1 2 p1 p3 3 D4C2
  • 99.
    Java Import !99 package p1; importsr2.*; imports q3.E4; public class C5 {} class D6 {} 4 p1 D6C5 3 2r2 1 p1 E4 q3
  • 100.
    C# Namespaces andPartial Classes !100 namespace N1 { using M2; partial class C3 { int f4; } } namespace N5 { partial class C6 { int m7() { return f8; } } } 1 3 6 4 7 8 C3 C6 N1 N5 f4 m7 N1 N5 C3 C6 f8 2 M2 5
  • 101.
  • 102.
    Static analysis - Propertiesthat can be checked of the (‘static’) program text - Decide which properties to encode in syntax definition vs checker - Strict vs liberal syntax Context-sensitive properties - Some properties are intrinsically context-sensitive - In particular: names Declarative specification of name binding rules - Common (language agnostic) understanding of name binding
 !102 Static Analysis
  • 103.
    Representation: Scope Graphs -Standardized representation for lexical scoping structure of programs - Path in scope graph relates reference to declaration - Basis for syntactic and semantic operations Formalism: Name Binding Constraints - References + Declarations + Scopes + Reachability + Visibility - Language-specific rules map AST to constraints Language-Independent Interpretation - Resolution calculus: correctness of path with respect to scope graph - Name resolution algorithm - Alpha equivalence - Mapping from graph to tree (to text) - Refactorings - And many other applications !103 A Theory of Name Resolution
  • 104.
    We have modeleda large set of example binding patterns - definition before use - different let binding flavors - recursive modules - imports and includes - qualified names - class inheritance - partial classes Next goal: fully model some real languages - In progress: Go, Rust, TypeScript - Java, ML !104 Validation
  • 105.
    Scope graph semanticsfor binding languages [OOPSLA18] - starting with NaBL - or rather: a redesign of NaBL based on scope graphs Dynamic analogs to static scope graphs [ECOOP16] - how does scope graph relate to memory at run-time? Supporting mechanized language meta-theory [POPL18] - relating static and dynamic bindings Resolution-sensitive program transformations - renaming, refactoring, substitution, … !105 Ongoing/Future Work
  • 106.
  • 107.
    Representation - To conductand represent the results of name resolution Declarative Rules - To define name binding rules of a language Language-Independent Tooling - Name resolution - Code completion - Refactoring - … !107 Separation of Concerns in Name Binding
  • 108.
    Representation - ? Declarative Rules -To define name binding rules of a language Language-Independent Tooling - Name resolution - Code completion - Refactoring - … !108 Separation of Concerns in Name Binding Scope Graphs
  • 109.
    Representation - ? Declarative Rules -? Language-Independent Tooling - Name resolution - Code completion - Refactoring - … !109 Separation of Concerns in Name Binding Scope & Type Constraint Rules [PEPM16] Scope Graphs
  • 110.
    Next: Constraint-Based TypeCheckers !110 Program AST Parse Constraints Generate Constraints Solution Resolve Constraints Language Specific Language Independent
  • 111.
    !111 Except where otherwisenoted, this work is licensed under