LAMBDA ZERO λ₀
LAMBDA ZERO ▸ Minimalist pure lazy functional programming language. ▸ Desugars to the untyped lambda calculus plus optimization of natural numbers and arithmetic. ▸ Python and Agda-inspired syntax. ▸ Bootstrap interpreter is under 2400 lines of C. ▸ Self-interpreting. ▸ Self-type-inferencing (Hindley-Milner with ADTs).
CODE EXAMPLES n ! ≔ n ⦊ 0 ↦ 1; ↑ n′ ↦ n ⋅ n′ ! sort ≔ [] ↦ []; n ∷ ns ↦ sort(ns ¦ (≤ n)) ⧺ [n] ⧺ sort(ns ¦ (> n)) primes ≔ p(2 …) where p ≔ [] ↦ []; n ∷ ns ↦ n ∷ p(ns ¦ (% n ≠ 0))
MOTIVATION ▸ Lambda Zero is an attempt to discover the true foundation of mathematics. ▸ If you try to fully understand anything in mathematics, you end up getting into the foundations, which generally means you end up in set theory. ▸ “The language of set theory can be used to define nearly all mathematical objects.” (Wikipedia)
MOTIVATION ▸ Kuratowski's definition of an ordered pair (the "accepted definition" according to Wikipedia): ▸ (a, b) := {{a}, {a, b}} ▸ The right is more asymmetric than the left… ▸ This seems unnatural, like a hack rather than the truth. ▸ The lambda calculus captures both sets and tuples without anything that seems like a hack.
WHY NOT AGDA? ▸ Agda is about 120,000 lines of code. ▸ Agda is written in Haskell… ▸ GHC (Haskell) is 140,000 lines of code (as of 2012). ▸ GHC is written in Haskell… ▸ This doesn't look like a small elegant core of mathematics.
KOLMOGOROV COMPLEXITY ▸ Loosely speaking, Kolmogorov complexity of an object is the length of the shortest program that produces the object as output. ▸ Similarly we can define the complexity of a program as the length of the shortest program that produces the same output. ▸ We can also define the complexity of a language to be the complexity of its interpreter. ▸ All interpreter implementations of the same language have the same complexity by definition.
THE COMPLEXITY COMPASS ▸ In designing a language, if you keep the interpreter close to the minimum complexity, you can use the complexity as a compass to point you toward the mathematical truth. ▸ If you get too far from the minimum, the compass signal can get lost in the noise of the non-mathematical complexity. ▸ Lambda Zero is written in C, which I think of as portable assembly code.
LESSON LEARNED ▸ On a couple occasions I tried to take a shortcut or do something potentially clever and useful despite what the math was suggesting. ▸ Each time, I eventually felt that I had to go back and change it. ▸ Example: Can we represent a triple in terms of pairs? Then both of these destructurings are valid, which decreases the helpfulness of type checking (amongst other issues): ▸ (a, b, c) ≔ triple ▸ (a, b) ≔ triple
DISTRACTIONS ▸ Performance and I/O are distractions from this goal. ▸ Performance is not a property of a language. ▸ No language is faster or slower than any other language. ▸ Compromising the language for the sake of performance makes the language objectively worse. ▸ Examples: Haskell's seq, builtin data types. ▸ I/O does not affect Turing completeness.
COMPONENTS ▸ Bootstrap interpreter: can be interpreted without knowledge of types; type annotations are ignored. ▸ ~2400 lines of C; 65KB statically linked and stripped. ▸ "Hello world" statically linked and stripped with gcc on linux is 671KB. ▸ Self interpreter: parses and saves type annotations for type checker. ▸ ~2400 lines of Lambda Zero. ▸ Self type inferencer. ▸ ~300 lines of Lambda Zero. ▸ Future: Self compiler, Lambda One (dependent types, implicit calculus of constructions, requires knowledge of types to interpret).
ADVANTAGES OF LAZY EVALUATION ▸ Automatically avoids computation that is never needed. ▸ Allows simple and consistent representation of infinite data. ▸ Allows increased modularity. ▸ Example: sqrt function can be split into a digit generator and a function that truncates the digits at a desired point.
ADVANTAGES OF STRICT EVALUATION ▸ Easier to implement efficiently. ▸ Easier to reason about performance. ▸ Easier to understand behavior in impure languages.
LAZY EVALUATION ▸ If we only care about making the ideal language (not performance etc)… ▸ And if we assume the language is purely functional… ▸ Then a lazy evaluated language is strictly better than a strictly evaluated language. ▸ Lambda Zero uses the Lazy Krivine Machine.
PARSING ▸ There are no statements in the grammar; only expressions. ▸ Shift reduce parser, table-driven. The table associates a shift function and a reduce function to each operator. ▸ Shift and reduce functions can do arbitrary modifications to the stack and their arguments, but this is not often needed. ▸ Uses an algorithm very similar to Dijkstra's shunting yard algorithm but with a combined stack for operators and operands which makes it more powerful. ▸ For example, newline is defined to be an infix operator: right- associative application with some special cases.
UNICODE ▸ Unicode makes it easier to write math directly in the language. ▸ Every operator has both an ASCII and a unicode representation (sometimes they are the same). ▸ You can type the ASCII representation and the editor will conceal with the unicode form once you leave that line. ▸ Lisp/Agda style lexing: only a few basic characters are considered delimiters so you have to put spaces between operators and operands.
SEMANTIC INDENTATION ▸ Just like python. ▸ Very easy to implement: the number of spaces after a newline is recorded in the newline operator and the more spaces, the higher the precedence of the operator.
OPERATORS ▸ = for equality ▸ ≔ := for definitions ▸ : for type annotations ▸ :: for prepend ▸ ↦ -> for lambdas ▸ => for function types ▸ ∘ <> for composition ▸ . for high precedence pipelining ▸ ⦊ |> for low precedence pipelining ▸ ; for in-line newline ▸ -- monus (truncating subtraction of naturals) ▸ // floor division, lower precedence than * ▸ ⊓ pair type constructor ▸ ⊔ either type constructor ▸ ? maybe type constructor ▸ ↑ successor ▸ ° function exponentiation ▸ ¦ |: filter ▸ ~ relation ▸ ! factorial ▸ ` flip
PATTERN MATCHING length := [] -> 0; _ :: xs’ -> 1 + length(xs’) define length [] -> 0 _ :: xs’ -> 1 + length(xs’) define length(xs) match xs case [] 0 case _ :: xs' 1 + length(xs') define length(xs) with xs as _ :: xs' 1 + length(xs') 0
SYNTAX DECLARATIONS ▸ infix, infixL, infixR, prefix, postfix, interfix, alias ▸ interfix is like infix but the definition is always straight application ▸ Lexically scoped ▸ Chaining: syntax(if) ≔ prefix(13) syntax(then) ≔ interfix(if _) syntax(else) ≔ interfix(if _ then _) if a then b else c if a then b c if a b c
SPACE OPERATOR ▸ Function application syntax: ▸ Haskell, Agda, Idris, ML: a b c d ▸ Math, mainstream languages: a(b, c, d) ▸ Math notation gives a visual indication of the asymmetry between applicand and arguments. ▸ In math and physics, space typically means multiplication. ▸ Let the user decide: the space operator can be user-defined. ▸ Space is special though; it has some built-in erasure rules.
ADT NOTATION ▸ Lambda Zero's ADT notation is a generalization of set- builder notation where the right side of the bar can be inlined on the left side. ⊥ ⩴ {} 𝕌 ⩴ {()} a ⊓ b ::= {(first ∈ a, second ∈ b)} 𝔹 ⩴ {False, True} (a)? ⩴ {Void, Just(_ ∈ a)} a ⊔ b ⩴ {Left(_ ∈ a), Right(_ ∈ b)} ℕ ⩴ {0, ↑(_ ∈ ℕ)} ℤ ⩴ {+(_ ∈ ℕ), -(_ ∈ ℕ)} ℚ ⩴ {(numerator ∈ ℤ) / (denominator ∈ ℤ)} List(a) ⩴ {[], (_head ∈ a) ∷ (_tail ∈ List(a))}
ADT ENCODING ▸ Implemented using the Scott encoding, which makes pattern matching trivial: you just treat the instance as a function and pass all the case handlers as arguments. (a, b) ~ f ↦ f(a, b) [] ~ n ↦ p ↦ n [a] ~ n ↦ p ↦ p(a, [])
ADT IMPLEMENTATION ▸ For each pattern in the ADT, a constructor and deconstructor is defined. ▸ The deconstructor has a name that is inaccessible to the user. ▸ constructor_n : p1 => ... => pm => A ▸ deconstructor_n : (reconstructor : p1 => ... => pm => B) => (fallback : A => B) => A => B ▸ If A was constructed with the constructor that corresponds to the deconstructor, then the parameters are passed to the reconstructor, otherwise A is passed to the fallback function. ▸ Deconstructors allow default cases and with/as syntax.
NATURAL NUMBER OPTIMIZATION ▸ Natural numbers are defined within the language by an ADT just like all other data types. ▸ Naturals can be pattern matched and act just like any other ADT. ▸ But the interpreter secretly represents naturals as machine integers and dynamically converts them to their Scott encodings on demand (whenever they are pattern matched). ℕ ⩴ {0, ↑(_ ∈ ℕ)}
DATA STRUCTURES ▸ Only two non-trivial data structures were required to implement everything so far: ▸ 1. Table: AA tree (simplified red-black tree) ▸ 2. Array: Random access list ▸ O(1) head ▸ O(1) tail ▸ O(1) prepend ▸ O(log(n)) lookup
MORE PURE THAN PURE ▸ Haskell is considered a pure functional language because IO is represented as actions with the IO monad. ▸ But this IO monad is still a relic in the language corresponding to the impure world. ▸ Lambda Zero has nothing IO-related in the language, so in some sense it’s even more pure. ▸ The main function takes a string parameter that is mapped to stdin and the return value is a string that gets mapped to stdout. main(input) ≔ input
LAZY IO ▸ This restricted form of IO is surprisingly capable due to the lazy evaluation of the IO streams. ▸ Bytes are read from stdin only when the program accesses the corresponding element of the input string. ▸ Bytes are sent to stdout as soon as they are computed even if the remainder of the output is not evaluated. ▸ This makes interactive IO possible.

Lambda Zero Programming Language Introduction

  • 1.
  • 2.
    LAMBDA ZERO ▸ Minimalistpure lazy functional programming language. ▸ Desugars to the untyped lambda calculus plus optimization of natural numbers and arithmetic. ▸ Python and Agda-inspired syntax. ▸ Bootstrap interpreter is under 2400 lines of C. ▸ Self-interpreting. ▸ Self-type-inferencing (Hindley-Milner with ADTs).
  • 3.
    CODE EXAMPLES n !≔ n ⦊ 0 ↦ 1; ↑ n′ ↦ n ⋅ n′ ! sort ≔ [] ↦ []; n ∷ ns ↦ sort(ns ¦ (≤ n)) ⧺ [n] ⧺ sort(ns ¦ (> n)) primes ≔ p(2 …) where p ≔ [] ↦ []; n ∷ ns ↦ n ∷ p(ns ¦ (% n ≠ 0))
  • 4.
    MOTIVATION ▸ Lambda Zerois an attempt to discover the true foundation of mathematics. ▸ If you try to fully understand anything in mathematics, you end up getting into the foundations, which generally means you end up in set theory. ▸ “The language of set theory can be used to define nearly all mathematical objects.” (Wikipedia)
  • 5.
    MOTIVATION ▸ Kuratowski's definitionof an ordered pair (the "accepted definition" according to Wikipedia): ▸ (a, b) := {{a}, {a, b}} ▸ The right is more asymmetric than the left… ▸ This seems unnatural, like a hack rather than the truth. ▸ The lambda calculus captures both sets and tuples without anything that seems like a hack.
  • 6.
    WHY NOT AGDA? ▸Agda is about 120,000 lines of code. ▸ Agda is written in Haskell… ▸ GHC (Haskell) is 140,000 lines of code (as of 2012). ▸ GHC is written in Haskell… ▸ This doesn't look like a small elegant core of mathematics.
  • 7.
    KOLMOGOROV COMPLEXITY ▸ Looselyspeaking, Kolmogorov complexity of an object is the length of the shortest program that produces the object as output. ▸ Similarly we can define the complexity of a program as the length of the shortest program that produces the same output. ▸ We can also define the complexity of a language to be the complexity of its interpreter. ▸ All interpreter implementations of the same language have the same complexity by definition.
  • 8.
    THE COMPLEXITY COMPASS ▸In designing a language, if you keep the interpreter close to the minimum complexity, you can use the complexity as a compass to point you toward the mathematical truth. ▸ If you get too far from the minimum, the compass signal can get lost in the noise of the non-mathematical complexity. ▸ Lambda Zero is written in C, which I think of as portable assembly code.
  • 9.
    LESSON LEARNED ▸ Ona couple occasions I tried to take a shortcut or do something potentially clever and useful despite what the math was suggesting. ▸ Each time, I eventually felt that I had to go back and change it. ▸ Example: Can we represent a triple in terms of pairs? Then both of these destructurings are valid, which decreases the helpfulness of type checking (amongst other issues): ▸ (a, b, c) ≔ triple ▸ (a, b) ≔ triple
  • 10.
    DISTRACTIONS ▸ Performance andI/O are distractions from this goal. ▸ Performance is not a property of a language. ▸ No language is faster or slower than any other language. ▸ Compromising the language for the sake of performance makes the language objectively worse. ▸ Examples: Haskell's seq, builtin data types. ▸ I/O does not affect Turing completeness.
  • 11.
    COMPONENTS ▸ Bootstrap interpreter:can be interpreted without knowledge of types; type annotations are ignored. ▸ ~2400 lines of C; 65KB statically linked and stripped. ▸ "Hello world" statically linked and stripped with gcc on linux is 671KB. ▸ Self interpreter: parses and saves type annotations for type checker. ▸ ~2400 lines of Lambda Zero. ▸ Self type inferencer. ▸ ~300 lines of Lambda Zero. ▸ Future: Self compiler, Lambda One (dependent types, implicit calculus of constructions, requires knowledge of types to interpret).
  • 12.
    ADVANTAGES OF LAZYEVALUATION ▸ Automatically avoids computation that is never needed. ▸ Allows simple and consistent representation of infinite data. ▸ Allows increased modularity. ▸ Example: sqrt function can be split into a digit generator and a function that truncates the digits at a desired point.
  • 13.
    ADVANTAGES OF STRICTEVALUATION ▸ Easier to implement efficiently. ▸ Easier to reason about performance. ▸ Easier to understand behavior in impure languages.
  • 14.
    LAZY EVALUATION ▸ Ifwe only care about making the ideal language (not performance etc)… ▸ And if we assume the language is purely functional… ▸ Then a lazy evaluated language is strictly better than a strictly evaluated language. ▸ Lambda Zero uses the Lazy Krivine Machine.
  • 15.
    PARSING ▸ There areno statements in the grammar; only expressions. ▸ Shift reduce parser, table-driven. The table associates a shift function and a reduce function to each operator. ▸ Shift and reduce functions can do arbitrary modifications to the stack and their arguments, but this is not often needed. ▸ Uses an algorithm very similar to Dijkstra's shunting yard algorithm but with a combined stack for operators and operands which makes it more powerful. ▸ For example, newline is defined to be an infix operator: right- associative application with some special cases.
  • 16.
    UNICODE ▸ Unicode makesit easier to write math directly in the language. ▸ Every operator has both an ASCII and a unicode representation (sometimes they are the same). ▸ You can type the ASCII representation and the editor will conceal with the unicode form once you leave that line. ▸ Lisp/Agda style lexing: only a few basic characters are considered delimiters so you have to put spaces between operators and operands.
  • 17.
    SEMANTIC INDENTATION ▸ Justlike python. ▸ Very easy to implement: the number of spaces after a newline is recorded in the newline operator and the more spaces, the higher the precedence of the operator.
  • 18.
    OPERATORS ▸ = forequality ▸ ≔ := for definitions ▸ : for type annotations ▸ :: for prepend ▸ ↦ -> for lambdas ▸ => for function types ▸ ∘ <> for composition ▸ . for high precedence pipelining ▸ ⦊ |> for low precedence pipelining ▸ ; for in-line newline ▸ -- monus (truncating subtraction of naturals) ▸ // floor division, lower precedence than * ▸ ⊓ pair type constructor ▸ ⊔ either type constructor ▸ ? maybe type constructor ▸ ↑ successor ▸ ° function exponentiation ▸ ¦ |: filter ▸ ~ relation ▸ ! factorial ▸ ` flip
  • 19.
    PATTERN MATCHING length :=[] -> 0; _ :: xs’ -> 1 + length(xs’) define length [] -> 0 _ :: xs’ -> 1 + length(xs’) define length(xs) match xs case [] 0 case _ :: xs' 1 + length(xs') define length(xs) with xs as _ :: xs' 1 + length(xs') 0
  • 20.
    SYNTAX DECLARATIONS ▸ infix,infixL, infixR, prefix, postfix, interfix, alias ▸ interfix is like infix but the definition is always straight application ▸ Lexically scoped ▸ Chaining: syntax(if) ≔ prefix(13) syntax(then) ≔ interfix(if _) syntax(else) ≔ interfix(if _ then _) if a then b else c if a then b c if a b c
  • 21.
    SPACE OPERATOR ▸ Functionapplication syntax: ▸ Haskell, Agda, Idris, ML: a b c d ▸ Math, mainstream languages: a(b, c, d) ▸ Math notation gives a visual indication of the asymmetry between applicand and arguments. ▸ In math and physics, space typically means multiplication. ▸ Let the user decide: the space operator can be user-defined. ▸ Space is special though; it has some built-in erasure rules.
  • 22.
    ADT NOTATION ▸ LambdaZero's ADT notation is a generalization of set- builder notation where the right side of the bar can be inlined on the left side. ⊥ ⩴ {} 𝕌 ⩴ {()} a ⊓ b ::= {(first ∈ a, second ∈ b)} 𝔹 ⩴ {False, True} (a)? ⩴ {Void, Just(_ ∈ a)} a ⊔ b ⩴ {Left(_ ∈ a), Right(_ ∈ b)} ℕ ⩴ {0, ↑(_ ∈ ℕ)} ℤ ⩴ {+(_ ∈ ℕ), -(_ ∈ ℕ)} ℚ ⩴ {(numerator ∈ ℤ) / (denominator ∈ ℤ)} List(a) ⩴ {[], (_head ∈ a) ∷ (_tail ∈ List(a))}
  • 23.
    ADT ENCODING ▸ Implementedusing the Scott encoding, which makes pattern matching trivial: you just treat the instance as a function and pass all the case handlers as arguments. (a, b) ~ f ↦ f(a, b) [] ~ n ↦ p ↦ n [a] ~ n ↦ p ↦ p(a, [])
  • 24.
    ADT IMPLEMENTATION ▸ Foreach pattern in the ADT, a constructor and deconstructor is defined. ▸ The deconstructor has a name that is inaccessible to the user. ▸ constructor_n : p1 => ... => pm => A ▸ deconstructor_n : (reconstructor : p1 => ... => pm => B) => (fallback : A => B) => A => B ▸ If A was constructed with the constructor that corresponds to the deconstructor, then the parameters are passed to the reconstructor, otherwise A is passed to the fallback function. ▸ Deconstructors allow default cases and with/as syntax.
  • 25.
    NATURAL NUMBER OPTIMIZATION ▸Natural numbers are defined within the language by an ADT just like all other data types. ▸ Naturals can be pattern matched and act just like any other ADT. ▸ But the interpreter secretly represents naturals as machine integers and dynamically converts them to their Scott encodings on demand (whenever they are pattern matched). ℕ ⩴ {0, ↑(_ ∈ ℕ)}
  • 26.
    DATA STRUCTURES ▸ Onlytwo non-trivial data structures were required to implement everything so far: ▸ 1. Table: AA tree (simplified red-black tree) ▸ 2. Array: Random access list ▸ O(1) head ▸ O(1) tail ▸ O(1) prepend ▸ O(log(n)) lookup
  • 27.
    MORE PURE THANPURE ▸ Haskell is considered a pure functional language because IO is represented as actions with the IO monad. ▸ But this IO monad is still a relic in the language corresponding to the impure world. ▸ Lambda Zero has nothing IO-related in the language, so in some sense it’s even more pure. ▸ The main function takes a string parameter that is mapped to stdin and the return value is a string that gets mapped to stdout. main(input) ≔ input
  • 28.
    LAZY IO ▸ Thisrestricted form of IO is surprisingly capable due to the lazy evaluation of the IO streams. ▸ Bytes are read from stdin only when the program accesses the corresponding element of the input string. ▸ Bytes are sent to stdout as soon as they are computed even if the remainder of the output is not evaluated. ▸ This makes interactive IO possible.