Alan Jeffrey, 1967–2024

My friend Alan Jeffrey passed away earlier this year. I described his professional life at a Celebration in Oxford on 2nd November 2024. This post is a slightly revised version of what I said.

Edinburgh, 1983–1987

I’ve known Alan for over 40 years—my longest-standing friend. We met at the University of Edinburgh in 1983, officially as computer science freshers together, but really through the clubs for science fiction and for role-playing games. Alan was only 16: like many in Scotland, he skipped the final school year for an earlier start at university. It surely helped that his school had no computers, so he wasted no time in transferring to a university that did. His brother David says that it also helped that he would then be able to get into the student union bars.

Oxford, 1987–1991

After Edinburgh, Alan and I wound up together again as freshers at the University of Oxford. We didn’t coordinate this; we independently and simultaneously applied to the same DPhil programme (Oxford’s name for the PhD). We were officemates for those 4 years, and shared a terraced hovel on St Mary’s Road in bohemian East Oxford with three other students for most of that time. He was clever, funny, kind, and serially passionate about all sorts of things. It was a privilege and a pleasure to have known him.

Alan had a career that spanned academia and industry, and he excelled at both. He described himself as a “semanticist”: using mathematics instead of English for precise descriptions of programming languages. He had already set out in that direction with his undergraduate project on concurrency under Robin Milner at Edinburgh; and he continued to work on concurrency for his DPhil under Bill Roscoe at Oxford, graduating in 1992.

Chalmers, 1991–1992

Alan spent the last year of his DPhil as a postdoc working for K V S Prasad at Chalmers University in Sweden. While there, he was assigned to host fellow Edinburgh alumnus Carolyn Brown visiting for an interview; Carolyn came bearing a bottle of malt whisky, as one does, which she and Alan proceeded to polish off together that evening.

Sussex, 1992–1999

Carolyn’s interview was successful; but by the time she arrived at Chalmers, Alan had left for a second postdoc under Matthew Hennessy at the University of Sussex. They worked together again when Carolyn was in turn hired as a lecturer at Sussex. In particular, they showed in 1994 that “string diagrams”—due to Roger Penrose and Richard Feynman in physics—provide a “fully abstract” calculus for hardware circuits, meaning that everything true of the diagrams is true of the hardware, and vice versa. This work foreshadowed a hot topic in the field of Applied Category Theory today.

Matthew essentially left Alan to his own devices: as Matthew put it, “something I was very happy with as he was an exceptional researcher”. Alan was soon promoted to a lectureship himself. He collaborated closely with Julian Rathke, then Matthew’s PhD student and later postdoc, on the Full Abstraction Factory project, developing a bunch more full abstraction results for concurrent and object-oriented languages. That fruitful collaboration continued even after Alan left Sussex.

DePaul, 1999–2004

Alan presented a paper A Fully Abstract Semantics for a Nondeterministic Functional Language with Monadic Types at the 1995 conference on Mathematical Foundations of Programming Semantics in New Orleans. I believe that this is where he met Karen Bernstein, who also had a paper. One thing led to another, and Alan took a one-year visiting position at DePaul University in Chicago in 1998, then formally left Sussex in 1999 for a regular Associate Professor position at DePaul. He lived in Chicago for the rest of his life.

Alan established the Foundations of Programming Languages research group at DePaul, attracting Radha Jagadeesan from Loyola, James Riely from Sussex, and Corin Pitcher from Oxford, working among other things on “relaxed memory”—modern processors don’t actually read and write their multiple levels of memory in the order you tell them to, when they can find quicker ways to do things concurrently and sometimes out of order.

James remembers showing Alan his first paper on relaxed memory, co-authored with Radha. Alan thought their approach was an “appalling idea”; the proper way was to use “event structures”, an idea from the 1980s. This turned in 2016 into a co-authored paper at LICS (Alan’s favourite conference), and what James considers his own best ever talk—an on-stage reenactment of the to and fro of their collaboration, sadly not recorded for posterity.

James was Alan’s most frequent collaborator over the years, with 14 joint papers. Their modus operandi was that, having identified a problem together, Alan would go off by himself and do some Alanny things, eventually coalescing on a solution, and choose an order of exposition, tight and coherent; this is about 40% of the life of the paper. But then there are various tweaks, extensions, corrections… Alan would never look at the paper again, and would be surprised years later to learn what was actually in it. However, Alan was always easy to work with: interested only in the truth, although it must be beautiful. He had a curious mix of modesty and egocentricity: always convinced he was right (and usually right that he was right). Still, he had no patience for boring stuff, especially university admin.

While at DePaul, Alan also had a significant collaboration with Andy Gordon from Microsoft on verifying security protocols. Their 2001 paper Authenticity by Typing for Security Protocols won a Test Of Time Award this year at the Symposium on Computer Security Foundations, “given to outstanding papers with enduring significance and impact”—recognition which happily Alan lived to see.

Bell Labs, 2004–2015

After the dot com crash in 2000, things got more difficult at DePaul, and Alan left in 2004 for Bell Labs, nominally as a member of technical staff in Naperville but actually part of a security group based at HQ in Murray Hill NJ. He worked on XPath, “a modal logic for XML”, with Michael Benedikt, now my databases colleague at Oxford. They bonded because only Alan and Michael lived in Chicago rather than the suburbs. Michael had shown Alan a recent award-winning paper in which Alan quickly spotted an error in a proof—an “obvious” and unproven lemma that turned out to be false—which led to their first paper together.

(A recurring pattern. Andy Gordon described Alan’s “uncanny ability to find bugs in arguments”: he found a type unsoundness bug in a released draft specification for Java, and ended up joining the standards committee to help fix it. And as a PhD examiner he “shockingly” found a subtle bug that unpicked the argument of half of the dissertation, necessitating major corrections: it took a brave student to invite Alan as examiner—or a very confident one.)

Michael describes Alan as an “awesome developer”. They once had an intern; it didn’t take long after the intern had left for Alan to discard the intern’s code and rewrite it from scratch. Alan was unusual in being able to combine Euro “abstract nonsense” and US engineering. Glenn Bruns, another Bell Labs colleague, said that “I think Alan was the only person I’ve met who could do theory and also low-level hackery”.

At Bell Labs Alan also worked with Peter Danielsen on the Web InterFace Language, WIFL for short: a way of embedding API descriptions in HTML. Peter recalls: “We spent a few months working together on the conceptual model. In the early stages of software development, however, Alan looked at what I’d written and said, “I wouldn’t do it that way at all!”, throwing it all away and starting over. The result was much better; and he inadvertently taught me a new way to think in JavaScript (including putting //Sigh… comments before unavoidable tedious code.)”

Mozilla Research, 2015–2020

The Bell Labs group dissolved in 2015, and Alan moved to Mozilla Research as a staff research engineer to work on Servo, a new web rendering engine in the under-construction programming language Rust.

For one of Alan’s projects at Mozilla, he took a highly under-specified part of the HTML specification about how web links and the back and forwards browser buttons should interact, created a formal model in Agda based on the existing specification, identified gaps in it as well as ways that major browsers did not match the model, then wrote it all up as a paper. Alan’s manager Josh Matthews recalls the editors of the HTML standard being taken aback by Alan’s work suddenly being dropped in their laps, but quickly appreciated how much more confidently they could make changes based on it.

Josh also recalled: “Similarly, any time other members of the team would talk about some aspect of the browser engine being safe as long as some property was upheld by the programmer, Alan would get twitchy. He had a soft spot for bad situations that looked like they could be made impossible by clever enough application of static types.”

In 2017 Alan made a rather surprising switch to working on augmented reality for the web, partly driven by internal politics at Mozilla. He took the lead on integrating Servo into the Magic Leap headset; the existing browser was clunky, the only way to interact with pages being via an awkward touchpad on the hand controller. This was not good enough for Alan: after implementing the same behaviour for Servo and finding it frustrating, he had several email exchanges with the Magic Leap developers, figured out how to access some interfaces that weren’t technically public but also were not actually private, and soon he proudly showed off a more natural laser pointer-style means of interacting with pages in augmented reality in Servo—to much acclaim from the team and testers.

Roblox, 2020–2024

Then in 2020, Mozilla’s funding stream got a lot more constrained, and Alan moved to the game platform company Roblox. Alan was a principal software engineer, and the language owner of Luau, “a fast, small, safe, gradually typed embeddable scripting language derived from Lua”, working on making the language easier to use, teach, and learn. Roblox supports more than two million “content creators”, mostly kids, creating millions of games a year; Alan’s goal was to empower them to build larger games with more characters.

The Luau product manager Bryan Nealer says that “people loved Alan”. Roblox colleagues appreciated his technical contributions: “Alan was meticulous in what he built and wrote at Roblox. He would stress not only the substance of his work, but also the presentation. His attention to detail inspired the rest of us!”; “One of the many wonderful things Alan did for us was to be the guy who could read the most abstruse academic research imaginable and translate it into something simple, useful, interesting, and even fun.” They also appreciated the more personal contributions: Alan led an internal paper reading group, meeting monthly to study some paper on programming or networking, but he also established the Roblox Book Club: “He was always thoughtful when discussing books, and challenged us to think about the text more deeply. He also had an encyclopedic knowledge of scifi. He recommended Iain M. Banks’s The Culture series to me, which has become my favorite scifi series. I think about him every time I pick up one of those books.”

Envoie

From my own perspective, one of the most impressive things about Alan is that he was impossible to pigeonhole: like Dr Who, he was continually regenerating. He explained to me that he got bored quickly with one area, and moved on to another. As well as his academic abilities, he was a talented and natural cartoonist: I still have a couple of the tiny fanzine comics he produced as a student.

Of course he did some serious science for his DPhil and later career: but he also took a strong interest in typography and typesetting. He digitized some beautiful Japanese crests for the chapter title pages of his DPhil dissertation. Alan dragged me in typography with him, a distraction I have enjoyed ever since. Among other projects, Alan and I produced a font containing some extra symbols so that we could use them in our papers, and named it St Mary’s Road after our Oxford digs. And Alan produced a full blackboard bold font, complete with lowercase letters and punctuation: you can see some of it in the order of service. But Alan was not satisfied with merely creating these things; he went to all the trouble to package them up properly and get them included in standard software distributions, so that they would be available for everyone: Alan loved to build things for people to use. These two fonts are still in regular use 35 years later, and I’m sure they will be reminding us of him for a long time to come.

Richard Bird, 1943-2022

My mentor, colleague, and friend Richard Bird died in April 2022 after a long battle with cancer. I wrote an obituary of him for The Guardian, his favoured newspaper; this post is a hybrid of that obituary and a eulogy I delivered at his funeral.

Richard was born in 1943 in London. His parents Kay and Jack were landlords of a series of pubs in South London. Richard attended St Olave’s Grammar School, then matriculated to read mathematics at Cambridge in 1960. After graduation, he had a brief spell working in sales for International Computers and Tabulators, the mainframe computer company that eventually became ICL. He was evidently effective in this role, being as personable as he was; but it involved flying around the country in small planes, which Richard did not enjoy at all. No amount of money or approval could persuade him to stay, so he left after a year to do postgraduate study at the University of London Institute of Computer Science.

Richard met Norma Lapworth at a friend’s 21st birthday party. Norma trained as an ecologist, later becoming a teacher, education advisor, and Ofsted schools inspector. Richard and Norma married in 1967.

After a year lecturing in Vancouver 1971-1972, Richard and Norma returned to the UK for Richard to take up a lectureship at the University of Reading. He was given the time finally to complete his PhD thesis, on Computational Complexity on Register Machines, in 1973. But Reading was a rather linguistically oriented department, which did not suit Richard’s mathematical temperament. So he leapt at the opportunity to join Tony Hoare’s expanding Programming Research Group at Oxford University Computing Laboratory in 1983. He was appointed as a University Lecturer in Computation, with a non-tutorial fellowship at St Cross College, assuming responsibility as Director of the MSc in Computation. He moved to a Tutorial Fellowship at Lincoln College in 1988. Richard stayed at the PRG for the remainder of his career, being promoted to Reader in Computation in 1990 and then to Professor in 1996, serving as Director of OUCL from 1998 to 2003, and finally retiring in 2008 after 25 years.

Richard’s research area was functional programming—an approach to computer programming centred around obeying the conventions of traditional mathematics. To Richard, it was self-evident that programs are mathematical entities, and that you can manipulate them in precisely the same way as you do quadratic equations. No immediate need for consideration of the capabilities of computer hardware; no entanglement in the vagaries of specific programming languages. Just algebra, pure and simple.

His PhD thesis in the early 1970s was already concerned with program transformation: the rules by which one program is shown to be equivalent to another. He wasn’t so interested in the program you actually end up with, so much as the process by which you got there, and how you know you ended up in the right place, and what alternative directions you didn’t follow en route. Specifically, he was interested in the calculus of equivalences with which you could conduct equational reasoning on programs, just as in high-school algebra.

In 1980 he was invited to participate in IFIP Working Group 2.1 on Algorithmic Languages and Calculi. This was the group that had designed the seminal programming language Algol68, and that continues to research into notations and techniques for designing and describing algorithms today. Richard had a very fruitful collaboration in WG2.1 with Lambert Meertens from the CWI at Amsterdam, developing what came to be known as the Bird-Meertens Formalism, or Squiggol to its friends. This was a concise and somewhat cryptic algebraic program notation with a powerful accompanying corpus of theorems, supporting the equational reasoning style that Richard sought.

Richard is known and respected around the world for his elegant writing. He published about 100 scientific papers in his life—not especially prolific for a scientist, but every one of those papers was very well polished. He also wrote or co-wrote seven books, starting with Programs and Machines (1976), which featured among other constructions a machine called NoRMa, the Number-Theoretic Register Machine.

I won’t go through all seven books, much as I would like to; I will just pick out a few highlights. Richard’s second book, Introduction to Functional Programming (1988, with Phil Wadler), set out his vision for teaching functional programming. Many people love this book. Graham Hutton wrote to me to say:

Richard set the standards for elegance and clarity, and the original Bird & Wadler is my all-time favourite book. It still stands up today as one of the best programming books ever written—every page is a gem.

Richard’s next book, Algebra of Programming (1996, with Oege de Moor), was much more research-oriented, extending his Squiggol agenda from functions to relations. Tim Sears tweeted:

This book blew my mind regarding what the relationship could be between math, programming and systems. I never met Richard, but his work has had a profound impact on me.

Richard’s fifth book Pearls of Functional Algorithm Design collected and repolished some of the Functional Pearls he had published over the years. These Pearls emulated Jon Bentley’s famous Programming Pearls column in CACM in the 1980s. Bentley explained:

Just as natural pearls grow from grains of sand that have irritated oysters, these programming pearls have grown from real problems that have irritated programmers. The programs are fun, and they teach important programming techniques and fundamental design principles.
Richard established and was the founding editor of the Functional Pearls column, from the very first issue in 1991 of the Journal of Functional Programming until he retired in 2008. He characterized Functional Pearls as “polished, elegant, instructive, entertaining”. He had high standards for them: he advised reviewers to
stop reading when you get bored, or the material gets too complicated, or too much specialist knowledge is needed, or the writing is bad.
He gave a keynote lecture about Pearls in 2006, where he observed that “most of them need more time in the oyster”.

Functional Pearls are the epitome of Richard’s writing style. Erik Meijer called him “the poet laureate of functional programming”. Greg Fitzgerald tweeted that

When life got hard, I’d read his pearls to settle my mind. His proofs read like poetry.
Ed Kmett commented that Richard’s pearls book
is the only real clear exposition of how to think like a functional programmer that I’ve ever seen

I was honoured to be Richard’s co-author on his final book, Algorithm Design with Haskell, written during his illness and published in 2020. Working with him was always a joy.

Despite the global influence of his publications, Richard always saw teaching as his main duty, and research as something he did for relaxation. Andrew Ker thanked Richard for the way he had taught him as a first-year undergraduate—a way which Andrew has passed on to his own students, and which he now sees them repeating in turn. Andrew wrote:

I’ve come to see that passing on our knowledge to the next generation is the most important thing we do, even more than generating new knowledge through research. And the ideas and attitudes persist through the generations. Your influence and example extends far beyond what you know, and countless students are grateful for it.

My own favourite Richard story is a teaching one. Richard was giving a tutorial about algorithm analysis: how many steps does this program take—for example, something about processing hierarchical tree-shaped data? You’d expect a question about counting steps to have an answer in whole numbers; but in fact, the answer often involves logarithms. One student asked why this was. Richard thought for a moment, then said “logs come from trees”. The students broke out in laughter, and it took Richard a while to work out why. In fact, I think this was Richard’s own favourite Richard story too, because I know that he continued to tell it in tutorials for years afterwards.

I want to close by mentioning Richard’s openness, considerateness, and egalitarianism. The departmental secretaries always looked forward to him coming in, because he treated them with the same friendliness that he treated the most eminent of his colleagues. Several once-overawed students have written to me to express gratitude for his welcome—“no Herr Professor Doktor Doktor here, we are all colleagues”—and for helping them to find their own way.

Richard’s collaborator Lambert Meertens says:

His lectures were a joy to attend, because of their clarity and the entertaining delivery. But I’ll remember him at least as much for his kindness, his generosity, his warm interest in people’s wellbeing—expressed not only in words, but also deeds.
Cezar Ionescu remembers being apprehensive about first meeting Richard at WG2.1—you should never meet your heroes—but says he
was exactly the way I had imagined him to be: very sharp, funny, wise. We talked about opera and Shakespeare, and fought (and won!) a hard battle for tap water with the hotel staff.

My own involvement with Richard started when I applied in 1987 to do a DPhil at Oxford, on a project that turned out to have finished by the time I arrived. I rattled around the Lab for a year, not even seeing my official supervisor. But Richard invited me to join his weekly Problem Solving Club. He took me under his wing, and I have never looked back. He became my DPhil supervisor, then some years later hired me into my current position, and I am privileged to have been able to call him my colleague and my friend. I basically owe him my whole career, and I have no words sufficient to convey my gratitude to him. I will miss him greatly.

The Genuine Sieve of Eratosthenes

The “Sieve of Eratosthenes” is often used as a nice example of the expressive power of streams (infinite lazy lists) and list comprehensions:

\displaystyle \begin{array}{@{}l} \mathit{primes} = \mathit{sieve}\;[2.\,.] \;\textbf{where} \\ \qquad \mathit{sieve}\;(p:\mathit{xs}) = p : \mathit{sieve}\;[ x \mid x \leftarrow \mathit{xs}, x \mathbin{\mathrm{mod}} p \ne 0] \end{array}

That is, {\mathit{sieve}} takes a stream of candidate primes; the head {p} of this stream is a prime, and the remaining primes are obtained by removing all multiples of {p} from the candidates and sieving what’s left. (It’s also a nice unfold.)

Unfortunately, as Melissa O’Neill observes, this nifty program is not the true Sieve of Eratosthenes! The problem is that for each prime {p}, every remaining candidate {x} is tested for divisibility. O’Neill calls this bogus common version “trial division”, and argues that the Genuine Sieve of Eratosthenes should eliminate every multiple of {p} without reconsidering all the candidates in between. That is, only {{}^{1\!}/_{\!2}} of the natural numbers are touched when eliminating multiples of 2, less than {{}^{1\!}/_{\!3}} of the remaining candidates for multiples of 3, and so on. As an additional optimization, it suffices to eliminate multiples of {p} starting with {p^2}, since by that point all composite numbers with a smaller nontrivial factor will already have been eliminated.

O’Neill’s paper presents a purely functional implementation of the Genuine Sieve of Eratosthenes. The tricky bit is keeping track of all the eliminations when generating an unbounded stream of primes, since obviously you can’t eliminate all the multiples of one prime before producing the next prime. Her solution is to maintain a priority queue of iterators; indeed, the main argument of her paper is that functional programmers are often too quick to use lists, when other data structures such as priority queues might be more appropriate.

O’Neill’s paper was published as a Functional Pearl in the Journal of Functional Programming, when Richard Bird was the handling editor for Pearls. The paper includes an epilogue that presents a purely list-based implementation of the Genuine Sieve, contributed by Bird during the editing process. And the same program pops up in Bird’s textbook Thinking Functionally with Haskell (TFWH). This post is about Bird’s program.

The Genuine Sieve, using lists

Bird’s program appears in §9.2 of TFWH. It deals with lists, but these will be infinite, sorted, duplicate-free streams, and you should think of these as representing infinite sets, in this case sets of natural numbers. In particular, there will be no empty or partial lists, only properly infinite ones.

Now, the prime numbers are what you get by eliminating the composite numbers from the “plural” naturals (those greater than one), and the composite numbers are the proper multiples of the primes—so the program is cleverly cyclic:

\displaystyle \begin{array}{@{}l@{\;}l} \multicolumn{2}{@{}l}{\mathit{primes}, \mathit{composites} :: [\mathit{Integer}]} \\ \mathit{primes} & = 2 : ([3.\,.] \mathbin{\backslash\!\backslash} \mathit{composites}) \\ \mathit{composites} & = \mathit{mergeAll}\;[ \mathit{multiples}\;p \mid p \leftarrow \mathit{primes} ] \\ \end{array}

We’ll come back in a minute to {\mathit{mergeAll}}, which unions a set of sets to a set; but {(\backslash\!\backslash)} is the obvious implementation of list difference of sorted streams (hence, representing set difference):

\displaystyle \begin{array}{@{}l@{\;}l@{\;}l} \multicolumn{3}{@{}l}{(\mathbin{\backslash\!\backslash}) :: \mathit{Ord}\;a \Rightarrow [a] \rightarrow [a] \rightarrow [a]} \\ (x:\mathit{xs}) \mathbin{\backslash\!\backslash} (y:\mathit{ys}) & \mid x < y & = x:(\mathit{xs} \mathbin{\backslash\!\backslash} (y:\mathit{ys})) \\ & \mid x == y & = \mathit{xs} \mathbin{\backslash\!\backslash} \mathit{ys} \\ & \mid x > y & = (x:\mathit{xs}) \mathbin{\backslash\!\backslash} \mathit{ys} \end{array}

and {\mathit{multiples}\;p} generates the multiples of {p} starting with {p^2}:

\displaystyle \begin{array}{@{}l} \mathit{multiples}\;p = \mathit{map}\; (p\times)\;[p.\,.] \end{array}

Thus, the composites are obtained by merging together the infinite stream of infinite streams {[ [ 4, 6.\,.], [ 9, 12.\,.], [ 25, 30.\,.], \dots ]}. You might think that you could have defined instead {\mathit{primes} = [2.\,.] \mathbin{\backslash\!\backslash} \mathit{composites}}, but this doesn’t work: this won’t compute the first prime without first computing some composites, and you can’t compute any composites without at least the first prime, so this definition is unproductive. Somewhat surprisingly, it suffices to “prime the pump” (so to speak) just with 2, and everything else flows freely from there.

Returning to {\mathit{mergeAll}}, here is the obvious implementation of {\mathit{merge}}, which merges two sorted duplicate-free streams into one (hence, representing set union):

\displaystyle \begin{array}{@{}l@{\;}l@{\;}l} \multicolumn{3}{@{}l}{\mathit{merge} :: \mathit{Ord}\;a \Rightarrow [a] \rightarrow [a] \rightarrow [a]} \\ \mathit{merge}\;(x:\mathit{xs})\;(y:\mathit{ys}) & \mid x < y & = x : \mathit{merge}\;\mathit{xs}\;(y:\mathit{ys}) \\ & \mid x == y & = x : \mathit{merge}\;\mathit{xs}\;\mathit{ys} \\ & \mid x > y & = y : \mathit{merge}\;(x:\mathit{xs})\;\mathit{ys} \end{array}

Then {\mathit{mergeAll}} is basically a stream fold with {\mathit{merge}}. You might think you could define this simply by {\mathit{mergeAll}\;(\mathit{xs}:\mathit{xss}) = \mathit{merge}\;\mathit{xs}\;(\mathit{mergeAll}\;\mathit{xss})}, but again this is unproductive. After all, you can’t merge the infinite stream of sorted streams {[ [5,6.\,.], [4,5.\,.], [3,4.\,.], \dots ]} into a single sorted stream, because there is no least element. Instead, we have to exploit the fact that we have a sorted stream of sorted streams; then the binary merge can exploit the fact that the head of the left stream is the head of the result, without even examining the right stream. So, we define:

\displaystyle \begin{array}{@{}l} \mathit{mergeAll} :: \mathit{Ord}\;a \Rightarrow [[a]] \rightarrow [a] \\ \mathit{mergeAll}\;(\mathit{xs}:xss) = \mathit{xmerge}\;\mathit{xs}\;(\mathit{mergeAll}\;xss) \medskip \\ \mathit{xmerge} :: \mathit{Ord}\;a \Rightarrow [a] \rightarrow [a] \rightarrow [a] \\ \mathit{xmerge}\;(x:\mathit{xs})\;\mathit{ys} = x : \mathit{merge}\;\mathit{xs}\;\mathit{ys} \end{array}

This program is now productive, and {\mathit{primes}} yields the infinite sequence of prime numbers, using the genuine algorithm of Eratosthenes.

Approximation Lemma

Bird uses the cyclic program as an illustration of the Approximation Lemma. This states that for infinite lists {\mathit{xs}}, {\mathit{ys}},

\displaystyle (\mathit{xs} = \mathit{ys}) \quad \Leftrightarrow \quad (\forall n \in \mathbb{N} \mathbin{.} \mathit{approx}\;n\;\mathit{xs} = \mathit{approx}\;n\;\mathit{ys})

where

\displaystyle \begin{array}{@{}l@{\;}l@{\;}l} \multicolumn{3}{@{}l}{\mathit{approx} :: \mathit{Integer} \rightarrow [a] \rightarrow [a]} \\ \mathit{approx}\;(n+1)\;[\,] & = & [\,] \\ \mathit{approx}\;(n+1)\;(x:\mathit{xs}) & = & x : \mathit{approx}\;n\;\mathit{xs} \end{array}

Note that {\mathit{approx}\;0\;\mathit{xs}} is undefined; the function {\mathit{approx}\;n} preserves the outermost {n} constructors of a list, but then chops off anything deeper and replaces it with {\bot} (undefined). So, the lemma states that to prove two infinite lists equal, it suffices to prove equal all their finite approximations.

Then to prove that {\mathit{primes}} does indeed produce the prime numbers, it suffices to prove that

\displaystyle \mathit{approx}\;n\;\mathit{primes} = p_1 : \cdots : p_n : \bot

for all {n}, where {p_j} is the {j}th prime (so {p_1=2}). Bird therefore defines

\displaystyle \mathit{prs}\;n = \mathit{approx}\;n\;\mathit{primes}

and claims that

\displaystyle \begin{array}{@{}ll} \mathit{prs}\;n & = \mathit{approx}\;n\;(2 : ([3.\,.] \mathbin{\backslash\!\backslash} \mathit{crs}\;n)) \\ \mathit{crs}\;n & = \mathit{mergeAll}\;[\mathit{multiples}\;p \mid p \leftarrow \mathit{prs}\;n] \end{array}

To prove the claim, he observes that it suffices for {\mathit{crs}\;n} to be well defined at least up to the first composite number greater than {p_{n+1}}, because then {\mathit{crs}\;n} delivers enough composite numbers to supply {\mathit{prs}\;(n+1)}, which will in turn supply {\mathit{crs}\;(n+1)}, and so on. It is a “non-trivial result in Number Theory” (in fact, a consequence of Bertrand’s Postulate) that {p_{n+1} < p_n^2}; therefore it suffices that

\displaystyle \mathit{crs}\;n = c_1 : \cdots : c_m : \bot

where {c_j} is the {j}th composite number (so {c_1=4}) and {c_m = p_n^2}. Completing the proof is set as Exercise 9.I of TFWH, and Answer 9.I gives a hint about using induction to show that {\mathit{crs}\;(n+1)} is the result of merging {\mathit{crs}\;n} with {\mathit{multiples}\;p_{n+1}}. (Incidentally, there is a typo in TFWH: both the body of the chapter and the exercise and its solution have “{m = p_n^2}” instead of “{c_m = p_n^2}”.)

Unfortunately, the hint in Answer 9.I is simply wrong. For example, it implies that {\mathit{crs}\;2} (which equals {4:6:8:9:\bot}) could be constructed from {\mathit{crs}\;1} (which equals {4:\bot}) and {\mathit{multiples}\;3} (which equals {[9,12.\,.]}); but this means that the 6 and 8 would have to come out of nowhere. Nevertheless, the claim in Exercise 9.I is valid. What should the hint for the proof have been?

Fixing the proof

The problem is made tricky by the cyclicity. Tom Schrijvers suggested to me to break the cycle by defining

\displaystyle \mathit{crsOf}\;\mathit{qs} = \mathit{mergeAll}\;[\mathit{multiples}\;p \mid p \leftarrow \mathit{qs}]

so that {\mathit{crs}\;n = \mathit{crsOf}\;(\mathit{prs}\;n)}; this allows us to consider {\mathit{crsOf}} in isolation from {\mathit{prs}}. In fact, I claim (following a suggestion generated by Geraint Jones and Guillaume Boisseau at our Algebra of Programming research group) that if {\mathit{qs} = q_1:\dots:q_n:\bot} where {1 < q_1 < \cdots < q_n}, then {\mathit{crsOf}\;\mathit{qs}} has a well-defined prefix consisting of all the composite numbers {c} such that {c \le q_n^2} and {c} is a multiple of some {q_i} with {c \ge q_i^2}, in ascending order, then becomes undefined. That’s not so difficult to see from the definition: {\mathit{multiples}\;q_i} contains the multiples of {q_i} starting from {q_i^2}; all these streams get merged; but the result gets undefined (in {\mathit{merge}}) once we pass the head {q_n^2} of the last stream. In particular, {\mathit{crsOf}\;(\mathit{prs}\;n)} has a well-defined prefix consisting of all the composites up to {p_n^2}, then becomes undefined, as required.

However, I must say that I still find this argument unsatisfactory: I have no equational proof about the behaviour of {\mathit{crsOf}}. In fact, I believe that the Approximation Lemma is unsufficient to provide such a proof: {\mathit{approx}} works along the grain of {\mathit{prs}}, but against the grain of {\mathit{crs}}. I believe we want something instead like an “Approx While Lemma”, where {\mathit{approxWhile}} is to {\mathit{approx}} as {\mathit{takeWhile}} is to {\mathit{take}}:

\displaystyle \begin{array}{@{}l} \mathit{approxWhile} :: (a \rightarrow \mathit{Bool}) \rightarrow [a] \rightarrow [a] \\ \mathit{approxWhile}\;p\;(x:\mathit{xs}) \mid p\;x = x : \mathit{approxWhile}\;p\;\mathit{xs} \end{array}

Thus, {\mathit{approxWhile}\;p\;\mathit{xs}} retains elements of {\mathit{xs}} as long as they satisfy {p}, but becomes undefined as soon as they don’t (or when the list runs out). Then for all sorted, duplicate-free streams {\mathit{xs},\mathit{ys}} and unbounded streams {\mathit{bs}} (that is, {\forall n \in \mathbb{N} \mathbin{.} \exists b \in \mathit{bs} \mathbin{.} b > n}),

\displaystyle (\mathit{xs} = \mathit{ys}) \quad \Leftrightarrow \quad (\forall b \in \mathit{bs} \mathbin{.} \mathit{approxWhile}\;(\le b)\;\mathit{xs} = \mathit{approxWhile}\;(\le b)\;\mathit{ys})

The point is that {\mathit{approxWhile}} works conveniently for {\mathit{merge}} and {(\mathbin{\backslash\!\backslash})} and hence for {\mathit{crs}} too. I am now working on a proof by equational reasoning of the correctness (especially the productivity) of the Genuine Sieve of Eratosthenes…

How to design co-programs

I recently attended the Matthias Felleisen Half-Time Show, a symposium held in Boston on 3rd November in celebration of Matthias’s 60th birthday. I was honoured to be able to give a talk there; this post is a record of what I (wished I had) said.

Matthias Felleisen

Matthias is known for many contributions to the field of Programming Languages. He received the SIGPLAN Programming Languages Achievement Award in 2012, the citation for which states:

He introduced evaluation contexts as a notation for specifying operational semantics, and progress-and-preservation proofs of type safety, both of which are used in scores of research papers each year, often without citation. His other contributions include small-step operational semantics for control and state, A-normal form, delimited continuations, mixin classes and mixin modules, a fully-abstract semantics for Sequential PCF, web programming techniques, higher-order contracts with blame, and static typing for dynamic languages.

Absent from this list, perhaps because it wasn’t brought back into the collective memory until 2013, is a very early presentation of the idea of using effects and handlers for extensible language design.

However, perhaps most prominent among Matthias’s contributions to our field is a long series of projects on teaching introductory programming, from TeachScheme! through Program By Design to a latest incarnation in the form of the How to Design Programs textbook (“HtDP”), co-authored with Robby Findler, Matthew Flatt, and Shriram Krishnamurthi, and now in its second edition. The HtDP book is my focus in this post; I have access only the First Edition, but the text of the Second Edition is online. I make no apologies for using Haskell syntax in my examples, but at least that gave Matthias something to shout at!

Design Recipes

One key aspect of HtDP is the emphasis on design recipes for solving programming tasks. A design recipe is a template for the solution to a problem: a contract for the function, analogous to a type signature (but HtDP takes an untyped approach, so this signature is informal); a statement of purpose; a function header; example inputs and outputs; and a skeleton of the function body. Following the design recipe entails completing the template—filling in a particular contract, etc—then fleshing out the function body from its skeleton, and finally testing the resulting program against the initial examples.

The primary strategy for problem solving in the book is via analysis of the structure of the input. When the input is composite, like a record, the skeleton should name the available fields as likely ingredients of the solution. When the input has “mixed data”, such as a union type, the skeleton should enumerate the alternatives, leading to a case analysis in the solution. When the input is of a recursive type, the skeleton encapsulates structural recursion—a case analysis between the base case and the inductive case, the latter case entailing recursive calls.

So, the design recipe for structural recursion looks like this:

Phase Goal Activity
Data Analysis and Design to formulate a data definition develop a data definition for mixed data with at least two alternatives; one alternative must not refer to the definition; explicitly identify all self-references in the data definition
Contract Purpose and Header to name the function; to specify its classes of input data and its class of output data; to describe its purpose; to formulate a header name the function, the classes of input data, the class of output data, and specify its purpose:

;; name : in1 in2 … –> out

;; to compute … from x1 …

(define (name x1 x2 …) …)

Examples to characterize the input-output relationship via examples create examples of the input-output relationship; make sure there is at least one example per subclass
Template to formulate an outline develop a cond-expression with one clause per alternative; add selector expressions to each clause; annotate the body with natural recursions; Test: the self-references in this template and the data definition match!
Body to define the function formulate a Scheme expression for each simple cond-line; explain for all other cond-clauses what each natural recursion computes according to the purpose statement
Test to discover mistakes (“typos” and logic) apply the function to the inputs of the examples; check that the outputs are as predicted

The motivating example for structural recursion is Insertion Sort: recursing on the tail of a non-empty list and inserting the head into the sorted subresult.

\displaystyle \begin{array}{@{}ll} \multicolumn{2}{@{}l}{\mathit{insertsort} :: [\mathit{Integer}] \rightarrow [\mathit{Integer}]} \\ \mathit{insertsort}\;[\,] &= [\,] \\ \mathit{insertsort}\;(a:x) &= \mathit{insert}\;a\;(\mathit{insertsort}\;x) \medskip \\ \multicolumn{2}{@{}l}{\mathit{insert} :: \mathit{Integer} \rightarrow [\mathit{Integer}] \rightarrow [\mathit{Integer}]} \\ \mathit{insert}\;b\;[\,] &= [b] \\ \mathit{insert}\;b\;(a:x) \\ \qquad \mid b \le a &= b : a : x \\ \qquad \mid b > a &= a : \mathit{insert}\;b\;x \end{array}

A secondary, more advanced, strategy is to use generative recursion, otherwise known as divide-and-conquer. The skeleton in this design recipe incorporates a test for triviality; in the non-trivial cases, it splits the problem into subproblems, recursively solves the subproblems, and assembles the subresults into an overall result. The motivating example for generative recursion is QuickSort (but not the fast in-place version): dividing a non-empty input list into two parts using the head as the pivot, recursively sorting both parts, and concatenating the results with the pivot in the middle.

\displaystyle \begin{array}{@{}ll} \multicolumn{2}{@{}l}{\mathit{quicksort} :: [\mathit{Integer}] \rightarrow [\mathit{Integer}]} \\ \mathit{quicksort}\;[\,] & = [\,] \\ \mathit{quicksort}\;(a:x) &= \mathit{quicksort}\;y \mathbin{{+}\!\!\!{+}} [a] \mathbin{{+}\!\!\!{+}} \mathit{quicksort}\;z \\ & \qquad \mathbf{where}\; \begin{array}[t]{@{}ll} y &= [ b \mid b \leftarrow x, b \le a ] \\ z &= [ b \mid b \leftarrow x, b > a ] \end{array} \end{array}

As far as I can see, no other program structures than structural recursion and generative recursion are considered in HtDP. (Other design recipes are considered, in particular accumulating parameters and imperative features. But these do not determine the gross structure of the resulting program. In fact, I believe that the imperative recipe has been dropped in the Second Edition.)

Co-programs

My thesis is that HtDP has missed an opportunity to reinforce its core message, that data structure determines program structure. Specifically, I believe that the next design recipe to consider after structural recursion, in which the shape of the program is determined by the shape of the input, should be structural corecursion, in which the shape of the program is determined instead by the shape of the output.

More concretely, a function that generates “mixed output”—whether that is a union type, or simply a boolean—might be defined by case analysis over the output. A function that generates a record might be composed of subprograms that generate each of the fields of that record. A function that generates a recursive data structure from some input data might be defined with a case analysis as to whether the result is trivial, and for non-trivial cases with recursive calls to generate substructures of the result. HtDP should present explicit design recipes to address these possibilities, as it does for program structures determined by the input data.

For an example of mixed output, consider a program that may fail, such as division, guarded so as to return an alternative value in that case:

\displaystyle \begin{array}{@{}lll} \multicolumn{3}{@{}l}{\mathit{safeDiv} :: \mathit{Integer} \rightarrow \mathit{Integer} \rightarrow \mathsf{Maybe}\;\mathit{Integer}} \\ \mathit{safeDiv}\;x\;y & \mid y == 0 & = \mathit{Nothing} \\ & \mid \mathbf{otherwise} & = \mathit{Just}\;(x \mathbin{\underline{\smash{\mathit{div}}}} y) \end{array}

The program performs a case analysis, and of course the analysis depends on the input data; but the analysis is not determined by the structure of the input, only its value. So a better explanation of the program structure is that it is determined by the structure of the output data.

For an example of generating composite output, consider the problem of extracting a date, represented as a record:

\displaystyle \mathbf{data}\;\mathit{Date} = \mathit{Date} \{ \mathit{day} :: \mathit{Day}, \mathit{month} :: \mathit{Month}, \mathit{year} :: \mathit{Year} \}

from a formatted string. The function is naturally structured to match the output type:

\displaystyle \begin{array}{@{}ll} \multicolumn{2}{@{}l}{\mathit{readDate} :: \mathit{String} \rightarrow \mathit{Date}} \\ \mathit{readDate}\;s & = \mathit{Date}\;\{ \mathit{day} = d, \mathit{month} = m, \mathit{year} = y \} \\ & \qquad \mathbf{where}\; \begin{array}[t]{@{}ll} d & = ... s ... \\ m & = ... s ... \\ y & = ... s ... \end{array} \end{array}

For an example of corecursion, consider the problem of “zipping” together two input lists to a list of pairs—taking {[1,2,3]} and {[4,5,6]} to {[(1,4),(2,5),(3,6)]}, and for simplicity let’s say pruning the result to the length of the shorter input. One can again solve the problem by case analysis on the input, but the fact that there are two inputs makes that a bit awkward—whether to do case analysis on one list in favour of the other, or to analyse both in parallel. For example, here is the outcome of case analysis on the first input, followed if that is non-empty by a case analysis on the second input:

\displaystyle \begin{array}{@{}ll} \multicolumn{2}{@{}l}{\mathit{zip} :: [\alpha] \rightarrow [\beta] \rightarrow [(\alpha,\beta)]} \\ \mathit{zip}\;x\;y &= \begin{array}[t]{@{}l} \mathbf{if}\; \mathit{null}\;x \;\mathbf{then}\; [\,] \;\mathbf{else} \\ \qquad \mathbf{if}\; \mathit{null}\;y \;\mathbf{then}\; [\,] \;\mathbf{else} \\ \qquad \qquad (\mathit{head}\;x,\mathit{head}\;y) : \mathit{zip}\;(\mathit{tail}\;x)\;(\mathit{tail}\;y) \end{array} \end{array}

Case analysis on both inputs would lead to four cases rather than three, which would not be an improvement. One can instead solve the problem by case analysis on the output—and it is arguably more natural to do so, because there is only one output rather than two. When is the output empty? (When either input is empty.) If it isn’t empty, what is the head of the output? (The pair of input heads.) And from what data is the tail of the output recursively constructed? (The pair of input tails.)

\displaystyle \begin{array}{@{}ll} \multicolumn{2}{@{}l}{\mathit{zip} :: [\alpha] \rightarrow [\beta] \rightarrow [(\alpha,\beta)]} \\ \mathit{zip}\;x\;y &= \begin{array}[t]{@{}l} \mathbf{if}\; \mathit{null}\;x \lor \mathit{null}\;y \;\mathbf{then}\; [\,] \;\mathbf{else} \\ \qquad (\mathit{head}\;x,\mathit{head}\;y) : \mathit{zip}\;(\mathit{tail}\;x)\;(\mathit{tail}\;y) \end{array} \end{array}

And whereas Insertion Sort is a structural recursion over the input list, inserting elements one by one into a sorted intermediate result, Selection Sort is a structural corecursion towards the output list, repeatedly extracting the minimum remaining element as the next element of the output: When is the output empty? (When the input is empty.) If the output isn’t empty, what is its head? (The minimum of the input.) And from what data is the tail recursively generated? (The input without this minimum element.)

\displaystyle \begin{array}{@{}ll} \multicolumn{2}{@{}l}{\mathit{selectSort} :: [\mathit{Integer}] \rightarrow [\mathit{Integer}]} \\ \mathit{selectSort}\;x & = \mathbf{if}\; \mathit{null}\;x \;\mathbf{then}\; [\,] \;\mathbf{else}\; a : \mathit{selectSort}\;(x \mathbin{\backslash\!\backslash} [a]) \\ & \qquad \mathbf{where}\; a = \mathit{minimum}\;x \end{array}

(here, {x \mathbin{\backslash\!\backslash} y} denotes list {x} with the elements of list {y} removed).

I’m not alone in making this assertion. Norman Ramsey wrote a nice paper On Teaching HtDP; his experience led him to the lesson that

Last, and rarely, you could design a function’s template around the introduction form for the result type. When I teach [HtDP] again, I will make my students aware of this decision point in the construction of a function’s template: should they use elimination forms, function composition, or an introduction form? They should use elimination forms usually, function composition sometimes, and an introduction form rarely.

He also elaborates on test coverage:

Check functional examples to be sure every choice of input is represented. Check functional examples to be sure every choice of output is represented. This activity is especially valuable for functions returning Booleans.

(his emphasis). We should pay attention to output data structure as well as to input data structure.

Generative recursion, aka divide-and-conquer

Only once this dual form of program structure has been explored should students be encouraged to move on to generative recursion, because this exploits both structural recursion and structural corecursion. For example, the QuickSort algorithm that is used as the main motivating example is really structured as a corecursion {\mathit{build}} to construct an intermediate tree, followed by structural recursion {\mathit{flatten}} over that tree to produce the resulting list:

\displaystyle \begin{array}{@{}ll} \multicolumn{2}{@{}l}{\mathit{quicksort} :: [\mathit{Integer}] \rightarrow [\mathit{Integer}]} \\ \mathit{quicksort} & = \mathit{flatten} \cdot \mathit{build} \medskip \\ \multicolumn{2}{@{}l}{\mathbf{data}\;\mathit{Tree} = \mathit{Empty} \mid \mathit{Node}\;\mathit{Tree}\;\mathit{Integer}\;\mathit{Tree}} \medskip\\ \multicolumn{2}{@{}l}{\mathit{build} :: [\mathit{Integer}] \rightarrow \mathit{Tree}} \\ \mathit{build}\;[\,] & = \mathit{Empty} \\ \mathit{build}\;(a:x) &= \mathit{Node}\;(\mathit{build}\;y)\;a\;(\mathit{build}\;z) \\ & \qquad \mathbf{where}\; \begin{array}[t]{@{}ll} y &= [ b \mid b \leftarrow x, b \le a ] \\ z &= [ b \mid b \leftarrow x, b > a ] \end{array} \medskip \\ \multicolumn{2}{@{}l}{\mathit{flatten} :: \mathit{Tree} \rightarrow [\mathit{Integer}]} \\ \mathit{flatten}\;\mathit{Empty} & = [\,] \\ \mathit{flatten}\;(\mathit{Node}\;t\;a\;u) &= \mathit{flatten}\;t \mathbin{{+}\!\!\!{+}} [a] \mathbin{{+}\!\!\!{+}} \mathit{flatten}\;u \end{array}

The structure of both functions {\mathit{build}} and {\mathit{flatten}} is determined by the structure of the intermediate {\mathit{Tree}} datatype, the first as structural corecursion and the second as structural recursion. A similar explanation applies to any divide-and-conquer algorithm; for example, MergeSort is another divide-and-conquer sorting algorithm, with the same intermediate tree shape, but this time with a simple splitting phase and all the comparisons in the recombining phase:

\displaystyle \begin{array}{@{}ll} \multicolumn{2}{@{}l}{\mathit{mergesort} :: [\mathit{Integer}] \rightarrow [\mathit{Integer}]} \\ \mathit{mergesort} & = \mathit{mergeAll} \cdot \mathit{splitUp} \medskip \\ \multicolumn{2}{@{}l}{\mathit{splitUp} :: [\mathit{Integer}] \rightarrow \mathit{Tree}} \\ \mathit{splitUp}\;[\,] & = \mathit{Empty} \\ \mathit{splitUp}\;(a:x) &= \mathit{Node}\;(\mathit{splitUp}\;y)\;a\;(\mathit{splitUp}\;z) \\ & \qquad \mathbf{where}\; (y,z) = \mathit{halve}\;x \medskip \\ \multicolumn{2}{@{}l}{\mathit{halve} :: [\mathit{Integer}] \rightarrow ([\mathit{Integer}],[\mathit{Integer}])} \\ \mathit{halve}\;[\,] & = ( [\,], [\,]) \\ \mathit{halve}\;[a] & = ( [a], [\,]) \\ \mathit{halve}\;(a:b:x) & = (a:y,b:z) \;\mathbf{where}\; (y,z) = \mathit{halve}\;x \medskip \\ \multicolumn{2}{@{}l}{\mathit{mergeAll} :: \mathit{Tree} \rightarrow [\mathit{Integer}]} \\ \mathit{mergeAll}\;\mathit{Empty} & = [\,] \\ \mathit{mergeAll}\;(\mathit{Node}\;t\;a\;u) &= \mathit{merge}\;(\mathit{mergeAll}\;t)\;(\mathit{merge}\;[a]\;(\mathit{mergeAll}\;u)) \medskip \\ \multicolumn{2}{@{}l}{\mathit{merge} :: ([\mathit{Integer}],[\mathit{Integer}]) \rightarrow [\mathit{Integer}]} \\ \mathit{merge}\;[\,]\;y & = y \\ \mathit{merge}\;x\;[\,] & = x \\ \mathit{merge}\;(a:x)\;(b:y) & = \begin{array}[t]{@{}l@{}l@{}l} \mathbf{if}\; a \le b\; & \mathbf{then}\; & a : \mathit{merge}\;x\;(b:y) \\ & \mathbf{else}\; & b : \mathit{merge}\;(a:x)\;y \end{array} \end{array}

(Choosing this particular tree type is a bit clunky for Merge Sort, because of the two calls to {\mathit{merge}} required in {\mathit{mergeAll}}. It would be neater to use non-empty externally labelled binary trees, with elements at the leaves and none at the branches:

\displaystyle \begin{array}{@{}l} \mathbf{data}\;\mathit{Tree}_2 = \mathit{Tip}\;\mathit{Integer} \mid \mathit{Bin}\;\mathit{Tree}_2\;\mathit{Tree}_2 \end{array}

Then you define the main function to work only for non-empty lists, and provide a separate case for sorting the empty list. This clunkiness is a learning opportunity: to realise the problem, come up with a fix (no node labels in the tree), rearrange the furniture accordingly, then replay the development and compare the results.)

Having identified the two parts, structural recursion and structural corecursion, they may be studied separately; separation of concerns is a crucial lesson in introductory programming. Moreover, the parts may be put together in different ways. The divide-and-conquer pattern is known in the MPC community as a hylomorphism, an unfold to generate a call tree followed by a fold to consume that tree. As the QuickSort example suggests, the tree can always be deforested—it is a virtual data structure. But the converse pattern, of a fold from some structured input to some intermediate value, followed by an unfold to a different structured output, is also interesting—you can see this as a change of structured representation, so I called it a metamorphism. One simple application is to convert a number from an input base (a sequence of digits in that base), via an intermediate representation (the represented number), to an output base (a different sequence of digits). More interesting applications include encoding and data compression algorithms, such as arithmetic coding.

Laziness

Although I have used Haskell as a notation, nothing above depends on laziness; it would all work as well in ML or Scheme. It is true that the mathematical structures underlying structural recursion and structural corecursion are prettier when you admit infinite data structures—the final coalgebra of the base functor for lists is the datatype of finite and infinite lists, and without admitting the infinite ones some recursive definitions have no solution. But that sophistication is beyond the scope of introductory programming, and it suffices to restrict attention to finite data structures.

HtDP already stipulates a termination argument in the design recipe for generative recursion; the same kind of argument should be required for structural corecursion (and is easy to make for the sorting examples given above). Of course, structural recursion over finite data structures is necessarily terminating. Laziness is unnecessary for co-programming.

Structured programming

HtDP actually draws on a long tradition on relating data structure and program structure, which is a theme dear to my own heart (and indeed, the motto of this blog). Tony Hoare wrote in his Notes on Data Structuring in 1972:

There are certain close analogies between the methods used for structuring data and the methods for structuring a program which processes that data.

Fred Brooks put it pithily in The Mythical Man-Month in 1975:

Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.

which was modernized by Eric Raymond in his 1997 essay The Cathedral and the Bazaar to:

Show me your code and conceal your data structures, and I shall continue to be mystified. Show me your data structures, and I won’t usually need your code; it’ll be obvious.

HtDP credits Jackson Structured Programming as partial inspiration for the design recipe approach. As Jackson wrote, also in 1975:

The central theme of this book has been the relationship between data and program structures. The data provides a model of the problem environment, and by basing our program structures on data structures we ensure that our programs will be intelligible and easy to maintain.

and

The structure of a program must be based on the structures of all of the data it processes.

(my emphasis). In a retrospective lecture in 2001, he clarified:

program structure should be dictated by the structure of its input and output data streams

—the JSP approach was designed for processing sequential streams of input records to similar streams of output records, and the essence of the approach is to identify the structures of each of the data files (input and output) in terms of sequences, selections, and iterations (that is, as regular languages), to refine them all to a common structure that matches all of them simultaneously, and to use that common data structure as the program structure. So even way back in 1975 it was clear that we need to pay attention to the structure of output data as well as to that of input data.

Ad-hockery

HtDP apologizes that generative recursion (which it identifies with the field of algorithm design)

is much more of an ad hoc activity than the data-driven design of structurally recursive functions. Indeed, it is almost better to call it inventing an algorithm than designing one. Inventing an algorithm requires a new insight—a “eureka”.

It goes on to suggest that mere programmers cannot generally be expected to have such algorithmic insights:

In practice, new complex algorithms are often developed by mathematicians and mathematical computer scientists; programmers, though, must th[o]roughly understand the underlying ideas so that they can invent the simple algorithms on their own and communicate with scientists about the others.

I say that this defeatism is a consequence of not following through on the core message, that data structure determines program structure. QuickSort and MergeSort are not ad hoc. Admittedly, they do require some insight in order to identify structure that is not present in either the input or the output. But having identified that structure, there is no further mystery, and no ad-hockery required.

Folds on lists

The previous post turned out to be rather complicated. In this one, I return to much simpler matters: {\mathit{foldr}} and {\mathit{foldl}} on lists, and list homomorphisms with an associative binary operator. For simplicity, I’m only going to be discussing finite lists.

Fold right

Recall that {\mathit{foldr}} is defined as follows:

\displaystyle \begin{array}{@{}ll} \multicolumn{2}{@{}l}{\mathit{foldr} :: (\alpha \rightarrow \beta \rightarrow \beta) \rightarrow \beta \rightarrow [\alpha] \rightarrow \beta} \\ \mathit{foldr}\;(\oplus)\;e\;[\,] & = e \\ \mathit{foldr}\;(\oplus)\;e\;(a:x) & = a \oplus \mathit{foldr}\;(\oplus)\;e\;x \end{array}

It satisfies the following universal property, which gives necessary and sufficient conditions for a function {h} to be expressible as a {\mathit{foldr}}:

\displaystyle \begin{array}{@{}l} h = \mathit{foldr}\;(\oplus)\;e \quad\Longleftrightarrow\quad h\;[\,] = e \;\land\; h\;(a:x) = a \oplus (h\;x) \end{array}

As one consequence of the universal property, we get a fusion theorem, which states sufficient conditions for fusing a following function {g} into a {\mathit{foldr}}:

\displaystyle \begin{array}{@{}l} g \cdot \mathit{foldr}\;(\oplus)\;e = \mathit{foldr}\; (\otimes)\;e' \quad \Longleftarrow\quad g\;e = e' \;\land\; g\;(a \oplus b) = a \otimes g\;b \end{array}

(on finite lists—infinite lists require an additional strictness condition). Fusion is an equivalence if {\mathit{foldr}\;(\oplus)\;e} is surjective. If {\mathit{foldr}\;(\oplus)\;e} is not surjective, it’s an equivalence on the range of that fold: the left-hand side implies

\displaystyle g\;(a \oplus b) = a \otimes g\;b

only for {b} of the form {\mathit{foldr}\;(\oplus)\;e\;x} for some {x}, not for all {b}.

As a second consequence of the universal property (or alternatively, as a consequence of the free theorem of the type of {\mathit{foldr}}), we have map–fold fusion:

\displaystyle \mathit{foldr}\;(\oplus)\;e \cdot \mathit{map}\;g = \mathit{foldr}\;((\oplus) \cdot g)\;e

The code golf{((\oplus) \cdot g)}” is a concise if opaque way of writing the binary operator pointfree: {((\oplus) \cdot g)\;a\;b = g\;a \oplus b}.

Fold left

Similarly, {\mathit{foldl}} is defined as follows:

\displaystyle \begin{array}{@{}ll} \multicolumn{2}{@{}l}{\mathit{foldl} :: (\beta \rightarrow \alpha \rightarrow \beta) \rightarrow \beta \rightarrow [\alpha] \rightarrow \beta} \\ \mathit{foldl}\;(\oplus)\;e\;[\,] & = e \\ \mathit{foldl}\;(\oplus)\;e\;(a:x) & = \mathit{foldl}\;(\oplus)\;(e \oplus a)\;x \end{array}

({\mathit{foldl}} is tail recursive, so it is unproductive on an infinite list.)

{\mathit{foldl}} also enjoys a universal property, although it’s not so well known as that for {\mathit{foldr}}. Because of the varying accumulating parameter {e}, the universal property entails abstracting that argument on the left in favour of a universal quantification on the right:

\displaystyle \begin{array}{@{}l} h = \mathit{foldl}\;(\oplus) \quad\Longleftrightarrow\quad \forall b \;.\; h\;b\;[\,] = b \;\land\; h\;b\;(a:x) = h\;(b \oplus a)\;x \end{array}

(For the proof, see Exercise 16 in my paper with Richard Bird on arithmetic coding.)

From the universal property, it is straightforward to prove a map–{\mathit{foldl}} fusion theorem:

\displaystyle \mathit{foldl}\;(\oplus)\;e \cdot \mathit{map}\;f = \mathit{foldl}\;((\cdot f) \cdot (\oplus))\;e

Note that {((\cdot f) \cdot (\oplus))\;b\;a = b \oplus f\;a}. There is also a fusion theorem:

\displaystyle (\forall e \;.\; f \cdot \mathit{foldl}\;(\oplus)\;e = \mathit{foldl}\;(\otimes)\;(f\;e)) \quad\Longleftarrow\quad f\;(b \oplus a) = f\;b \otimes a

This is easy to prove by induction. (I would expect it also to be a consequence of the universal property, but I don’t see how to make that go through.)

Duality theorems

Of course, {\mathit{foldr}} and {\mathit{foldl}} are closely related. §3.5.1 of Bird and Wadler’s classic text presents a First Duality Theorem:

\displaystyle \mathit{foldr}\;(\oplus)\;e\;x = \mathit{foldl}\;(\oplus)\;e\;x

when {\oplus} is associative with neutral element {e}; a more general Second Duality Theorem:

\displaystyle \mathit{foldr}\;(\oplus)\;e\;x = \mathit{foldl}\;(\otimes)\;e\;x

when {\oplus} associates with {\otimes} (that is, {a \oplus (b \otimes c) = (a \oplus b) \otimes c}) and {a \oplus e = e \otimes a} (for all {a,b,c}, but fixed {e}); and a Third Duality Theorem:

\displaystyle \mathit{foldr}\;(\oplus)\;e\;x = \mathit{foldl}\;(\mathit{flip}\;f)\;e\;(\mathit{reverse}\;x)

(again, all three only for finite {x}).

The First Duality Theorem is a specialization of the Second, when {{\oplus} = {\otimes}} (but curiously, also a slight strengthening: apparently all we require is that {a \oplus e = e \oplus a}, not that both equal {a}; for example, we still have {\mathit{foldr}\;(+)\;1 = \mathit{foldl}\;(+)\;1}, even though {1} is not the neutral element for {+}).

Characterization

When is a function a fold?” Evidently, if function {h} on lists is injective, then it can be written as a {\mathit{foldr}}: in that case (at least, classically speaking), {h} has a post-inverse—a function {g} such that {g \cdot h = \mathit{id}}—so:

\displaystyle \begin{array}{@{}ll} & h\;(a:x) \\ = & \qquad \{ g \cdot h = \mathit{id} \} \\ & h\;(a : g\;(h\;x)) \\ = & \qquad \{ \mbox{let } a \oplus b = h\;(a : g\;b) \} \\ & a \oplus h\;x \end{array}

and so, letting {e = h\;[\,]}, we have

\displaystyle h = \mathit{foldr}\;(\oplus)\;e

But also evidently, injectivity is not necessary for a function to be a fold; after all, {\mathit{sum}} is a fold, but not the least bit injective. Is there a simple condition that is both sufficient and necessary for a function to be a fold? There is! The condition is that lists equivalent under {h} remain equivalent when extended by an element:

\displaystyle h\;x = h\;x' \quad\Longrightarrow\quad h\;(a:x) = h\;(a:x')

We say that {h} is leftwards. (The kernel {\mathrm{ker}\;h} of a function {h} is a relation on its domain, namely the set of pairs {(x,x')} such that {h\;x = h\;x'}; this condition is equivalent to {\mathrm{ker}\;h \subseteq \mathrm{ker}\;(h \cdot (a:))} for every {a}.) Clearly, leftwardsness is necessary: if {h = \mathit{foldr}\;(\oplus)\;e} and {h\;x = h\;x'}, then

\displaystyle \begin{array}{@{}ll} & h\;(a:x) \\ = & \qquad \{ h \mbox{ as } \mathit{foldr} \} \\ & a \oplus h\;x \\ = & \qquad \{ \mbox{assumption} \} \\ & a \oplus h\;x' \\ = & \qquad \{ h \mbox{ as } \mathit{foldr} \mbox{ again} \} \\ & h\;(a:x') \end{array}

Moreover, leftwardsness is sufficient. For, suppose that {h} is leftwards; then pick a function {g} such that, when {b} is in the range of {h}, we get {g\;b = x} for some {x} such that {h\;x = b} (and it doesn’t matter what value {g} returns for {b} outside the range of {h}), and then define

\displaystyle a \oplus b = h\;(a : g\;b)

This is a proper definition of {\oplus}, on account of leftwardsness, in the sense that it doesn’t matter which value {x} we pick for {g\;b}, as long as indeed {h\;x = b}: any other value {x'} that also satisfies {h\;x' = b} entails the same outcome {h\;(a:x') = h\;(a:x)} for {a \oplus b}. Intuitively, it is not necessary to completely invert {h} (as we did in the injective case), provided that {h} preserves enough distinctions. For example, for {h = \mathit{sum}} (for which indeed {\mathit{sum}\;x = \mathit{sum}\;x'} implies {\mathit{sum}\;(a:x) = \mathit{sum}\;(a:x')}), we could pick {g\;b = [b]}. In particular, {h\;x} is obviously in the range of {h}; then {g\;(h\;x)} is chosen to be some {x'} such that {h\;x' = h\;x}, and so by construction {h\;(g\;(h\;x)) = h\;x}—in other words, {g} acts as a kind of partial inverse of {h}. So we get:

\displaystyle \begin{array}{@{}ll} & a \oplus h\;x \\ = & \qquad \{ \mbox{definition of } \oplus \mbox{; let } x' = g\;(h\;x) \} \\ & h\;(a : x') \\ = & \qquad \{ h\;x' = h\;x \mbox{; assumption} \} \\ & h\;(a:x) \\ \end{array}

and therefore {h = \mathit{foldr}\;(\oplus)\;e} where {e = h\;[\,]}.

Monoids and list homomorphisms

A list homomorphism is a fold after a map in a monoid. That is, define

\displaystyle \begin{array}{@{}ll} \multicolumn{2}{@{}l}{\mathit{hom} :: (\beta\rightarrow\beta\rightarrow\beta) \rightarrow (\alpha\rightarrow\beta) \rightarrow \beta \rightarrow [\alpha] \rightarrow \beta} \\ \mathit{hom}\;(\odot)\;f\;e\;[\,] & = e \\ \mathit{hom}\;(\odot)\;f\;e\;[a] & = f\;a \\ \mathit{hom}\;(\odot)\;f\;e\;(x \mathbin{{+}\!\!\!{+}} y) & = \mathit{hom}\;(\odot)\;f\;e\;x \;\odot\; \mathit{hom}\;(\odot)\;f\;e\;y \end{array}

provided that {\odot} and {e} form a monoid. (One may verify that this condition is sufficient for the equations to completely define the function; moreover, they are almost necessary—{\odot} and {e} should form a monoid on the range of {\mathit{hom}\;(\odot)\;f\;e}.) The Haskell library {\mathit{Data.Foldable}} defines an analogous method, whose list instance is

\displaystyle \mathit{foldMap} :: \mathit{Monoid}\;\beta \Rightarrow (\alpha \rightarrow \beta) \rightarrow [\alpha] \rightarrow \beta

—the binary operator {\odot} and initial value {e} are determined implicitly by the {\mathit{Monoid}} instance rather than being passed explicitly.

Richard Bird’s Introduction to the Theory of Lists states implicitly what might be called the First Homomorphism Theorem, that any homomorphism consists of a reduction after a map (in fact, a consequence of the free theorem of the type of {\mathit{hom}}):

\displaystyle \mathit{hom}\;(\odot)\;f\;e = \mathit{hom}\;(\odot)\;\mathit{id}\;e \cdot \mathit{map}\;f

Explicitly as Lemma 4 (“Specialization”) the same paper states a Second Homomorphism Theorem, that any homomorphism can be evaluated right-to-left or left-to-right, among other orders:

\displaystyle \mathit{hom}\;(\odot)\;f\;e = \mathit{foldr}\;(\lambda\;a\;b \;.\; f\;a \odot b)\;e = \mathit{foldl}\;(\lambda\;b\;a \;.\; b \odot f\;a)\;e

I wrote up the Third Homomorphism Theorem, which is the converse of the Second Homomorphism Theorem: any function that can be written both as a {\mathit{foldr}} and as a {\mathit{foldl}} is a homomorphism. I learned this theorem from David Skillicorn, but it was conjectured earlier by Richard Bird and proved by Lambert Meertens during a train journey in the Netherlands—as the story goes, Lambert returned from a bathroom break with the idea for the proof. Formally, for given {h}, if there exists {\oplus, \otimes, e} such that

\displaystyle h = \mathit{foldr}\;(\oplus)\;e = \mathit{foldl}\;(\otimes)\;e

then there also exists {f} and associative {\odot} such that

\displaystyle h = \mathit{hom}\;(\odot)\;f\;e

Recall that {h} is “leftwards” if

\displaystyle h\;x = h\;x' \Longrightarrow h\;(a:x) = h\;(a:x')

and that the leftwards functions are precisely the {\mathit{foldr}}s. Dually, {h} is rightwards if

\displaystyle h\;x = h\;x' \Longrightarrow h\;(x \mathbin{{+}\!\!\!{+}} [a]) = h\;(x' \mathbin{{+}\!\!\!{+}} [a])

and (as one can show) the rightwards functions are precisely the {\mathit{foldl}}s. So the Specialization Theorem states that every homomorphism is both leftwards and rightwards, and the Third Homomorphism Theorem states that every function that is both leftwards and rightwards is necessarily a homomorphism. To prove the latter, suppose that {h} is both leftwards and rightwards; then pick a function {g} such that, for {b} in the range of {h}, we get {g\;b = x} for some {x} such that {h\;x = b}, and then define

\displaystyle b \odot c = h\;(g\;b \mathbin{{+}\!\!\!{+}} g\;c)

As before, this is a proper definition of {\odot}: assuming leftwardsness and rightwardsness, the result does not depend on the representatives chosen for {g}. By construction, {g} again satisfies {h\;(g\;(h\;x)) = h\;x} for any {x}, and so we have:

\displaystyle \begin{array}{@{}ll} & h\;x \odot h\;y \\ = & \qquad \{ \mbox{definition of } \odot \} \\ & h\;(g\;(h\;x) \mathbin{{+}\!\!\!{+}} g\;(h\;y)) \\ = & \qquad \{ h \mbox{ is rightwards; } h\;(g\;(h\;x)) = h\;x \} \\ & h\;(x \mathbin{{+}\!\!\!{+}} g\;(h\;y)) \\ = & \qquad \{ h \mbox{ is leftwards; } h\;(g\;(h\;y)) = h\;y \} \\ & h\;(x \mathbin{{+}\!\!\!{+}} y) \end{array}

Moreover, one can show that {\odot} is associative, and {e} its neutral element (at least on the range of {h}).

For example, sorting a list can obviously be done as a {\mathit{foldr}}, using insertion sort; by the Third Duality Theorem, it can therefore also be done as a {\mathit{foldl}} on the reverse of the list; and because the order of the input is irrelevant, the reverse can be omitted. The Third Homomorphism Theorem then implies that there exists an associative binary operator {\odot} such that {\mathit{sort}\;(x \mathbin{{+}\!\!\!{+}} y) = \mathit{sort}\;x \odot \mathit{sort}\;y}. It also gives a specification of {\odot}, given a suitable partial inverse {g} of {\mathit{sort}}—in this case, {g = \mathit{id}} suffices, because {\mathit{sort}} is idempotent. The only characterization of {\odot} arising directly from the proof involves repeatedly inserting the elements of one argument into the other, which does not exploit sortedness of the first argument. But from this inefficient characterization, one may derive the more efficient implementation that merges two sorted sequences, and hence obtain merge sort overall.

The Trick

Here is another relationship between the directed folds ({\mathit{foldr}} and {\mathit{foldl}}) and list homomorphisms—any directed fold can be implemented as a homomorphism, followed by a final function application to a starting value:

\displaystyle \mathit{foldr}\;(\oplus)\;e\;x = \mathit{hom}\;(\cdot)\;(\oplus)\;\mathit{id}\;x\;e

Roughly speaking, this is because application and composition are related:

\displaystyle a \oplus (b \oplus (c \oplus e)) = (a\oplus) \mathbin{\$} (b\oplus) \mathbin{\$} (c\oplus) \mathbin{\$} e = ((a\oplus) \cdot (b\oplus) \cdot (c\oplus))\;e

(here, {\$ } is right-associating, loose-binding function application). To be more precise:

\displaystyle \begin{array}{@{}ll} & \mathit{foldr}\;(\oplus)\;e \\ = & \qquad \{ \mbox{map-fold fusion: } (\$)\;(a\oplus)\;b = a \oplus b \} \\ & \mathit{foldr}\;(\$)\;e \cdot \mathit{map}\;(\oplus) \\ = & \qquad \{ \mbox{fusion: } (\$ e)\;\mathit{id} = e \mbox{ and } (\$ e)\;((\oplus) \cdot f) = (\oplus)\;(f\;e) = (\$)\;(\oplus)\;((\$ e)\;f) \} \\ & (\$ e) \cdot \mathit{foldr}\;(\cdot)\;\mathit{id} \cdot \mathit{map}\;(\oplus) \\ = & \qquad \{ (\cdot) \mbox{ and } \mathit{id} \mbox{ form a monoid} \} \\ & (\$ e) \cdot \mathit{hom}\;(\cdot)\;(\oplus)\;\mathit{id} \end{array}

This result has many applications. For example, it’s the essence of parallel recognizers for regular languages. Language recognition looks at first like an inherently sequential process; a recognizer over an alphabet {\Sigma} can be represented as a finite state machine over states {S}, with a state transition function of type {S \times \Sigma \rightarrow S}, and such a function is clearly not associative. But by mapping this function over the sequence of symbols, we get a sequence of {S \rightarrow S} functions, which should be subject to composition. Composition is of course associative, so can be computed in parallel, in {\mathrm{log}\;n} steps on {n} processors. Better yet, each such {S \rightarrow S} function can be represented as an array (since {S} is finite), and the composition of any number of such functions takes a fixed amount of space to represent, and a fixed amount of time to apply, so the {\mathrm{log}\;n} steps take {\mathrm{log}\;n} time.

Similarly for carry-lookahead addition circuits. Binary addition of two {n}-bit numbers with an initial carry-in proceeds from right to left, adding bit by bit and taking care of carries, producing an {n}-bit result and a carry-out; this too appears inherently sequential. But there aren’t many options for the pair of input bits at a particular position: two {1}s will always generate an outgoing carry, two {0}s will always kill an incoming carry, and a {1} and {0} in either order will propagate an incoming carry to an outgoing one. Whether there is a carry-in at a particular bit position is computed by a {\mathit{foldr}} of the bits to the right of that position, zipped together, starting from the initial carry-in, using the binary operator {\oplus} defined by

\displaystyle \begin{array}{@{}ll} (1,1) \oplus b & = 1 \\ (0,0) \oplus b & = 0 \\ (x,y) \oplus b & = b \end{array}

Again, applying {\oplus} to a bit-pair makes a bit-to-bit function; these functions are to be composed, and function composition is associative. Better, such functions have small finite representations as arrays (indeed, we need a domain of only three elements, {G, K, P}; {G} and {K} are both left zeroes of composition, and {P} is neutral). Better still, we can compute the carry-in at all positions using a {\mathit{scanr}}, which for an associative binary operator can also be performed in parallel in {\mathrm{log}\;n} steps on {n} processors.

The Trick seems to be related to Cayley’s Theorem, on monoids rather than groups: any monoid is equivalent to a monoid of endomorphisms. That is, corresponding to every monoid {(M,{\oplus},e)}, with {M} a set, and {(\oplus) :: M \rightarrow M \rightarrow M} an associative binary operator with neutral element {e :: M}, there is another monoid {(N,(\cdot),\mathit{id})} on the carrier {N = \{ (x\oplus) \mid x \in M \}} of endomorphisms of the form {(x\oplus)}; the mappings {(\oplus) :: M \rightarrow N} and {(\$ e) :: N \rightarrow M} are both monoid homomorphisms, and are each other’s inverse. (Cayley’s Theorem is what happens to the Yoneda Embedding when specialized to the one-object category representing a monoid.) So any list homomorphism corresponds to a list homomorphism with endomorphisms as the carrier, followed by a single final function application:

\displaystyle \begin{array}{@{}ll} & \mathit{hom}\;(\oplus)\;f\;e \\ = & \qquad \{ \mbox{Second Homomorphism Theorem} \} \\ & \mathit{foldr}\;((\oplus) \cdot f)\;e \\ = & \qquad \{ \mbox{The Trick} \} \\ & (\$ e) \cdot \mathit{hom}\;(\cdot)\;((\oplus) \cdot f)\;\mathit{id} \end{array}

Note that {(\oplus) \cdot f} takes a list element {a} the endomorphism {(f\,a\, \oplus)}. The change of representation from {M} to {N} is the same as in The Trick, and is also what underlies Hughes’s novel representation of lists; but I can’t quite put my finger on what these both have to do with Cayley and Yoneda.