This repository contains the prototype compiler for my master's thesis.
It is a proof-of-concept implementation for a modified version of the C-- language showcasing type inference in the context of systems programming and inference-guided automatic resource management.
- cabal (version 3.2+; preferably version 3.6.2 acquired through ghcup)
- ghc (version 8.8.4 - 9.0.2; preferably version 8.10.7 acquired through ghcup)
- llc (version 12+; preferably version 12)
Two modified cabal packages included as git submodules in the vendor directory:
- llvm-hs-pure from https://github.com/jiriklepl/llvm-hs, modified version of https://github.com/llvm-hs/llvm-hs
- llvm-hs-pretty from https://github.com/jiriklepl/llvm-hs-pretty, modified version of https://github.com/llvm-hs/llvm-hs-pretty
To get the repository:
git clone https://github.com/jiriklepl/masters-thesis-code.git cd masters-thesis-codeTo build the project:
git submodule init git submodule update cabal build --only-dependencies Compiler cabal build Compiler # development documentation: cabal haddock CompilerFor general use:
cabal run Compiler -- [options...] input_fileFor /dev/stdout output, use:
cabal run Compiler -- -o - input_fileThe usage of the compiler is explained by the standardized --help:
cabal run Compiler -- -hFor running the testing examples in the folder examples/:
./run_examples.shSome of the examples test the compiler for invalid input, which will cause several error messages to be printed out during the process.
The source code is documented with Haddock. You can build the HTML documentation by running:
cabal haddock CompilerThis command is quite verbose but its output ends with the location of the index file of the generated documentation (it should be somewhere in the dist-newstyle folder, most likely in dist-newstyle/build/<arch>/<ghc-version>/Compiler-0.1.0.0/x/Compiler/doc/html/Compiler/Compiler/index.html).
When reading the source files of the program, we suggest using Haskell Language Server (HLS), which parses the documentation comments and makes the documentation more easily accessible.
The project contains many modules of varying significance documented with Haddock, here we list the main ones:
CMM.Pipeline: contains the high-level logic of the compiler and wrappers for the main phases of the compiler pipelineCMM.Lexer: contains the definition of the tokenization phaseCMM.Parser: contains the definition of the parsing phaseCMM.AST: contains the definitions of various abstract syntactic tree (AST) nodes used as an representation of the programCMM.AST.Flattener: defines the functionflatten, which flattens the given ASTCMM.AST.Blockifier: defines the functionblockify, which blockifies the procedures in the given AST, annotating each statement with a block annotation (that assigns the given statement to a corresponding basic block). It also produces theBlockifierState, which is refined byCMM.FlowAnalysisCMM.FlowAnalysis: defines the flow analysis for a given procedure. It is issued byCMM.Blockifierand refines itsBlockifierStateCMM.Inference.Preprocess: contains the definition of the functionpreprocess, which performs the inference preprocessing phase: elaborates the AST and generates the constraints that represent the type semantics of the programCMM.Inference: defines the functionreduce, which performs the inference pipeline on the given set of constraints (facts), producing anInferencerStatethat can interpret types of the elaborated AST- The various data are defined in:
CMM.Inference.Type,CMM.Inference.TypeCompl,CMM.Inference.TypeKind,CMM.Inference.Properties,CMM.Inference.Fact,CMM.Inference.DataKind,CMM.Inference.Constness
- The various data are defined in:
CMM.Monomorphize: contains the definition of the functionmonomorphize, which monomorphizes the given program represented by elaborated AST. It uses theInferencerStateto interpret each typeCMM.FillElabs,CMM.Mangle: these two modules define the postprocessing of monomorphized code - filling-in concrete types in place of type variables according and name-mangling of monomorphic copies of polymorphic top-level definitionsCMM.Translator: contains the definition of the functiontranslate, which performs the translation phase on an elaborated and blockified AST - emission of the LLVM assembly. It usesBlockifierStateandInferencerStateto interpret the control flow and types, respectively
Modules LLVM.* in directory vendor/* are from modified versions of packages llvm-hs-pure and llvm-hs-pretty
The source files are formatted by hindent and checked by hlint (should not produce any hints). The source should compile without warnings (tested with ghc-8.10.7). The script .\run_examples.sh should compile all example files and successfully interpret them by llc.