Coq Code Formatter Learning

This project implements a machine learning approach to learn and suggest formatting conventions for Coq code, based on the paper "Learning to Format Coq Code Using Language Models" by Nie et al.

Overview

The project uses two approaches to learn Coq code formatting:

N-gram model: A simple statistical model based on token sequences
Neural model: A bidirectional LSTM network for more sophisticated formatting predictions

Features

Tokenization of Coq source code
Space prediction between tokens
Support for different coding styles (standard Coq and SSReflect/MathComp)
Dataset statistics and analysis tools
Model evaluation metrics

Project Structure

coq-format-learning/ ├── README.md ├── requirements.txt ├── data/ │ ├── sample_input/ # Sample Coq files │ │ ├── simple.v # Standard Coq style │ │ └── mathcomp_style.v # MathComp style │ └── processed/ # Processed token data ├── src/ │ ├── preprocessor.py # Token extraction │ ├── ngram_model.py # N-gram model │ ├── neural_model.py # Neural model │ └── utils.py # Helper functions └── train.py # Training script

Setup

Clone the repository:

git clone https://github.com/Shift-Happens/coq-format-learning.git cd coq-format-learning

Create and activate a virtual environment:

python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Usage

Train models on sample data:

python train.py

Use specific files for training:

python train.py --input data/sample_input/simple.v

Model Performance

Not assesed yet

Contributing

Contributions are welcome! Here are some ways you can contribute:

Add support for more Coq formatting features
Improve the neural model architecture
Add more sample Coq files
Improve documentation
Report issues

License

MIT License

Citation

If you use this code in your research, please cite:

@article{nie2020learning, title={Learning to Format Coq Code Using Language Models}, author={Nie, Pengyu and Palmskog, Karl and Li, Junyi Jessy and Gligoric, Milos}, journal={arXiv preprint arXiv:2006.16743}, year={2020} }

Acknowledgments

This project is based on the research paper by Nie et al. from The University of Texas at Austin and KTH Royal Institute of Technology.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Coq Code Formatter Learning

Overview

Features

Project Structure

Setup

Usage

Model Performance

Contributing

License

Citation

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data/sample_input		data/sample_input
models		models
src		src
README.md		README.md
neural_model.py		neural_model.py
ngram_model.py		ngram_model.py
preprocessor.py		preprocessor.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Shift-Happens/coq-format-learning

Folders and files

Latest commit

History

Repository files navigation

Coq Code Formatter Learning

Overview

Features

Project Structure

Setup

Usage

Model Performance

Contributing

License

Citation

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages