This project implements a machine learning approach to learn and suggest formatting conventions for Coq code, based on the paper "Learning to Format Coq Code Using Language Models" by Nie et al.
The project uses two approaches to learn Coq code formatting:
- N-gram model: A simple statistical model based on token sequences
- Neural model: A bidirectional LSTM network for more sophisticated formatting predictions
- Tokenization of Coq source code
- Space prediction between tokens
- Support for different coding styles (standard Coq and SSReflect/MathComp)
- Dataset statistics and analysis tools
- Model evaluation metrics
coq-format-learning/ ├── README.md ├── requirements.txt ├── data/ │ ├── sample_input/ # Sample Coq files │ │ ├── simple.v # Standard Coq style │ │ └── mathcomp_style.v # MathComp style │ └── processed/ # Processed token data ├── src/ │ ├── preprocessor.py # Token extraction │ ├── ngram_model.py # N-gram model │ ├── neural_model.py # Neural model │ └── utils.py # Helper functions └── train.py # Training script - Clone the repository:
git clone https://github.com/Shift-Happens/coq-format-learning.git cd coq-format-learning- Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Train models on sample data:
python train.py- Use specific files for training:
python train.py --input data/sample_input/simple.vNot assesed yet
Contributions are welcome! Here are some ways you can contribute:
- Add support for more Coq formatting features
- Improve the neural model architecture
- Add more sample Coq files
- Improve documentation
- Report issues
MIT License
If you use this code in your research, please cite:
@article{nie2020learning, title={Learning to Format Coq Code Using Language Models}, author={Nie, Pengyu and Palmskog, Karl and Li, Junyi Jessy and Gligoric, Milos}, journal={arXiv preprint arXiv:2006.16743}, year={2020} } This project is based on the research paper by Nie et al. from The University of Texas at Austin and KTH Royal Institute of Technology.