Code, data, and model for our ACL 2023 paper Text-to-SQL Error Correction with Language Models of Code.
- Installation
- Data
- Preprocessing
- Training
- Evaluation
- Citation
Please run the following commands to create a conda environment in Python 3.9 with the required packages.
conda create -n sqledit python=3.9 pip conda activate sqledit pip install -r requirements.txtPlease first download the original Spider dataset from this link and unzip it in the data/ folder.
unzip spider.zip -d data/Then, please download our synthesized SQL error correction data from this link and also put them in the data/ folder.
The data/ folder should be organized as follows:
. ├─── data │ ├─── spider │ ├─── ... │ ├─── spider-dev-bridge.json │ ├─── spider-dev-codet5.json │ ├─── spider-dev-smbop.json │ ├─── spider-train-bridge.json │ ├─── spider-train-codet5.json │ ├─── spider-train-smbop.json │ ├─── sqledit_dev_gold.sql │ ... python run.py --preproc --use_content --query_type pydict --edit_type program --base_parser smbopmkdir model python run.py --train --load_checkpoint Salesforce/codet5-base --save_checkpoint model/codet5-sqledit --seed 42 --gpu 0python run.py --eval --load_checkpoint model/codet5-sqledit --gpu 0You may download our pre-trained model checkpoints from this link. It includes our CodeT5-PyDict+Program model trained for the three text-to-SQL base parser in our paper.
@inproceedings{chen-etal-2023-sqledit, title = "Text-to-SQL Error Correction with Language Models of Code", author = "Chen, Ziru and Chen, Shijie and White, Michael and Mooney, Raymond and Payani, Ali and Srinivasa, Jayanth and Su, Yu and Sun, Huan", booktitle = "Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)", year = "2023", address = "Toronto, Canada", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/2305.13073" }