Skip to content

Latest commit

 

History

History
82 lines (71 loc) · 2.73 KB

README.md

File metadata and controls

82 lines (71 loc) · 2.73 KB

Auto-SQL-Correction

Code, data, and model for our ACL 2023 paper Text-to-SQL Error Correction with Language Models of Code.

Table of Contents

  1. Installation
  2. Data
  3. Preprocessing
  4. Training
  5. Evaluation
  6. Citation

Installation

Please run the following commands to create a conda environment in Python 3.9 with the required packages.

conda create -n sqledit python=3.9 pip
conda activate sqledit
pip install -r requirements.txt

Data

Please first download the original Spider dataset from this link and unzip it in the data/ folder.

unzip spider.zip -d data/

Then, please download our synthesized SQL error correction data from this link and also put them in the data/ folder.

The data/ folder should be organized as follows:

.
├───  data
│    ├───  spider
│        ├───  ...
│    ├───  spider-dev-bridge.json
│    ├───  spider-dev-codet5.json
│    ├───  spider-dev-smbop.json
│    ├───  spider-train-bridge.json
│    ├───  spider-train-codet5.json
│    ├───  spider-train-smbop.json
│    ├───  sqledit_dev_gold.sql
│   ...

Preprocessing

python run.py --preproc --use_content --query_type pydict --edit_type program --base_parser smbop

Training

mkdir model
python run.py --train --load_checkpoint Salesforce/codet5-base --save_checkpoint model/codet5-sqledit --seed 42 --gpu 0

Evaluation

python run.py --eval --load_checkpoint model/codet5-sqledit --gpu 0

Model Checkpoints

You may download our pre-trained model checkpoints from this link. It includes our CodeT5-PyDict+Program model trained for the three text-to-SQL base parser in our paper.

Citation

@inproceedings{chen-etal-2023-sqledit,
    title = "Text-to-SQL Error Correction with Language Models of Code",
    author = "Chen, Ziru  and
      Chen, Shijie  and
      White, Michael  and
      Mooney, Raymond  and
      Payani, Ali  and
      Srinivasa, Jayanth  and
      Su, Yu  and
      Sun, Huan",
    booktitle = "Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)",
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/2305.13073"
}