Code and data for Distributional Correlation–Aware Knowledge Distillation for Stock Trading Volume Prediction (ECML-PKDD 22) [paper pdf]
The key idea is the turn the regression KD problem into a distributional matching problem:
We further incorporate the correlational alignments on the distribution predicted to enhance the performance:
We recommand to setup the running enviroment via conda:
conda create -n dckd python=3.7
conda activate dckd
conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
pip install -r requirements.txt
We collect the trading data of TPX500, can the original data of all tickets can be downloaded from Google Drive
Unzip the tpx500.zip
under the project root dir and you can check the topix500
directory to see the raw data.
For distillation, we first train a large teacher model with DeepAR on the whole dataset.
The training can be started by executing:
sh train_teacher.sh
Check the ar_kd_teacher.py
for corresponding setting parameters like number of model layers.
After training, the best teacher model will be saved at teacher_ckpt
and we can use it to train the student later.
Specify the teacher path in the run_kd.sh
and execute the script for training the student model:
sh run_kd.sh
We thank Zhiyuan Zhang for providing the code base.
If you find this repo and the data helpful, please kindly cite our paper:
@article{Li2022DistributionalCK,
title={Distributional Correlation-Aware Knowledge Distillation for Stock Trading Volume Prediction},
author={Lei Li and Zhiyuan Zhang and Ruihan Bao and Keiko Harimoto and Xu Sun},
journal={ArXiv},
year={2022},
volume={abs/2208.07232}
}