NeuDep is a tool to detect binary memory dependencies.
First, create the conda environment,
conda create -n neudep python=3.9 scipy scikit-learn requests
and activate the conda environment:
conda activate neudep
Then, install the latest PyTorch (assume you have GPU):
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
Enter the neudep root directory: e.g., path/to/neudep
, and install neudep:
pip install --editable .
Install PyArrow for large datasets:
pip install pyarrow
For faster training install NVIDIA's apex library:
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \
--global-option="--deprecated_fused_adam" --global-option="--xentropy" \
--global-option="--fast_multihead_attn" ./
checkpoints
- store the pretrained and finetuned checkpoints. We provide the pretrained model here:
command
- scripts to pretrain and finetune the model. Hyperparameters are included in the training scripts too, e.g., command/finetune/table3/finetune_table3_all.sh
we have MAX_SENTENCES=32
indicating batch size is 32.
fairseq
- implementation of model architecture, preprocessing pipeline, and training task loss
data-src
- preprocessed dataset for model to binarize. We put the data for pretraining at here, and finetuning at here
data-bin
- stores binarized dataset for actual training