This repo contains code for our paper on learning production functions from supply chain networks (under review). Our code extends code from the TGB and TGB_Baselines repositories.
pip install -r requirements.txt
pip install -e ./TGB/
- register_data: convert raw data into standardized transaction-level data, construct hypergraphs
- TGB/examples/linkproppred/general: run experiments
- TGB/modules: model architecture and training modules
- TGB/tgb/datasets: pre- and post-processed datasets
- TGB/tgb/linkproppred: dataset, logging, evaluation, and negative sampling frameworks
- TGB/tgb/utils: constants and utils for experiments
We run experiments on the two real-world supply chain datasets and three synthetic datasets from our simulator, SupplySim: a standard setting with high exogenous supply (“SS-std”), a setting with shocks to exogenous supply (“SS-shocks”), and a setting with missing transactions (“SS-missing”). We release synthetic datasets in this repository, but we are not able to release the real-world datasets due to their proprietary nature.
SupplySim is implemented in ./TGB/modules/synthetic_data.py
. This file contains code to generate static graphs, generate exogenous supply and demand schedules, and generate the time-varying transactions. See Section 3.2 and Appendix B.2 for details. Once the transactions are generated, create a new directory called {DATASET_NAME}
in ./TGB/tgb/datasets/
, and save the transactions as {DATASET_NAME}_transactions.csv
.
We also built pipelines to work with real-world datasets (Tesla and SEM). See Section 3.1 and Appendix B.1 for details on data sources.
First, create a directory named {DATASET_NAME}
in the ./TGB/tgb/datasets/
directory and place the raw CSV files inside. Make sure the raw CSV file contains transaction-level information including but not limited to time, supplier, buyer, product hs code, and amount (measured in quntity, weight, price, or etc.).
Next, transform the raw data into the standardized format {DATASET_NAME}_transactions.csv
by running the following command in the root directory:
python ./register_data/preprocess_{DATASET_NAME}.py --ARGS
This preprocessing file will differ slightly across datasets based on dataset properties.
Finally, transform the transactions data, saved as {DATASET_NAME}_transactions.csv
for both synthetic and real-world data, into the format expected by model experiments (e.g., represent as hypergraph, do negative sampling). To do this, run the following command in the root directory:
python ./register_data/register_hypergraph.py --ARGS
This command will generate files including the edgelist, sampled negatives for the train/val/test splits, and supplementary metadata (e.g., mapping fom node IDs to firm & product names).
The repository supports a large variety of training setups, so here we list the commands for running model experiments described in the submission.
./TGB/modules/prod_learning.py
contains code to test the inventory module and different baseline methods at learning production functions. In the paper, we reported three baselines including temporal correlations, PMI, and node2vec, and reported results when learning attention directly and when using product embeddings (see Table 1). For example, if we use
compare_methods_on_data('sem', ['corr', 'pmi', 'node2vec', 'inventory'], save_results=True)
we will obtain results on the SEM dataset for those four methods, and results are saved.
For these experiments, cd
into the ./TGB/examples/linkproppred/general/
directory. In the commands below, remember to specify your dataset of interest, and if applicable utilize hyperparameters reported in Appendix C.2 Table 4.
To run Edgebank, use the following command:
python test_hyper_edgebank.py
To run SC-TGN, use the following command:
python model_experiments.py --train_with_fixed_samples --model tgnpl --memory_name tgnpl --emb_name attn --ARGS
Other variations of above including static, graph transformer, SC-TGN (id) replace --memory_name
and --emb_name
arguments with static
and id
, static
and attn
, and tgnpl
and id
, respectively. The model is comparable with the original TGN (Huang et al., 2023), if one removes --train_with_fixed_samples
and adds the arguments --init_memory_not_learnable
, --update_penalty 0
, and --skip_amount
to the above command (see comparison of SC-TGN vs TGN in Appendix A.2).
To run SC-GraphMixer, use the following command:
python model_experiments.py --train_with_fixed_samples --model graphmixer --ARGS
To include the inventory module, simply add the flag --use_inventory
. There are also a number of other optional parameters related to the inventory module. For example, we find that providing initial attention weights via --att_weights
helps with model training. To run an experiment where the inventory module's attention weights are set to the ground-truth production functions, the functions should be provided via --prod_graph {PROD_GRAPH_FILE}.pkl
, and use --fix_inventory
so that the attention weights are not updated during training (see Appendix C.2 and Table 6).