We finetuned DeBERTa on Mind2Web data to train it to classify whether an element is relevant for a given intent. To reproduce the results, run:
python train.py
the outputs will be saved by default to deberta_results
. The logs can be visualized with Tensorboard:
tensorboard --log_dir deberta_results
First, make sure to clone WebArena Repo and set up as the instruction goes. Then replace the run.py
with the run.py
in the deberta_api
directory.
# Installing relevant dependencies of fastAPI and host the deberta API on a machine with GPU
./start_api.sh
# Try running python deberta_inference.py to test if the api works.
python deberta_inference.py
# Run tests on WebArena + DeBERTa with a sampled list of tasks on the following script
./parallel_run.sh
# Clone ColBERT repo on a device with GPU
git clone https://github.com/stanford-futuredata/ColBERT.git
# run the mind2web_recalls.py inside the ColBERT repo
python mind2web_recalls.py
Follow the official WebArena instructions in setting up the environment.
Then, to reproduce the GPT-3.5 baseline, run:
./gpt3.5_test.sh
python logfile_to_csv.py
The default arguments are to host LLaMA-2-70B at half precision (fp16) on a server with 4 GPUs. To do so, run:
bash launch_llama70b_server.sh
After the server is running, we can make POST requests to it, as per the instructions in lti-llm-deployment.