11711-webarena

Training Accessibility Tree Element Classification Models

We finetuned DeBERTa on Mind2Web data to train it to classify whether an element is relevant for a given intent. To reproduce the results, run:

python train.py

the outputs will be saved by default to deberta_results. The logs can be visualized with Tensorboard:

tensorboard --log_dir deberta_results

Integrating WebArena with DeBERTa

First, make sure to clone WebArena Repo and set up as the instruction goes. Then replace the run.py with the run.py in the deberta_api directory.

# Installing relevant dependencies of fastAPI and host the deberta API on a machine with GPU
./start_api.sh

# Try running python deberta_inference.py to test if the api works.
python deberta_inference.py

# Run tests on WebArena + DeBERTa with a sampled list of tasks on the following script
./parallel_run.sh

Inferencing ColBERT for Mind2Web

# Clone ColBERT repo on a device with GPU
git clone https://github.com/stanford-futuredata/ColBERT.git

# run the mind2web_recalls.py inside the ColBERT repo
python mind2web_recalls.py

Reproducing WebArena Baseline Results

Follow the official WebArena instructions in setting up the environment.

Then, to reproduce the GPT-3.5 baseline, run:

./gpt3.5_test.sh

Examine and Visualize GPT 3.5 Baseline Result

python logfile_to_csv.py

View trajectories on Zeno ML platform

https://hub.zenoml.com/project/72c536c2-f0ae-4b6f-a208-33c5c1093b7e/WebArena%20Tester/explore?params=eyJtb2RlbCI6ImdwdDMuNSIsIm1ldHJpYyI6eyJpZCI6OTAxMCwibmFtZSI6InN1Y2Nlc3MiLCJ0eXBlIjoibWVhbiIsImNvbHVtbnMiOlsic3VjY2VzcyJdfSwiY29tcGFyaXNvbkNvbHVtbiI6eyJpZCI6ImI3YzdmYzMzLTYyYWUtNDYzYi05OTdmLWM5NDgzNzRlMjc5OCIsIm5hbWUiOiIjIG9mIGNsaWNrcyIsImNvbHVtblR5cGUiOiJGRUFUVVJFIiwiZGF0YVR5cGUiOiJDT05USU5VT1VTIiwibW9kZWwiOiJncHQzLjUifSwiY29tcGFyZVNvcnQiOltudWxsLHRydWVdLCJtZXRyaWNSYW5nZSI6WzAsMV0sInNlbGVjdGlvbnMiOnsibWV0YWRhdGEiOnt9LCJzbGljZXMiOltdLCJ0YWdzIjpbXX19

Launch LLaMA-2-70B Server

The default arguments are to host LLaMA-2-70B at half precision (fp16) on a server with 4 GPUs. To do so, run:

bash launch_llama70b_server.sh

After the server is running, we can make POST requests to it, as per the instructions in lti-llm-deployment.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
colbert_inference		colbert_inference
data		data
deberta_api		deberta_api
mind2web_analysis		mind2web_analysis
notebooks		notebooks
webarena_analysis		webarena_analysis
webarena_results		webarena_results
.DS_Store		.DS_Store
README.md		README.md
data.py		data.py
parallel_run.sh		parallel_run.sh
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

11711-webarena

Training Accessibility Tree Element Classification Models

Integrating WebArena with DeBERTa

Inferencing ColBERT for Mind2Web

Reproducing WebArena Baseline Results

Examine and Visualize GPT 3.5 Baseline Result

View trajectories on Zeno ML platform

Launch LLaMA-2-70B Server

About

Releases

Packages

Contributors 3

Languages

ZhitongGuo/11711-webarena

Folders and files

Latest commit

History

Repository files navigation

11711-webarena

Training Accessibility Tree Element Classification Models

Integrating WebArena with DeBERTa

Inferencing ColBERT for Mind2Web

Reproducing WebArena Baseline Results

Examine and Visualize GPT 3.5 Baseline Result

View trajectories on Zeno ML platform

Launch LLaMA-2-70B Server

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages