We collected human trajectories on 179 tasks and the recording files are here.
We sample one task from each template or templates that share similar task semantic. Each file is named as <task_id>.zip
, and the corresponding template id can be found in the task config file. The trajectories are presented as playwright trace files. You can view the concrete HTML, network traffic etc by playwright show-trace <example_idx>.zip
.
Human task success rate: 78.24%
The results on the release v2 can be found in this folder. It contains
- text-bison-001 + CoT + UA Hint
- GPT3.5-turbo-0613-16k + Direct + UA Hint
- GPT3.5-turbo-0613-16k + Direct
- GPT3.5-turbo-0613-16k + CoT + UA Hint
- GPT3.5-turbo-0613-16k + CoT
- GPT4-0613 + CoT
The results on the release v1 can be found in this folder. It contains
- GPT4-0613 + CoT
- GPT3.5-turbo-0613 + CoT
- GPT3.5-turbo-0613 + Direct
Once you unzip the file with unzip <file_name>.zip
, you will see a list of render_*.html
, a log file merge_log.txt
recording whether an example failed or passed and a trace
folder containing the playwright
recording of the executions.
Each file render the execution trace of the correponding example with (1) the accessibility tree observations, (2) the raw prediction from the agent and (3) the parsed action. We also provide the correponding screenshot of each observation.
To extract specific information from the html, you could use the following code snippet:
from bs4 import BeautifulSoup
with open("render_<id>.html", 'r') as f:
content = f.read()
soup = BeautifulSoup(content, 'html.parser')
# get the observations
observations = soup.find_all("div", {"class": "state_obv"})
# urls
urls = soup.find_all("h3", {"class": "url"})
# get the raw predictions (e.g, let's think step-by-step ....)
raw_predictions = soup.find_all("div", {"class": "raw_parsed_prediction"})
# get the action object
actions = soup.find_all("div", {"class": "action_object"})
The zip files are generated automatically with playwright. You can view the concrete HTML, network traffic etc by playwright show-trace <example_idx>.zip
. You will see something like this: