Skip to content

Commit

Permalink
init
Browse files Browse the repository at this point in the history
  • Loading branch information
boyugou committed Oct 30, 2024
1 parent b6c63e1 commit 4ea6dae
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion add_paper_here.md
Original file line number Diff line number Diff line change
Expand Up @@ -310,7 +310,7 @@
- 📅 Date: January 1, 2024
- 📑 Publisher: ICML 2024
- 💻 Env: [Web]
- 🔑 Key: [framework], [dataset], [benchmark], [generalist web agent], [grounding], [seeact]
- 🔑 Key: [framework], [dataset], [benchmark], [grounding], [seeact], [multimodal-mind2web]
- 📖 TLDR: This paper explores the capability of GPT-4V(ision), a multimodal model, as a web agent that can perform tasks across various websites by following natural language instructions. It introduces the **SEEACT** framework, enabling GPT-4V to navigate, interpret, and interact with elements on websites. Evaluated using the **Mind2Web** benchmark and an online test environment, the framework demonstrates high performance on complex web tasks by integrating grounding strategies like element attributes and image annotations to improve HTML element targeting. However, grounding remains challenging, presenting opportunities for further improvement.

- [SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents](https://arxiv.org/abs/2401.10935)
Expand Down

0 comments on commit 4ea6dae

Please sign in to comment.