This repo covers a variety of papers related to GUI Agents, such as:
- Datasets
- Benchmarks
- Models
- Agent frameworks
- Vision, language, multimodal foundation models (with explicit support for GUI)
- Works in general domains extensively used by GUI Agents (e.g., SoM prompting)
Web | Mobile | Desktop | GUI | Misc |
---|
(Misc: Papers for general topics that have important applications in GUI agents.)
{{insert_keyword_groups_here}}
{{insert_author_groups_here}}
Papers
{{insert_all_papers_here}}
Please fork and update:
🤖 You can use this GPTs to quickly search and get a formatted paper entry automatically by inputting a paper name. Or you can simply leave a comment in an issue.
Format example and explanation
- [title](paper link)
- List authors directly without a "key" identifier (e.g., author1, author2)
- 🏛️ Institutions: List the institutions concisely, using abbreviations (e.g., university names, like OSU).
- 📅 Date: e.g., Oct 30, 2024
- 📑 Publisher: ICLR 2025
- 💻 Env: Indicate the research environment within brackets, such as [Web], [Mobile], or [Desktop]. Use [GUI] if the research spans multiple environments. Use [Misc] if it is researching in general domains.
- 🔑 Key: Label each keyword within brackets, e.g., [model], [framework],[dataset],[benchmark].
- 📖 TLDR: Brief summary of the paper.
Regarding the 🔑 Key:
Key | Definition |
---|---|
model | Indicates a newly trained model. |
framework | If the paper proposes a new agent framework. |
dataset | If a new (training) dataset is created and published. |
benchmark | If a new benchmark is established (also add "dataset" if there's a new training set). |
primary studies | List the main focus or innovation in the study. |
Abbreviations | Include commonly used abbreviations associated with the paper (model names, framework names, etc.). |
For missing information, use "Unknown."