Skip to content

Commit

Permalink
Merge pull request #5 from YuanDaoze/main
Browse files Browse the repository at this point in the history
Update the paper lists
  • Loading branch information
boyugou authored Dec 20, 2024
2 parents 7edb7d4 + 37b8ea1 commit d61aaba
Showing 1 changed file with 10 additions and 0 deletions.
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,16 @@ This repo covers a variety of papers related to GUI Agents, such as:
<details open>
<summary>Papers</summary>

- [OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use](https://github.com/OS-Agent-Survey/OS-Agent-Survey/blob/main/paper.pdf)
- Xueyu Hu, Tao Xiong, Biao Yi, Zishu Wei, Ruixuan Xiao, Yurun Chen, Jiasheng Ye, Meiling Tao, Xiangxin Zhou, Ziyu Zhao, Yuhuai Li, Shengze Xu, Shawn Wang, Xinchen Xu, Shuofei Qiao , Kun Kuang, Tieyong Zeng, Liang Wang, Jiwei Li, Yuchen Eleanor Jiang, Wangchunshu Zhou, Guoyin Wang, Keting Yin, Zhou Zhao, Hongxia Yang, Fan Wu, Shengyu Zhang, Fei Wu
- 🏛️ Institutions: Zhejiang University, Fudan University, OPPO AI Center, University of Chinese Academy of Sciences, Institute of Automation, Chinese Academy of Sciences, The Chinese University of Hong Kong, Tsinghua University, 01.AI, The Hong Kong Polytechnic University, Shanghai Jiao Tong University
- 📅 Date: December 20, 2024
- 📑 Publisher: https://os-agent-survey.github.io/
- 💻 Env: [GUI]
- 🔑 Key: [survey]
- 📖 TLDR: This paper conducts a comprehensive survey on OS Agents, which are (M)LLM-based agents that operate on computing devices (e.g., computers and mobile phones) by interacting with the environments and interfaces (e.g., Graphical User Interfaces (GUI)) provided by operating systems (OS) to automate tasks and processes. The survey begins by elucidating the fundamentals of OS Agents, exploring their key components including the environment, observation space, and action space, and outlining essential capabilities such as understanding, planning, and grounding. Methodologies for constructing OS Agents are examined, with a focus on domain-specific foundation models and agent frameworks. A detailed review of evaluation protocols and benchmarks highlights how OS Agents are assessed across diverse tasks. Finally, current challenges are discussed, and promising directions for future research are identified, including safety and privacy, personalization and self-evolution.


- [GUI Agents: A Survey](https://arxiv.org/abs/2412.13501)
- Dang Nguyen, Jian Chen, Yu Wang, Gang Wu, Namyong Park, Zhengmian Hu, Hanjia Lyu, Junda Wu, Ryan Aponte, Yu Xia, Xintong Li, Jing Shi, Hongjie Chen, Viet Dac Lai, Zhouhang Xie, Sungchul Kim, Ruiyi Zhang, Tong Yu, Mehrab Tanjim, Nesreen K. Ahmed, Puneet Mathur, Seunghyun Yoon, Lina Yao, Branislav Kveton, Thien Huu Nguyen, Trung Bui, Tianyi Zhou, Ryan A. Rossi, Franck Dernoncourt
- 🏛️ Institutions: UMD, SUNY Buffalo, University of Oregon, Adobe Research, Meta AI, University of Rochester, UCSD, CMU, Dolby Labs, Intel AI Research, UNSW
Expand Down

0 comments on commit d61aaba

Please sign in to comment.