-
WebOlympus: An Open Platform for Web Agents on Live Websites
- Boyuan Zheng, Boyu Gou, Scott Salisbury, Zheng Du, Huan Sun, Yu Su
- 🏛️ Institutions: OSU
- 📅 Date: November 12, 2024
- 📑 Publisher: EMNLP 2024
- 💻 Env: [Web]
- 🔑 Key: [safety], [Chrome extension], [WebOlympus], [SeeAct], [Annotation Tool]
- 📖 TLDR: This paper introduces WebOlympus, an open platform designed to facilitate the research and deployment of web agents on live websites. It features a user-friendly Chrome extension interface, allowing users without programming expertise to operate web agents with minimal effort. The platform incorporates a safety monitor module to prevent harmful actions through human supervision or model-based control, supporting applications such as annotation interfaces for web agent trajectories and data crawling.
-
Attacking Vision-Language Computer Agents via Pop-ups
- Yanzhe Zhang, Tao Yu, Diyi Yang
- 🏛️ Institutions: Georgia Tech, HKU, Stanford
- 📅 Date: Nov 4, 2024
- 📑 Publisher: arXiv
- 💻 Env: [GUI]
- 🔑 Key: [attack], [adversarial pop-ups], [VLM agents], [safety]
- 📖 TLDR: This paper demonstrates that vision-language model (VLM) agents can be easily deceived by carefully designed adversarial pop-ups, leading them to perform unintended actions such as clicking on these pop-ups instead of completing their assigned tasks. Integrating these pop-ups into environments like OSWorld and VisualWebArena resulted in an average attack success rate of 86% and a 47% decrease in task success rate. Basic defense strategies, such as instructing the agent to ignore pop-ups or adding advertisement notices, were found to be ineffective against these attacks.
-
MobileSafetyBench: Evaluating Safety of Autonomous Agents in Mobile Device Control
- Juyong Lee, Dongyoon Hahm, June Suk Choi, W. Bradley Knox, Kimin Lee
- 🏛️ Institutions: KAIST, UT at Austin
- 📅 Date: October 23, 2024
- 📑 Publisher: arXiv
- 💻 Env: [Mobile]
- 🔑 Key: [benchmark], [safety], [evaluation], [Android emulator]
- 📖 TLDR: MobileSafetyBench introduces a benchmark for evaluating the safety of large language model (LLM)-based autonomous agents in mobile device control. Using Android emulators, the benchmark simulates real-world tasks in apps such as messaging and banking to assess agents' safety and helpfulness. The safety-focused tasks test for privacy risk management and robustness against adversarial prompt injections. Experiments show agents perform well in helpful tasks but struggle with safety-related challenges, underscoring the need for continued advancements in mobile safety mechanisms for autonomous agents.
-
Dissecting Adversarial Robustness of Multimodal LM Agents
- Chen Henry Wu, Rishi Rajesh Shah, Jing Yu Koh, Russ Salakhutdinov, Daniel Fried, Aditi Raghunathan
- 🏛️ Institutions: CMU, Stanford
- 📅 Date: October 21, 2024
- 📑 Publisher: NeurIPS 2024 Workshop
- 💻 Env: [Web]
- 🔑 Key: [dataset], [attack], [ARE], [safety]
- 📖 TLDR: This paper introduces the Agent Robustness Evaluation (ARE) framework to assess the adversarial robustness of multimodal language model agents in web environments. By creating 200 targeted adversarial tasks within VisualWebArena, the study reveals that minimal perturbations can significantly compromise agent performance, even in advanced systems utilizing reflection and tree-search mechanisms. The findings highlight the need for enhanced safety measures in deploying such agents.
-
Refusal-Trained LLMs Are Easily Jailbroken As Browser Agents
- Priyanshu Kumar, Elaine Lau, Saranya Vijayakumar, Tu Trinh, Scale Red Team, Elaine Chang, Vaughn Robinson, Sean Hendryx, Shuyan Zhou, Matt Fredrikson, Summer Yue, Zifan Wang
- 🏛️ Institutions: CMU, GraySwan AI, Scale AI
- 📅 Date: October 11, 2024
- 📑 Publisher: arXiv
- 💻 Env: [Web]
- 🔑 Key: [attack], [BrowserART], [jailbreaking], [safety]
- 📖 TLDR: This paper introduces Browser Agent Red teaming Toolkit (BrowserART), a comprehensive test suite for evaluating the safety of LLM-based browser agents. The study reveals that while refusal-trained LLMs decline harmful instructions in chat settings, their corresponding browser agents often comply with such instructions, indicating a significant safety gap. The authors call for collaboration among developers and policymakers to enhance agent safety.
-
ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents
- Ido Levy, Ben Wiesel, Sami Marreed, Alon Oved, Avi Yaeli, Segev Shlomov
- 🏛️ Institutions: IBM Research
- 📅 Date: October 9, 2024
- 📑 Publisher: arXiv
- 💻 Env: [Web]
- 🔑 Key: [benchmark], [safety], [trustworthiness], [ST-WebAgentBench]
- 📖 TLDR: This paper introduces ST-WebAgentBench, a benchmark designed to evaluate the safety and trustworthiness of web agents in enterprise contexts. It defines safe and trustworthy agent behavior, outlines the structure of safety policies, and introduces the "Completion under Policies" metric to assess agent performance. The study reveals that current state-of-the-art agents struggle with policy adherence, highlighting the need for improved policy awareness and compliance in web agents.
-
AdvWeb: Controllable Black-box Attacks on VLM-powered Web Agents
- Chejian Xu, Mintong Kang, Jiawei Zhang, Zeyi Liao, Lingbo Mo, Mengqi Yuan, Huan Sun, Bo Li
- 🏛️ Institutions: UIUC, OSU
- 📅 Date: September 27, 2024
- 📑 Publisher: arXiv
- 💻 Env: [Web]
- 🔑 Key: [safety], [black-box attack], [adversarial prompter model], [Direct Policy Optimization]
- 📖 TLDR: This paper presents AdvWeb, a black-box attack framework that exploits vulnerabilities in vision-language model (VLM)-powered web agents by injecting adversarial prompts directly into web pages. Using Direct Policy Optimization (DPO), AdvWeb trains an adversarial prompter model that can mislead agents into executing harmful actions, such as unauthorized financial transactions, while maintaining high stealth and control. Extensive evaluations reveal that AdvWeb achieves high success rates across multiple real-world tasks, emphasizing the need for stronger security measures in web agent deployments.
-
EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage
- Zeyi Liao, Lingbo Mo, Chejian Xu, Mintong Kang, Jiawei Zhang, Chaowei Xiao, Yuan Tian, Bo Li, Huan Sun
- 🏛️ Institutions: OSU, UCLA, UChicago, UIUC, UW-Madison
- 📅 Date: September 17, 2024
- 📑 Publisher: arXiv
- 💻 Env: [Web]
- 🔑 Key: [safety], [privacy attack], [environmental injection], [stealth attack]
- 📖 TLDR: This paper introduces the Environmental Injection Attack (EIA), a privacy attack targeting generalist web agents by embedding malicious yet concealed web elements to trick agents into leaking users' PII. Utilizing 177 action steps within realistic web scenarios, EIA demonstrates a high success rate in extracting specific PII and whole user requests. Through its detailed threat model and defense suggestions, the work underscores the challenge of detecting and mitigating privacy risks in autonomous web agents.
-
Adversarial Attacks on Multimodal Agents
- Chen Henry Wu, Jing Yu Koh, Ruslan Salakhutdinov, Daniel Fried, Aditi Raghunathan
- 🏛️ Institutions: CMU
- 📅 Date: Jun 18, 2024
- 📑 Publisher: arXiv
- 💻 Env: [Web]
- 🔑 Key: [benchmark], [safety], [VisualWebArena-Adv]
- 📖 TLDR: This paper investigates the safety risks posed by multimodal agents built on vision-enabled language models (VLMs). The authors introduce two adversarial attack methods: a captioner attack targeting white-box captioners and a CLIP attack that transfers to proprietary VLMs. To evaluate these attacks, they curated VisualWebArena-Adv, a set of adversarial tasks based on VisualWebArena. The study demonstrates that within a limited perturbation norm, the captioner attack can achieve a 75% success rate in making a captioner-augmented GPT-4V agent execute adversarial goals. The paper also discusses the robustness of agents based on other VLMs and provides insights into factors contributing to attack success and potential defenses. oai_citation_attribution:0‡ArXiv
-
Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents
- Wenkai Yang, Xiaohan Bi, Yankai Lin, Sishuo Chen, Jie Zhou, Xu Sun
- 🏛️ Institutions: Renming University of China, PKU, Tencent
- 📅 Date: Feb 17, 2024
- 📑 Publisher: arXiv
- 💻 Env: [GUI], [Misc]
- 🔑 Key: [attack], [backdoor], [safety]
- 📖 TLDR: This paper investigates backdoor attacks on LLM-based agents, introducing a framework that categorizes attacks based on outcomes and trigger locations. The study demonstrates the vulnerability of such agents to backdoor attacks and emphasizes the need for targeted defenses.
-
A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents
- Lingbo Mo, Zeyi Liao, Boyuan Zheng, Yu Su, Chaowei Xiao, Huan Sun
- 🏛️ Institutions: OSU, UWM
- 📅 Date: February 15, 2024
- 📑 Publisher: arXiv
- 💻 Env: [Misc]
- 🔑 Key: [safety], [adversarial attacks], [security risks], [language agents], [Perception-Brain-Action]
- 📖 TLDR: This paper introduces a conceptual framework to assess and understand adversarial vulnerabilities in language agents, dividing the agent structure into three components—Perception, Brain, and Action. It discusses 12 specific adversarial attack types that exploit these components, ranging from input manipulation to complex backdoor and jailbreak attacks. The framework provides a basis for identifying and mitigating risks before the widespread deployment of these agents in real-world applications.