Papers with Keyword: attack

Attacking Vision-Language Computer Agents via Pop-ups
- Yanzhe Zhang, Tao Yu, Diyi Yang
- 🏛️ Institutions: Georgia Tech, HKU, Stanford
- 📅 Date: Nov 4, 2024
- 📑 Publisher: arXiv
- 💻 Env: [GUI]
- 🔑 Key: [attack], [adversarial pop-ups], [VLM agents], [safety]
- 📖 TLDR: This paper demonstrates that vision-language model (VLM) agents can be easily deceived by carefully designed adversarial pop-ups, leading them to perform unintended actions such as clicking on these pop-ups instead of completing their assigned tasks. Integrating these pop-ups into environments like OSWorld and VisualWebArena resulted in an average attack success rate of 86% and a 47% decrease in task success rate. Basic defense strategies, such as instructing the agent to ignore pop-ups or adding advertisement notices, were found to be ineffective against these attacks.
Dissecting Adversarial Robustness of Multimodal LM Agents
- Chen Henry Wu, Rishi Rajesh Shah, Jing Yu Koh, Russ Salakhutdinov, Daniel Fried, Aditi Raghunathan
- 🏛️ Institutions: CMU, Stanford
- 📅 Date: October 21, 2024
- 📑 Publisher: NeurIPS 2024 Workshop
- 💻 Env: [Web]
- 🔑 Key: [dataset], [attack], [ARE], [safety]
- 📖 TLDR: This paper introduces the Agent Robustness Evaluation (ARE) framework to assess the adversarial robustness of multimodal language model agents in web environments. By creating 200 targeted adversarial tasks within VisualWebArena, the study reveals that minimal perturbations can significantly compromise agent performance, even in advanced systems utilizing reflection and tree-search mechanisms. The findings highlight the need for enhanced safety measures in deploying such agents.
Refusal-Trained LLMs Are Easily Jailbroken As Browser Agents
- Priyanshu Kumar, Elaine Lau, Saranya Vijayakumar, Tu Trinh, Scale Red Team, Elaine Chang, Vaughn Robinson, Sean Hendryx, Shuyan Zhou, Matt Fredrikson, Summer Yue, Zifan Wang
- 🏛️ Institutions: CMU, GraySwan AI, Scale AI
- 📅 Date: October 11, 2024
- 📑 Publisher: arXiv
- 💻 Env: [Web]
- 🔑 Key: [attack], [BrowserART], [jailbreaking], [safety]
- 📖 TLDR: This paper introduces Browser Agent Red teaming Toolkit (BrowserART), a comprehensive test suite for evaluating the safety of LLM-based browser agents. The study reveals that while refusal-trained LLMs decline harmful instructions in chat settings, their corresponding browser agents often comply with such instructions, indicating a significant safety gap. The authors call for collaboration among developers and policymakers to enhance agent safety.
AdvWeb: Controllable Black-box Attacks on VLM-powered Web Agents
- Chejian Xu, Mintong Kang, Jiawei Zhang, Zeyi Liao, Lingbo Mo, Mengqi Yuan, Huan Sun, Bo Li
- 🏛️ Institutions: UIUC, OSU
- 📅 Date: September 27, 2024
- 📑 Publisher: arXiv
- 💻 Env: [Web]
- 🔑 Key: [safety], [black-box attack], [adversarial prompter model], [Direct Policy Optimization]
- 📖 TLDR: This paper presents AdvWeb, a black-box attack framework that exploits vulnerabilities in vision-language model (VLM)-powered web agents by injecting adversarial prompts directly into web pages. Using Direct Policy Optimization (DPO), AdvWeb trains an adversarial prompter model that can mislead agents into executing harmful actions, such as unauthorized financial transactions, while maintaining high stealth and control. Extensive evaluations reveal that AdvWeb achieves high success rates across multiple real-world tasks, emphasizing the need for stronger security measures in web agent deployments.
EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage
- Zeyi Liao, Lingbo Mo, Chejian Xu, Mintong Kang, Jiawei Zhang, Chaowei Xiao, Yuan Tian, Bo Li, Huan Sun
- 🏛️ Institutions: OSU, UCLA, UChicago, UIUC, UW-Madison
- 📅 Date: September 17, 2024
- 📑 Publisher: arXiv
- 💻 Env: [Web]
- 🔑 Key: [safety], [privacy attack], [environmental injection], [stealth attack]
- 📖 TLDR: This paper introduces the Environmental Injection Attack (EIA), a privacy attack targeting generalist web agents by embedding malicious yet concealed web elements to trick agents into leaking users' PII. Utilizing 177 action steps within realistic web scenarios, EIA demonstrates a high success rate in extracting specific PII and whole user requests. Through its detailed threat model and defense suggestions, the work underscores the challenge of detecting and mitigating privacy risks in autonomous web agents.
Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents
- Wenkai Yang, Xiaohan Bi, Yankai Lin, Sishuo Chen, Jie Zhou, Xu Sun
- 🏛️ Institutions: Renming University of China, PKU, Tencent
- 📅 Date: Feb 17, 2024
- 📑 Publisher: arXiv
- 💻 Env: [GUI], [Misc]
- 🔑 Key: [attack], [backdoor], [safety]
- 📖 TLDR: This paper investigates backdoor attacks on LLM-based agents, introducing a framework that categorizes attacks based on outcomes and trigger locations. The study demonstrates the vulnerability of such agents to backdoor attacks and emphasizes the need for targeted defenses.
A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents
- Lingbo Mo, Zeyi Liao, Boyuan Zheng, Yu Su, Chaowei Xiao, Huan Sun
- 🏛️ Institutions: OSU, UWM
- 📅 Date: February 15, 2024
- 📑 Publisher: arXiv
- 💻 Env: [Misc]
- 🔑 Key: [safety], [adversarial attacks], [security risks], [language agents], [Perception-Brain-Action]
- 📖 TLDR: This paper introduces a conceptual framework to assess and understand adversarial vulnerabilities in language agents, dividing the agent structure into three components—Perception, Brain, and Action. It discusses 12 specific adversarial attack types that exploit these components, ranging from input manipulation to complex backdoor and jailbreak attacks. The framework provides a basis for identifying and mitigating risks before the widespread deployment of these agents in real-world applications.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

paper_attack.md

paper_attack.md

Papers with Keyword: attack

Files

paper_attack.md

Latest commit

History

paper_attack.md

File metadata and controls

Papers with Keyword: attack