YiZeng623

Follow

🏔️

@ Menlo Park

Yi Zeng YiZeng623

🏔️

@ Menlo Park

Follow

RS intern @ Meta AI | Ph.D. @ Virginia Tech | M.S. @ UCSD | Previous Intern @ Sony AI

80 followers · 43 following

San Diego
11:00 (UTC -08:00)
https://www.yi-zeng.com/
@EasonZeng623

Achievements

Achievements

Highlights

Pro

Organizations

Pinned Loading

LLM-Tuning-Safety/LLMs-Finetuning-Safety LLM-Tuning-Safety/LLMs-Finetuning-Safety Public

We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20 via OpenAI’s APIs.

Python 246 29
CHATS-lab/persuasive_jailbreaker CHATS-lab/persuasive_jailbreaker Public

Persuasive Jailbreaker: we can persuade LLMs to jailbreak them!

HTML 261 20
reds-lab/Narcissus reds-lab/Narcissus Public

The official implementation of the CCS'23 paper, Narcissus clean-label backdoor attack -- only takes THREE images to poison a face recognition dataset in a clean-label way and achieves a 99.89% att…

Python 105 12
I-BAU I-BAU Public

Official Implementation of ICLR 2022 paper, ``Adversarial Unlearning of Backdoors via Implicit Hypergradient''

Jupyter Notebook 51 13
frequency-backdoor frequency-backdoor Public

ICCV 2021, We find most existing triggers of backdoor attacks in deep learning contain severe artifacts in the frequency domain. This Repo. explores how we can use these artifacts to develop strong…

Jupyter Notebook 42 6
reds-lab/Meta-Sift reds-lab/Meta-Sift Public

The official implementation of USENIX Security'23 paper "Meta-Sift" -- Ten minutes or less to find a 1000-size or larger clean subset on poisoned dataset.

Python 18 4