You Only Prompt Once: On the Capabilities of Prompt Learning on Large Language Models to Tackle Toxic Content

This is the official implementation of our paper You Only Prompt Once: On the Capabilities of Prompt Learning on Large Language Models to Tackle Toxic Content.

Environment Setup

conda env create --file environment.yaml &&
conda activate toxic_prompt

Datasets

Datasets we used in this paper are in parsed_dataset folder.

# Task 1
Dataset: ["HateXplain", "USElectionHate20", "HateCheck", "SBIC.v2", "measuring-hate-speech"]

# Task 2:
Dataset: ["TSD"]

# Task 3:
Dataset: ["Parallel", "Paradetox"]

Task 1: Toxicity Classification

# Train the prompt tuning model. 
python 1_toxicity_classification.py --plm_eval_mode --model t5 --model_name_or_path t5-small --dataset HateXplain

Task 2: Toxic Span Detection

Train the prompt and perform the generation task:

# Example: TSD dataset with t5-small model.
python 2_and_3_toxic_generation.py --plm_eval_mode --model t5 --model_name_or_path t5-small --dataset TSD

Evaluation:

python 2_calculate_span.py --file_path sfs_out/task23/TSD_t5-small_True.txt

For the baseline methods, we follow the implementation from toxic-span.

Task 3: Detoxification

Note that Task 2 and Task 3 share the same code as they are both generation tasks.

Train the prompt and perform the generation task:

# Example: Parallel dataset with t5-small model.
python 2_and_3_toxic_generation.py --plm_eval_mode --model t5 --model_name_or_path t5-small --dataset Parallel

Evaluation for toxicity change:

python 3_perspective_evaluation.py --file_path sfs_out/task23/Parallel_t5-small_True.txt --key YOUR_PERSPECTIVE_API_KEY

Note that for the baseline methods and evaluation regarding other metrics (e.g., BLEU, SIM, PPL), we follow the implementation of paradetox.

Model Weights

You can download the model weights and replace the saved_model/ folder from this Link.

Cite

Please cite our paper if you use this code in your own work:

@inproceedings{HZSZ224,
author = {Xinlei He and Savvas Zannettou and Yun Shen and Yang Zhang},
title = {{You Only Prompt Once: On the Capabilities of Prompt Learning on Large Language Models to Tackle Toxic Content}},
booktitle = {{IEEE Symposium on Security and Privacy (S\&P)}},
publisher = {IEEE},
year = {2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

You Only Prompt Once: On the Capabilities of Prompt Learning on Large Language Models to Tackle Toxic Content

Environment Setup

Datasets

Task 1: Toxicity Classification

Task 2: Toxic Span Detection

Task 3: Detoxification

Model Weights

Cite

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
experiment_scripts/soft_template		experiment_scripts/soft_template
parsed_dataset		parsed_dataset
1_toxicity_classification.py		1_toxicity_classification.py
2_and_3_toxic_generation.py		2_and_3_toxic_generation.py
2_calculate_span.py		2_calculate_span.py
3_perspective_evaluation.py		3_perspective_evaluation.py
README.md		README.md
environment.yaml		environment.yaml
perspective_api.py		perspective_api.py

xinleihe/toxic-prompt

Folders and files

Latest commit

History

Repository files navigation

You Only Prompt Once: On the Capabilities of Prompt Learning on Large Language Models to Tackle Toxic Content

Environment Setup

Datasets

Task 1: Toxicity Classification

Task 2: Toxic Span Detection

Task 3: Detoxification

Model Weights

Cite

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages