Skip to content

ReMoDetect: Reward Models Recognize Aligned LLM's Generations (NeurIPS 2024)

Notifications You must be signed in to change notification settings

hyunseoklee-ai/ReMoDetect

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ReMoDetect

Official PyTorch implementation of "ReMoDetect: Reward Models Recognize Aligned LLM's Generations".

Prepare Data

  • Download the cleaned full-text HC3 dataset in English from YuchuanTian/AIGC_text_detector.git.
  • Download the unfilter_full/en_train_cleaned.csv file from the repository linked above and extract it into the ./data directory.

Train and Evaluation Instructions

  1. Configure Environment Variables

    Set your Groq API key in the data_process.sh script.

    export GROQ_API_KEY="your_groq_api_key_here"
  2. Run Data Processing Script

    Execute the data_process.sh script to process the dataset.

    bash data_process.sh
  3. Train

    Execute the train.sh script to process the dataset.

    bash train.sh

    Or, you can get trained weight from huggingface

  4. Generate Evaluation Data (Optional)

    If you want to generate additional evaluation data, place your Azure, Anthropic, and Groq API keys in the gen_eval_data.sh script.

    export AZURE_API_KEY="your_azure_api_key_here"
    export ANTHROPIC_API_KEY="your_anthropic_api_key_here"
    export GROQ_API_KEY="your_groq_api_key_here"

    Then, run the script:

    bash gen_eval_data.sh
  5. Evaluate

    Finally, run the eval.sh script to evaluate the model.

    bash eval.sh

    The benchmark based on the Fast-DetectGPT project and some codes include their licences.

Citation

@inproceedings{lee2024remodetect,
  title={ReMoDetect: Reward Models Recognize Aligned LLM's Generations},
  author={Lee, Hyunseok and Tack, Jihoon and Shin, Jinwoo},
  booktitle={Advances in Neural Information Processing Systems},
  year={2024}
}

About

ReMoDetect: Reward Models Recognize Aligned LLM's Generations (NeurIPS 2024)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published