About

This is a fork of the original Style Transfer Through Back-Translation repository. It's sole purpose is to keep track of the changes I have to make in order to use the authors' original code.

Installation

Get the source and data by following the instruction in the original repo (or in this one)

Running

There are a couple of gotchas:

the original code uses an old version of PyTorch - 0.3
it's setup for non-pre-trained models, e.g. you have to start from scratch
it presumes working with CUDA device

I was able to workaround those by:

setting up a python virtual environment

$ python3 -m virtualenv backtranslate_venv #create a virtualenv directory, called backtranslate_venv in the root of the project
$ source backtranslate_venv/bin/activate # activate the virutalenv
$ pip install torch==0.3.1 # install Torch dependencies in the virtualenv
$ pip install torchvision==0.1.6

modified the Translate.py scripts to work with pre-trained models

Additional:

gitignore is added to exclude common python and pycharm files

Reproduce

Several actions need to be done in order to reproduce the authors' results:

translate the train, dev and test data from en to fr
- pay attention to the arguments of the various translate scripts - different arguments are required based on whether a single model or a pair of encode/decoder is used
either train a decoder (e.g. style generator) or download a pre-trained from the authors' repo

use the translate.py script to transfer from republican to democratic and vice versa. E.g.

python translate.py -encoder_model ../models/translation/french_english/french_english.pt -decoder_model ../models/style_generators/democratic_generator.pt -src ../data/political_data/republican_only.test.fr -output trained_models/republican_democratic.txt -replace_unk $true -gpu 0
python translate.py -encoder_model ../models/translation/french_english/french_english.pt -decoder_model ../models/style_generators/republican_generator.pt -src ../data/political_data/democratic_only.test.fr -output trained_models/democratic_republican.txt -replace_unk $true -gpu 0

use the classifier to evaluate the style transfer:

python cnn_translate.py -gpu 0 -model ../models/classifier/political_classifier/political_classifier.pt -src ../style_decoder/trained_models/republican_democratic.txt -tgt 'democratic' -label0 republican -label1 democratic
python cnn_translate.py -gpu 0 -model ../models/classifier/political_classifier/political_classifier.pt -src ../style_decoder/trained_models/democratic_republican.txt -tgt 'republican' -label0 republican -label1 democratic

Below this line is the original README content.

Style Transfer Through Back-Translation

This repo contains the code and data of the following paper:

Style Transfer Through Back-Translation. Shrimai Prabhumoye, Yulia Tsvetkov, Ruslan Salakhutdinov, Alan W Black. ACL 2018. arXiv

Dependencies

Python 3.6
Pytorch 0.3

Trained Machine Translation Models

Dowload the english--french and french--english models from the following link:

http://tts.speech.cs.cmu.edu/style_models/english_french.tar
http://tts.speech.cs.cmu.edu/style_models/french_english.tar

Place these models in the models/translation folder.

Trained Classifier Models

Dowload the trained gender, political slant and sentiment classifiers from the following link:

http://tts.speech.cs.cmu.edu/style_models/gender_classifier.tar
http://tts.speech.cs.cmu.edu/style_models/political_classifier.tar
http://tts.speech.cs.cmu.edu/style_models/sentiment_classifier.tar

Place these models in the models/classifierfolder. The three classifiers are trained for the following labels:

Task	Label = 0	Label = 1
Gender	Male	Female
Political	Republican	Democratic
Sentiment	Negative	Positive

Trained Style Models

Download the trained style models from the following links:

http://tts.speech.cs.cmu.edu/style_models/female_generator.tar
http://tts.speech.cs.cmu.edu/style_models/male_generator.tar
http://tts.speech.cs.cmu.edu/style_models/democratic_generator.tar
http://tts.speech.cs.cmu.edu/style_models/republican_generator.tar
http://tts.speech.cs.cmu.edu/style_models/positive_generator.tar
http://tts.speech.cs.cmu.edu/style_models/negative_generator.tar

Place these models in the models/style_generators folder.

Quick Start

Refer to example.sh file to see the commands.

First cd style_decoder and then preprocess your raw data using the following command:

python preprocess.py -train_src TRAIN_SOURCE_FILE -train_tgt TRAIN_TARGET_FILE -valid_src VALID_SOURCE_FILE -valid_tgt VALID_TARGET_FILE -save_data DATA_NAME

Then train your model using the following command:

python train_decoder.py -data DATA_NAME.train.pt -save_model MODEL_DIR/MODEL_NAME -classifier_model CLASSIFIER.pt -encoder_model ENCODER_MODEL -tgt_label {0/1}

Data

Dowload the data required for the political slant transfer experiment from the following link and place it in data/ folder.

http://tts.speech.cs.cmu.edu/style_models/political_data.tar
tar -xvf political_data.tar

The train, dev, test and classtrain splits are given as is. If you are using this data then please cite the following papers:

@inproceedings{style_transfer_acl18,
  title={Style Transfer Through Back-Translation},
  author={Prabhumoye, Shrimai and Tsvetkov, Yulia and Salakhutdinov, Ruslan and Black, Alan W},
  year={2018},
  booktitle={Proc. ACL}
  }

@inproceedings{rtgender,
  title={{RtGender}: A Corpus for Studying Differential Responses to Gender},
  author={Voigt, Rob and Jurgens, David and Prabhakaran, Vinodkumar and Jurafsky, Dan and Tsvetkov, Yulia},
  year={2018},
  booktitle={Proc. LREC},
  }

Dowload the data required for the gender transfer experiment from the following link and place it in data/ folder.

http://tts.speech.cs.cmu.edu/style_models/gender_data.tar
tar -xvf gender_data.tar

The train, dev, test and classtrain splits are given as is. If you are using this data then please cite the following papers:

@inproceedings{style_transfer_acl18,
  title={Style Transfer Through Back-Translation},
  author={Prabhumoye, Shrimai and Tsvetkov, Yulia and Salakhutdinov, Ruslan and Black, Alan W},
  year={2018},
  booktitle={Proc. ACL}
  }
  
@inproceedings{reddy2016obfuscating,
  title={Obfuscating gender in social media writing},
  author={Reddy, Sravana and Knight, Kevin},
  year={2016},
  booktitle={Proc. of Workshop on Natural Language Processing and Computational Social Science}
  pages={17--26},
  }

You can find the data used in the sentiment modification experiment described in the paper at this link. The train, dev, test and classtrain splits are given as is.

Acknowledgements

The code used to train the NMT systems is from the OpenNMT toolkit. This code base is based on the code of the toolkit.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
classifier		classifier
style_decoder		style_decoder
.gitignore		.gitignore
README.md		README.md
results.txt		results.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Installation

Running

Reproduce

Style Transfer Through Back-Translation

Dependencies

Trained Machine Translation Models

Trained Classifier Models

Trained Style Models

Quick Start

Data

Acknowledgements

About

Releases

Packages

Languages

freespirit/Style-Transfer-Through-Back-Translation

Folders and files

Latest commit

History

Repository files navigation

About

Installation

Running

Reproduce

Style Transfer Through Back-Translation

Dependencies

Trained Machine Translation Models

Trained Classifier Models

Trained Style Models

Quick Start

Data

Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages