Towards Lossless Encoding of Sentences
- Python 3.7
- Pytorch 1.x
- (Optional, for dataset generation) h5py
python train.py --dataset_path=<path>
Dataset not included, but dataset_generator.py
can be used to generate a hdf5 dataset file from a text file of tokenized sentences, one per line.
Embedding size 2048:
model.load_state_dict(torch.load('rae2048.pt'))