Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[won't merge - v1 codebase] Bert #1543

Open
wants to merge 32 commits into
base: master
Choose a base branch
from
Open

Conversation

Zenglinxiao
Copy link
Contributor

Add BERT into OpenNMT-py.

Copy link
Member

@francoishernandez francoishernandez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick review of FAQ.md

docs/source/FAQ.md Outdated Show resolved Hide resolved
docs/source/FAQ.md Outdated Show resolved Hide resolved
docs/source/FAQ.md Outdated Show resolved Hide resolved
docs/source/FAQ.md Outdated Show resolved Hide resolved
docs/source/FAQ.md Outdated Show resolved Hide resolved
docs/source/FAQ.md Outdated Show resolved Hide resolved
docs/source/FAQ.md Outdated Show resolved Hide resolved
docs/source/FAQ.md Outdated Show resolved Hide resolved
docs/source/FAQ.md Outdated Show resolved Hide resolved
docs/source/FAQ.md Outdated Show resolved Hide resolved
bert_ckp_convert.py Outdated Show resolved Hide resolved
onmt/train_single.py Outdated Show resolved Hide resolved
onmt/train_single.py Outdated Show resolved Hide resolved
onmt/trainer.py Outdated Show resolved Hide resolved
onmt/train_single.py Outdated Show resolved Hide resolved
onmt/translate/predictor.py Outdated Show resolved Hide resolved
onmt/translate/predictor.py Outdated Show resolved Hide resolved
onmt/translate/predictor.py Outdated Show resolved Hide resolved
@vince62s
Copy link
Member

@Zenglinxiao don't you think it could be possible to embed "bert_build_model" within "build_model" ?

@rajarsheem
Copy link

Do you have some comparison between BERT and w/o BERT on NMT tasks?

@francoishernandez
Copy link
Member

@rajarsheem this is not the aim of BERT or this PR. You have some examples of what it's for in the FAQ.md.


class BertLayerNorm(nn.Module):
def __init__(self, hidden_size, eps=1e-12):
"""Layernorm module in the TF style(epsilon inside the square root).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does PyTorch implement it differently?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adapt the BertLayerNorm from huggingface/pytorch-transformers as it indicate "epsilon inside the square root", and i checked Pytorch doc, it use eps=1e-5, so i kept that. Thanks to your remind, I find it's the same as i go deep into pytorch cpp code. The origin implementation is due to the typo of Pytorch doc which is fix already (huggingface/transformers#1089).
I'll switch that!

Copy link
Contributor Author

@Zenglinxiao Zenglinxiao Sep 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check:
BertLayerNorm switched to pytorch original LayerNorm.
BertAdam switched into AdamW as well but add option correct_bias to not compensate for bias, as in original BERT.

@@ -191,40 +213,62 @@ def build_base_model(model_opt, fields, gpu, checkpoint=None, gpu_id=None):
pad_idx = tgt_base_field.vocab.stoi[tgt_base_field.pad_token]
generator = CopyGenerator(model_opt.dec_rnn_size, vocab_size, pad_idx)

if model_opt.is_bert:
model = encoder
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the encoder becomes the whole model? If there is no decoder, then how can I use the model for machine translation?

Copy link
Member

@francoishernandez francoishernandez Oct 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BERT stands for Bidirectional Encoder Representations from Transformers. It is not a machine translation model.
https://arxiv.org/abs/1810.04805

Copy link

@kagrze kagrze Oct 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this :) But this is a PR to a machine translation system. If the machine translation cannot benefit from this PR then why bother? If you want to use BERT for basic downstream tasks (e.g. classification), then you can use the Hugging Face's implementation or the reference implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenNMT-py is not only machine translation, there are other tasks such as Speech2Text, Image2Text, Vid2Text, etc.
This PR is meant to add BERT task(s) and components to the framework to allow users to experiment. Nothing stops you to try and build a custom model using BERT pre-trained encoder layers and standard Transformer decoder layers for instance, but I think it has been tried without great benefits.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, but AFAIK, all the tasks you mentioned (i.e. Speech2Text, Image2Text, Video2Text) are formalised as transduction problems and, therefore, require some sort of a decoder.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. All those things are not mutually exclusive.
Feel free to share any research on the forum and eventually PR if you have anything interesting regarding using BERT in a seq2seq task.

@Simons2017
Copy link

I would like to using BERT in a seq2seq task, but I'm confused about how to preprocess data. Would you like to give me some advice?

@vince62s vince62s changed the title Bert [won't merge - v1 codebase] Bert Jan 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants