Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading text tutorial: removed +1 from VOCAB_SIZE+1 #2261

Merged
merged 1 commit into from
Sep 11, 2023

Conversation

vaharoni
Copy link
Contributor

In the Beginners tutorials > Load and preprocess data > Text, under the "Download more datasets using TensorFlow Datasets (TFDS)" section, we do:

vectorize_layer = TextVectorization(
    max_tokens=VOCAB_SIZE,
    output_mode='int',
    output_sequence_length=MAX_SEQUENCE_LENGTH)
    
# ...

model = create_model(vocab_size=VOCAB_SIZE + 1, num_labels=1)

Since the vocabulary of the TextVectorization layer already includes both the padding token and OOV token (i.e. the maximum index possible is 9999 when VOCAB_SIZE is 10000), I believe the +1 should be removed from VOCAB_SIZE + 1. The Embedding layer created by create_model should not see a token with a value greater than 9999.

@vaharoni vaharoni requested a review from a team as a code owner August 27, 2023 11:47
@github-actions
Copy link

Preview

Preview and run these notebook edits with Google Colab: Rendered notebook diffs available on ReviewNB.com.

Format and style

Use the TensorFlow docs notebook tools to format for consistent source diffs and lint for style:
$ python3 -m pip install -U --user git+https://github.com/tensorflow/docs

$ python3 -m tensorflow_docs.tools.nbfmt notebook.ipynb
$ python3 -m tensorflow_docs.tools.nblint --arg=repo:tensorflow/docs notebook.ipynb
If commits are added to the pull request, synchronize your local branch: git pull origin tutorial_text_fix_3

@8bitmp3 8bitmp3 added the review in progress Someone is actively reviewing this PR label Aug 29, 2023
@MarkDaoust MarkDaoust added ready to pull Start merge process and removed review in progress Someone is actively reviewing this PR labels Sep 8, 2023
@copybara-service copybara-service bot merged commit 5a7a816 into tensorflow:master Sep 11, 2023
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready to pull Start merge process
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants