Loading text tutorial: removed +1 from VOCAB_SIZE+1 #2261

vaharoni · 2023-08-27T11:47:59Z

In the Beginners tutorials > Load and preprocess data > Text, under the "Download more datasets using TensorFlow Datasets (TFDS)" section, we do:

vectorize_layer = TextVectorization(
    max_tokens=VOCAB_SIZE,
    output_mode='int',
    output_sequence_length=MAX_SEQUENCE_LENGTH)
    
# ...

model = create_model(vocab_size=VOCAB_SIZE + 1, num_labels=1)

Since the vocabulary of the TextVectorization layer already includes both the padding token and OOV token (i.e. the maximum index possible is 9999 when VOCAB_SIZE is 10000), I believe the +1 should be removed from VOCAB_SIZE + 1. The Embedding layer created by create_model should not see a token with a value greater than 9999.

github-actions · 2023-08-27T11:48:23Z

Preview

Preview and run these notebook edits with Google Colab:

site/en/tutorials/load_data/text.ipynb

Rendered notebook diffs available on ReviewNB.com.

Format and style

Use the TensorFlow docs notebook tools to format for consistent source diffs and lint for style:

$ python3 -m pip install -U --user git+https://github.com/tensorflow/docs

$ python3 -m tensorflow_docs.tools.nbfmt notebook.ipynb

$ python3 -m tensorflow_docs.tools.nblint --arg=repo:tensorflow/docs notebook.ipynb

If commits are added to the pull request, synchronize your local branch: git pull origin tutorial_text_fix_3

Loading text tutorial: removed +1 from VOCAB_SIZE+1

58b17ef

vaharoni requested a review from a team as a code owner August 27, 2023 11:47

8bitmp3 assigned markmcd, MarkDaoust and 8bitmp3 Aug 29, 2023

8bitmp3 added the review in progress Someone is actively reviewing this PR label Aug 29, 2023

MarkDaoust approved these changes Sep 8, 2023

View reviewed changes

MarkDaoust added ready to pull Start merge process and removed review in progress Someone is actively reviewing this PR labels Sep 8, 2023

copybara-service bot merged commit 5a7a816 into tensorflow:master Sep 11, 2023
2 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading text tutorial: removed +1 from VOCAB_SIZE+1 #2261

Loading text tutorial: removed +1 from VOCAB_SIZE+1 #2261

vaharoni commented Aug 27, 2023

github-actions bot commented Aug 27, 2023

Loading text tutorial: removed +1 from VOCAB_SIZE+1 #2261

Loading text tutorial: removed +1 from VOCAB_SIZE+1 #2261

Conversation

vaharoni commented Aug 27, 2023

github-actions bot commented Aug 27, 2023

Preview

Format and style