Skip to content
This repository has been archived by the owner on Jul 7, 2023. It is now read-only.

Commit

Permalink
Added links to files in doc and corrected a few typos (#282)
Browse files Browse the repository at this point in the history
* better documentation with links

* fixed line permalink
  • Loading branch information
ghego authored and rsepassi committed Sep 7, 2017
1 parent 24071ba commit 406db60
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 8 deletions.
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -214,8 +214,7 @@ on the task (e.g. fed through a final linear transform to produce logits for a
softmax over classes). All models are imported in
[`models.py`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/models/models.py),
inherit from `T2TModel` - defined in
[`t2t_model.py`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/utils/t2t_model.py)
- and are registered with
[`t2t_model.py`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/utils/t2t_model.py) - and are registered with
[`@registry.register_model`](https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/utils/registry.py).

### Hyperparameter Sets
Expand Down
12 changes: 6 additions & 6 deletions docs/new_problem.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,9 @@ Let's add a new dataset together and train the transformer model. We'll be learn

For each problem we want to tackle we create a new problem class and register it. Let's call our problem `Word2def`.

Since many text2text problems share similar methods, there's already a class called `Text2TextProblem` that extends the base problem class, `Problem` (both found in `problem.py`).
Since many text2text problems share similar methods, there's already a class called [`Text2TextProblem`](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/problem.py#L354) that extends the base problem class, `Problem` (both found in [`problem.py`](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/problem.py)).

For our problem, we can go ahead and create the file `word2def.py` in the `data_generators` folder and add our new problem, `Word2def`, which extends `Text2TextProblem`. Let's also register it while we're at it so we can specify the problem through flags.
For our problem, we can go ahead and create the file `word2def.py` in the [`data_generators`](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/) folder and add our new problem, `Word2def`, which extends [`Text2TextProblem`](https://github.com/tensorflow/tensor2tensor/blob/24071ba07d5a14c170044c5e60a24bda8179fb7a/tensor2tensor/data_generators/problem.py#L354). Let's also register it while we're at it so we can specify the problem through flags.

```python
@registry.register_problem
Expand All @@ -28,7 +28,7 @@ class Word2def(problem.Text2TextProblem):
...
```

We need to implement the following methods from `Text2TextProblem` in our new class:
We need to implement the following methods from [`Text2TextProblem`](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/problem.py#L354) in our new class:
* is_character_level
* targeted_vocab_size
* generator
Expand All @@ -42,7 +42,7 @@ Let's tackle them one by one:

**input_space_id, target_space_id, is_character_level, targeted_vocab_size, use_subword_tokenizer**:

SpaceIDs tell Tensor2Tensor what sort of space the input and target tensors are in. These are things like, EN_CHR (English character), EN_TOK (English token), AUDIO_WAV (audio waveform), IMAGE, DNA (genetic bases). The complete list can be found at `data_generators/problem.py` in the class `SpaceID`.
SpaceIDs tell Tensor2Tensor what sort of space the input and target tensors are in. These are things like, EN_CHR (English character), EN_TOK (English token), AUDIO_WAV (audio waveform), IMAGE, DNA (genetic bases). The complete list can be found at [`data_generators/problem.py`](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/problem.py) in the class `SpaceID`.

Since we're generating definitions and feeding in words at the character level, we set `is_character_level` to true, and use the same SpaceID, EN_CHR, for both input and target. Additionally, since we aren't using tokens, we don't need to give a `targeted_vocab_size` or define `use_subword_tokenizer`.

Expand Down Expand Up @@ -86,7 +86,7 @@ class Word2def(problem.Text2TextProblem):

**generator**:

We're almost done. `generator` generates the training and evaluation data and stores them in files like "word2def_train.lang1" in your DATA_DIR. Thankfully several commonly used methods like `character_generator`, and `token_generator` are already written in the file `wmt.py`. We will import `character_generator` and write:
We're almost done. `generator` generates the training and evaluation data and stores them in files like "word2def_train.lang1" in your DATA_DIR. Thankfully several commonly used methods like `character_generator`, and `token_generator` are already written in the file [`wmt.py`](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/wmt.py). We will import `character_generator` and [`text_encoder`](https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/data_generators/text_encoder.py) to write:
```python
def generator(self, data_dir, tmp_dir, train):
character_vocab = text_encoder.ByteTextEncoder()
Expand Down Expand Up @@ -151,7 +151,7 @@ _WORD2DEF_TEST_DATASETS = [

## Putting it all together

Now our `word2def.py` file looks like: (with the correct imports)
Now our `word2def.py` file looks like:
```python
""" Problem definition for word to dictionary definition.
"""
Expand Down

0 comments on commit 406db60

Please sign in to comment.