feature/attention block rewrite #9

jdblackstar · 2024-04-08T19:49:08Z

	🚀 This PR description was created by Ellipsis for commit `059c8a0`.

Summary:

This PR refactors the GPT model and attention block, introduces new classes, modifies existing methods, and improves code readability and efficiency in multiple files.

Key points:

Refactored GPT model and attention block in examples/gpt.py
Replaced MHA class with Head and MultiHeadAttention classes
Removed ModelArgs dataclass
Added get_vocab_size method to CharTokenizer class in punytorch/helpers.py
Modified forward method in GPT class to accept an optional targets parameter
Added documentation for generate method in GPT class
Refactored and simplified code in punytorch/ops.py and punytorch/tensor.py

Generated with ❤️ by ellipsis.dev

…ates multiple Heads

ellipsis-dev

❌ Changes requested.

Reviewed the entire pull request up to 9005bc2
Looked at 295 lines of code in 4 files
Took 3 minutes and 13 seconds to review

More info

Skipped 0 files when reviewing.
Skipped posting 2 additional comments because they didn't meet confidence threshold of 50%.

1. examples/gpt.py:179:

Assessed confidence : 100%
Grade: 0%
Comment:
The new Dropout class implemented in punytorch.nn.dropout should be used here instead of nn.Dropout.
Reasoning:
The PR author has implemented a new Dropout class in punytorch.nn.dropout. However, in the Head and MultiHeadAttention classes in examples/gpt.py, the dropout is still being implemented using nn.Dropout, which is a placeholder according to the comment. The new Dropout class should be used instead.

2. examples/gpt.py:234:

Assessed confidence : 100%
Grade: 0%
Comment:
The new Dropout class implemented in punytorch.nn.dropout should be used here instead of nn.Dropout.
Reasoning:
The PR author has implemented a new Dropout class in punytorch.nn.dropout. However, in the Head and MultiHeadAttention classes in examples/gpt.py, the dropout is still being implemented using nn.Dropout, which is a placeholder according to the comment. The new Dropout class should be used instead.

Workflow ID: wflow_NpolyLvOsAclhTLf

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. We'll respond in a few minutes. Learn more here.

examples/gpt.py

…nce that everywhere

ellipsis-dev

👍 Looks good to me!

Performed an incremental review on 29cca1f
Looked at 241 lines of code in 1 files
Took 2 minutes and 59 seconds to review

More info

Skipped 0 files when reviewing.
Skipped posting 4 additional comments because they didn't meet confidence threshold of 50%.

1. examples/gpt.py:195:

Assessed confidence : 80%
Grade: 0%
Comment:
The dropout attribute is initialized using nn.Dropout. However, the PR description mentions the introduction of a Dropout class in punytorch/nn/dropout.py. Should this new Dropout class be used here instead?
Reasoning:
The dropout attribute in the Head and MultiHeadAttention classes is initialized using nn.Dropout, but the PR description mentions the introduction of a Dropout class in punytorch/nn/dropout.py. It's unclear whether this new Dropout class should be used instead.

2. examples/gpt.py:250:

Assessed confidence : 80%
Grade: 20%
Comment:
The dropout attribute is initialized using nn.Dropout. Should the new Dropout class mentioned in the PR description be used here instead?
Reasoning:
The dropout attribute in the MultiHeadAttention class is also initialized using nn.Dropout. The same question applies here as in the Head class.

3. examples/gpt.py:365:

Assessed confidence : 100%
Grade: 0%
Comment:
The Block class constructor takes hparams as an argument, but the Block class is initialized with hparams in the GPT class. This seems to be a logical error, as the Block class should be initialized with hparams.
Reasoning:
The Block class constructor takes hparams as an argument, but the Block class is initialized with hparams in the GPT class. This seems to be a logical error, as the Block class should be initialized with hparams.

4. examples/gpt.py:399:

Assessed confidence : 100%
Grade: 0%
Comment:
The GPT class constructor takes hparams as an argument, but the GPT class is initialized with hyperparameters in the main function. This seems to be a logical error, as the GPT class should be initialized with hparams.
Reasoning:
The GPT class constructor takes hparams as an argument, but the GPT class is initialized with hyperparameters in the main function. This seems to be a logical error, as the GPT class should be initialized with hparams.

Workflow ID: wflow_8R0Yb4i6Uj4quakC

Not what you expected? You can customize the content of the reviews using rules. Learn more here.

ellipsis-dev

👍 Looks good to me!

Performed an incremental review on 08dc1ff
Looked at 22 lines of code in 1 files
Took 2 minutes and 11 seconds to review

More info

Skipped 0 files when reviewing.
Skipped posting 1 additional comments because they didn't meet confidence threshold of 50%.

1. examples/gpt.py:408:

Assessed confidence : 100%
Grade: 0%
Comment:
The PR description mentions the addition of a Dropout class and its corresponding test. However, these changes are not visible in the diff. Please ensure that these changes are included in the PR.
Reasoning:
The PR description mentions that a Dropout class has been added in punytorch/nn/dropout.py and a corresponding test in tests/test_dropout.py. However, the diff does not show any changes in these files. It's important to ensure that the Dropout class and its tests are included in the PR.

Workflow ID: wflow_VY3oXBeRvnjl1zQs

Not what you expected? You can customize the content of the reviews using rules. Learn more here.

ellipsis-dev

❌ Changes requested.

Performed an incremental review on bbdb4d9
Looked at 185 lines of code in 2 files
Took 2 minutes and 16 seconds to review

More info

Skipped 0 files when reviewing.
Skipped posting 2 additional comments because they didn't meet confidence threshold of 50%.

1. examples/gpt.py:450:

Assessed confidence : 100%
Grade: 0%
Comment:
Consider using a method from the punytorch library to sample from the softmax probabilities to keep all operations on the same device and ensure gradients are tracked correctly.
Reasoning:
The generate method in the GPT class in gpt.py uses a numpy function to sample from the softmax probabilities, but it should use a method from the punytorch library to keep all operations on the same device and ensure gradients are tracked correctly.

2. punytorch/helpers.py:64:

Assessed confidence : 100%
Grade: 0%
Comment:
The vocab_size attribute is not defined in the __init__ method. This could lead to a runtime error if the get_vocab_size method is called before the vocab_size attribute is set. Consider defining vocab_size in the __init__ method.
Reasoning:
The CharTokenizer class in helpers.py has a get_vocab_size method that returns the vocab_size attribute. However, this attribute is not defined in the __init__ method, which could lead to a runtime error if the method is called before the vocab_size attribute is set.

Workflow ID: wflow_hjVN4XJ8alINT3mS

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. We'll respond in a few minutes. Learn more here.

examples/gpt.py

ellipsis-dev

❌ Changes requested.

Performed an incremental review on a50ccd0
Looked at 13 lines of code in 1 files
Took 1 minute and 14 seconds to review

More info

Skipped 0 files when reviewing.
Skipped posting 0 additional comments because they didn't meet confidence threshold of 50%.

Workflow ID: wflow_zqwtTip5Hhp7nehj

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. We'll respond in a few minutes. Learn more here.

ellipsis-dev · 2024-04-09T00:18:13Z

examples/gpt.py

+        if isinstance(module, nn.Linear) and module.bias is not None:
+            module.bias.data.zero_()
+
+    def forward(self, x: Tensor, targets=None) -> Tensor:


The method signature of the forward method has been changed to include an optional parameter 'targets'. Please update the docstring to reflect this change.

@ellipsis-dev can you make this change to the docstring?

@jdblackstar, I have addressed your comments in pull request #10

…rewrite);

ellipsis-dev

👍 Looks good to me!

Performed an incremental review on 9a3db12
Looked at 36 lines of code in 1 files
Took 5 minutes and 53 seconds to review

More info

Skipped 0 files when reviewing.
Skipped posting 2 additional comments because they didn't meet confidence threshold of 50%.

1. examples/gpt.py:438:

Assessed confidence : 100%
Grade: 30%
Comment:
Consider adding a type check for targets to ensure it is a Tensor when it is not None.

if targets is None:
    loss = None
elif not isinstance(targets, Tensor):
    raise TypeError(f"Expected targets to be a Tensor, but got {type(targets).__name__}")
else:
    B, T, C = logits.shape
    logits = logits.view(B * T, C)
    targets = targets.view(B * T)
    loss = CrossEntropyLoss.forward(logits, targets)

Reasoning:
The PR description mentions that the forward method in the GPT class has been modified to accept an optional targets parameter. Looking at the diff, it seems that the forward method now checks if targets is None and if so, it sets loss to None. However, there is no check to ensure that targets is a Tensor when it is not None. This could potentially lead to errors downstream if targets is of an incorrect type. I will suggest adding a type check for targets.

2. examples/gpt.py:457:

Assessed confidence : 100%
Grade: 30%
Comment:
Consider adding a type check for idx to ensure it is a Tensor.

if not isinstance(idx, Tensor):
    raise TypeError(f"Expected idx to be a Tensor, but got {type(idx).__name__}")
for _ in range(max_new_tokens):
    # crop idx to the last block_size tokens
    idx_cond = idx[:, -hparams.block_size :]
    # get the predictions
    logits, loss = self(idx_cond)
    # focus only on the last time step
    logits = logits[:, -1, :]  # becomes (B, C)
    # apply softmax to get probabilities
    probs = Softmax.forward(logits, dim=-1)  # (B, C)
    # sample from the distribution - don't have access to torch.multinomial, so we're making our own
    idx_next = np.array([np.random.choice(len(probs[b]), 1, p=probs[b]) for b in range(len(probs))])  # (B, 1)
    idx_next = Tensor(idx_next).to(probs.device)  # Convert to Tensor and move to the same device as probs
    # append sampled index to the running sequence
    idx = Tensor.cat((idx, idx_next), dim=1)  # (B, T+1)
return idx

Reasoning:
The PR description mentions that the generate method in the GPT class has been modified. Looking at the diff, it seems that the generate method now includes a detailed docstring. However, there is no check to ensure that idx is a Tensor. This could potentially lead to errors downstream if idx is of an incorrect type. I will suggest adding a type check for idx.

Workflow ID: wflow_mUwyRFWGZ0riBmsT

Not what you expected? You can customize the content of the reviews using rules. Learn more here.

ellipsis-dev

👍 Looks good to me!

Performed an incremental review on 9b302d3
Looked at 363 lines of code in 2 files
Took 3 minutes and 36 seconds to review

More info

Skipped 0 files when reviewing.
Skipped posting 2 additional comments because they didn't meet confidence threshold of 50%.

1. punytorch/ops.py:16:

Assessed confidence : 0%
Comment:
The refactoring of the operations seems to be done correctly. The backward methods now directly use the stored inputs instead of getting them from the context. The forward methods now use the ensure_numpy function to convert the inputs to numpy arrays, which simplifies the code.
Reasoning:
The refactoring of the operations in ops.py and tensor.py seems to be done correctly. The code is cleaner and more efficient. The backward methods in the operations now directly use the stored inputs instead of getting them from the context. The binary operations in the Tensor class have been refactored to use a helper function, which reduces code duplication. The matmul operation has been correctly modified to handle vector inputs by reshaping them. The forward methods in the operations now use the ensure_numpy function to convert the inputs to numpy arrays, which simplifies the code.

2. punytorch/tensor.py:228:

Assessed confidence : 0%
Comment:
The binary operations have been refactored to use a helper function, which reduces code duplication. The matmul operation has been correctly modified to handle vector inputs by reshaping them.
Reasoning:
The binary operations in the Tensor class have been refactored to use a helper function, which reduces code duplication. The matmul operation has been correctly modified to handle vector inputs by reshaping them.

Workflow ID: wflow_t9Sko1KTYFVW6rrS

Not what you expected? You can customize the content of the reviews using rules. Learn more here.

ellipsis-dev

👍 Looks good to me!

Performed an incremental review on 059c8a0
Looked at 76 lines of code in 1 files
Took 5 minutes and 47 seconds to review

More info

Skipped 0 files when reviewing.
Skipped posting 8 additional comments because they didn't meet confidence threshold of 50%.

1. examples/gpt.py:214:

Assessed confidence : 50%
Comment:
Consider removing the debug log statements in the forward method of the Head class as they seem to be for debugging purposes only.
Reasoning:
The author has replaced the MHA class with Head and MultiHeadAttention classes. The Head class seems to be a single attention head, and the MultiHeadAttention class applies multiple attention heads in parallel and concatenates their outputs. The forward method in the Head class has a debug log statement that logs the shape and type of v and attention_probs. This could be removed in the final version of the code as it seems to be for debugging purposes only.

2. examples/gpt.py:215:

Assessed confidence : 50%
Comment:
Consider removing the debug log statements in the forward method of the Head class as they seem to be for debugging purposes only.
Reasoning:
The author has replaced the MHA class with Head and MultiHeadAttention classes. The Head class seems to be a single attention head, and the MultiHeadAttention class applies multiple attention heads in parallel and concatenates their outputs. The forward method in the Head class has a debug log statement that logs the shape and type of v and attention_probs. This could be removed in the final version of the code as it seems to be for debugging purposes only.

3. examples/gpt.py:442:

Assessed confidence : 70%
Grade: 0%
Comment:
Ensure that the reshaping of logits and targets into two-dimensional tensors is done correctly. Incorrect reshaping could potentially introduce a bug.
Reasoning:
The author has refactored the forward method in the GPT class to accept an optional targets parameter. If targets is not None, the method computes and returns the loss. If targets is None, the method only returns the logits. This seems to be a logical change and makes the method more flexible. However, the author has also changed the way the loss is computed. Previously, the loss was computed directly from logits and targets. Now, logits and targets are reshaped into two-dimensional tensors before computing the loss. This change could potentially introduce a bug if the reshaping is not done correctly.

4. examples/gpt.py:443:

Assessed confidence : 70%
Grade: 0%
Comment:
Ensure that the reshaping of logits and targets into two-dimensional tensors is done correctly. Incorrect reshaping could potentially introduce a bug.
Reasoning:
The author has refactored the forward method in the GPT class to accept an optional targets parameter. If targets is not None, the method computes and returns the loss. If targets is None, the method only returns the logits. This seems to be a logical change and makes the method more flexible. However, the author has also changed the way the loss is computed. Previously, the loss was computed directly from logits and targets. Now, logits and targets are reshaped into two-dimensional tensors before computing the loss. This change could potentially introduce a bug if the reshaping is not done correctly.

5. examples/gpt.py:444:

Assessed confidence : 70%
Grade: 0%
Comment:
Ensure that the reshaping of logits and targets into two-dimensional tensors is done correctly. Incorrect reshaping could potentially introduce a bug.
Reasoning:
The author has refactored the forward method in the GPT class to accept an optional targets parameter. If targets is not None, the method computes and returns the loss. If targets is None, the method only returns the logits. This seems to be a logical change and makes the method more flexible. However, the author has also changed the way the loss is computed. Previously, the loss was computed directly from logits and targets. Now, logits and targets are reshaped into two-dimensional tensors before computing the loss. This change could potentially introduce a bug if the reshaping is not done correctly.

6. examples/gpt.py:445:

Assessed confidence : 70%
Grade: 0%
Comment:
Ensure that the reshaping of logits and targets into two-dimensional tensors is done correctly. Incorrect reshaping could potentially introduce a bug.
Reasoning:
The author has refactored the forward method in the GPT class to accept an optional targets parameter. If targets is not None, the method computes and returns the loss. If targets is None, the method only returns the logits. This seems to be a logical change and makes the method more flexible. However, the author has also changed the way the loss is computed. Previously, the loss was computed directly from logits and targets. Now, logits and targets are reshaped into two-dimensional tensors before computing the loss. This change could potentially introduce a bug if the reshaping is not done correctly.

7. examples/gpt.py:446:

Assessed confidence : 70%
Grade: 0%
Comment:
Ensure that the reshaping of logits and targets into two-dimensional tensors is done correctly. Incorrect reshaping could potentially introduce a bug.
Reasoning:
The author has refactored the forward method in the GPT class to accept an optional targets parameter. If targets is not None, the method computes and returns the loss. If targets is None, the method only returns the logits. This seems to be a logical change and makes the method more flexible. However, the author has also changed the way the loss is computed. Previously, the loss was computed directly from logits and targets. Now, logits and targets are reshaped into two-dimensional tensors before computing the loss. This change could potentially introduce a bug if the reshaping is not done correctly.

8. examples/gpt.py:447:

Assessed confidence : 70%
Grade: 0%
Comment:
Ensure that the reshaping of logits and targets into two-dimensional tensors is done correctly. Incorrect reshaping could potentially introduce a bug.
Reasoning:
The author has refactored the forward method in the GPT class to accept an optional targets parameter. If targets is not None, the method computes and returns the loss. If targets is None, the method only returns the logits. This seems to be a logical change and makes the method more flexible. However, the author has also changed the way the loss is computed. Previously, the loss was computed directly from logits and targets. Now, logits and targets are reshaped into two-dimensional tensors before computing the loss. This change could potentially introduce a bug if the reshaping is not done correctly.

Workflow ID: wflow_Gw3d3xFfeYspikvu

Not what you expected? You can customize the content of the reviews using rules. Learn more here.

jdblackstar added 5 commits April 8, 2024 12:45

remove model_args and just use hyperparameters

c48d2c9

make sure the hyperparameters is an instance of the dataclass

1c332bc

splitting MHA into a single Head class and an MHA class that concaten…

d4add5c

…ates multiple Heads

creating a dropout class that inherits from nn.Module

455f03c

basic dropout test

9005bc2

jdblackstar self-assigned this Apr 8, 2024

ellipsis-dev bot reviewed Apr 8, 2024

View reviewed changes

examples/gpt.py Show resolved Hide resolved

i'm going to use one dataclass called Hyperparameters and just refere…

29cca1f

…nce that everywhere

ellipsis-dev bot reviewed Apr 8, 2024

View reviewed changes

fixing some minor things per ellipsis

08dc1ff

ellipsis-dev bot reviewed Apr 8, 2024

View reviewed changes

jdblackstar added 2 commits April 8, 2024 17:08

trimming down that dataclass a bit

5e2fbae

because we trimmed down, need to rename

bbdb4d9

ellipsis-dev bot reviewed Apr 9, 2024

View reviewed changes

examples/gpt.py Show resolved Hide resolved

examples/gpt.py Outdated Show resolved Hide resolved

examples/gpt.py Outdated Show resolved Hide resolved

jdblackstar added 2 commits April 8, 2024 17:15

oops wrong name

811fc4f

set target to None

a50ccd0

ellipsis-dev bot reviewed Apr 9, 2024

View reviewed changes

ellipsis-dev bot added a commit that referenced this pull request Apr 9, 2024

address comments left by @jdblackstar on #9 (feature/attention block …

9f6e2b6

…rewrite);

ellipsis-dev bot mentioned this pull request Apr 9, 2024

[Ellipsis] Remove type check in GPT forward method #10

Closed

jdblackstar added 2 commits April 10, 2024 10:49

pre-commit changes

96d3611

adding the target argument to the GPT docstring

9a3db12

ellipsis-dev bot reviewed Apr 10, 2024

View reviewed changes

jdblackstar added 2 commits April 11, 2024 09:16

adds the masked_fill() method to the Tensor class

e766d25

changes to binary operations in both tensor.py and ops.py

9b302d3

ellipsis-dev bot reviewed Apr 11, 2024

View reviewed changes

jdblackstar added 2 commits April 12, 2024 09:53

adding apply() method to the Module class

a077947

some more attention block rewrites

059c8a0

ellipsis-dev bot reviewed Apr 12, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature/attention block rewrite #9

feature/attention block rewrite #9

jdblackstar commented Apr 8, 2024 •

edited by ellipsis-dev bot

Loading

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot Apr 9, 2024

jdblackstar Apr 9, 2024

ellipsis-dev bot Apr 9, 2024

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

feature/attention block rewrite #9

Are you sure you want to change the base?

feature/attention block rewrite #9

Conversation

jdblackstar commented Apr 8, 2024 • edited by ellipsis-dev bot Loading

Summary:

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot Apr 9, 2024

Choose a reason for hiding this comment

jdblackstar Apr 9, 2024

Choose a reason for hiding this comment

ellipsis-dev bot Apr 9, 2024

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

jdblackstar commented Apr 8, 2024 •

edited by ellipsis-dev bot

Loading