-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature/attention block rewrite #9
base: feature/gpt-example
Are you sure you want to change the base?
Conversation
…ates multiple Heads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Changes requested.
- Reviewed the entire pull request up to 9005bc2
- Looked at
295
lines of code in4
files - Took 3 minutes and 13 seconds to review
More info
- Skipped
0
files when reviewing. - Skipped posting
2
additional comments because they didn't meet confidence threshold of50%
.
1. examples/gpt.py:179
:
- Assessed confidence :
100%
- Grade:
0%
- Comment:
The new Dropout class implemented in punytorch.nn.dropout should be used here instead of nn.Dropout. - Reasoning:
The PR author has implemented a new Dropout class in punytorch.nn.dropout. However, in the Head and MultiHeadAttention classes in examples/gpt.py, the dropout is still being implemented using nn.Dropout, which is a placeholder according to the comment. The new Dropout class should be used instead.
2. examples/gpt.py:234
:
- Assessed confidence :
100%
- Grade:
0%
- Comment:
The new Dropout class implemented in punytorch.nn.dropout should be used here instead of nn.Dropout. - Reasoning:
The PR author has implemented a new Dropout class in punytorch.nn.dropout. However, in the Head and MultiHeadAttention classes in examples/gpt.py, the dropout is still being implemented using nn.Dropout, which is a placeholder according to the comment. The new Dropout class should be used instead.
Workflow ID: wflow_NpolyLvOsAclhTLf
Want Ellipsis to fix these issues? Tag @ellipsis-dev
in a comment. We'll respond in a few minutes. Learn more here.
…nce that everywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me!
- Performed an incremental review on 29cca1f
- Looked at
241
lines of code in1
files - Took 2 minutes and 59 seconds to review
More info
- Skipped
0
files when reviewing. - Skipped posting
4
additional comments because they didn't meet confidence threshold of50%
.
1. examples/gpt.py:195
:
- Assessed confidence :
80%
- Grade:
0%
- Comment:
Thedropout
attribute is initialized usingnn.Dropout
. However, the PR description mentions the introduction of aDropout
class inpunytorch/nn/dropout.py
. Should this newDropout
class be used here instead? - Reasoning:
Thedropout
attribute in theHead
andMultiHeadAttention
classes is initialized usingnn.Dropout
, but the PR description mentions the introduction of aDropout
class inpunytorch/nn/dropout.py
. It's unclear whether this newDropout
class should be used instead.
2. examples/gpt.py:250
:
- Assessed confidence :
80%
- Grade:
20%
- Comment:
Thedropout
attribute is initialized usingnn.Dropout
. Should the newDropout
class mentioned in the PR description be used here instead? - Reasoning:
Thedropout
attribute in theMultiHeadAttention
class is also initialized usingnn.Dropout
. The same question applies here as in theHead
class.
3. examples/gpt.py:365
:
- Assessed confidence :
100%
- Grade:
0%
- Comment:
TheBlock
class constructor takeshparams
as an argument, but theBlock
class is initialized withhparams
in theGPT
class. This seems to be a logical error, as theBlock
class should be initialized withhparams
. - Reasoning:
TheBlock
class constructor takeshparams
as an argument, but theBlock
class is initialized withhparams
in theGPT
class. This seems to be a logical error, as theBlock
class should be initialized withhparams
.
4. examples/gpt.py:399
:
- Assessed confidence :
100%
- Grade:
0%
- Comment:
TheGPT
class constructor takeshparams
as an argument, but theGPT
class is initialized withhyperparameters
in themain
function. This seems to be a logical error, as theGPT
class should be initialized withhparams
. - Reasoning:
TheGPT
class constructor takeshparams
as an argument, but theGPT
class is initialized withhyperparameters
in themain
function. This seems to be a logical error, as theGPT
class should be initialized withhparams
.
Workflow ID: wflow_8R0Yb4i6Uj4quakC
Not what you expected? You can customize the content of the reviews using rules. Learn more here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me!
- Performed an incremental review on 08dc1ff
- Looked at
22
lines of code in1
files - Took 2 minutes and 11 seconds to review
More info
- Skipped
0
files when reviewing. - Skipped posting
1
additional comments because they didn't meet confidence threshold of50%
.
1. examples/gpt.py:408
:
- Assessed confidence :
100%
- Grade:
0%
- Comment:
The PR description mentions the addition of aDropout
class and its corresponding test. However, these changes are not visible in the diff. Please ensure that these changes are included in the PR. - Reasoning:
The PR description mentions that aDropout
class has been added inpunytorch/nn/dropout.py
and a corresponding test intests/test_dropout.py
. However, the diff does not show any changes in these files. It's important to ensure that theDropout
class and its tests are included in the PR.
Workflow ID: wflow_VY3oXBeRvnjl1zQs
Not what you expected? You can customize the content of the reviews using rules. Learn more here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Changes requested.
- Performed an incremental review on bbdb4d9
- Looked at
185
lines of code in2
files - Took 2 minutes and 16 seconds to review
More info
- Skipped
0
files when reviewing. - Skipped posting
2
additional comments because they didn't meet confidence threshold of50%
.
1. examples/gpt.py:450
:
- Assessed confidence :
100%
- Grade:
0%
- Comment:
Consider using a method from the punytorch library to sample from the softmax probabilities to keep all operations on the same device and ensure gradients are tracked correctly. - Reasoning:
Thegenerate
method in theGPT
class ingpt.py
uses a numpy function to sample from the softmax probabilities, but it should use a method from the punytorch library to keep all operations on the same device and ensure gradients are tracked correctly.
2. punytorch/helpers.py:64
:
- Assessed confidence :
100%
- Grade:
0%
- Comment:
Thevocab_size
attribute is not defined in the__init__
method. This could lead to a runtime error if theget_vocab_size
method is called before thevocab_size
attribute is set. Consider definingvocab_size
in the__init__
method. - Reasoning:
TheCharTokenizer
class inhelpers.py
has aget_vocab_size
method that returns thevocab_size
attribute. However, this attribute is not defined in the__init__
method, which could lead to a runtime error if the method is called before thevocab_size
attribute is set.
Workflow ID: wflow_hjVN4XJ8alINT3mS
Want Ellipsis to fix these issues? Tag @ellipsis-dev
in a comment. We'll respond in a few minutes. Learn more here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Changes requested.
- Performed an incremental review on a50ccd0
- Looked at
13
lines of code in1
files - Took 1 minute and 14 seconds to review
More info
- Skipped
0
files when reviewing. - Skipped posting
0
additional comments because they didn't meet confidence threshold of50%
.
Workflow ID: wflow_zqwtTip5Hhp7nehj
Want Ellipsis to fix these issues? Tag @ellipsis-dev
in a comment. We'll respond in a few minutes. Learn more here.
if isinstance(module, nn.Linear) and module.bias is not None: | ||
module.bias.data.zero_() | ||
|
||
def forward(self, x: Tensor, targets=None) -> Tensor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method signature of the forward method has been changed to include an optional parameter 'targets'. Please update the docstring to reflect this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ellipsis-dev can you make this change to the docstring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jdblackstar, I have addressed your comments in pull request #10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me!
- Performed an incremental review on 9a3db12
- Looked at
36
lines of code in1
files - Took 5 minutes and 53 seconds to review
More info
- Skipped
0
files when reviewing. - Skipped posting
2
additional comments because they didn't meet confidence threshold of50%
.
1. examples/gpt.py:438
:
- Assessed confidence :
100%
- Grade:
30%
- Comment:
Consider adding a type check fortargets
to ensure it is aTensor
when it is notNone
.
if targets is None:
loss = None
elif not isinstance(targets, Tensor):
raise TypeError(f"Expected targets to be a Tensor, but got {type(targets).__name__}")
else:
B, T, C = logits.shape
logits = logits.view(B * T, C)
targets = targets.view(B * T)
loss = CrossEntropyLoss.forward(logits, targets)
- Reasoning:
The PR description mentions that theforward
method in theGPT
class has been modified to accept an optionaltargets
parameter. Looking at the diff, it seems that theforward
method now checks iftargets
isNone
and if so, it setsloss
toNone
. However, there is no check to ensure thattargets
is aTensor
when it is notNone
. This could potentially lead to errors downstream iftargets
is of an incorrect type. I will suggest adding a type check fortargets
.
2. examples/gpt.py:457
:
- Assessed confidence :
100%
- Grade:
30%
- Comment:
Consider adding a type check foridx
to ensure it is aTensor
.
if not isinstance(idx, Tensor):
raise TypeError(f"Expected idx to be a Tensor, but got {type(idx).__name__}")
for _ in range(max_new_tokens):
# crop idx to the last block_size tokens
idx_cond = idx[:, -hparams.block_size :]
# get the predictions
logits, loss = self(idx_cond)
# focus only on the last time step
logits = logits[:, -1, :] # becomes (B, C)
# apply softmax to get probabilities
probs = Softmax.forward(logits, dim=-1) # (B, C)
# sample from the distribution - don't have access to torch.multinomial, so we're making our own
idx_next = np.array([np.random.choice(len(probs[b]), 1, p=probs[b]) for b in range(len(probs))]) # (B, 1)
idx_next = Tensor(idx_next).to(probs.device) # Convert to Tensor and move to the same device as probs
# append sampled index to the running sequence
idx = Tensor.cat((idx, idx_next), dim=1) # (B, T+1)
return idx
- Reasoning:
The PR description mentions that thegenerate
method in theGPT
class has been modified. Looking at the diff, it seems that thegenerate
method now includes a detailed docstring. However, there is no check to ensure thatidx
is aTensor
. This could potentially lead to errors downstream ifidx
is of an incorrect type. I will suggest adding a type check foridx
.
Workflow ID: wflow_mUwyRFWGZ0riBmsT
Not what you expected? You can customize the content of the reviews using rules. Learn more here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me!
- Performed an incremental review on 9b302d3
- Looked at
363
lines of code in2
files - Took 3 minutes and 36 seconds to review
More info
- Skipped
0
files when reviewing. - Skipped posting
2
additional comments because they didn't meet confidence threshold of50%
.
1. punytorch/ops.py:16
:
- Assessed confidence :
0%
- Comment:
The refactoring of the operations seems to be done correctly. The backward methods now directly use the stored inputs instead of getting them from the context. The forward methods now use the ensure_numpy function to convert the inputs to numpy arrays, which simplifies the code. - Reasoning:
The refactoring of the operations in ops.py and tensor.py seems to be done correctly. The code is cleaner and more efficient. The backward methods in the operations now directly use the stored inputs instead of getting them from the context. The binary operations in the Tensor class have been refactored to use a helper function, which reduces code duplication. The matmul operation has been correctly modified to handle vector inputs by reshaping them. The forward methods in the operations now use the ensure_numpy function to convert the inputs to numpy arrays, which simplifies the code.
2. punytorch/tensor.py:228
:
- Assessed confidence :
0%
- Comment:
The binary operations have been refactored to use a helper function, which reduces code duplication. The matmul operation has been correctly modified to handle vector inputs by reshaping them. - Reasoning:
The binary operations in the Tensor class have been refactored to use a helper function, which reduces code duplication. The matmul operation has been correctly modified to handle vector inputs by reshaping them.
Workflow ID: wflow_t9Sko1KTYFVW6rrS
Not what you expected? You can customize the content of the reviews using rules. Learn more here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me!
- Performed an incremental review on 059c8a0
- Looked at
76
lines of code in1
files - Took 5 minutes and 47 seconds to review
More info
- Skipped
0
files when reviewing. - Skipped posting
8
additional comments because they didn't meet confidence threshold of50%
.
1. examples/gpt.py:214
:
- Assessed confidence :
50%
- Comment:
Consider removing the debug log statements in theforward
method of theHead
class as they seem to be for debugging purposes only. - Reasoning:
The author has replaced theMHA
class withHead
andMultiHeadAttention
classes. TheHead
class seems to be a single attention head, and theMultiHeadAttention
class applies multiple attention heads in parallel and concatenates their outputs. Theforward
method in theHead
class has a debug log statement that logs the shape and type ofv
andattention_probs
. This could be removed in the final version of the code as it seems to be for debugging purposes only.
2. examples/gpt.py:215
:
- Assessed confidence :
50%
- Comment:
Consider removing the debug log statements in theforward
method of theHead
class as they seem to be for debugging purposes only. - Reasoning:
The author has replaced theMHA
class withHead
andMultiHeadAttention
classes. TheHead
class seems to be a single attention head, and theMultiHeadAttention
class applies multiple attention heads in parallel and concatenates their outputs. Theforward
method in theHead
class has a debug log statement that logs the shape and type ofv
andattention_probs
. This could be removed in the final version of the code as it seems to be for debugging purposes only.
3. examples/gpt.py:442
:
- Assessed confidence :
70%
- Grade:
0%
- Comment:
Ensure that the reshaping oflogits
andtargets
into two-dimensional tensors is done correctly. Incorrect reshaping could potentially introduce a bug. - Reasoning:
The author has refactored theforward
method in theGPT
class to accept an optionaltargets
parameter. Iftargets
is notNone
, the method computes and returns the loss. Iftargets
isNone
, the method only returns the logits. This seems to be a logical change and makes the method more flexible. However, the author has also changed the way the loss is computed. Previously, the loss was computed directly fromlogits
andtargets
. Now,logits
andtargets
are reshaped into two-dimensional tensors before computing the loss. This change could potentially introduce a bug if the reshaping is not done correctly.
4. examples/gpt.py:443
:
- Assessed confidence :
70%
- Grade:
0%
- Comment:
Ensure that the reshaping oflogits
andtargets
into two-dimensional tensors is done correctly. Incorrect reshaping could potentially introduce a bug. - Reasoning:
The author has refactored theforward
method in theGPT
class to accept an optionaltargets
parameter. Iftargets
is notNone
, the method computes and returns the loss. Iftargets
isNone
, the method only returns the logits. This seems to be a logical change and makes the method more flexible. However, the author has also changed the way the loss is computed. Previously, the loss was computed directly fromlogits
andtargets
. Now,logits
andtargets
are reshaped into two-dimensional tensors before computing the loss. This change could potentially introduce a bug if the reshaping is not done correctly.
5. examples/gpt.py:444
:
- Assessed confidence :
70%
- Grade:
0%
- Comment:
Ensure that the reshaping oflogits
andtargets
into two-dimensional tensors is done correctly. Incorrect reshaping could potentially introduce a bug. - Reasoning:
The author has refactored theforward
method in theGPT
class to accept an optionaltargets
parameter. Iftargets
is notNone
, the method computes and returns the loss. Iftargets
isNone
, the method only returns the logits. This seems to be a logical change and makes the method more flexible. However, the author has also changed the way the loss is computed. Previously, the loss was computed directly fromlogits
andtargets
. Now,logits
andtargets
are reshaped into two-dimensional tensors before computing the loss. This change could potentially introduce a bug if the reshaping is not done correctly.
6. examples/gpt.py:445
:
- Assessed confidence :
70%
- Grade:
0%
- Comment:
Ensure that the reshaping oflogits
andtargets
into two-dimensional tensors is done correctly. Incorrect reshaping could potentially introduce a bug. - Reasoning:
The author has refactored theforward
method in theGPT
class to accept an optionaltargets
parameter. Iftargets
is notNone
, the method computes and returns the loss. Iftargets
isNone
, the method only returns the logits. This seems to be a logical change and makes the method more flexible. However, the author has also changed the way the loss is computed. Previously, the loss was computed directly fromlogits
andtargets
. Now,logits
andtargets
are reshaped into two-dimensional tensors before computing the loss. This change could potentially introduce a bug if the reshaping is not done correctly.
7. examples/gpt.py:446
:
- Assessed confidence :
70%
- Grade:
0%
- Comment:
Ensure that the reshaping oflogits
andtargets
into two-dimensional tensors is done correctly. Incorrect reshaping could potentially introduce a bug. - Reasoning:
The author has refactored theforward
method in theGPT
class to accept an optionaltargets
parameter. Iftargets
is notNone
, the method computes and returns the loss. Iftargets
isNone
, the method only returns the logits. This seems to be a logical change and makes the method more flexible. However, the author has also changed the way the loss is computed. Previously, the loss was computed directly fromlogits
andtargets
. Now,logits
andtargets
are reshaped into two-dimensional tensors before computing the loss. This change could potentially introduce a bug if the reshaping is not done correctly.
8. examples/gpt.py:447
:
- Assessed confidence :
70%
- Grade:
0%
- Comment:
Ensure that the reshaping oflogits
andtargets
into two-dimensional tensors is done correctly. Incorrect reshaping could potentially introduce a bug. - Reasoning:
The author has refactored theforward
method in theGPT
class to accept an optionaltargets
parameter. Iftargets
is notNone
, the method computes and returns the loss. Iftargets
isNone
, the method only returns the logits. This seems to be a logical change and makes the method more flexible. However, the author has also changed the way the loss is computed. Previously, the loss was computed directly fromlogits
andtargets
. Now,logits
andtargets
are reshaped into two-dimensional tensors before computing the loss. This change could potentially introduce a bug if the reshaping is not done correctly.
Workflow ID: wflow_Gw3d3xFfeYspikvu
Not what you expected? You can customize the content of the reviews using rules. Learn more here.
Summary:
This PR refactors the GPT model and attention block, introduces new classes, modifies existing methods, and improves code readability and efficiency in multiple files.
Key points:
examples/gpt.py
MHA
class withHead
andMultiHeadAttention
classesModelArgs
dataclassget_vocab_size
method toCharTokenizer
class inpunytorch/helpers.py
forward
method inGPT
class to accept an optionaltargets
parametergenerate
method inGPT
classpunytorch/ops.py
andpunytorch/tensor.py
Generated with ❤️ by ellipsis.dev