Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature/attention block rewrite #9

Open
wants to merge 17 commits into
base: feature/gpt-example
Choose a base branch
from

Conversation

jdblackstar
Copy link
Owner

@jdblackstar jdblackstar commented Apr 8, 2024

Ellipsis 🚀 This PR description was created by Ellipsis for commit 059c8a0.

Summary:

This PR refactors the GPT model and attention block, introduces new classes, modifies existing methods, and improves code readability and efficiency in multiple files.

Key points:

  • Refactored GPT model and attention block in examples/gpt.py
  • Replaced MHA class with Head and MultiHeadAttention classes
  • Removed ModelArgs dataclass
  • Added get_vocab_size method to CharTokenizer class in punytorch/helpers.py
  • Modified forward method in GPT class to accept an optional targets parameter
  • Added documentation for generate method in GPT class
  • Refactored and simplified code in punytorch/ops.py and punytorch/tensor.py

Generated with ❤️ by ellipsis.dev

@jdblackstar jdblackstar self-assigned this Apr 8, 2024
Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested.

  • Reviewed the entire pull request up to 9005bc2
  • Looked at 295 lines of code in 4 files
  • Took 3 minutes and 13 seconds to review
More info
  • Skipped 0 files when reviewing.
  • Skipped posting 2 additional comments because they didn't meet confidence threshold of 50%.
1. examples/gpt.py:179:
  • Assessed confidence : 100%
  • Grade: 0%
  • Comment:
    The new Dropout class implemented in punytorch.nn.dropout should be used here instead of nn.Dropout.
  • Reasoning:
    The PR author has implemented a new Dropout class in punytorch.nn.dropout. However, in the Head and MultiHeadAttention classes in examples/gpt.py, the dropout is still being implemented using nn.Dropout, which is a placeholder according to the comment. The new Dropout class should be used instead.
2. examples/gpt.py:234:
  • Assessed confidence : 100%
  • Grade: 0%
  • Comment:
    The new Dropout class implemented in punytorch.nn.dropout should be used here instead of nn.Dropout.
  • Reasoning:
    The PR author has implemented a new Dropout class in punytorch.nn.dropout. However, in the Head and MultiHeadAttention classes in examples/gpt.py, the dropout is still being implemented using nn.Dropout, which is a placeholder according to the comment. The new Dropout class should be used instead.

Workflow ID: wflow_NpolyLvOsAclhTLf


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. We'll respond in a few minutes. Learn more here.

examples/gpt.py Show resolved Hide resolved
Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me!

  • Performed an incremental review on 29cca1f
  • Looked at 241 lines of code in 1 files
  • Took 2 minutes and 59 seconds to review
More info
  • Skipped 0 files when reviewing.
  • Skipped posting 4 additional comments because they didn't meet confidence threshold of 50%.
1. examples/gpt.py:195:
  • Assessed confidence : 80%
  • Grade: 0%
  • Comment:
    The dropout attribute is initialized using nn.Dropout. However, the PR description mentions the introduction of a Dropout class in punytorch/nn/dropout.py. Should this new Dropout class be used here instead?
  • Reasoning:
    The dropout attribute in the Head and MultiHeadAttention classes is initialized using nn.Dropout, but the PR description mentions the introduction of a Dropout class in punytorch/nn/dropout.py. It's unclear whether this new Dropout class should be used instead.
2. examples/gpt.py:250:
  • Assessed confidence : 80%
  • Grade: 20%
  • Comment:
    The dropout attribute is initialized using nn.Dropout. Should the new Dropout class mentioned in the PR description be used here instead?
  • Reasoning:
    The dropout attribute in the MultiHeadAttention class is also initialized using nn.Dropout. The same question applies here as in the Head class.
3. examples/gpt.py:365:
  • Assessed confidence : 100%
  • Grade: 0%
  • Comment:
    The Block class constructor takes hparams as an argument, but the Block class is initialized with hparams in the GPT class. This seems to be a logical error, as the Block class should be initialized with hparams.
  • Reasoning:
    The Block class constructor takes hparams as an argument, but the Block class is initialized with hparams in the GPT class. This seems to be a logical error, as the Block class should be initialized with hparams.
4. examples/gpt.py:399:
  • Assessed confidence : 100%
  • Grade: 0%
  • Comment:
    The GPT class constructor takes hparams as an argument, but the GPT class is initialized with hyperparameters in the main function. This seems to be a logical error, as the GPT class should be initialized with hparams.
  • Reasoning:
    The GPT class constructor takes hparams as an argument, but the GPT class is initialized with hyperparameters in the main function. This seems to be a logical error, as the GPT class should be initialized with hparams.

Workflow ID: wflow_8R0Yb4i6Uj4quakC


Not what you expected? You can customize the content of the reviews using rules. Learn more here.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me!

  • Performed an incremental review on 08dc1ff
  • Looked at 22 lines of code in 1 files
  • Took 2 minutes and 11 seconds to review
More info
  • Skipped 0 files when reviewing.
  • Skipped posting 1 additional comments because they didn't meet confidence threshold of 50%.
1. examples/gpt.py:408:
  • Assessed confidence : 100%
  • Grade: 0%
  • Comment:
    The PR description mentions the addition of a Dropout class and its corresponding test. However, these changes are not visible in the diff. Please ensure that these changes are included in the PR.
  • Reasoning:
    The PR description mentions that a Dropout class has been added in punytorch/nn/dropout.py and a corresponding test in tests/test_dropout.py. However, the diff does not show any changes in these files. It's important to ensure that the Dropout class and its tests are included in the PR.

Workflow ID: wflow_VY3oXBeRvnjl1zQs


Not what you expected? You can customize the content of the reviews using rules. Learn more here.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested.

  • Performed an incremental review on bbdb4d9
  • Looked at 185 lines of code in 2 files
  • Took 2 minutes and 16 seconds to review
More info
  • Skipped 0 files when reviewing.
  • Skipped posting 2 additional comments because they didn't meet confidence threshold of 50%.
1. examples/gpt.py:450:
  • Assessed confidence : 100%
  • Grade: 0%
  • Comment:
    Consider using a method from the punytorch library to sample from the softmax probabilities to keep all operations on the same device and ensure gradients are tracked correctly.
  • Reasoning:
    The generate method in the GPT class in gpt.py uses a numpy function to sample from the softmax probabilities, but it should use a method from the punytorch library to keep all operations on the same device and ensure gradients are tracked correctly.
2. punytorch/helpers.py:64:
  • Assessed confidence : 100%
  • Grade: 0%
  • Comment:
    The vocab_size attribute is not defined in the __init__ method. This could lead to a runtime error if the get_vocab_size method is called before the vocab_size attribute is set. Consider defining vocab_size in the __init__ method.
  • Reasoning:
    The CharTokenizer class in helpers.py has a get_vocab_size method that returns the vocab_size attribute. However, this attribute is not defined in the __init__ method, which could lead to a runtime error if the method is called before the vocab_size attribute is set.

Workflow ID: wflow_hjVN4XJ8alINT3mS


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. We'll respond in a few minutes. Learn more here.

examples/gpt.py Show resolved Hide resolved
examples/gpt.py Outdated Show resolved Hide resolved
examples/gpt.py Outdated Show resolved Hide resolved
Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested.

  • Performed an incremental review on a50ccd0
  • Looked at 13 lines of code in 1 files
  • Took 1 minute and 14 seconds to review
More info
  • Skipped 0 files when reviewing.
  • Skipped posting 0 additional comments because they didn't meet confidence threshold of 50%.

Workflow ID: wflow_zqwtTip5Hhp7nehj


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. We'll respond in a few minutes. Learn more here.

if isinstance(module, nn.Linear) and module.bias is not None:
module.bias.data.zero_()

def forward(self, x: Tensor, targets=None) -> Tensor:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method signature of the forward method has been changed to include an optional parameter 'targets'. Please update the docstring to reflect this change.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ellipsis-dev can you make this change to the docstring?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jdblackstar, I have addressed your comments in pull request #10

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me!

  • Performed an incremental review on 9a3db12
  • Looked at 36 lines of code in 1 files
  • Took 5 minutes and 53 seconds to review
More info
  • Skipped 0 files when reviewing.
  • Skipped posting 2 additional comments because they didn't meet confidence threshold of 50%.
1. examples/gpt.py:438:
  • Assessed confidence : 100%
  • Grade: 30%
  • Comment:
    Consider adding a type check for targets to ensure it is a Tensor when it is not None.
if targets is None:
    loss = None
elif not isinstance(targets, Tensor):
    raise TypeError(f"Expected targets to be a Tensor, but got {type(targets).__name__}")
else:
    B, T, C = logits.shape
    logits = logits.view(B * T, C)
    targets = targets.view(B * T)
    loss = CrossEntropyLoss.forward(logits, targets)
  • Reasoning:
    The PR description mentions that the forward method in the GPT class has been modified to accept an optional targets parameter. Looking at the diff, it seems that the forward method now checks if targets is None and if so, it sets loss to None. However, there is no check to ensure that targets is a Tensor when it is not None. This could potentially lead to errors downstream if targets is of an incorrect type. I will suggest adding a type check for targets.
2. examples/gpt.py:457:
  • Assessed confidence : 100%
  • Grade: 30%
  • Comment:
    Consider adding a type check for idx to ensure it is a Tensor.
if not isinstance(idx, Tensor):
    raise TypeError(f"Expected idx to be a Tensor, but got {type(idx).__name__}")
for _ in range(max_new_tokens):
    # crop idx to the last block_size tokens
    idx_cond = idx[:, -hparams.block_size :]
    # get the predictions
    logits, loss = self(idx_cond)
    # focus only on the last time step
    logits = logits[:, -1, :]  # becomes (B, C)
    # apply softmax to get probabilities
    probs = Softmax.forward(logits, dim=-1)  # (B, C)
    # sample from the distribution - don't have access to torch.multinomial, so we're making our own
    idx_next = np.array([np.random.choice(len(probs[b]), 1, p=probs[b]) for b in range(len(probs))])  # (B, 1)
    idx_next = Tensor(idx_next).to(probs.device)  # Convert to Tensor and move to the same device as probs
    # append sampled index to the running sequence
    idx = Tensor.cat((idx, idx_next), dim=1)  # (B, T+1)
return idx
  • Reasoning:
    The PR description mentions that the generate method in the GPT class has been modified. Looking at the diff, it seems that the generate method now includes a detailed docstring. However, there is no check to ensure that idx is a Tensor. This could potentially lead to errors downstream if idx is of an incorrect type. I will suggest adding a type check for idx.

Workflow ID: wflow_mUwyRFWGZ0riBmsT


Not what you expected? You can customize the content of the reviews using rules. Learn more here.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me!

  • Performed an incremental review on 9b302d3
  • Looked at 363 lines of code in 2 files
  • Took 3 minutes and 36 seconds to review
More info
  • Skipped 0 files when reviewing.
  • Skipped posting 2 additional comments because they didn't meet confidence threshold of 50%.
1. punytorch/ops.py:16:
  • Assessed confidence : 0%
  • Comment:
    The refactoring of the operations seems to be done correctly. The backward methods now directly use the stored inputs instead of getting them from the context. The forward methods now use the ensure_numpy function to convert the inputs to numpy arrays, which simplifies the code.
  • Reasoning:
    The refactoring of the operations in ops.py and tensor.py seems to be done correctly. The code is cleaner and more efficient. The backward methods in the operations now directly use the stored inputs instead of getting them from the context. The binary operations in the Tensor class have been refactored to use a helper function, which reduces code duplication. The matmul operation has been correctly modified to handle vector inputs by reshaping them. The forward methods in the operations now use the ensure_numpy function to convert the inputs to numpy arrays, which simplifies the code.
2. punytorch/tensor.py:228:
  • Assessed confidence : 0%
  • Comment:
    The binary operations have been refactored to use a helper function, which reduces code duplication. The matmul operation has been correctly modified to handle vector inputs by reshaping them.
  • Reasoning:
    The binary operations in the Tensor class have been refactored to use a helper function, which reduces code duplication. The matmul operation has been correctly modified to handle vector inputs by reshaping them.

Workflow ID: wflow_t9Sko1KTYFVW6rrS


Not what you expected? You can customize the content of the reviews using rules. Learn more here.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me!

  • Performed an incremental review on 059c8a0
  • Looked at 76 lines of code in 1 files
  • Took 5 minutes and 47 seconds to review
More info
  • Skipped 0 files when reviewing.
  • Skipped posting 8 additional comments because they didn't meet confidence threshold of 50%.
1. examples/gpt.py:214:
  • Assessed confidence : 50%
  • Comment:
    Consider removing the debug log statements in the forward method of the Head class as they seem to be for debugging purposes only.
  • Reasoning:
    The author has replaced the MHA class with Head and MultiHeadAttention classes. The Head class seems to be a single attention head, and the MultiHeadAttention class applies multiple attention heads in parallel and concatenates their outputs. The forward method in the Head class has a debug log statement that logs the shape and type of v and attention_probs. This could be removed in the final version of the code as it seems to be for debugging purposes only.
2. examples/gpt.py:215:
  • Assessed confidence : 50%
  • Comment:
    Consider removing the debug log statements in the forward method of the Head class as they seem to be for debugging purposes only.
  • Reasoning:
    The author has replaced the MHA class with Head and MultiHeadAttention classes. The Head class seems to be a single attention head, and the MultiHeadAttention class applies multiple attention heads in parallel and concatenates their outputs. The forward method in the Head class has a debug log statement that logs the shape and type of v and attention_probs. This could be removed in the final version of the code as it seems to be for debugging purposes only.
3. examples/gpt.py:442:
  • Assessed confidence : 70%
  • Grade: 0%
  • Comment:
    Ensure that the reshaping of logits and targets into two-dimensional tensors is done correctly. Incorrect reshaping could potentially introduce a bug.
  • Reasoning:
    The author has refactored the forward method in the GPT class to accept an optional targets parameter. If targets is not None, the method computes and returns the loss. If targets is None, the method only returns the logits. This seems to be a logical change and makes the method more flexible. However, the author has also changed the way the loss is computed. Previously, the loss was computed directly from logits and targets. Now, logits and targets are reshaped into two-dimensional tensors before computing the loss. This change could potentially introduce a bug if the reshaping is not done correctly.
4. examples/gpt.py:443:
  • Assessed confidence : 70%
  • Grade: 0%
  • Comment:
    Ensure that the reshaping of logits and targets into two-dimensional tensors is done correctly. Incorrect reshaping could potentially introduce a bug.
  • Reasoning:
    The author has refactored the forward method in the GPT class to accept an optional targets parameter. If targets is not None, the method computes and returns the loss. If targets is None, the method only returns the logits. This seems to be a logical change and makes the method more flexible. However, the author has also changed the way the loss is computed. Previously, the loss was computed directly from logits and targets. Now, logits and targets are reshaped into two-dimensional tensors before computing the loss. This change could potentially introduce a bug if the reshaping is not done correctly.
5. examples/gpt.py:444:
  • Assessed confidence : 70%
  • Grade: 0%
  • Comment:
    Ensure that the reshaping of logits and targets into two-dimensional tensors is done correctly. Incorrect reshaping could potentially introduce a bug.
  • Reasoning:
    The author has refactored the forward method in the GPT class to accept an optional targets parameter. If targets is not None, the method computes and returns the loss. If targets is None, the method only returns the logits. This seems to be a logical change and makes the method more flexible. However, the author has also changed the way the loss is computed. Previously, the loss was computed directly from logits and targets. Now, logits and targets are reshaped into two-dimensional tensors before computing the loss. This change could potentially introduce a bug if the reshaping is not done correctly.
6. examples/gpt.py:445:
  • Assessed confidence : 70%
  • Grade: 0%
  • Comment:
    Ensure that the reshaping of logits and targets into two-dimensional tensors is done correctly. Incorrect reshaping could potentially introduce a bug.
  • Reasoning:
    The author has refactored the forward method in the GPT class to accept an optional targets parameter. If targets is not None, the method computes and returns the loss. If targets is None, the method only returns the logits. This seems to be a logical change and makes the method more flexible. However, the author has also changed the way the loss is computed. Previously, the loss was computed directly from logits and targets. Now, logits and targets are reshaped into two-dimensional tensors before computing the loss. This change could potentially introduce a bug if the reshaping is not done correctly.
7. examples/gpt.py:446:
  • Assessed confidence : 70%
  • Grade: 0%
  • Comment:
    Ensure that the reshaping of logits and targets into two-dimensional tensors is done correctly. Incorrect reshaping could potentially introduce a bug.
  • Reasoning:
    The author has refactored the forward method in the GPT class to accept an optional targets parameter. If targets is not None, the method computes and returns the loss. If targets is None, the method only returns the logits. This seems to be a logical change and makes the method more flexible. However, the author has also changed the way the loss is computed. Previously, the loss was computed directly from logits and targets. Now, logits and targets are reshaped into two-dimensional tensors before computing the loss. This change could potentially introduce a bug if the reshaping is not done correctly.
8. examples/gpt.py:447:
  • Assessed confidence : 70%
  • Grade: 0%
  • Comment:
    Ensure that the reshaping of logits and targets into two-dimensional tensors is done correctly. Incorrect reshaping could potentially introduce a bug.
  • Reasoning:
    The author has refactored the forward method in the GPT class to accept an optional targets parameter. If targets is not None, the method computes and returns the loss. If targets is None, the method only returns the logits. This seems to be a logical change and makes the method more flexible. However, the author has also changed the way the loss is computed. Previously, the loss was computed directly from logits and targets. Now, logits and targets are reshaped into two-dimensional tensors before computing the loss. This change could potentially introduce a bug if the reshaping is not done correctly.

Workflow ID: wflow_Gw3d3xFfeYspikvu


Not what you expected? You can customize the content of the reviews using rules. Learn more here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant