You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I proceeded further based on the issue #2402
I have already made modifications based on the suggestions according to the issue above, but my code encounters issues when calculating the PPO loss. After debugging, I found that the line gain1 = log_weight.exp() * advantage always results in a tensor where all values are zero. I also discovered that this might be due to the fact that the result from return self.log_prob_composite(sample, include_sum=True) is too small (e.g., -300, -200, etc.). I can't figure out why self.log_prob_composite computes such values, and I hope someone can help me with this issue.
It seems that I have identified an issue: when a ClipPPOLoss instance object performs the batch forward function, there appears to be a problem with the log_prob function of the CompositeDistribution. The tensor dimensions passed to the dist.log_prob part of the code seem to be incorrect, resulting in abnormal values when calculating the lp. When I modified the code from lp = dist.log_prob(sample.get(name)) to lp = dist.log_prob(sample.get(name).squeeze(-1)), the training loss curve began to change according to adjustments in the learning rate. If I do not make this change, altering the learning rate has no effect on the learning rate curve during training, even with the random seed unchanged. In other words, performing training ten times with different learning rates results in identical loss curves.
Here is the complete log_prob function code after my modifications.
I think what could be happening is that your sample tensors have more dimensions than the distribution parameters, and then some broadcasting is done by PyTorch.
Are you able to change the action_spec in your code not to have this dimension you are removing with squeeze(-1) in your actions? It would be interesting to see if that solves the problem. I remember in the code I provided in #2402, I had to modify a bit the structure of the action space.
I think what could be happening is that your sample tensors have more dimensions than the distribution parameters, and then some broadcasting is done by PyTorch.
Are you able to change the action_spec in your code not to have this dimension you are removing with squeeze(-1) in your actions? It would be interesting to see if that solves the problem. I remember in the code I provided in #2402, I had to modify a bit the structure of the action space.
I am not sure if the modifications will allow the model to train and converge properly, because I am a beginner, and also because there is this warning in issue 2458 (#2458). However, I am confident that it performs better or more normally than before modifying the log_prob function. Additionally, my code has indeed been modified according to the suggestions you provided in issue 2402 (#2402) to redefine self.action_spec in the _make_spec function.
Describe the bug
When I proceeded further based on the issue #2402
I have already made modifications based on the suggestions according to the issue above, but my code encounters issues when calculating the PPO loss. After debugging, I found that the line gain1 = log_weight.exp() * advantage always results in a tensor where all values are zero. I also discovered that this might be due to the fact that the result from return self.log_prob_composite(sample, include_sum=True) is too small (e.g., -300, -200, etc.). I can't figure out why self.log_prob_composite computes such values, and I hope someone can help me with this issue.
System info
win11 24h2
python 3.10.14
torch 2.4.0+cu118
torchaudio 2.4.0+cu118
torchrl 0.5.0+ca3a595
torchvision 0.19.0+cu118
tensordict 0.5.0+eba0769
The text was updated successfully, but these errors were encountered: