Why used rewards[:-1] instead of rewards[1:]? #1

M-Heidari2000 · 2024-11-21T07:45:53Z

I have been reading this code and I saw in line 177 of train.py, rewards[:-1] is used instead of rewards[1:] for reward_loss. Is that a bug? If not, could you please explain why rewards[:-1] is correct?

Thank you

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why used rewards[:-1] instead of rewards[1:]? #1

Why used rewards[:-1] instead of rewards[1:]? #1

M-Heidari2000 commented Nov 21, 2024

Why used rewards[:-1] instead of rewards[1:]? #1

Why used rewards[:-1] instead of rewards[1:]? #1

Comments

M-Heidari2000 commented Nov 21, 2024