demo_A2C_PPO #354

guest-oo · 2024-04-25T06:30:46Z

D:\anconda\envs\pytorch\python.exe C:\Users\user\Desktop\ElegantRL-master\examples\demo_A2C_PPO.py
env_args = {'env_name': 'CartPole-v1',
'num_envs': 1,
'max_step': 500,
'state_dim': 4,
'action_dim': 2,
'if_discrete': True}
| Arguments Remove cwd: ./CartPole-v1_DiscreteA2C_0
| Evaluator:
| step: Number of samples, or total training steps, or running times of env.step().
| time: Time spent from the start of training to this moment.
| avgR: Average value of cumulative rewards, which is the sum of rewards in an episode.
| stdR: Standard dev of cumulative rewards, which is the sum of rewards in an episode.
| avgS: Average of steps in an episode.
| objC: Objective of Critic network. Or call it loss function of critic network.
| objA: Objective of Actor network. It is the average Q value of the critic network.
################################################################################
ID Step Time | avgR stdR avgS stdS | expR objC objA etc.

tensor_action = tensor_action.argmax(dim=1)
IndexError：维度超出范围（预期在 [-1， 0] 范围内，但得到 1）

The text was updated successfully, but these errors were encountered:

stellawang196 mentioned this issue May 25, 2024

Fix issue 354 #359

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

demo_A2C_PPO #354

demo_A2C_PPO #354

guest-oo commented Apr 25, 2024

demo_A2C_PPO #354

demo_A2C_PPO #354

Comments

guest-oo commented Apr 25, 2024