-
Notifications
You must be signed in to change notification settings - Fork 27
Inconsistent with the reproduced results of the paper #18
Comments
You are using 2 million time steps and the paper is using 100k time steps, you should compare table1 |
Hi! Let me see if I can understand the issue here. The results in the paper are with 10 seeds. As you can see, even with 10 seeds, the standard error is quite high. For reference, standard error = standard deviation / sqrt(num of seeds). I understand that you ran the experiments with 3 seeds and the standard error bands that you get are quite high as well (especially for multi-headed SAC). Could you please try running with more seeds? Increasing the |
Hello, my metaworld version is af8417bfc82a3e249b4b02156518d775f29eb289. Do different versions of metaworld greatly affect the experimental results? |
That is the metaworld version that we tested against. I wanted to check the version as metaworld was under active development at that time |
We are using the af8417bfc version as the environment, which is also used when we run the MTRL code :) |
Description
This is what we reproduced:
This is the result in the paper:
We don't know why the results of Soft Modularization and Multi-headed SAC are not good.
How to reproduce
The following code is a command line instruction, as described in https://mtrl.readthedocs.io/en/latest/pages/tutorials/baseline.html.
cd Code/mtrl-main/
conda activate garage
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/yyq/.mujoco/mujoco200/bin
export PYTHONWARNINGS='ignore:semaphore_tracker:UserWarning'
mkdir -p ./trainlogs
mt10_mtsac
CUDA_VISIBLE_DEVICES=0 nohup python -u main.py setup=metaworld env=metaworld-mt10 agent=state_sac experiment.num_eval_episodes=1 experiment.num_train_steps=2000000 setup.seed=1 replay_buffer.batch_size=1280 agent.multitask.num_envs=10 agent.multitask.should_use_disentangled_alpha=True agent.encoder.type_to_select=identity agent.multitask.should_use_multi_head_policy=False agent.multitask.actor_cfg.should_condition_model_on_task_info=False agent.multitask.actor_cfg.should_condition_encoder_on_task_info=True agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=True +exp_name=mt10_mtsac_2000000 > trainlogs/mt10_mtsac_sd1.log 2>&1 &
CUDA_VISIBLE_DEVICES=0 nohup python -u main.py setup=metaworld env=metaworld-mt10 agent=state_sac experiment.num_eval_episodes=1 experiment.num_train_steps=2000000 setup.seed=2 replay_buffer.batch_size=1280 agent.multitask.num_envs=10 agent.multitask.should_use_disentangled_alpha=True agent.encoder.type_to_select=identity agent.multitask.should_use_multi_head_policy=False agent.multitask.actor_cfg.should_condition_model_on_task_info=False agent.multitask.actor_cfg.should_condition_encoder_on_task_info=True agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=True +exp_name=mt10_mtsac_2000000 > trainlogs/mt10_mtsac_sd2.log 2>&1 &
CUDA_VISIBLE_DEVICES=0 nohup python -u main.py setup=metaworld env=metaworld-mt10 agent=state_sac experiment.num_eval_episodes=1 experiment.num_train_steps=2000000 setup.seed=3 replay_buffer.batch_size=1280 agent.multitask.num_envs=10 agent.multitask.should_use_disentangled_alpha=True agent.encoder.type_to_select=identity agent.multitask.should_use_multi_head_policy=False agent.multitask.actor_cfg.should_condition_model_on_task_info=False agent.multitask.actor_cfg.should_condition_encoder_on_task_info=True agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=True +exp_name=mt10_mtsac_2000000 > trainlogs/mt10_mtsac_sd3.log 2>&1 &
mt10_mtmhsac
CUDA_VISIBLE_DEVICES=1 nohup python -u main.py setup=metaworld env=metaworld-mt10 agent=state_sac experiment.num_eval_episodes=1 experiment.num_train_steps=2000000 setup.seed=1 replay_buffer.batch_size=1280 agent.multitask.num_envs=10 agent.multitask.should_use_disentangled_alpha=True agent.encoder.type_to_select=identity agent.multitask.should_use_multi_head_policy=True agent.multitask.actor_cfg.should_condition_model_on_task_info=False agent.multitask.actor_cfg.should_condition_encoder_on_task_info=False agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=False +exp_name=mt10_mtmhsac_2000000 > trainlogs/mt10_mtmhsac_sd1.log 2>&1 &
CUDA_VISIBLE_DEVICES=1 nohup python -u main.py setup=metaworld env=metaworld-mt10 agent=state_sac experiment.num_eval_episodes=1 experiment.num_train_steps=2000000 setup.seed=2 replay_buffer.batch_size=1280 agent.multitask.num_envs=10 agent.multitask.should_use_disentangled_alpha=True agent.encoder.type_to_select=identity agent.multitask.should_use_multi_head_policy=True agent.multitask.actor_cfg.should_condition_model_on_task_info=False agent.multitask.actor_cfg.should_condition_encoder_on_task_info=False agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=False +exp_name=mt10_mtmhsac_2000000 > trainlogs/mt10_mtmhsac_sd2.log 2>&1 &
CUDA_VISIBLE_DEVICES=1 nohup python -u main.py setup=metaworld env=metaworld-mt10 agent=state_sac experiment.num_eval_episodes=1 experiment.num_train_steps=2000000 setup.seed=3 replay_buffer.batch_size=1280 agent.multitask.num_envs=10 agent.multitask.should_use_disentangled_alpha=True agent.encoder.type_to_select=identity agent.multitask.should_use_multi_head_policy=True agent.multitask.actor_cfg.should_condition_model_on_task_info=False agent.multitask.actor_cfg.should_condition_encoder_on_task_info=False agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=False +exp_name=mt10_mtmhsac_2000000 > trainlogs/mt10_mtmhsac_sd3.log 2>&1 &
mt10_soft_modularization
CUDA_VISIBLE_DEVICES=1 nohup python -u main.py setup=metaworld env=metaworld-mt10 agent=state_sac experiment.num_eval_episodes=1 experiment.num_train_steps=2000000 setup.seed=1 replay_buffer.batch_size=1280 agent.multitask.num_envs=10 agent.multitask.should_use_disentangled_alpha=True agent.multitask.should_use_task_encoder=True agent.encoder.type_to_select=feedforward agent.multitask.actor_cfg.should_condition_model_on_task_info=True agent.multitask.actor_cfg.should_condition_encoder_on_task_info=False agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=False agent.multitask.actor_cfg.moe_cfg.should_use=True agent.multitask.actor_cfg.moe_cfg.mode=soft_modularization agent.multitask.should_use_multi_head_policy=False agent.encoder.feedforward.hidden_dim=50 agent.encoder.feedforward.num_layers=2 agent.encoder.feedforward.feature_dim=50 agent.actor.num_layers=4 agent.multitask.task_encoder_cfg.model_cfg.pretrained_embedding_cfg.should_use=False +exp_name=mt10_soft_modularization_2000000 > trainlogs/mt10_soft_modularization_sd1.log 2>&1 &
CUDA_VISIBLE_DEVICES=1 nohup python -u main.py setup=metaworld env=metaworld-mt10 agent=state_sac experiment.num_eval_episodes=1 experiment.num_train_steps=2000000 setup.seed=2 replay_buffer.batch_size=1280 agent.multitask.num_envs=10 agent.multitask.should_use_disentangled_alpha=True agent.multitask.should_use_task_encoder=True agent.encoder.type_to_select=feedforward agent.multitask.actor_cfg.should_condition_model_on_task_info=True agent.multitask.actor_cfg.should_condition_encoder_on_task_info=False agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=False agent.multitask.actor_cfg.moe_cfg.should_use=True agent.multitask.actor_cfg.moe_cfg.mode=soft_modularization agent.multitask.should_use_multi_head_policy=False agent.encoder.feedforward.hidden_dim=50 agent.encoder.feedforward.num_layers=2 agent.encoder.feedforward.feature_dim=50 agent.actor.num_layers=4 agent.multitask.task_encoder_cfg.model_cfg.pretrained_embedding_cfg.should_use=False +exp_name=mt10_soft_modularization_2000000 > trainlogs/mt10_soft_modularization_sd2.log 2>&1 &
CUDA_VISIBLE_DEVICES=1 nohup python -u main.py setup=metaworld env=metaworld-mt10 agent=state_sac experiment.num_eval_episodes=1 experiment.num_train_steps=2000000 setup.seed=3 replay_buffer.batch_size=1280 agent.multitask.num_envs=10 agent.multitask.should_use_disentangled_alpha=True agent.multitask.should_use_task_encoder=True agent.encoder.type_to_select=feedforward agent.multitask.actor_cfg.should_condition_model_on_task_info=True agent.multitask.actor_cfg.should_condition_encoder_on_task_info=False agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=False agent.multitask.actor_cfg.moe_cfg.should_use=True agent.multitask.actor_cfg.moe_cfg.mode=soft_modularization agent.multitask.should_use_multi_head_policy=False agent.encoder.feedforward.hidden_dim=50 agent.encoder.feedforward.num_layers=2 agent.encoder.feedforward.feature_dim=50 agent.actor.num_layers=4 agent.multitask.task_encoder_cfg.model_cfg.pretrained_embedding_cfg.should_use=False +exp_name=mt10_soft_modularization_2000000 > trainlogs/mt10_soft_modularization_sd3.log 2>&1 &
System information
Thank you very much!
The text was updated successfully, but these errors were encountered: