Inconsistent with the reproduced results of the paper #18

metaqiang · 2022-02-24T07:54:40Z

Description

This is what we reproduced:

This is the result in the paper:

We don't know why the results of Soft Modularization and Multi-headed SAC are not good.

How to reproduce

The following code is a command line instruction, as described in https://mtrl.readthedocs.io/en/latest/pages/tutorials/baseline.html.

cd Code/mtrl-main/
conda activate garage
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/yyq/.mujoco/mujoco200/bin
export PYTHONWARNINGS='ignore:semaphore_tracker:UserWarning'
mkdir -p ./trainlogs

mt10_mtsac

CUDA_VISIBLE_DEVICES=0 nohup python -u main.py setup=metaworld env=metaworld-mt10 agent=state_sac experiment.num_eval_episodes=1 experiment.num_train_steps=2000000 setup.seed=1 replay_buffer.batch_size=1280 agent.multitask.num_envs=10 agent.multitask.should_use_disentangled_alpha=True agent.encoder.type_to_select=identity agent.multitask.should_use_multi_head_policy=False agent.multitask.actor_cfg.should_condition_model_on_task_info=False agent.multitask.actor_cfg.should_condition_encoder_on_task_info=True agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=True +exp_name=mt10_mtsac_2000000 > trainlogs/mt10_mtsac_sd1.log 2>&1 &

CUDA_VISIBLE_DEVICES=0 nohup python -u main.py setup=metaworld env=metaworld-mt10 agent=state_sac experiment.num_eval_episodes=1 experiment.num_train_steps=2000000 setup.seed=2 replay_buffer.batch_size=1280 agent.multitask.num_envs=10 agent.multitask.should_use_disentangled_alpha=True agent.encoder.type_to_select=identity agent.multitask.should_use_multi_head_policy=False agent.multitask.actor_cfg.should_condition_model_on_task_info=False agent.multitask.actor_cfg.should_condition_encoder_on_task_info=True agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=True +exp_name=mt10_mtsac_2000000 > trainlogs/mt10_mtsac_sd2.log 2>&1 &

CUDA_VISIBLE_DEVICES=0 nohup python -u main.py setup=metaworld env=metaworld-mt10 agent=state_sac experiment.num_eval_episodes=1 experiment.num_train_steps=2000000 setup.seed=3 replay_buffer.batch_size=1280 agent.multitask.num_envs=10 agent.multitask.should_use_disentangled_alpha=True agent.encoder.type_to_select=identity agent.multitask.should_use_multi_head_policy=False agent.multitask.actor_cfg.should_condition_model_on_task_info=False agent.multitask.actor_cfg.should_condition_encoder_on_task_info=True agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=True +exp_name=mt10_mtsac_2000000 > trainlogs/mt10_mtsac_sd3.log 2>&1 &

mt10_mtmhsac

CUDA_VISIBLE_DEVICES=1 nohup python -u main.py setup=metaworld env=metaworld-mt10 agent=state_sac experiment.num_eval_episodes=1 experiment.num_train_steps=2000000 setup.seed=1 replay_buffer.batch_size=1280 agent.multitask.num_envs=10 agent.multitask.should_use_disentangled_alpha=True agent.encoder.type_to_select=identity agent.multitask.should_use_multi_head_policy=True agent.multitask.actor_cfg.should_condition_model_on_task_info=False agent.multitask.actor_cfg.should_condition_encoder_on_task_info=False agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=False +exp_name=mt10_mtmhsac_2000000 > trainlogs/mt10_mtmhsac_sd1.log 2>&1 &

CUDA_VISIBLE_DEVICES=1 nohup python -u main.py setup=metaworld env=metaworld-mt10 agent=state_sac experiment.num_eval_episodes=1 experiment.num_train_steps=2000000 setup.seed=2 replay_buffer.batch_size=1280 agent.multitask.num_envs=10 agent.multitask.should_use_disentangled_alpha=True agent.encoder.type_to_select=identity agent.multitask.should_use_multi_head_policy=True agent.multitask.actor_cfg.should_condition_model_on_task_info=False agent.multitask.actor_cfg.should_condition_encoder_on_task_info=False agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=False +exp_name=mt10_mtmhsac_2000000 > trainlogs/mt10_mtmhsac_sd2.log 2>&1 &

CUDA_VISIBLE_DEVICES=1 nohup python -u main.py setup=metaworld env=metaworld-mt10 agent=state_sac experiment.num_eval_episodes=1 experiment.num_train_steps=2000000 setup.seed=3 replay_buffer.batch_size=1280 agent.multitask.num_envs=10 agent.multitask.should_use_disentangled_alpha=True agent.encoder.type_to_select=identity agent.multitask.should_use_multi_head_policy=True agent.multitask.actor_cfg.should_condition_model_on_task_info=False agent.multitask.actor_cfg.should_condition_encoder_on_task_info=False agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=False +exp_name=mt10_mtmhsac_2000000 > trainlogs/mt10_mtmhsac_sd3.log 2>&1 &

mt10_soft_modularization

CUDA_VISIBLE_DEVICES=1 nohup python -u main.py setup=metaworld env=metaworld-mt10 agent=state_sac experiment.num_eval_episodes=1 experiment.num_train_steps=2000000 setup.seed=1 replay_buffer.batch_size=1280 agent.multitask.num_envs=10 agent.multitask.should_use_disentangled_alpha=True agent.multitask.should_use_task_encoder=True agent.encoder.type_to_select=feedforward agent.multitask.actor_cfg.should_condition_model_on_task_info=True agent.multitask.actor_cfg.should_condition_encoder_on_task_info=False agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=False agent.multitask.actor_cfg.moe_cfg.should_use=True agent.multitask.actor_cfg.moe_cfg.mode=soft_modularization agent.multitask.should_use_multi_head_policy=False agent.encoder.feedforward.hidden_dim=50 agent.encoder.feedforward.num_layers=2 agent.encoder.feedforward.feature_dim=50 agent.actor.num_layers=4 agent.multitask.task_encoder_cfg.model_cfg.pretrained_embedding_cfg.should_use=False +exp_name=mt10_soft_modularization_2000000 > trainlogs/mt10_soft_modularization_sd1.log 2>&1 &

CUDA_VISIBLE_DEVICES=1 nohup python -u main.py setup=metaworld env=metaworld-mt10 agent=state_sac experiment.num_eval_episodes=1 experiment.num_train_steps=2000000 setup.seed=2 replay_buffer.batch_size=1280 agent.multitask.num_envs=10 agent.multitask.should_use_disentangled_alpha=True agent.multitask.should_use_task_encoder=True agent.encoder.type_to_select=feedforward agent.multitask.actor_cfg.should_condition_model_on_task_info=True agent.multitask.actor_cfg.should_condition_encoder_on_task_info=False agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=False agent.multitask.actor_cfg.moe_cfg.should_use=True agent.multitask.actor_cfg.moe_cfg.mode=soft_modularization agent.multitask.should_use_multi_head_policy=False agent.encoder.feedforward.hidden_dim=50 agent.encoder.feedforward.num_layers=2 agent.encoder.feedforward.feature_dim=50 agent.actor.num_layers=4 agent.multitask.task_encoder_cfg.model_cfg.pretrained_embedding_cfg.should_use=False +exp_name=mt10_soft_modularization_2000000 > trainlogs/mt10_soft_modularization_sd2.log 2>&1 &

CUDA_VISIBLE_DEVICES=1 nohup python -u main.py setup=metaworld env=metaworld-mt10 agent=state_sac experiment.num_eval_episodes=1 experiment.num_train_steps=2000000 setup.seed=3 replay_buffer.batch_size=1280 agent.multitask.num_envs=10 agent.multitask.should_use_disentangled_alpha=True agent.multitask.should_use_task_encoder=True agent.encoder.type_to_select=feedforward agent.multitask.actor_cfg.should_condition_model_on_task_info=True agent.multitask.actor_cfg.should_condition_encoder_on_task_info=False agent.multitask.actor_cfg.should_concatenate_task_info_with_encoder=False agent.multitask.actor_cfg.moe_cfg.should_use=True agent.multitask.actor_cfg.moe_cfg.mode=soft_modularization agent.multitask.should_use_multi_head_policy=False agent.encoder.feedforward.hidden_dim=50 agent.encoder.feedforward.num_layers=2 agent.encoder.feedforward.feature_dim=50 agent.actor.num_layers=4 agent.multitask.task_encoder_cfg.model_cfg.pretrained_embedding_cfg.should_use=False +exp_name=mt10_soft_modularization_2000000 > trainlogs/mt10_soft_modularization_sd3.log 2>&1 &

System information

MTRL Version: latest :
MTRL environment Name: MT10 :
Python version: 1.5.0 :

Thank you very much!

aithuuuus · 2022-03-05T13:06:28Z

You are using 2 million time steps and the paper is using 100k time steps, you should compare table1

metaqiang · 2022-03-05T14:50:46Z

You are using 2 million time steps and the paper is using 100k time steps, you should compare table1

The experimental results are still inconsistent with the table 1 in the paper:

shagunsodhani · 2022-04-13T03:39:19Z

Hi! Let me see if I can understand the issue here. The results in the paper are with 10 seeds. As you can see, even with 10 seeds, the standard error is quite high. For reference, standard error = standard deviation / sqrt(num of seeds). I understand that you ran the experiments with 3 seeds and the standard error bands that you get are quite high as well (especially for multi-headed SAC). Could you please try running with more seeds? Increasing the experiment.num_eval_episodes may also help to get more stable results. Could you also share the metaworld version (git commit) that you are using.

metaqiang · 2022-04-13T07:27:47Z

Hello, my metaworld version is af8417bfc82a3e249b4b02156518d775f29eb289. Do different versions of metaworld greatly affect the experimental results?

shagunsodhani · 2022-04-13T09:08:42Z

That is the metaworld version that we tested against. I wanted to check the version as metaworld was under active development at that time

metaqiang · 2022-04-15T03:52:51Z

We are using the af8417bfc version as the environment, which is also used when we run the MTRL code :)

JosselinSomervilleRoberts mentioned this issue Jun 1, 2023

Impossible to reproduce results of the paper #33

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent with the reproduced results of the paper #18

Inconsistent with the reproduced results of the paper #18

metaqiang commented Feb 24, 2022 •

edited

Loading

aithuuuus commented Mar 5, 2022

metaqiang commented Mar 5, 2022

shagunsodhani commented Apr 13, 2022

metaqiang commented Apr 13, 2022

shagunsodhani commented Apr 13, 2022

metaqiang commented Apr 15, 2022

Inconsistent with the reproduced results of the paper #18

Inconsistent with the reproduced results of the paper #18

Comments

metaqiang commented Feb 24, 2022 • edited Loading

Description

How to reproduce

mt10_mtsac

mt10_mtmhsac

mt10_soft_modularization

System information

aithuuuus commented Mar 5, 2022

metaqiang commented Mar 5, 2022

shagunsodhani commented Apr 13, 2022

metaqiang commented Apr 13, 2022

shagunsodhani commented Apr 13, 2022

metaqiang commented Apr 15, 2022

metaqiang commented Feb 24, 2022 •

edited

Loading