Skip to content

Latest commit

 

History

History
194 lines (191 loc) · 12.6 KB

RawNotes.002.md

File metadata and controls

194 lines (191 loc) · 12.6 KB

These are my raw notes from each training run

  • 002.143 - backflip (StyleTransfer002) - Finally completes backflip!!! needed 128m training steps
Reward EpisodeLenght ValueLoss
Reward EpisodeLenght ValueLoss
Trained with 64 agents (so 128m training steps)
Trained in 3 phases. 1 was 32m, then 32m, then 64m
  • test using Unity 2018.3 beta using my default hyperparms
  • 002.142 - gamma: 0.995 (was .99)
  • 002.142 - gamma: 0.98 (was .99) - seams closer to
  • 002.141 - trained almost as good as
  • … by mistake, buffer_size: 4096, batch_size: 1024
  • … optimize by only running action / any ml-agents logic 1 in 5 times
  • Tried OpenAI Baselines PPO2 but did not figure out right hyperparams or see improvement in training time. Found GPU was slower than CPU???
  • Tried hidden_units: 1024 (was 512) - didn’t show any improvement
  • 002.138 - backflip (StyleTransfer002)
  • … curriculum learning 50% at .8, then 0.0
  • 002.137 - backflip (StyleTransfer002) start frame 0, =
  • … hidden_units: 1024 (was 512) * … time_horizon: 100 (was 50)
  • 002.136 - backflip (StyleTransfer002) start frame 0 =
  • … abort if distanceReward < 0.25f (was 0.33)
  • 002.135 - backflip (StyleTransfer002) with random trainer = 32m - so so, better than 133 but not as good as 134; will keep training both
  • 002.134 - backflip (StyleTransfer002) start frame 1 = 32m - is working best so far
  • … abort if distanceReward < 0.334f (was 0.18f)
  • 002.133 - backflip (StyleTransfer002) with random trainer = aborts out of backflip, try more aggressive distanceReward abort
  • 002.132 - runningv2 (StyleTransfer002) with random trainer = trains well after 32m sim steps, train for more
  • … revert early terminate during training after 50 steps
  • … reset if reward < 0.18 or torso, head, bum hit ground
  • 002.130 = backflip (StyleTransfer002) with random trainer = not great, seams really wobbly, not sure the 50 steps thing is working out so well
  • … StyleTransfer002TrainerAgent ignores skipped frames (so backflip goes from 430 to 83 frames)
  • … time_horizon: 50 (was 1000)
  • … turn jointAtLimitPenality into jointsNotAtLimitReward
  • 002.129 = backflip (StyleTransfer002e) normal batch size = hard to tell
  • 002.128 = backflip (StyleTransfer002f) 1/4 batch size = doesn’t do well
  • .. 64 agents
  • 002.127 = backflip (StyleTransfer002e) with random trainer =
  • 002.126 = backflip (StyleTransfer002f) = not sure
  • … early terminate during training after 50 steps
  • 002.125 = backflip (StyleTransfer002e) = not sure
  • … reset if reward < 0.18
  • 002.124 = backflip (StyleTransfer002f) )
  • … trainerAgent does not reset on done = Same, takes easy way out of backflip
  • 002.122 = backflip (StyleTransfer002e) ) = OK, but takes easy way out of backflip so that he stands for longer
  • … rework reward so never negative
  • 002.121 = backflip (StyleTransfer002e) (minor tweak) = is learning to fail as this is higher reward than negatives
  • 002.120 = backflip (StyleTransfer002e) = NOT WORKING
  • … TrainerBrain - Adversarial network to choose where to focus training
  • 002.119 = jazzDance (StyleTransfer002d) =
  • 002.118 = MMAKick (StyleTransfer002b) =
  • 002.117 = backflip (StyleTransfer002a) =
  • … remove clip on reward
  • 002.116 = backflip (StyleTransfer002d) -1 on fail (was -100) = TBD
  • 002.115 = backflip (StyleTransfer002c) = TBD needs more training steps
  • 002.114 = runningv2 (StyleTransfer002b) = Trains well
  • 002.113 = walking (StyleTransfer002) = Trains well
  • … fix AnimationBase and MarathonMan referenced shin in Agent2
  • 002.112 = backflip (StyleTransfer002c) =
  • 002.111 = runningv2 (StyleTransfer002b) =
  • 002.110 = walking (StyleTransfer002) = training better, but strange problem with Agent02
  • … fixed Muscle002 apply force to use max of XY (was average which did not work so well when one was 0)
  • 002.109 = backflip (StyleTransfer002c) = ok but not as good as 106, something is not right
  • 002.108 = runningv2 (StyleTransfer002b) = ok but not as good as 105, something is not right
  • 002.107 = walking (StyleTransfer002) = ok but not as good as 104, something is not right
  • … make all non-arm joints use one joint
  • 002.106 = backflip (StyleTransfer002c) = statistically improves but still not training
  • 002.105 = runningv2 (StyleTransfer002b) = trains well, best ever
  • 002.104 = walking (StyleTransfer002) = trains really well
  • … make feet use one joint (2 axis)
  • 002.103 = walking (StyleTransfer002) =
  • … tweak foot rotations and power
  • 002.102 = walking (StyleTransfer002e) =
  • … foot as two joints in one
  • 002.101 = walking (StyleTransfer002d) =
  • … tweaked humanoid
  • 002.100 = backflip (StyleTransfer002c) = 1m signs of training but a long way to go
  • 002.099 = running (StyleTransfer002b) = 2m trains a little, needs random inference, bad feet choices
  • 002.098 = walking (StyleTransfer002) = 2m trains well after - but feet are still floppy
  • … fix - reduce bounce on spawn
  • 002.097 = running (StyleTransfer002c) always start at beginning =
  • 002.096 = running (StyleTransfer002b) =
  • 002.095 = walking (StyleTransfer002) =
  • … fix - nest children
  • 002.094 = running (StyleTransfer002c) always start at beginning =
  • 002.093 = running (StyleTransfer002b) =
  • 002.092 = walking (StyleTransfer002) =
  • … fix - velocaties / rotationVelocity where bleeding past reset
  • 002.091 = running (StyleTransfer002b) -300 penalty, time_horizon: 64 (was 1000)
  • 002.090 = running (StyleTransfer002b) -300 penalty, time_horizon: 128 (was 1000)
  • 002.089 = running (StyleTransfer002b) -300 penalty =
  • 002.088 = running (StyleTransfer002b) = 1m: seams about the same
  • 002.087 = walking (StyleTransfer002) = 1m: seams about the same
  • … add sensorReward comparing differences in sensor states
  • … now 3 sensors per feet
  • 002.086 = walking (StyleTransfer002d) = 100 early exit, no done reward, train for 5m / 80m steps = doesn’t train well
  • 002.085 = running (StyleTransfer002) = 100 early exit, no done reward, train for 5m / 80m steps = trains ok, feel are still a problem. Not sure any improvement over previous attemtps
  • 002.084 = running (StyleTransfer002c) = -100 early exit, add 10 for complete reward, reduce velocity to 0f
  • 002.083 = running (StyleTransfer002c) = -10 early exit, add 10 for complete reward, reduce velocity to 0f
  • 002.082 = running (StyleTransfer002b) = remove early exit reward, add 10 for complete reward, reduce velocity to 0f
  • 002.081 = running (StyleTransfer002) = remove early exit reward, add 10 for complete reward, reduce velocity to * 0.03
  • 002.080 = running (StyleTransfer002c) = remove early exit reward, add 10 for complete reward
  • 002.079 = walking (StyleTransfer002b) = remove early exit reward
  • 002.078 = running (StyleTransfer002) = remove early exit reward
  • … fix runningv2 animation
  • 002.077 = running (StyleTransfer002b) = tweak rewards, remove early exit reward
  • 002.076 = walking (StyleTransfer002) = tweak rewards, remove early exit reward
  • 002.075 = running (StyleTransfer002b) = revert
  • 002.074 = walking (StyleTransfer002) = revert
  • 002.073 = walking (StyleTransfer002c) = new normalize, no clamp
  • 002.072 = walking (StyleTransfer002b) = new normalize, clamp after square and scale
  • 002.071 = walking (StyleTransfer002) = new normalize, clamp between square and scale
  • 002.070 = walking (StyleTransfer002d) = new normalize, clamp before square and scale
  • 002.069 = walking (StyleTransfer002c)
  • 002.068 = backflip (StyleTransfer002b)
  • 002.067 = runningv2 (StyleTransfer002)
  • … try removing reward for early termination or reaching end
  • 002.066 = walking (StyleTransfer002c)
  • 002.065 = backflip (StyleTransfer002b)
  • 002.064 = runningv2 (StyleTransfer002)
  • .. fix spawn bug
  • 002.063 = runningv2 (StyleTransfer002) … non square values … buffer_size: 20480
  • 002.062 = walking (StyleTransfer002b) … non square values … buffer_size: 10240
  • 002.061 = walking (StyleTransfer002b) … non square values
  • 002.060 = runningv2 (StyleTransfer002) … non square values - ABORTED
  • … buffer_size: 4096 was 10240
  • 002.059 = walking (StyleTransfer002d) … buffer_size: 4096 was 10240
  • 002.058 = walking (StyleTransfer002c) … square values (not average)
  • 002.057 = backflip (StyleTransfer002b) … square values (not average)
  • 002.056 = runningv2 (StyleTransfer002) … square values (not average)
  • … tweak observations
  • 002.055 = runningv2 (StyleTransfer002) … square values (not average)
  • 002.054 = walking (StyleTransfer002d) … square values (not average)
  • 002.053 = walking (StyleTransfer002c) … non square values (as average)
  • 002.052 = walking (StyleTransfer002b) … remove average
  • 002.051 = walking (StyleTransfer002)
  • .. tweak rewards
  • 002.050 = runningv2 (StyleTransfer002b)
  • 002.049 = walking (StyleTransfer002)
  • … fix obs
  • 002.048 = runningv2 (StyleTransfer002b)
  • 002.047 = walking (StyleTransfer002)
  • … redo differences
  • 002.046 = walking (StyleTransfer002c)
  • 002.045 = walking (StyleTransfer002c)
  • … terminate on end function
  • 002.044 = runningv2 (StyleTransfer002b)
  • 002.043 = walking (StyleTransfer002)
  • … alt ObsAngleDeltaFromAnimationRotation + add angleDist for end effector ObsAngleDeltaFromAnimationRotation = Mathf.Abs(_animationRotation.x - rotation.x) + Mathf.Abs(_animationRotation.y - rotation.y) + Mathf.Abs(_animationRotation.z - rotation.z) + Mathf.Abs(_animationRotation.w - rotation.w);
  • Note: feet are main problem. Could try recording / comparing sensors.
  • Note: paper is not so clear on how they calculate angle difference - could try alt formula
  • 002.042 = walking, power * 1 (StyleTransfer002c) Try magnitude (was SqrMagnitude) for differences + clamp at -3 = Not as robust as 002.035, slower to learn, can not keep up with example mode.
  • 002.041 = walking, power * 1 (StyleTransfer002b) Try magnitude (was SqrMagnitude) for differences + clamp at -2 = Not as robust as 002.035,
  • 002.040 = walking, power * 1 (StyleTransfer002b) Try magnitude (was SqrMagnitude) for differences
  • 002.039 = JazzDancing * 1 (StyleTransfer002e) = Training SOSO
  • 002.038 = Greeting * 1 (StyleTransfer002d) = Training OK
  • 002.037 = walking, power * 3 (StyleTransfer002c) = no major benefit over 035
  • 002.036 = walking, power * 2 (StyleTransfer002b) = no major benefit over 035
  • 002.035 = walking, power * 1 (StyleTransfer002) = Training OK, try 5m
    • 2m trained well
    • 5m looked like training barfed at 4.8m steps - final model does not work
    • … overall does not look like 5m is much better than 2m
  • Default now uses: sensors + clamp -1/1 + termination signal at <-1 + buffer_size: 10240 (was 20480)
  • 002.034 = running, power * 2 (StyleTransfer002b) + sensors + clamp -1/1 + termination signal at <-1 + buffer_size: 10240 (was 20480) =
  • 002.033 = running, power * 3 (StyleTransfer002d) + sensors + clamp -1/1 + termination signal at <-1 + buffer_size: 10240 (was 20480) = BETTER THAN 032
  • 002.032 = running, power * 3 (StyleTransfer002) + sensors + clamp -1/1 + termination signal at <-1 = IMPROVED by 2x
  • 002.031 = running, power * 3 (StyleTransfer002d) + sensors + clamp -1/1 + termination signal at <0 & <-1 - FAILED?
  • 002.030 = running, power * 3 (StyleTransfer002c) + sensors + clamp -1/1 = NO CHANGE
  • … Mathf.Clamp(poseReward, -1f, 1f);
  • 002.029 = running, power * 3 (StyleTransfer002b) + sensors = NO CHANGE
  • … remove updated reward & termination
  • 002.028 = running, power * 3 (StyleTransfer002e) + sensors + updated reward & termination - FAILED to learn
  • … shouldTerminate = terminateSignals >= 1
  • 002.027 = running, power * 3 (StyleTransfer002d) + sensors + updated reward - FAILED to learn
  • … remove mode observations (not sure these are help as they are)
  • … created foot / toe sensors
  • 002.026 = running, power * 3 (StyleTransfer002d) + some model observations + updated reward - ABORTED
  • … Updated reward = removed the reward clipping per observation (was 0 to 1)
  • 002.025 = running, power * 3 (StyleTransfer002d) + some model observations - ABORTED
  • 002.024 = backflip, power * 3 (StyleTransfer002c) - TO CHECK
  • 002.023 = mmaKick, power * 3 (StyleTransfer002b) - TO CHECK
  • 002.022 = running, power * 3 (StyleTransfer002) - did best, testing over 5m - kept improving up to 4.6m then lost performance; used 4.5m steps as test - TO CHECK
  • 002.021 = running, power * 2 (StyleTransfer002c) - did better, try x3
  • 002.020 = running, power * 1.3 (StyleTransfer002b) - did a little better
  • 002.019 = running, power * 1 (StyleTransfer002)
  • 002.018 = MMA Kick, power * 1
  • 002.016 = running, power * 1