MinPPO

From Humanoid Robots Wiki
Revision as of 22:20, 20 August 2024 by Ben (talk | contribs) (Created page with "These are notes for the MinPPO project [https://github.com/kscalelabs/minppo here]. == Testing == * Hidden layer size of 256 shows progress (loss is based on state.q[2]) * s...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

These are notes for the MinPPO project here.

Testing

  • Hidden layer size of 256 shows progress (loss is based on state.q[2])
  • setting std to zero makes rewards nans why. I wonder if there NEEDS to be randomization in the enviornment
  • ctrl cost is whats giving nans? interesting?
  • it is unrelated to randomization of enviornmnet. i think gradient related
  • first thing to become nans seems to be actor loss and scores. after that, everything becomes nans
  • fixed entropy epsilon. hope this works now.