Changes

Newer edit →

MinPPO

549 bytes added, 22:20, 20 August 2024

Created page with "These are notes for the MinPPO project [https://github.com/kscalelabs/minppo here]. == Testing == * Hidden layer size of 256 shows progress (loss is based on state.q[2]) * s..."

These are notes for the MinPPO project [https://github.com/kscalelabs/minppo here].

== Testing ==

* Hidden layer size of 256 shows progress (loss is based on state.q[2])
* setting std to zero makes rewards nans why. I wonder if there NEEDS to be randomization in the enviornment
* ctrl cost is whats giving nans? interesting?
* it is unrelated to randomization of enviornmnet. i think gradient related
* first thing to become nans seems to be actor loss and scores. after that, everything becomes nans
* fixed entropy epsilon. hope this works now.

Ben

blockimmune, Bureaucrats, Administrators

488

edits

Changes

MinPPO

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools