Humanoid Gym

Humanoid-Gym is an advanced reinforcement learning (RL) framework built on Nvidia Isaac Gym, designed for training locomotion skills in humanoid robots. Notably, it emphasizes zero-shot transfer, enabling skills learned in simulation to be directly applied to real-world environments without additional adjustments.

GitHub

DemoEdit

This codebase is verified by RobotEra's XBot-S (1.2 meter tall humanoid robot) and XBot-L (1.65 meter tall humanoid robot) in a real-world environment with zero-shot sim-to-real transfer.

FeaturesEdit

1. Humanoid Robot TrainingEdit

This repository offers comprehensive guidance and scripts for the training of humanoid robots. Humanoid-Gym features specialized rewards for humanoid robots, simplifying the difficulty of sim-to-real transfer. In this repository, RobotEra's XBot-L is used as a primary example, but it can also be used for other robots with minimal adjustments. Resources cover setup, configuration, and execution, aiming to fully prepare the robot for real-world locomotion by providing in-depth training and optimization.

Comprehensive Training Guidelines: Thorough walkthroughs for each stage of the training process.
Step-by-Step Configuration Instructions: Clear and succinct guidance ensuring an efficient setup process.
Execution Scripts for Easy Deployment: Pre-prepared scripts to streamline the training workflow.

2. Sim2Sim SupportEdit

Humanoid-Gym includes a sim2sim pipeline, allowing the transfer of trained policies to highly accurate and carefully designed simulated environments. Once the robot is acquired, the RL-trained policies can be confidently deployed in real-world settings.

Simulator settings, particularly with Mujoco, are finely tuned to closely mimic real-world scenarios. This careful calibration ensures that performances in both simulated and real-world environments are closely aligned, making simulations more trustworthy and enhancing their applicability to real-world scenarios.

3. Denoising World Model Learning (Coming Soon!)Edit

Denoising World Model Learning (DWL) presents an advanced sim-to-real framework integrating state estimation and system identification. This dual-method approach ensures the robot's learning and adaptation are both practical and effective in real-world contexts.

Enhanced Sim-to-real Adaptability: Techniques to optimize the robot's transition from simulated to real environments.
Improved State Estimation Capabilities: Advanced tools for precise and reliable state analysis.

Dexterous Hand Manipulation (Coming Soon!)Edit

InstallationEdit

Generate a new Python virtual environment with Python 3.8: conda create -n myenv python=3.8 For best performance, it is recommended to use NVIDIA driver version 525: sudo apt install nvidia-driver-525 The minimal driver version supported is 515. If unable to install version 525, ensure that the system has at least version 515 to maintain basic functionality.

Install PyTorch 1.13 with Cuda-11.7: conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia Install numpy-1.23: conda install numpy=1.23 Install Isaac Gym: Download and install Isaac Gym Preview 4 from [1]. cd isaacgym/python && pip install -e . Run an example: cd examples && python 1080_balls_of_solitude.py Consult isaacgym/docs/index.html for troubleshooting.

Install humanoid-gym: Clone this repository: cd humanoid-gym && pip install -e .

Usage GuideEdit

ExamplesEdit

Launching PPO Policy Training for 'v1' Across 4096 Environments

python scripts/train.py --task=humanoid_ppo --run_name v1 --headless --num_envs 4096

Evaluating the Trained PPO Policy 'v1'

python scripts/play.py --task=humanoid_ppo --run_name v1

Implementing Simulation-to-Simulation Model Transformation

python scripts/sim2sim.py --load_model /path/to/logs/XBot_ppo/exported/policies/policy_1.pt

Run our trained policy

python scripts/sim2sim.py --load_model /path/to/logs/XBot_ppo/exported/policies/policy_example.pt

1. Default TasksEdit

humanoid_ppo

 * Purpose: Baseline, PPO policy, Multi-frame low-level control
 * Observation Space: Variable dimensions, where is the number of frames
 * Privileged Information: Dimensions

humanoid_dwl (coming soon)

2. PPO PolicyEdit

Training Command:

python humanoid/scripts/train.py --task=humanoid_ppo --load_run log_file_path --name run_name

Running a Trained Policy:

python humanoid/scripts/play.py --task=humanoid_ppo --load_run log_file_path --name run_name By default, the latest model of the last run from the experiment folder is loaded. Other run iterations/models can be selected by adjusting load_run and checkpoint in the training config.

3. Sim-to-simEdit

Before initiating the sim-to-sim process, ensure that play.py is run to export a JIT policy.

Mujoco-based Sim2Sim Deployment: python scripts/sim2sim.py --load_model /path/to/export/model.pt

4. ParametersEdit

CPU and GPU Usage: To run simulations on the CPU, set both --sim_device=cpu and --rl_device=cpu. For GPU operations, specify --sim_device=cuda:{0,1,2...} and --rl_device={0,1,2...} accordingly. Note that CUDA_VISIBLE_DEVICES is not applicable, and it's essential to match the --sim_device and --rl_device settings.
Headless Operation: Include --headless for operations without rendering.
Rendering Control: Press 'v' to toggle rendering during training.
Policy Location: Trained policies are saved in humanoid/logs/<experiment_name>/<date_time>_<run_name>/model_<iteration>.pt.

5. Command-Line ArgumentsEdit

For RL training, refer to humanoid/utils/helpers.py#L161. For the sim-to-sim process, refer to humanoid/scripts/sim2sim.py#L169.

Code StructureEdit

Every environment hinges on an env file (legged_robot.py) and a configuration file (legged_robot_config.py). The latter houses two classes: LeggedRobotCfg (encompassing all environmental parameters) and LeggedRobotCfgPPO (denoting all training parameters). Both env and config classes use inheritance. Non-zero reward scales specified in cfg contribute a function of the corresponding name to the sum-total reward.

Tasks must be registered with task_registry.register(name, EnvClass, EnvConfig, TrainConfig). Registration may occur within envs/__init__.py, or outside of this repository.

Add a new environmentEdit

The base environment legged_robot constructs a rough terrain locomotion task. The corresponding configuration does not specify a robot asset (URDF/MJCF) and no reward scales.

If adding a new environment, create a new folder in the envs/ directory with a configuration file named <your_env>_config.py. The new configuration should inherit from existing environment configurations.

If proposing a new robotEdit

Insert the corresponding assets in the resources/ folder.
In the cfg file, set the path to the asset, define body names, default_joint_positions, and PD gains. Specify the desired train_cfg and the environment's name (python class).
In the train_cfg, set the experiment_name and run_name.
If needed, create your environment in <your_env>.py. Inherit from existing environments, override desired functions and/or add your reward functions.
Register your environment in humanoid/envs/__init__.py.
Modify or tune other parameters in your cfg or cfg_train as per requirements. To remove the reward, set its scale to zero. Avoid modifying the parameters of other environments!

If a new robot/environment needs to perform sim2sim, modifications to humanoid/scripts/sim2sim.py may be required:

Check the joint mapping of the robot between MJCF and URDF.
Change the initial joint position of the robot according to the trained policy.

TroubleshootingEdit

ImportError: libpython3.8.so.1.0: cannot open shared object file: No such file or directory

export LD_LIBRARY_PATH="~/miniconda3/envs/your_env/lib:$LD_LIBRARY_PATH" or sudo apt install libpython3.8

AttributeError: module 'distutils' has no attribute 'version'

conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

ImportError: /home/roboterax/anaconda3/bin/../lib/libstdc++.so.6: version `GLIBCXX_3.4.20` not found (required by /home/roboterax/carbgym/python/isaacgym/_bindings/linux64/gym_36.so)

mkdir ${YOUR_CONDA_ENV}/lib/_unused mv ${YOUR_CONDA_ENV}/lib/libstdc++* ${YOUR_CONDA_ENV}/lib/_unused

CitationEdit

Please cite the following if you use this code or parts of it: @article{gu2024humanoid,

 title={Humanoid-Gym: Reinforcement Learning for Humanoid Robot with Zero-Shot Sim2Real Transfer},
 author={Gu, Xinyang and Wang, Yen-Jen and Chen, Jianyu},
 journal={arXiv preprint arXiv:2404.05695},
 year={2024}

}

AcknowledgmentEdit

The implementation of Humanoid-Gym relies on resources from legged_gym and rsl_rl projects, created by the Robotic Systems Lab. The LeggedRobot implementation from their research is specifically utilized to enhance this codebase.

Humanoid Robots Wiki ^β

Humanoid Gym

Contents

DemoEdit

FeaturesEdit

1. Humanoid Robot TrainingEdit

2. Sim2Sim SupportEdit

3. Denoising World Model Learning (Coming Soon!)Edit

Dexterous Hand Manipulation (Coming Soon!)Edit

InstallationEdit

Usage GuideEdit

ExamplesEdit

1. Default TasksEdit

2. PPO PolicyEdit

3. Sim-to-simEdit

4. ParametersEdit

5. Command-Line ArgumentsEdit

Code StructureEdit

Add a new environmentEdit

If proposing a new robotEdit

TroubleshootingEdit

CitationEdit

AcknowledgmentEdit

Humanoid Robots Wiki β

Humanoid Gym

Contents

DemoEdit

FeaturesEdit

1. Humanoid Robot TrainingEdit

2. Sim2Sim SupportEdit

3. Denoising World Model Learning (Coming Soon!)Edit

Dexterous Hand Manipulation (Coming Soon!)Edit

InstallationEdit

Usage GuideEdit

ExamplesEdit

1. Default TasksEdit

2. PPO PolicyEdit

3. Sim-to-simEdit

4. ParametersEdit

5. Command-Line ArgumentsEdit

Code StructureEdit

Add a new environmentEdit

If proposing a new robotEdit

TroubleshootingEdit

CitationEdit

AcknowledgmentEdit

Humanoid Robots Wiki ^β