467
edits
Changes
no edit summary
Humanoid-Gym is an advanced reinforcement learning (RL) framework built on Nvidia Isaac Gym, designed for training locomotion skills in humanoid robots. Notably, it emphasizes zero-shot transfer, enabling skills learned in simulation to be directly applied to real-world environments without additional adjustments.
[https://github.com/roboterax/humanoid-gym GitHub]
== Demo ==
This codebase is verified by RobotEra's XBot-S (1.2 meter tall humanoid robot) and XBot-L (1.65 meter tall humanoid robot) in a real-world environment with zero-shot sim-to-real transfer.
== Features ==
=== 1. Humanoid Robot Training ===
This repository offers comprehensive guidance and scripts for the training of humanoid robots. Humanoid-Gym features specialized rewards for humanoid robots, simplifying the difficulty of sim-to-real transfer. In this repository, RobotEra's XBot-L is used as a primary example, but it can also be used for other robots with minimal adjustments. Resources cover setup, configuration, and execution, aiming to fully prepare the robot for real-world locomotion by providing in-depth training and optimization.
* Comprehensive Training Guidelines: Thorough walkthroughs for each stage of the training process.
* Step-by-Step Configuration Instructions: Clear and succinct guidance ensuring an efficient setup process.
* Execution Scripts for Easy Deployment: Pre-prepared scripts to streamline the training workflow.
=== 2. Sim2Sim Support ===
Humanoid-Gym includes a sim2sim pipeline, allowing the transfer of trained policies to highly accurate and carefully designed simulated environments. Once the robot is acquired, the RL-trained policies can be confidently deployed in real-world settings.
Simulator settings, particularly with Mujoco, are finely tuned to closely mimic real-world scenarios. This careful calibration ensures that performances in both simulated and real-world environments are closely aligned, making simulations more trustworthy and enhancing their applicability to real-world scenarios.
=== 3. Denoising World Model Learning (Coming Soon!) ===
Denoising World Model Learning (DWL) presents an advanced sim-to-real framework integrating state estimation and system identification. This dual-method approach ensures the robot's learning and adaptation are both practical and effective in real-world contexts.
* Enhanced Sim-to-real Adaptability: Techniques to optimize the robot's transition from simulated to real environments.
* Improved State Estimation Capabilities: Advanced tools for precise and reliable state analysis.
=== Dexterous Hand Manipulation (Coming Soon!) ===
== Installation ==
Generate a new Python virtual environment with Python 3.8:
<code>
conda create -n myenv python=3.8
</code>
For best performance, it is recommended to use NVIDIA driver version 525:
<code>
sudo apt install nvidia-driver-525
</code>
The minimal driver version supported is 515. If unable to install version 525, ensure that the system has at least version 515 to maintain basic functionality.
Install PyTorch 1.13 with Cuda-11.7:
<code>
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
</code>
Install numpy-1.23:
<code>
conda install numpy=1.23
</code>
Install Isaac Gym:
Download and install Isaac Gym Preview 4 from [https://developer.nvidia.com/isaac-gym].
<code>
cd isaacgym/python && pip install -e .
</code>
Run an example:
<code>
cd examples && python 1080_balls_of_solitude.py
</code>
Consult isaacgym/docs/index.html for troubleshooting.
Install humanoid-gym:
Clone this repository:
<code>
cd humanoid-gym && pip install -e .
</code>
== Usage Guide ==
=== Examples ===
# Launching PPO Policy Training for 'v1' Across 4096 Environments
<code>
python scripts/train.py --task=humanoid_ppo --run_name v1 --headless --num_envs 4096
</code>
# Evaluating the Trained PPO Policy 'v1'
<code>
python scripts/play.py --task=humanoid_ppo --run_name v1
</code>
# Implementing Simulation-to-Simulation Model Transformation
<code>
python scripts/sim2sim.py --load_model /path/to/logs/XBot_ppo/exported/policies/policy_1.pt
</code>
# Run our trained policy
<code>
python scripts/sim2sim.py --load_model /path/to/logs/XBot_ppo/exported/policies/policy_example.pt
</code>
=== 1. Default Tasks ===
* humanoid_ppo
* Purpose: Baseline, PPO policy, Multi-frame low-level control
* Observation Space: Variable dimensions, where is the number of frames
* Privileged Information: Dimensions
* humanoid_dwl (coming soon)
=== 2. PPO Policy ===
* Training Command:
<code>
python humanoid/scripts/train.py --task=humanoid_ppo --load_run log_file_path --name run_name
</code>
* Running a Trained Policy:
<code>
python humanoid/scripts/play.py --task=humanoid_ppo --load_run log_file_path --name run_name
</code>
By default, the latest model of the last run from the experiment folder is loaded. Other run iterations/models can be selected by adjusting load_run and checkpoint in the training config.
=== 3. Sim-to-sim ===
Before initiating the sim-to-sim process, ensure that play.py is run to export a JIT policy.
Mujoco-based Sim2Sim Deployment:
<code>
python scripts/sim2sim.py --load_model /path/to/export/model.pt
</code>
=== 4. Parameters ===
* CPU and GPU Usage: To run simulations on the CPU, set both --sim_device=cpu and --rl_device=cpu. For GPU operations, specify --sim_device=cuda:{0,1,2...} and --rl_device={0,1,2...} accordingly. Note that CUDA_VISIBLE_DEVICES is not applicable, and it's essential to match the --sim_device and --rl_device settings.
* Headless Operation: Include --headless for operations without rendering.
* Rendering Control: Press 'v' to toggle rendering during training.
* Policy Location: Trained policies are saved in humanoid/logs/<experiment_name>/<date_time>_<run_name>/model_<iteration>.pt.
=== 5. Command-Line Arguments ===
For RL training, refer to humanoid/utils/helpers.py#L161. For the sim-to-sim process, refer to humanoid/scripts/sim2sim.py#L169.
== Code Structure ==
Every environment hinges on an env file (legged_robot.py) and a configuration file (legged_robot_config.py). The latter houses two classes: LeggedRobotCfg (encompassing all environmental parameters) and LeggedRobotCfgPPO (denoting all training parameters). Both env and config classes use inheritance. Non-zero reward scales specified in cfg contribute a function of the corresponding name to the sum-total reward.
Tasks must be registered with task_registry.register(name, EnvClass, EnvConfig, TrainConfig). Registration may occur within envs/__init__.py, or outside of this repository.