Changes

Jump to: navigation, search
no edit summary
= Robotic Control via Embodied Chain-of-Thought Reasoning =
Embodied Chain-of-Thought Reasoning (ECoT) is a novel approach for training robotic policies. This approach trains a vision-language-action model to generate reasoning steps in response to instructions and images before choosing a robot action, enabling better performance, interpretability, and generalization. The codebase is built on top of OpenVLA. We refer Refer to it for the detailed documentation of the code and dependencies.
== Quickstart ==
We provide a A Colab notebook is provided containing code for loading up our the ECoT policy and using it to generate reasoning and actions in response to an observation. Loading the model for inference is easy:
<code>
from transformers import AutoModelForVision2Seq, AutoProcessor
== Pretrained models ==
We release two Two ECoT models trained as part of our work, and the a dataset of reasonings, are available on our the HuggingFace page:* '''ecot-openvla-7b-bridge''': The main model that we used for most of our the experiments. It was trained on the Bridge dataset annotated with the reasoning for 80k steps.* '''ecot-openvla-7b-oxe''': A policy that was initially trained on the Open-X-Embodiment dataset actions, fine-tuned on the mixture of OXE action-only data and our reasonings for Bridge for another 20k steps.
* '''embodied_features_bridge''': A dataset of the embodied features and reasonings collected for Bridge demonstrations.
=== Explicit Notes on Model Licensing & Commercial Use ===
While all code in this repository is released under an MIT License, our the pretrained models may inherit restrictions from the underlying base models we use. Specifically, both the above models are derived from Llama-2, and as such are subject to the Llama Community License.
== Installation ==
See Refer to the original OpenVLA repository for detailed installation instructions.
== Repository Structure ==
* '''experiments''': Code for evaluating the policies on a WidowX robot.
* '''vla-scripts/''': Core scripts for training, fine-tuning, and deploying VLAs.
* '''LICENSE''': All code is made available under the MIT License; happy hacking!.
* '''Makefile''': Top-level Makefile (by default, supports linting - checking & auto-fix); extend as needed.
* '''pyproject.toml''': Full project configuration details (including dependencies), as well as tool configurations.
== Citation ==
If you find our the code or models are useful in your work, please cite our the paper:
<code>
@article{Zawalski24-ecot,

Navigation menu