Changes

← Older edit

Robotic Control via Embodied Chain-of-Thought Reasoning

204 bytes added, 20:05, 18 July 2024

no edit summary

= Robotic Control via Embodied Chain-of-Thought Reasoning =

Embodied Chain-of-Thought Reasoning (ECoT) is a novel approach for training robotic policies. This approach trains a vision-language-action model to generate reasoning steps in response to instructions and images before choosing a robot action, enabling better performance, interpretability, and generalization. The codebase is built on top of OpenVLA. ~~We refer~~ Refer to it for ~~the~~ detailed documentation of the code and dependencies.

== Quickstart ==

~~We provide a~~ A Colab notebook is provided containing code for loading ~~up our~~ the ECoT policy and using it to generate reasoning and actions in response to an observation. Loading the model for inference is easy:

<code>

from transformers import AutoModelForVision2Seq, AutoProcessor

== Pretrained models ==

~~We release two~~ Two ECoT models ~~trained as part of our work,~~ and ~~the~~ a dataset of reasonings, are available on ~~our~~ the HuggingFace page:* '''ecot-openvla-7b-bridge''': The main model ~~that we~~ used for most of ~~our~~ the experiments. It was trained on the Bridge dataset annotated with the reasoning for 80k steps.* '''ecot-openvla-7b-oxe''': A policy ~~that was~~ initially trained on the Open-X-Embodiment dataset actions, fine-tuned on the mixture of OXE action-only data and ~~our~~ reasonings for Bridge for another 20k steps.

* '''embodied_features_bridge''': A dataset of the embodied features and reasonings collected for Bridge demonstrations.

=== Explicit Notes on Model Licensing & Commercial Use ===

While all code in this repository is released under an MIT License, ~~our~~ the pretrained models may inherit restrictions from the underlying base models ~~we use~~. Specifically, both ~~the above~~ models are derived from Llama-2, and ~~as such~~ are subject to the Llama Community License.

== Installation ==

~~See~~ Refer to the original OpenVLA repository for detailed installation instructions.

== Repository Structure ==

* '''experiments''': Code for evaluating the policies on a WidowX robot.

* '''vla-scripts/''': Core scripts for training, fine-tuning, and deploying VLAs.

* '''LICENSE''': All code is made available under the MIT License~~; happy hacking!~~.

* '''Makefile''': Top-level Makefile (by default, supports linting - checking & auto-fix); extend as needed.

* '''pyproject.toml''': Full project configuration details (including dependencies), as well as tool configurations.

== Citation ==

If ~~you find our~~ the code or models are useful in your work, please cite ~~our~~ the paper:

<code>

@article{Zawalski24-ecot,

Vrtnis

Administrators

467

edits

Changes

Robotic Control via Embodied Chain-of-Thought Reasoning

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools