Difference between revisions of "Prismatic VLM REPL"
Line 32: | Line 32: | ||
Make sure the images have an end effector in them. | Make sure the images have an end effector in them. | ||
+ | [[File:Coke can2.png|800px|Can pickup task]] | ||
+ | |||
== Starting REPL mode == | == Starting REPL mode == |
Revision as of 23:46, 20 June 2024
The K-Scale OpenVLA adaptation by User:Paweł is at https://github.com/kscalelabs/openvla
Contents
REPL Script Guide
Here are some suggestions to run the generate.py REPL Script from the repo (you can find this in the scripts folder) if you would like to get started with OpenVLA.
Prerequisites
Before running the script, ensure you have the following:
- Python 3.8 or higher installed
- NVIDIA GPU with CUDA support (optional but recommended for faster processing)
- Hugging Face account and token for accessing Meta Lllama
Setting Up the Environment
In addition to installing requirements-min.txt from the repo, you probably need to install rich, tensorflow_graphics, tensorflow-datasets and dlimp.
Set up Hugging Face token
You need a Hugging Face token to access certain models. Create a .hf_token file thats needed by the script.
Create a file named `.hf_token` in the root directory of your project and add your Hugging Face token to this file:
echo "your_hugging_face_token" > .hf_token
Sample Images for generate.py REPL
You can get these by capturing frames or screenshotting rollout videos from
https://openvla.github.io/
Make sure the images have an end effector in them.
Starting REPL mode
Then, run generate.py. The script starts by initializing the generation playground with the Prismatic model prism-dinosiglip+7b.
The model prism-dinosiglip+7b is downloaded from the Hugging Face Hub.
The model configuration is found and then the model is loaded with the following components:
Vision Backbone: dinosiglip-vit-so-384px
Language Model (LLM) Backbone: llama2-7b-pure (this is also where the hf token comes into play)
Architecture Specifier: no-align+fused-gelu-mlp
Checkpoint Path: The model checkpoint is loaded from a specific path in the cache.
You should see this in your terminal:
After loading the model, the script enters a REPL mode, allowing the user to interact with the model. The REPL mode provides a default generation setup and waits for user inputs.
work in progress,need to add screenshots and next steps