Open main menu

Humanoid Robots Wiki β

Changes

Prismatic VLM REPL

1,780 bytes added, 00:51, 21 June 2024
no edit summary
The K-Scale OpenVLA adaptation by [[Userhttps:Paweł]//github.com/TRI-ML/prismatic-vlms Prismatic VLM] is the project upon which OpenVLA is based. The generate.py REPL script is available in the OpenVLA repo as well but is essentially using Prismatic models. is at Note that the Prismatic models generate natural language whereas OpenVLA models were trained to generate robot actions. (see this https://github.com/kscalelabsopenvla/openvla /issues/5).
== REPL Script Guide ==Of note, the K-Scale OpenVLA adaptation by [[User:Paweł]] is at https://github.com/kscalelabs/openvla
== Prismatic REPL Script Guide == Here are some suggestions to run the generate.py REPL Script from the repo if (you would like to get started with OpenVLAcan find this in the '''scripts''' folder).
== Prerequisites ==
Make sure the images have an end effector in them.
[[File:Coke can2.png|400px|Can pickup task]]
 
== Starting REPL mode ==
 
Then, run generate.py. The script starts by initializing the generation playground with the Prismatic model prism-dinosiglip+7b.
 
The model prism-dinosiglip+7b is downloaded from the Hugging Face Hub.
 
The model configuration is found and then the model is loaded with the following components:
 
Vision Backbone: dinosiglip-vit-so-384px
 
Language Model (LLM) Backbone: llama2-7b-pure (this is also where the hf token comes into play)
 
Architecture Specifier: no-align+fused-gelu-mlp
 
Checkpoint Path: The model checkpoint is loaded from a specific path in the cache.
 
You should see this in your terminal:
 
[[File:Openvla1.png|800px|prismatic models]]
 
 
 
''After loading the model, the script enters a REPL mode, allowing the user to interact with the model. The REPL mode provides a default generation setup and waits for user inputs.''
 
Basically, the generate.py script runs a REPL that allows users to interactively test generating outputs from the Prismatic model prism-dinosiglip+7b. Upon running the script, users can enter commands in the REPL prompt:
''work in progresstype (i) to load a new local image by specifying its path, (p) to update the prompt template for generating outputs, (q) to quit the REPL,need or directly input a prompt to add screenshots generate a response based on the loaded image and next steps''the specified prompt. [[File:Prismatic chat1.png|800px|prismatic chat]]