Open main menu

Humanoid Robots Wiki β

Changes

Prismatic VLM REPL

778 bytes added, 20 June
no edit summary
Make sure the images have an end effector in them.
 
== Starting REPL mode ==
 
Then, run generate.py. The script starts by initializing the generation playground with the Prismatic model prism-dinosiglip+7b.
 
The model prism-dinosiglip+7b is downloaded from the Hugging Face Hub.
 
The model configuration is found and then the model is loaded with the following components:
 
Vision Backbone: dinosiglip-vit-so-384px
 
Language Model (LLM) Backbone: llama2-7b-pure (this is also where the hf token comes into play)
 
Architecture Specifier: no-align+fused-gelu-mlp
 
Checkpoint Path: The model checkpoint is loaded from a specific path in the cache.
 
''After loading the model, the script enters a REPL mode, allowing the user to interact with the model. The REPL mode provides a default generation setup and waits for user inputs.''
''work in progress,need to add screenshots and next steps''