Stereo Vision

From Humanoid Robots Wiki
Revision as of 01:52, 13 May 2024 by Vrtnis (talk | contribs)
Jump to: navigation, search

This is a guide for setting up and experimenting with stereo cameras in your projects.

This guide is incomplete and a work in progress; you can help by expanding it!

Choosing the Right Stereo Camera

In the realm of computer vision, selecting an appropriate stereo camera is fundamental. Considerations such as resolution, compatibility, and specific features like Image Signal Processing (ISP) support are paramount.

For example, the Arducam Pivariety 18MP AR1820HS camera module offers high resolution and is compatible with Raspberry Pi models, featuring auto exposure, auto white balance, and lens shading that are crucial for capturing high-quality images under varying lighting conditions.

Implementation and Testing

Setting up and testing stereo cameras can vary based on the project's needs. For example, streaming from a USB stereo camera to a VR headset like the Quest Pro involves addressing challenges such as latency and the processing of hand tracking data.

Utilizing resources like the TeleVision GitHub repository can be invaluable for developers aiming to stream camera feeds efficiently, crucial for applications requiring real-time data such as virtual reality or remote operation environments.

Application Scenarios

Stereo cameras are versatile and can be adapted for numerous applications. For instance, one setup might utilize long cables for full-room scale monitoring, another for 360-degree local vision, and a third for specific stereo vision tasks.

These configurations cater to the unique requirements of each application, whether it’s monitoring large spaces or creating immersive user experiences.

Computational Considerations

When deploying stereo cameras, considering the computational load is crucial. Processing two raw images from stereo pairs might seem redundant, especially if the images are similar.

Techniques like using CLIP-like models for encoding can reduce the need for processing both images in depth, as these models can intuit depth from high-level semantic content, thus conserving computational resources.

Exploring Depth Sensing Techniques

Depth sensing in stereo cameras can be achieved through various technologies. While some utilize stereo disparity, others may incorporate structured light sensors for depth detection.

Understanding the underlying technology is essential for optimizing the setup and ensuring efficient processing, as seen in RealSense cameras, which combine structured light sensing with stereo disparity to provide robust depth information without significant additional computational demands.

Community Insights on Stereo Cameras

The community has shared varied experiences and recommendations on stereo cameras, emphasizing the practical use and applications of different models. Notably, Intel RealSense cameras seem popular among users for their robust software and integration with ROS.

Despite some criticisms about the small baseline of the RealSense cameras, which can limit depth perception, alternatives like the Oak-D camera and Arducam's stereo cameras have been suggested for different needs. Oak-D is praised for its edge computing capabilities and high-quality image processing, while Arducam offers affordable options for Raspberry Pi and NVIDIA platforms.

Additionally, advanced users have discussed using the ZED2 camera for its superior baseline and resolution, and have compared various models for specific needs, such as indoor testing and 3D benchmarking with Kinect and Orbbec Astra cameras. The community also highlights the importance of considering off-the-shelf depth cameras that integrate stereo computation internally to save on development time and effort.

Users have shared their experiences with various depth-sensing technologies across different robotic platforms. For example, the Unitree robotics platforms and the mini-cheetah from MIT have incorporated RealSense cameras for mapping and environmental interaction. The broader field of view afforded by using multiple cameras, as seen with Spot which employs five cameras, is advantageous for comprehensive environmental awareness. However, for specific research applications where movement is predominantly forward, a single camera might suffice. The discussion also highlights the shift towards more sophisticated sensing technologies like solid-state lidars, though cost and weight remain significant considerations. For instance, the CEREBERUS team from the SubTChallenge noted that while lidars provide more efficient depth data than multiple depth cameras, their higher cost and weight could be limiting factors depending on the robotic platform.

This section reflects ongoing discussions and is open for further contributions and updates from the community.

References:

[1] https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9196777

[2] https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8642374

[3] https://arxiv.org/pdf/2201.07067.pdf