Researchers from the University of California and NVIDIA (NVDA, Financial) have introduced a new vision-language model, NaVILA, offering innovative solutions for robot navigation. Unlike traditional methods reliant on pre-drawn maps and complex sensors, NaVILA enables robots to navigate autonomously using natural language instructions, real-time visual images, and lidar data. This model extends navigation capabilities to legged robots, enhancing their ability to tackle complex environments.
Testing with the Go2 robot dog and G1 humanoid robot showed an 88% success rate in real-world settings. NaVILA optimizes accuracy and efficiency, reducing training costs and memory requirements. It handles high-resolution inputs and compresses visual data for effective processing. NaVILA outperformed models like GPT-4o Mini in video benchmarks.