Figure’s Humanoid Robot Just Got Smarter and Faster
Figure, a company developing humanoid robots for commercial use, has introduced a real-world logistics application that puts its latest technology to the test. The robots, powered by Figure’s advanced Vision-Language-Action model called Helix, are now being trained to handle one of the most complex tasks in logistics: sorting and manipulating packages on moving conveyor belts. This new use case pushes the boundaries of robotic precision, adaptability, and speed, especially in unpredictable warehouse environments.
The foundation of this breakthrough lies in Helix System 1, Figure’s low-level visuo-motor control policy. The newest updates include several major improvements designed to enhance performance. The robot now understands depth through implicit stereo vision, allowing for more accurate 3D movement. It also leverages multi-scale visual representation, capturing both fine details and larger context within the scene. Learned visual proprioception means each robot can calibrate itself without manual tuning, ensuring smooth deployment across different machines. One of the standout features is a “sport mode” that enables faster-than-human execution using a clever test-time technique without needing retraining.
To train this behavior, Figure curated just 8 hours of high-quality demonstration data. By excluding failed or slow attempts and including moments of corrective action, the company ensured the robot learned not just what to do, but how to recover when things go wrong. The result is a policy that is both fast and flexible.
Sorting packages may sound simple, but the challenge grows when dealing with items that vary in size, shape, weight, and rigidity—from firm boxes to soft, deformable bags. These need to be moved between belts and reoriented so that labels face the correct direction for scanners. The robots must also adapt in real time to the constant flow of packages and unexpected situations. Helix’s multi-scale stereo vision significantly boosts performance, with some scenarios seeing a 60% improvement in throughput over models without stereo input.
Moreover, Figure’s self-calibrating system enables the same policy to work across multiple robots, even with small differences in hardware and sensors. This advancement drastically reduces the time and effort needed for individual calibration and helps scale deployment more efficiently.
Another key learning is that data quality matters more than quantity. Models trained on curated demonstrations achieved up to 40% better throughput even with 33% less data. The “sport mode” proved to be highly effective too, with robots reaching up to a 50% speed boost while still maintaining precision—though beyond that, performance begins to drop as motions become too aggressive.
With this logistics use case, Figure demonstrates how far end-to-end visuo-motor systems have come. Their combination of stereo vision, learned calibration, and efficient training techniques creates a solid foundation for future real-world deployment across industries.

Submit a Comment