Humanoids driving go-karts

Topic was presented at Munich Datageeks - December Edition 2025

Abstract

The talk presents an innovative project where TNG Technology Consulting's innovation hacking team successfully programmed a Unitree G1 humanoid robot to drive a go-kart. The presentation covers the complete journey from hardware integration to autonomous control, demonstrating how modern humanoid robots have become accessible to developers with Python programming skills. The team addressed multiple technical challenges including stability issues, motor control, steering mechanisms, and reverse functionality through creative solutions like pedal extensions, chain-based steering attachments, and custom FTDI controllers. Beyond manual control, the talk explores autonomous navigation using classical computer vision algorithms for line-following, as the robot's onboard GPU couldn't handle large vision language models with acceptable latency. The presentation also delves into reinforcement learning approaches for teaching robots generalized behaviors in simulation environments, and introduces vision language action models as the future of robot foundation models. The team's work extends beyond go-kart driving to educational applications at the Deutsches Museum, where children can interact with and learn from these humanoid robots.

About the Speaker

Thomas Endres is an innovation hacker at TNG Technology Consulting, where he has been pioneering innovation hacking for the past 12 years. His role involves experimenting with cutting-edge technologies that are not yet in production but may become standard in the next five years. His team's work ranges from early LLM fine-tuning with GPT-2 on social media comments to current robotics projects. He is joined by Daniel, a consultant and research scientist at TNG who works on making robots more intelligent for manufacturing processes, and David, also part of the innovation hacking team at TNG. Together, they focus on exploring emerging technologies and demonstrating their practical applications through hands-on projects like the humanoid go-kart driver.

Transcript Summary

Historical Context of Robotics

The presentation begins with a historical overview of humanoid robotics, tracing back to ancient Greek concepts of human-like mechanical objects. The first actual robot-like mechanisms can be found in the Deutsches Museum in Munich, including the praying monk with movable arms and feet powered entirely by mechanical systems. The evolution continued through the NAO robot, a smaller humanoid that could walk and move its arms and head, though it suffered from stability issues due to its design. Early attempts like Honda's ASIMO robot in 2000 failed dramatically when attempting tasks like stair climbing. Boston Dynamics' Atlas robot represented a significant advancement with its hydraulic motors and exceptional stability, able to withstand impacts while standing on one leg.

Introduction to the Unitree G1 Robot

The Unitree G1 represents a new generation of accessible humanoid robots, costing between 20,000 and 50,000 euros, comparable to a small car rather than a house. The robot provides Python, C++, and ROS SDKs, making it accessible to anyone with programming skills. The G1 features sophisticated motor control across multiple degrees of freedom, including feet for acceleration and braking, stable legs that can shift body weight and raise individual limbs, a compact hip requiring adaptation for adult-sized equipment, and arms capable of lifting significant weight (tested with 12 bottles of water). The robot contains a relatively small battery in its chest that powers all operations and includes an onboard NVIDIA Jetson GPU for neural network inference.

Hardware Integration for Go-Kart Driving

The team employed several creative solutions to adapt the G1 for go-kart operation. Since the robot's hip is smaller than an adult human's, they used children's car seats and pedal extensions from a go-kart shop. For steering, they developed a chain-based system that attached to the robot's wrists rather than relying on the fragile finger mechanisms, allowing safe arm movement to control the steering wheel without risk of breaking the delicate metal plates in the hands. The team faced maintenance challenges when motors operated near their breaking point, requiring replacement of a wrist component that involved cutting wires, soldering, and recalibration.

Custom Reverse Control System

Traditional go-karts require pressing a physical reverse button, impossible when the robot's hands are chained to the steering wheel. The team built a custom solution using an FTDI controller, described as an advanced version of Arduino, which allowed programmatic control of pins connected to a relay. This relay integrated with the go-kart's circuit to electronically switch between forward and reverse modes, controlled directly from the laptop managing the robot. This was the only modification made to the go-kart itself.

The robot's head contains an RGB camera and a LIDAR depth perception camera for environmental observation. The team initially attempted to use vision language models for autonomous decision-making, such as following a blue line on the floor. However, small models that could run on the robot's GPU were insufficiently accurate, while larger models couldn't run locally. Cloud-based solutions using GPU clusters suffered from excessive latency. Instead, they implemented a classical computer vision approach with three steps: filtering pixels to identify only blue ones for line detection, calculating the line's angle, using depth perception or QR codes to identify the go-kart's front orientation, and employing a P controller to regulate turning based on comparing the line position to the go-kart's direction.

Reinforcement Learning in Simulation

To teach the robot new behaviors like walking without risking the expensive hardware, the team utilized simulation environments. Using the Isaac Gym platform with predefined G1 assets, they trained hundreds of virtual robots simultaneously through reinforcement learning. The robots received rewards for desired behaviors (moving forward, staying upright) and punishments for failures. Initial attempts resulted in falling and ineffective jumping motions, but after approximately two hours of training on an RTX 4090 GPU, the robots began developing effective walking gaits. After several more hours of training, the behavior stabilized sufficiently for transfer to the real robot via the onboard NVIDIA Jetson.

Data Capture and Teleoperation

For capturing training data, the team employed two methods. The simpler approach involved physically manipulating the robot to demonstrate desired movements, which the robot would mimic. The more sophisticated method used an Apple Vision Pro headset (adding 4,000 euros to the project cost) that allowed operators to see through the robot's Intel RealSense camera while mapping human limb positions to robot motor angles, enabling precise teleoperation and data collection.

Vision Language Action Models

The presentation introduced vision language action models as the future of generalized robot intelligence, exemplified by NVIDIA's Groot 1.6 released on the day of the talk. These foundation models accept three inputs: motor states, camera footage for environmental awareness, and natural language prompts. Unlike traditional LLMs, these models are smaller (around 3 billion parameters) to enable real-time operation with multiple inferences per second. The architecture consists of two stages: a vision language model that processes high-level understanding and a second stage that translates this understanding into specific motor movements, enabling robots to perform complex tasks like shelf stocking based on simple verbal instructions.

Educational Applications and Future Directions

The team deployed the G1 at the Deutsches Museum as an educational platform, leveraging the robot's stability to safely interact with children. Applications include giving go-kart rides outside the museum and conducting interactive tours inside the museum using a custom trailer attached to the go-kart. The robot's integrated loudspeaker and microphone enable natural conversations when combined with text-to-speech and speech-to-text systems. Future plans include continued work with robot foundation models, expanding to industrial applications with a UR5 robot from Universal Robotics for chess playing, and exploring leader-follower cobot systems for tasks like laundry folding.