Starship is building a fleet of robots to deliver packages locally on demand. To achieve this successfully, robots must be safe, polite and fast. But how do you get there without less computational resources and expensive sensors like LIDARs? This is an engineering reality that you have to deal with unless you live in a universe where customers gladly pay 100 for delivery.
Initially, robots began sensing the world with the help of radar, multiple cameras, and ultrasonics.
The challenge, however, is that most of this knowledge is low-level and non-semantic. For example, a robot realizes that an object is ten meters away, yet it is difficult to make a safe driving decision without knowing the classification of the object.
Machine learning through neural networks is surprisingly useful in converting this intact low-level information into high-level information.
Starship robots mostly drive on sidewalks and sidewalks when they need to. This creates a different challenge than self-driving cars. Traffic on car roads is more structured and predictable. Cars move along the alley and do not change direction very often where people stop frequently, in the middle, may be in a leash with a dog, and do not signal their purpose with turn signal lights.
To understand the surroundings in real time, a central component of the robot is an object detection module – a program that inputs images and provides a list of object boxes.
It’s all very well, but how do you write such a program?
An image is a large three-dimensional array consisting of a myriad of numbers representing pixel intensities. These values vary significantly when the image is taken at night instead of during the day; When the color, scale or position of the object changes, or when the object itself is cut off or closed.
For some complex problems, teaching is more normal than programming.
In robot software, we have a set of trainable units, mostly neural networks, where the code is written by the model itself. The program is represented by a set of weights.
At first, these numbers start randomly and the output of the program is also random. Engineers present model examples of what they want to predict and will tell the network to get better the next time they see the same input. By repeatedly changing the weight, the optimization algorithm searches for programs that more accurately predict bonding boxes.
However, the examples that are used to train the model need to think deeply.
- Should the model be punished or rewarded for identifying the car in the window reflection?
- What will it do if you identify pictures of people from posters?
- Should the trailer of a car full of cars be vaccinated as an entity or should each car be vaccinated separately?
These are all examples of what happened during the creation of the object detection module in our robots.
When teaching a machine, big data is simply not enough. The information collected must be rich and varied. For example, using only uniformly patterned images and then annotating them, will show many pedestrians and cars, yet the model will lack examples of motorcycles or skaters to reliably identify these categories.
The team has to mine particularly difficult examples and rare cases, otherwise the model will not progress. Starship operates in different countries and enriches examples of different weather conditions. Many were surprised when the starship delivery robots were operated during the snowstorm ‘Emma’ In the UK, However, the airport and school were closed.
At the same time, data annotation takes time and resources. Ideally, it is best to train and improve models with less data. This is where architectural engineering comes into play. We encode prior knowledge of the architecture and optimization process to reduce search space for potential programs in the real world.
In some computer vision applications such as pixel-based partitions, it is important for the model to know whether the robot is on the sidewalk or at the crossroads. To provide an indication, we encode global image-level clues in the neural network architecture; The model then learns from scratch by deciding whether to use it.
After data and architecture engineering, the model can work better. However, deep learning models require significant amounts of computing power, and this is a big challenge for the team because we cannot take advantage of the most powerful graphics cards in battery-powered low-cost delivery robots.
Starship wants our delivery to be low cost which means our hardware must be cheap. The same reason why Starship does not use LIDARs (an identification system that works on radar principles, but uses light from lasers) will make the world much easier to understand – but we don’t want our customers to pay as much as they need for delivery.
The state-of-the-art object detection system published in academic papers runs at about 5 frames per second. [MaskRCNN], And real-time object detection papers do not report significant rates above 100 FPS [Light-Head R-CNN, tiny-YOLO, tiny-DSOD]. What’s more, these numbers are reported in a single image; However, we need a 360-degree understanding (equivalent to about 5 single image processing).
To provide an overview, Starship models run more than 2000 FPS when measured in a consumer-grade GPU and process a complete 360-degree panorama image on a forward pass. This is equivalent to 10,000 FPS when processing 5 single images including batch size 1.
Neural networks are better than humans in many visual problems, although they may still have bugs. For example, a border box may be too wide, confidence too low, or an object Hallucinated In a place that is actually empty.
Fixing these bugs is challenging.
Neural networks are considered black boxes that are difficult to analyze and understand. However, to improve the model, engineers need to understand the phenomena of failure and dive deeper into the specificities of what the model has learned.
The model is represented by a set of weights and can visualize what each specific neuron is trying to detect. For example, the first layers of the starship network are activated in standard patterns such as horizontal and vertical edges. Subsequent blocks of layers identify more complex textures, while higher layers identify vehicle parts and whole objects.
Technical debt takes on another meaning with machine learning models. Engineers are constantly improving the architecture, optimization process and dataset. As a result the model became more accurate. Nevertheless, modifying the detection model for better does not guarantee success in the robot’s overall behavior.
There are dozens of components that use the output of the object detection model, each of which requires a different accuracy and recall level that is set based on the existing model. However, the new model may work differently in different ways. For example, the output potential distribution may be biased or larger than the larger value. While average performance is good, it can be worse for a certain group like big cars. To avoid these constraints, the team calibrates the probabilities and tests the regression in a multi-layered data set.
Monitoring trainable software components poses a different challenge than standard software monitoring. There is little concern about estimated time or memory usage, as these are mostly constant.
However, the dataset shift becomes the primary concern – the data distribution model used to train the model differs from where it is currently deployed.
For example, an electric scooter may suddenly run on the sidewalk. If the model does not consider this category, it will take a long time to classify the model correctly. The information obtained from the Object Detection module will not agree with other sensitive information, which will require the help of human operators and thus reduce the delivery.
Neural networks give Starship robots the ability to stay safe at road crossings by avoiding obstacles like cars, and on the pavement by understanding the different aspects that people and other obstacles can choose.
Starship robots achieve this using cheap hardware that faces many engineering challenges but makes robot delivery a powerful reality. Starship’s robots are delivering the original seven days a week in multiple cities around the world, and are delighted to see how our technology continues to bring additional benefits to people’s lives.