Few-shot object detection has drawn increasing attention in the field of robotic exploration, where robots are required to find unseen objects with a few online provided examples. Despite recent efforts have been made to yield online processing capabilities, slow inference speeds of low-powered robots fail to meet the demands of real-time detection-making them impractical for autonomous exploration.
Existing methods still face performance and efficiency challenges, mainly due to unreliable features and exhaustive class loops.
In this work, we propose a new paradigm AirShot, and discover that, by fully exploiting the valuable correlation map, AirShot can result in a more robust and faster few-shot object detection system, which is more applicable to robotics community. The core module Top Prediction Filter (TPF) can operate on multi-scale correlation maps in both the training and inference stages. During training, TPF supervises the generation of a more representative correlation map, while during inference, it reduces looping iterations by selecting top-ranked classes, thus cutting down on computational costs with better performance. Surprisingly, this dual functionality exhibits general effectiveness and efficiency on various off-the-shelf models.
Exhaustive experiments on COCO2017, VOC2014, and SubT datasets demonstrate that TPF can significantly boost the efficacy and efficiency of most off-the-shelf models, achieving up to 36.4% precision improvements along with 56.3% faster inference speed. Pre-trained model, Code and Data are also released.
The pipeline of the autonomous exploration task and the framework of AirShot. During exploration, a few prior raw images that potentially contain novel objects (helmet) are sent to a human user first. Provided with online annotated few-shot data, the robot explorer is able to detect those objects by observing its surrounding environment.
AirShot first needs offline data for base training, where Top Prediction Filter (TPF) learns to infer the existence directly from correlation map. During exploration, the inference is applied only to the top predictions filtered by TPF (original 20), enabling a lightweight inference strategy with few precision drop.
 
The provided support images and detection results in the real-world tests. AirShot is robust to distinct object scales and different illumination conditions.
 
 
@inproceedings{wang2024airshot,
title = {{AirShot}: Efficient Few-Shot Detection for Autonomous Exploration},
author = {Wang, Zihan and Li, Bowen and Wang, Chen and Scherer, Sebastian},
booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
year = {2024},
url = {https://arxiv.org/pdf/2404.05069.pdf}
}