AirDet

Abstract

Few-shot object detection has drawn increasing attention in the field of robotic exploration, where robots are required to find unseen objects with a few online provided examples. Despite recent efforts have been made to yield online processing capabilities, slow inference speeds of low-powered robots fail to meet the demands of real-time detection-making them impractical for autonomous exploration.

Existing methods still face performance and efficiency challenges, mainly due to unreliable features and exhaustive class loops.

In this work, we propose a new paradigm AirShot, and discover that, by fully exploiting the valuable correlation map, AirShot can result in a more robust and faster few-shot object detection system, which is more applicable to robotics community. The core module Top Prediction Filter (TPF) can operate on multi-scale correlation maps in both the training and inference stages. During training, TPF supervises the generation of a more representative correlation map, while during inference, it reduces looping iterations by selecting top-ranked classes, thus cutting down on computational costs with better performance. Surprisingly, this dual functionality exhibits general effectiveness and efficiency on various off-the-shelf models.

Exhaustive experiments on COCO2017, VOC2014, and SubT datasets demonstrate that TPF can significantly boost the efficacy and efficiency of most off-the-shelf models, achieving up to 36.4% precision improvements along with 56.3% faster inference speed. Pre-trained model, Code and Data are also released.

Contribution

We proposed a new model AirShot that fully exploits the valuable knowledge in the correlation map. Its core module TPF, a plug-and-play design, also works generally for various FSOD models [1], [8]. The efficiency and effectiveness brought by TPF offer substantial advancements in robot autonomous exploration tasks.
We comprehensively tested AirShot across two widely used datasets, MS-COCO and Pascal VOC. AirShot yields a 36.4% performance improvement and a 56.28% reduction in computational costs, demonstrating a new state-of-the-art (SOTA) in the field of FSOD that requires no fine-tuning.
To demonstrate its effectiveness for real-world environments, we also tested our system with a challenging FSOD dataset collected from the DARPA Subterranean (SubT) Challenge by our team.

Method

(The left are our improved pipeline and right is the results among fine-tuning free models)

The pipeline of the autonomous exploration task and the framework of AirShot. During exploration, a few prior raw images that potentially contain novel objects (helmet) are sent to a human user first. Provided with online annotated few-shot data, the robot explorer is able to detect those objects by observing its surrounding environment.

AirShot first needs offline data for base training, where Top Prediction Filter (TPF) learns to infer the existence directly from correlation map. During exploration, the inference is applied only to the top predictions filtered by TPF (original 20), enabling a lightweight inference strategy with few precision drop.

Selected Qualitative Results

3-shot Detection in SubT Challenge

The provided support images and detection results in the real-world tests. AirShot is robust to distinct object scales and different illumination conditions.

For more results please refer to our offical Paper.

BibTeX

@inproceedings{wang2024airshot,
      title = {{AirShot}: Efficient Few-Shot Detection for Autonomous Exploration},
      author = {Wang, Zihan and Li, Bowen and Wang, Chen and Scherer, Sebastian},
      booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
      year = {2024},
      url = {https://arxiv.org/pdf/2404.05069.pdf}
    }

Acknowledgements

The work was done when Zihan Wang was interns at AirLab in Robotics Institute, CMU. The authors would like to thank all members of the Team Explorer for providing data collected from the DARPA Subterranean Challenge. Our code is built upon FewX, for which we sincerely express our gratitute to the authors.

AirShot

Efficient Few-Shot Detection for Autonomous Exploration

IROS 2024

Given 3-shot support images, AirShot can directly detect the novel objects without fine-tuning.

(The left are few-shot annotations online provided by user and right is the detection results)

More robust and effective(play in fixed fps).

AirShot shows substantial improvement regarding both the efficiency and effectiveness(play in relative fps).