(a) NBV methods with a depth sensor (e.g., SCONE by Guédon et al.) (b) Our approach MACARONS with an RGB sensor
We introduce a method that simultaneously learns to explore new large environments and to reconstruct them in 3D from color images in a self-supervised fashion. This is closely related to the Next Best View problem (NBV), where one has to identify where to move the camera next to improve the coverage of an unknown scene. However, most of the current NBV methods rely on depth sensors, need 3D supervision and/or do not scale to large scenes.
In this paper, we propose the first deep-learning-based NBV approach for dense reconstruction of large 3D scenes from RGB images. We call this approach MACARONS, for Mapping And Coverage Anticipation with RGB Online Self-Supervision. Moreover, we provide a dedicated training procedure for online learning for scene mapping and automated exploration based on coverage optimization in any kind of environment, with no explicit 3D supervision. Consequently, our approach is also the first NBV method to learn in real-time to reconstruct and explore arbitrarily large scenes in a self-supervised fashion.
Indeed, MACARONS simultaneously learns to predict a "volume occupancy field" from color images and, from this field, to predict the NBV. We experimentally show that this greatly improves results for NBV exploration of 3D scenes. In particular, we demonstrate this on a recent dataset made of various 3D scenes and show it performs even better than recent methods requiring a depth sensor, which is not a realistic assumption for outdoor scenes captured with a flying drone. It makes our approach suitable for real-life applications on small drones with a simple color camera. More fundamentally, it shows that an autonomous system can learn to explore and reconstruct environments without any 3D information a priori.
This video illustrates how MACARONS explores and reconstructs efficiently a subset of three large 3D scenes. In particular, the video shows several key-elements of our approach:
MACARONS simultaneously reconstructs the scene and selects the next best camera pose by running three neural modules:
During online exploration, we perform a training iteration at each time step t which consists in three steps.
@inproceedings{guedon2023macarons, title={MACARONS: Mapping And Coverage Anticipation with RGB Online Self-Supervision}, author={Gu{\'e}don, Antoine and Monnier, Tom and Monasse, Pascal and Lepetit, Vincent}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={940--951}, year={2023} } This work was granted access to the HPC resources of IDRIS under the allocation 2022-AD011013387 made by GENCI. We thank Elliot Vincent for inspiring discussions and valuable feedback on the manuscript.
The template webpage is inspired from the (awesome) work UNICORN (ECCV 2022) by Monnier et al.
© You are welcome to copy the code, please attribute the source with a link back to this page and the UNICORN webpage by Tom Monnier and remove the analytics.
Possible misspellings: tom monier, tom monnie, tom monie, monniert.