BUOL: A Bottom-Up Framework with Occupancy-aware Lifting for Panoptic 3D Scene Reconstruction From A Single Image
This is an official release of the paper BUOL: A Bottom-Up Framework with Occupancy-aware Lifting for Panoptic 3D Scene Reconstruction From A Single Image
BUOL: A Bottom-Up Framework with Occupancy-aware Lifting for Panoptic 3D Scene Reconstruction From A Single Image
Tao Chu, Pan Zhang, Qiong Liu, Jiaqi Wang
The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR 2023)
The results of BUOL on each dataset are shown below. We have released the models.
| dataset | PRQ | RSQ | RRQ | PRQ_th | PRQ_st | Download |
|---|---|---|---|---|---|---|
| 3D FRONT | 54.05 | 63.72 | 83.14 | 49.77 | 73.34 | front3d.pth |
| Matterport3D | 14.54 | 45.91 | 31.08 | 11.02 | 25.09 | matterport3d.pth |
Creat environment.
conda create -n buol -y conda activate buol conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge -yInstall MinkowskiEngine.
git clone https://github.com/NVIDIA/MinkowskiEngine.git cd MinkowskiEngine python setup.py install --blas_include_dirs=${CONDA_PREFIX}/include --blas=openblas --force_cudaInstall PyMarchingCubes.
git clone https://github.com/xheon/PyMarchingCubes.git cd PyMarchingCubes git clone https://gitlab.com/libeigen/eigen.git python setup.py installInstall other dependency packages.
pip install yacs fvcore pip install opencv-python conda install -c conda-forge openexr-python -y pip install pyexr pip install matplotlib pip install plyfile pip install loguru pip install scipyDownload front3d.pth and put it at models/front3d.pth, and run:
python demo.pyDownload datasets and put them in datasets/<dataset_name> as the following structure, and then set GPUS (e.g. GPUS: (0, 1, 2, 3)) and MODEL.EVAL: False in the config file, and train with multi-GPU:
python -m torch.distributed.launch --nproc_per_node=4 main.py --cfg configs/front.yamlDownload the model or train the model, and then set MODEL.WEIGHTS as the model path. Set GPUS: (0,) and MODEL.EVAL: True in the config file, and test with one GPU:
python main.py --cfg configs/front.yamlThe 3D FRONT is a synthetic indoor dataset. We process it the same as Dahnert et al. (Panoptic 3D Scene Reconstruction from a Single RGB Image). You can download or process it from there.
front3d/ <scene_id>/ ├── rgb_<frame_id>.png # Color image: 320x240x3 ├── depth_<frame_id>.exr # Depth image: 320x240x1 ├── segmap_<frame_id>.mapped.npz # 2D Segmentation: 320x240x2, with 0: pre-mapped semantics, 1: instances ├── geometry_<frame_id>.npz # 3D Geometry: 256x256x256x1, truncated, (unsigned) distance field at 3cm voxel resolution and 12 voxel truncation. ├── segmentation_<frame_id>.mapped.npz # 3D Segmentation: 256x256x256x2, with 0: pre-mapped semantics & instances ├── weighting_<frame_id>.mapped.npz # 3D Weighting mask: 256x256x256x1 The Matterport3D is a real-world indoor datasets. We follow Dahnert et al. to preprocess this dataset. In addition, we generate depth and room mask by rendering 3D scenes instead of using the origin version.
matterport/ <scene_id>/ ├── <image_id>_i<frame_id>.png # Color image: 320x240x3 ├── <image_id>_segmap<frame_id>.mapped.npz # 2D Segmentation: 320x240x2, with 0: pre-mapped semantics, 1: instances ├── <image_id>_intrinsics_<camera_id>.png # Intrinsics matrix: 4x4 ├── <image_id>_geometry<frame_id>.npz # 3D Geometry: 256x256x256x1, truncated, (unsigned) distance field at 3cm voxel resolution and 12 voxel truncation. ├── <image_id>_segmentation<frame_id>.mapped.npz # 3D Segmentation: 256x256x256x2, with 0: pre-mapped semantics & instances ├── <image_id>_weighting<frame_id>.npz # 3D Weighting mask: 256x256x256x1 matterport_depth_gen/ <scene_id>/ ├── <posithion_id>_d<frame_id>.png # Depth image: 320x240x1 matterport_room_mask/ <scene_id>/ ├── <posithion_id>_rm<frame_id>.png # room mask: 320x240x1 @inproceedings{chu2023buol, title={BUOL: A Bottom-Up Framework With Occupancy-Aware Lifting for Panoptic 3D Scene Reconstruction From a Single Image}, author={Chu, Tao and Zhang, Pan and Liu, Qiong and Wang, Jiaqi}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={4937--4946}, year={2023} }This project is licensed under the MIT License - see the LICENSE file for details.

