This is the official repository for our CVPR 2024 paper RoDLA:Benchmarking the Robustness of Document Layout Analysis Models. For more result and benchmarking details, please visit our project homepage.
We introduce RoDLA that aims to benchmark the robustness of Document Layout Analysis (DLA) models. RoDLA is a large-scale benchmark that contains 450,000+ documents with diverse layouts and contents. We also provide a set of evaluation metrics to facilitate the comparison of different DLA models. We hope that RoDLA can serve as a standard benchmark for the robustness evaluation of DLA models.
- Perturbation Benchmark Dataset
- PubLayNet-P
- DocLayNet-P
- M6Doc-P
- Perturbation Generation and Evaluation Code
- RoDLA Model Checkpoints
- RoDLA Model Training Code
- RoDLA Model Evaluation Code
1. Clone the repository
git clone https://github.com/yufanchen96/RoDLA.git cd RoDLA 2. Create a conda virtual environment
# create virtual environment conda create -n RoDLA python=3.7 -y conda activate RoDLA 3. Install benchmark dependencies
- Install Basic Dependencies
pip install torch==1.10.2+cu113 torchvision==0.11.3+cu113 torchaudio==0.10.2+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html python -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu113/torch1.10/index.html pip install -U openmim mim install mmcv-full==1.5.0 pip install timm==0.6.11 mmdet==2.28.1 pip install Pillow==9.5.0 pip install opencv-python termcolor yacs pyyaml scipy - Install ocrodeg Dependencies
git clone https://github.com/NVlabs/ocrodeg.git cd ./ocrodeg pip install -e . - Compile CUDA operators
cd ./model/ops_dcnv3 sh ./make.sh python test.py -
You can also install the operator using .whl files
Download the RoDLA dataset from Google Driver to the desired root directory.
Prepare the dataset as follows by yourself:
cd ./perturbation python apply_perturbation.py \ --dataset_dir ./publaynet/val \ --json_dir ./publaynet/val.json \ --dataset_name PubLayNet-P \ --output_dir ./PubLayNet-P \ --pert_method all \ --background_folder ./background \ --metric all After dataset preparation, the perturbed dataset structure would be:
.desired_root └── PubLayNet-P ├── Background │ ├── Background_1 │ │ ├── psnr.json │ │ ├── ms_ssim.json │ │ ├── cw_ssim.json │ │ ├── val.json │ │ ├── val │ │ │ ├── PMC538274_00004.jpg ... │ ├── Background_2 ... ├── Rotation ... cd ./model python -u test.py configs/publaynet/rodla_internimage_xl_publaynet.py \ checkpoint_dir/rodla_internimage_xl_publaynet.pth \ --work-dir result/rodla_internimage_publaynet/Speckle_1 \ --eval bbox \ --cfg-options data.test.ann_file='PubLayNet-P/Speckle/Speckle_1/val.json' \ data.test.img_prefix='PubLayNet-P/Speckle/Speckle_1/val/' - Modify the configuration file under
configs/_base_/datasetsto specify the dataset path - Run the following command to train the model with 4 GPUs
sh dist_train.sh configs/publaynet/rodla_internimage_xl_2x_publaynet.py 4 If you find this code useful for your research, please consider citing:
@inproceedings{chen2024rodla, title={RoDLA: Benchmarking the Robustness of Document Layout Analysis Models}, author={Yufan Chen and Jiaming Zhang and Kunyu Peng and Junwei Zheng and Ruiping Liu and Philip Torr and Rainer Stiefelhagen}, booktitle={CVPR}, year={2024} } 