This document describes the PP-StructureV3 document parsing pipeline, which extracts structured information from complex document images and outputs results in machine-readable formats (Markdown, JSON, HTML). PP-StructureV3 builds upon the general layout analysis v1 pipeline with enhanced capabilities for layout region detection, table recognition, formula recognition, chart understanding, and multi-column reading order recovery.
Related Pipelines:
PP-StructureV3 is a modular pipeline that orchestrates multiple specialized modules to parse complex document layouts. The pipeline processes documents through layout analysis, region-specific recognition, and structured output generation.
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md1-20 docs/version3.x/pipeline_usage/PP-StructureV3.en.md1-20
PP-StructureV3 integrates the following components, each supporting independent training and inference:
| Module | Purpose | Optional | Models Supported |
|---|---|---|---|
| Layout Detection | Identifies document regions by type | Required | PP-DocLayout_plus-L, PP-DocBlockLayout, PP-DocLayout-L/M/S, PicoDet variants, RT-DETR variants |
| General OCR | Extracts text from text regions | Required | PP-OCRv5, PP-OCRv4, PP-OCRv3 series |
| Document Preprocessing | Corrects orientation and distortion | Optional | PP-LCNet_x1_0_doc_ori, UVDoc |
| Table Recognition | Parses table structure and content | Optional | SLANeXt_wired/wireless, SLANet_plus, SLANet |
| Seal Recognition | Recognizes curved seal text | Optional | PP-OCRv4_server_seal_det, seal text detectors |
| Formula Recognition | Converts formulas to LaTeX | Optional | UniMERNet, PP-FormulaNet_L/base |
| Chart Parsing | Extracts data from charts | Optional | Chart2Table |
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md11-19 docs/version3.x/pipeline_usage/PP-StructureV3.en.md11-19
The layout detection module supports multiple model variants with different category sets:
Category Mapping:
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md79-182 docs/version3.x/pipeline_usage/PP-StructureV3.en.md80-178
The following diagram illustrates the complete processing flow with specific module names from the codebase:
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md1-27 docs/version3.x/pipeline_usage/PP-StructureV3.en.md1-27
PP-StructureV3 includes specialized logic for recovering correct reading order in multi-column documents:
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md110-137
The table recognition sub-pipeline in PP-StructureV3 supports multiple processing strategies:
Table Recognition Models:
| Component | Model | Purpose | Accuracy |
|---|---|---|---|
| Structure Recognition | SLANeXt_wired | Detects table structure for wired tables | 69.65% |
| Structure Recognition | SLANeXt_wireless | Detects table structure for wireless tables | 69.65% |
| Table Classification | PP-LCNet_x1_0_table_cls | Classifies wired vs wireless tables | 94.2% Top-1 |
| Cell Detection | RT-DETR-L_wired_table_cell_det | Detects individual cells in wired tables | 82.7% mAP |
| Cell Detection | RT-DETR-L_wireless_table_cell_det | Detects individual cells in wireless tables | 82.7% mAP |
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md322-392 docs/version3.x/pipeline_usage/table_recognition_v2.md1-76
PP-StructureV3 integrates formula recognition for converting mathematical expressions to LaTeX:
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md685-784 docs/version3.x/pipeline_usage/formula_recognition.md1-40
For seal regions, PP-StructureV3 uses specialized curved text detection:
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md785-1114 docs/version3.x/pipeline_usage/seal_recognition.md1-40
PP-StructureV3 includes chart understanding capabilities to extract structured data from visualizations:
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md1115-1155
Basic usage through the paddleocr command:
Key Configuration Parameters:
| Parameter | Type | Description | Default |
|---|---|---|---|
layout_model_name | str | Layout detection model name | PP-DocLayout_plus-L |
use_doc_orientation_classify | bool | Enable orientation correction | True |
use_doc_unwarping | bool | Enable geometric correction | True |
use_table_recognition | bool | Enable table parsing | True |
use_formula_recognition | bool | Enable formula recognition | True |
use_seal_recognition | bool | Enable seal text recognition | True |
use_chart_parsing | bool | Enable chart understanding | True |
output_format | str | Output format: markdown/json/html | markdown |
recover_reading_order | bool | Enable multi-column order recovery | True |
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md1156-1400 docs/version3.x/pipeline_usage/PP-StructureV3.en.md1156-1400
Using PP-StructureV3 through Python:
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md1401-1600
PP-StructureV3 converts structured document parsing results into Markdown format with preserved layout hierarchy:
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md1601-1800
Selection Criteria:
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md21-320
PP-StructureV3 supports deployment across multiple hardware platforms:
| Hardware | Backend | Optimizations | Supported Models |
|---|---|---|---|
| NVIDIA GPU | Paddle Inference, TensorRT | FP16, INT8 quantization | All models |
| CPU | Paddle Inference, MKL-DNN | INT8 quantization, MKLDNN cache | All models |
| Kunlunxin XPU | Paddle Inference | XPU-specific ops | All models |
| Ascend NPU | Paddle Inference | NPU acceleration | All models |
| MLU | Paddle Inference | MLU operators | All models |
| DCU | Paddle Inference | DCU acceleration | All models |
Performance Modes:
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md636-698 docs/version3.x/pipeline_usage/OCR.md636-698
PP-StructureV3 supports replacing default models with custom trained models:
For processing multiple documents efficiently:
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md1800-2000
This page provides comprehensive documentation of the PP-StructureV3 document parsing pipeline. For specific module training and fine-tuning instructions, refer to the individual module documentation pages linked throughout this document.
Refresh this wiki
This wiki was recently refreshed. Please wait 6 days to refresh again.