Menu

PP-StructureV3 Document Parsing

Relevant source files

Purpose and Scope

This document describes the PP-StructureV3 document parsing pipeline, which extracts structured information from complex document images and outputs results in machine-readable formats (Markdown, JSON, HTML). PP-StructureV3 builds upon the general layout analysis v1 pipeline with enhanced capabilities for layout region detection, table recognition, formula recognition, chart understanding, and multi-column reading order recovery.

Related Pipelines:

  • For basic text extraction without structural analysis, see PP-OCRv5 Universal Text Recognition #2.1
  • For LLM-powered intelligent document understanding, see PP-ChatOCRv4 #2.3
  • For individual module usage outside of pipelines, see Individual Module Usage #2.6

Pipeline Architecture Overview

PP-StructureV3 is a modular pipeline that orchestrates multiple specialized modules to parse complex document layouts. The pipeline processes documents through layout analysis, region-specific recognition, and structured output generation.

Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md1-20 docs/version3.x/pipeline_usage/PP-StructureV3.en.md1-20

Core Modules and Components

7 Module/Sub-pipeline Architecture

PP-StructureV3 integrates the following components, each supporting independent training and inference:

ModulePurposeOptionalModels Supported
Layout DetectionIdentifies document regions by typeRequiredPP-DocLayout_plus-L, PP-DocBlockLayout, PP-DocLayout-L/M/S, PicoDet variants, RT-DETR variants
General OCRExtracts text from text regionsRequiredPP-OCRv5, PP-OCRv4, PP-OCRv3 series
Document PreprocessingCorrects orientation and distortionOptionalPP-LCNet_x1_0_doc_ori, UVDoc
Table RecognitionParses table structure and contentOptionalSLANeXt_wired/wireless, SLANet_plus, SLANet
Seal RecognitionRecognizes curved seal textOptionalPP-OCRv4_server_seal_det, seal text detectors
Formula RecognitionConverts formulas to LaTeXOptionalUniMERNet, PP-FormulaNet_L/base
Chart ParsingExtracts data from chartsOptionalChart2Table

Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md11-19 docs/version3.x/pipeline_usage/PP-StructureV3.en.md11-19

Layout Detection Models and Categories

The layout detection module supports multiple model variants with different category sets:

Category Mapping:

  • PP-DocLayout_plus-L (20 categories): Document title, paragraph title, text, page number, abstract, table of contents, references, footnotes, header, footer, algorithm, formula, formula number, image, table, figure-table title, seal, chart, sidebar text, reference content
  • PP-DocBlockLayout (1 category): Block - for detecting sub-article regions in multi-column newspapers and magazines
  • PP-DocLayout-L/M/S (23 categories): All 20 categories plus figure caption, table caption, figure title, figure, header image, footer image, sidebar text (refined)

Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md79-182 docs/version3.x/pipeline_usage/PP-StructureV3.en.md80-178

Processing Workflow

End-to-End Pipeline Execution

The following diagram illustrates the complete processing flow with specific module names from the codebase:

Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md1-27 docs/version3.x/pipeline_usage/PP-StructureV3.en.md1-27

Multi-Column Reading Order Recovery

PP-StructureV3 includes specialized logic for recovering correct reading order in multi-column documents:

Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md110-137

Table Recognition Pipeline

The table recognition sub-pipeline in PP-StructureV3 supports multiple processing strategies:

Table Recognition Architecture

Table Recognition Models:

ComponentModelPurposeAccuracy
Structure RecognitionSLANeXt_wiredDetects table structure for wired tables69.65%
Structure RecognitionSLANeXt_wirelessDetects table structure for wireless tables69.65%
Table ClassificationPP-LCNet_x1_0_table_clsClassifies wired vs wireless tables94.2% Top-1
Cell DetectionRT-DETR-L_wired_table_cell_detDetects individual cells in wired tables82.7% mAP
Cell DetectionRT-DETR-L_wireless_table_cell_detDetects individual cells in wireless tables82.7% mAP

Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md322-392 docs/version3.x/pipeline_usage/table_recognition_v2.md1-76

Formula and Seal Recognition

Formula Recognition Integration

PP-StructureV3 integrates formula recognition for converting mathematical expressions to LaTeX:

Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md685-784 docs/version3.x/pipeline_usage/formula_recognition.md1-40

Seal Text Recognition

For seal regions, PP-StructureV3 uses specialized curved text detection:

Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md785-1114 docs/version3.x/pipeline_usage/seal_recognition.md1-40

Chart Parsing Module

PP-StructureV3 includes chart understanding capabilities to extract structured data from visualizations:

Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md1115-1155

Configuration and Usage

Command-Line Interface

Basic usage through the paddleocr command:

Key Configuration Parameters:

ParameterTypeDescriptionDefault
layout_model_namestrLayout detection model namePP-DocLayout_plus-L
use_doc_orientation_classifyboolEnable orientation correctionTrue
use_doc_unwarpingboolEnable geometric correctionTrue
use_table_recognitionboolEnable table parsingTrue
use_formula_recognitionboolEnable formula recognitionTrue
use_seal_recognitionboolEnable seal text recognitionTrue
use_chart_parsingboolEnable chart understandingTrue
output_formatstrOutput format: markdown/json/htmlmarkdown
recover_reading_orderboolEnable multi-column order recoveryTrue

Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md1156-1400 docs/version3.x/pipeline_usage/PP-StructureV3.en.md1156-1400

Python API

Using PP-StructureV3 through Python:

Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md1401-1600

Output Formats

Markdown Generation

PP-StructureV3 converts structured document parsing results into Markdown format with preserved layout hierarchy:

JSON Structure

Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md1601-1800

Model Selection Guidelines

Performance vs. Accuracy Trade-offs

Selection Criteria:

  1. High Accuracy Priority: Use PP-DocLayout_plus-L/L + SLANeXt + UniMERNet for best results
  2. Balanced Performance: Use PP-DocLayout-M + SLANet_plus + PP-FormulaNet_L for production
  3. Speed Priority: Use PP-DocLayout-S + SLANet + PP-FormulaNet_base for edge devices

Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md21-320

Hardware and Deployment Support

PP-StructureV3 supports deployment across multiple hardware platforms:

HardwareBackendOptimizationsSupported Models
NVIDIA GPUPaddle Inference, TensorRTFP16, INT8 quantizationAll models
CPUPaddle Inference, MKL-DNNINT8 quantization, MKLDNN cacheAll models
Kunlunxin XPUPaddle InferenceXPU-specific opsAll models
Ascend NPUPaddle InferenceNPU accelerationAll models
MLUPaddle InferenceMLU operatorsAll models
DCUPaddle InferenceDCU accelerationAll models

Performance Modes:

  • Standard Mode: FP32 precision, basic optimizations
  • High-Performance Mode: Best available precision (FP16/INT8), TensorRT/MKLDNN enabled, optimized batch sizes

Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md636-698 docs/version3.x/pipeline_usage/OCR.md636-698

Advanced Features

Custom Model Integration

PP-StructureV3 supports replacing default models with custom trained models:

Batch Processing

For processing multiple documents efficiently:

Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md1800-2000


This page provides comprehensive documentation of the PP-StructureV3 document parsing pipeline. For specific module training and fine-tuning instructions, refer to the individual module documentation pages linked throughout this document.