PP-StructureV3 Document Parsing

Relevant source files

Purpose and Scope

This document describes the PP-StructureV3 document parsing pipeline, which extracts structured information from complex document images and outputs results in machine-readable formats (Markdown, JSON, HTML). PP-StructureV3 builds upon the general layout analysis v1 pipeline with enhanced capabilities for layout region detection, table recognition, formula recognition, chart understanding, and multi-column reading order recovery.

Related Pipelines:

For basic text extraction without structural analysis, see PP-OCRv5 Universal Text Recognition #2.1
For LLM-powered intelligent document understanding, see PP-ChatOCRv4 #2.3
For individual module usage outside of pipelines, see Individual Module Usage #2.6

Pipeline Architecture Overview

PP-StructureV3 is a modular pipeline that orchestrates multiple specialized modules to parse complex document layouts. The pipeline processes documents through layout analysis, region-specific recognition, and structured output generation.

Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md1-20 docs/version3.x/pipeline_usage/PP-StructureV3.en.md1-20

Core Modules and Components

7 Module/Sub-pipeline Architecture

PP-StructureV3 integrates the following components, each supporting independent training and inference:

Module	Purpose	Optional	Models Supported
Layout Detection	Identifies document regions by type	Required	PP-DocLayout_plus-L, PP-DocBlockLayout, PP-DocLayout-L/M/S, PicoDet variants, RT-DETR variants
General OCR	Extracts text from text regions	Required	PP-OCRv5, PP-OCRv4, PP-OCRv3 series
Document Preprocessing	Corrects orientation and distortion	Optional	PP-LCNet_x1_0_doc_ori, UVDoc
Table Recognition	Parses table structure and content	Optional	SLANeXt_wired/wireless, SLANet_plus, SLANet
Seal Recognition	Recognizes curved seal text	Optional	PP-OCRv4_server_seal_det, seal text detectors
Formula Recognition	Converts formulas to LaTeX	Optional	UniMERNet, PP-FormulaNet_L/base
Chart Parsing	Extracts data from charts	Optional	Chart2Table

Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md11-19 docs/version3.x/pipeline_usage/PP-StructureV3.en.md11-19

Layout Detection Models and Categories

The layout detection module supports multiple model variants with different category sets:

Category Mapping:

PP-DocLayout_plus-L (20 categories): Document title, paragraph title, text, page number, abstract, table of contents, references, footnotes, header, footer, algorithm, formula, formula number, image, table, figure-table title, seal, chart, sidebar text, reference content
PP-DocBlockLayout (1 category): Block - for detecting sub-article regions in multi-column newspapers and magazines
PP-DocLayout-L/M/S (23 categories): All 20 categories plus figure caption, table caption, figure title, figure, header image, footer image, sidebar text (refined)

Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md79-182 docs/version3.x/pipeline_usage/PP-StructureV3.en.md80-178

Processing Workflow

End-to-End Pipeline Execution

The following diagram illustrates the complete processing flow with specific module names from the codebase:

Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md1-27 docs/version3.x/pipeline_usage/PP-StructureV3.en.md1-27

Multi-Column Reading Order Recovery

PP-StructureV3 includes specialized logic for recovering correct reading order in multi-column documents:

Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md110-137

Table Recognition Pipeline

The table recognition sub-pipeline in PP-StructureV3 supports multiple processing strategies:

Table Recognition Architecture

Table Recognition Models:

Component	Model	Purpose	Accuracy
Structure Recognition	SLANeXt_wired	Detects table structure for wired tables	69.65%
Structure Recognition	SLANeXt_wireless	Detects table structure for wireless tables	69.65%
Table Classification	PP-LCNet_x1_0_table_cls	Classifies wired vs wireless tables	94.2% Top-1
Cell Detection	RT-DETR-L_wired_table_cell_det	Detects individual cells in wired tables	82.7% mAP
Cell Detection	RT-DETR-L_wireless_table_cell_det	Detects individual cells in wireless tables	82.7% mAP

Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md322-392 docs/version3.x/pipeline_usage/table_recognition_v2.md1-76

Formula and Seal Recognition

Formula Recognition Integration

PP-StructureV3 integrates formula recognition for converting mathematical expressions to LaTeX:

Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md685-784 docs/version3.x/pipeline_usage/formula_recognition.md1-40

Seal Text Recognition

For seal regions, PP-StructureV3 uses specialized curved text detection:

Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md785-1114 docs/version3.x/pipeline_usage/seal_recognition.md1-40

Chart Parsing Module

PP-StructureV3 includes chart understanding capabilities to extract structured data from visualizations:

Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md1115-1155

Configuration and Usage

Command-Line Interface

Basic usage through the paddleocr command:

Key Configuration Parameters:

Parameter	Type	Description	Default
`layout_model_name`	str	Layout detection model name	`PP-DocLayout_plus-L`
`use_doc_orientation_classify`	bool	Enable orientation correction	`True`
`use_doc_unwarping`	bool	Enable geometric correction	`True`
`use_table_recognition`	bool	Enable table parsing	`True`
`use_formula_recognition`	bool	Enable formula recognition	`True`
`use_seal_recognition`	bool	Enable seal text recognition	`True`
`use_chart_parsing`	bool	Enable chart understanding	`True`
`output_format`	str	Output format: markdown/json/html	`markdown`
`recover_reading_order`	bool	Enable multi-column order recovery	`True`

Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md1156-1400 docs/version3.x/pipeline_usage/PP-StructureV3.en.md1156-1400

Python API

Using PP-StructureV3 through Python:

Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md1401-1600

Output Formats

Markdown Generation

PP-StructureV3 converts structured document parsing results into Markdown format with preserved layout hierarchy:

JSON Structure

Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md1601-1800

Model Selection Guidelines

Performance vs. Accuracy Trade-offs

Selection Criteria:

High Accuracy Priority: Use PP-DocLayout_plus-L/L + SLANeXt + UniMERNet for best results
Balanced Performance: Use PP-DocLayout-M + SLANet_plus + PP-FormulaNet_L for production
Speed Priority: Use PP-DocLayout-S + SLANet + PP-FormulaNet_base for edge devices

Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md21-320

Hardware and Deployment Support

PP-StructureV3 supports deployment across multiple hardware platforms:

Hardware	Backend	Optimizations	Supported Models
NVIDIA GPU	Paddle Inference, TensorRT	FP16, INT8 quantization	All models
CPU	Paddle Inference, MKL-DNN	INT8 quantization, MKLDNN cache	All models
Kunlunxin XPU	Paddle Inference	XPU-specific ops	All models
Ascend NPU	Paddle Inference	NPU acceleration	All models
MLU	Paddle Inference	MLU operators	All models
DCU	Paddle Inference	DCU acceleration	All models

Performance Modes:

Standard Mode: FP32 precision, basic optimizations
High-Performance Mode: Best available precision (FP16/INT8), TensorRT/MKLDNN enabled, optimized batch sizes

Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md636-698 docs/version3.x/pipeline_usage/OCR.md636-698

Advanced Features

Custom Model Integration

PP-StructureV3 supports replacing default models with custom trained models:

Batch Processing

For processing multiple documents efficiently:

Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md1800-2000

This page provides comprehensive documentation of the PP-StructureV3 document parsing pipeline. For specific module training and fine-tuning instructions, refer to the individual module documentation pages linked throughout this document.