Overview

Relevant source files

PaddleOCR is a production-ready OCR and document AI engine built on PaddlePaddle 3.0, providing end-to-end solutions from text extraction to intelligent document understanding. The system converts images and PDFs into structured, AI-friendly data (JSON/Markdown) through a modular architecture supporting 109+ languages, multiple text types (printed, handwritten, mixed scripts), and heterogeneous hardware platforms including domestic accelerators (Kunlunxin XPU, Ascend NPU).

Key Project Statistics:

60,000+ GitHub stars
100+ million PyPI downloads
Integrated into 6,000+ repositories including MinerU, RAGFlow, pathway, cherry-studio
Supports Python 3.8-3.12 on Linux, Windows, macOS

The codebase is distributed as a Python package via pip, with functionality exposed through:

CLI Interface: paddleocr command with subcommands registered in pyproject.toml81-98
Python API: Classes exported from paddleocr/__init__.py1-50 (PaddleOCR, PPStructureV3, PPChatOCRv4Doc, etc.)
Package Configuration: setup.py1-19 and pyproject.toml1-105 define build and dependency management

This page introduces PaddleOCR's purpose, architecture, and project structure. For installation, see page 1.2. For migration from 2.x, see page 1.3. For pipeline details, see section 2.

Sources: README.md1-50 docs/index.en.md1-60 pyproject.toml1-105 paddleocr/__init__.py1-50 setup.py1-19

System Purpose and Key Features

PaddleOCR 3.0 addresses document understanding through four flagship pipelines:

PP-OCRv5 Universal Text Recognition (paddleocr.PaddleOCR class): Single model supports 5 text types (Simplified/Traditional Chinese, English, Japanese, Pinyin) plus handwriting; 13% accuracy improvement over v4
PaddleOCR-VL Multilingual Parsing (paddleocr.PaddleOCRVL class): 0.9B parameter vision-language model supporting 109 languages with unified recognition of text, tables, formulas, charts
PP-StructureV3 Document Parsing (paddleocr.PPStructureV3 class): Converts complex PDFs to Markdown/JSON preserving layout; outperforms commercial solutions on OmniDocBench benchmark
PP-ChatOCRv4 Intelligent Extraction (paddleocr.PPChatOCRv4Doc class): Integrates ERNIE 4.5 LLM for intelligent information extraction; 15% accuracy improvement over v3

Development Lifecycle Support:

Data Annotation: PPOCRLabel tool (separate repository)
Model Training: tools/train.py1-50 with YAML configuration system
Model Evaluation: tools/eval.py1-50 with task-specific metrics
Model Export: tools/export_model.py1-100 converts training checkpoints to inference format
Deployment: Python inference, C++ deployment (deploy/cpp_infer/]), mobile (deploy/lite/]), service APIs

Project Structure:

PaddleOCR/ ├── paddleocr/ # Main package (Python API exports) │ ├── __init__.py # Exports PaddleOCR, PPStructureV3, etc. │ ├── _cli.py # CLI implementation │ └── _abstract.py # Base classes and interfaces ├── tools/ # Training and evaluation scripts │ ├── train.py # Model training entry point │ ├── eval.py # Model evaluation │ └── export_model.py # Inference model export ├── deploy/ # Deployment implementations │ ├── cpp_infer/ # C++ inference │ └── lite/ # Mobile deployment (Paddle-Lite) ├── test_tipc/ # Testing framework (TIPC) ├── benchmark/ # Performance benchmarking ├── pyproject.toml # Package configuration └── setup.py # Build system entry point

Sources: README.md48-75 docs/index.en.md17-42 pyproject.toml1-105 paddleocr/__init__.py1-50 tools/train.py1-50 tools/eval.py1-50 tools/export_model.py1-100

Core Pipelines

PaddleOCR 3.0 organizes its functionality into five flagship pipelines that build upon each other through hierarchical module composition:

Diagram: Core Pipeline Class Hierarchy and Module Composition

Pipeline Characteristics:

Pipeline	Python Class	Primary Purpose	Key Features
PP-OCRv5	`PaddleOCR`	Universal scene text recognition	13% accuracy improvement; single model supports 5 text types + handwriting
PaddleOCR-VL	`PaddleOCRVL`	End-to-end VLM document parsing	SOTA 0.9B parameter model; 109 languages; recognizes text, tables, formulas, charts
PP-StructureV3	`PPStructureV3`	Complex document parsing	Outputs Markdown/JSON; outperforms commercial solutions on OmniDocBench
PP-ChatOCRv4	`PPChatOCRv4Doc`	LLM-enhanced information extraction	15% accuracy improvement; native ERNIE 4.5 integration with RAG
PP-DocTranslation	`PPDocTranslation`	Intelligent document translation	Preserves layout; based on PP-StructureV3 + ERNIE 4.5

Architecture Design Principles:

Modular Composition: Basic modules (text detection, recognition) in individual classes serve as reusable building blocks for complex pipelines
Progressive Enhancement: Simpler pipelines (PP-OCRv5) provide foundational OCR; advanced pipelines (PP-ChatOCRv4) add LLM intelligence
Dual Paradigm Support: Traditional pipeline models (composing detection + recognition) coexist with unified VLM approach (PaddleOCR-VL)
Deployment Flexibility: Same model code runs on CPU, GPU, XPU, NPU through unified inference interface
Unified Interface: All pipelines inherit from PaddleXPipelineWrapper base class providing consistent predict() API

Sources: README.md48-75 docs/index.en.md17-42 pyproject.toml81-98 paddleocr/__init__.py1-50 docs/update/upgrade_notes.en.md15-50

High-Level System Design

PaddleOCR 3.0 employs a layered, modular architecture designed for flexibility, reusability, and production deployment. The system is organized into distinct layers, each with clear responsibilities:

Diagram: System Architecture Layers

Key Architectural Principles:

Separation of Concerns: User interfaces (CLI/API) are decoupled from pipeline logic, which is decoupled from inference execution
Hierarchical Composition: Complex pipelines compose basic modules; each layer builds on the layer below
Unified Inference: PaddleX wrapper provides consistent inference interface across all models and hardware
Plugin-Based Design: New modules and pipelines can be added without modifying existing code
Hardware Abstraction: Same code runs across heterogeneous platforms (CPU, GPU, domestic accelerators)

For detailed architecture documentation including module interactions, data flow, and component diagrams, see page 1.1 (System Architecture and Components).

Sources: README.md58-70 docs/update/upgrade_notes.en.md15-21 pyproject.toml81-98 paddleocr/__main__.py18-39 paddleocr/__init__.py

Installation and Quick Start

Installation

PaddleOCR 3.0 requires PaddlePaddle 3.0+ (installable via pip install paddlepaddle==3.2.0 or paddlepaddle-gpu==3.2.0). The package provides optional dependency groups defined in pyproject.toml50-78:

Dependency Group	Installation Command	Features	Defined in pyproject.toml
Core (default)	`pip install paddleocr`	Basic text recognition (PP-OCR series)	Base dependencies
`doc-parser`	`pip install "paddleocr[doc-parser]"`	Document parsing (PP-StructureV3)	Line 52-56
`ie`	`pip install "paddleocr[ie]"`	Information extraction (PP-ChatOCRv4)	Line 57-61
`trans`	`pip install "paddleocr[trans]"`	Document translation (PP-DocTranslation)	Line 62-64
`all`	`pip install "paddleocr[all]"`	Complete functionality	Line 65-78

The build system uses setup.py1-19 which delegates to setuptools.setup() with configuration from pyproject.toml1-98

Sources: README.md211-230 docs/quick_start.en.md7-35 setup.py1-19 pyproject.toml50-78 docs/version3.x/installation.en.md97-131

Entry Points

PaddleOCR provides two primary interfaces:

1. Command-Line Interface (CLI)

The CLI is implemented via paddleocr/__main__.py21-35 which calls console_entry(), wrapping the main CLI logic in paddleocr/_cli.py Subcommands are registered in pyproject.toml81-98 under [project.scripts]:

Each CLI subcommand maps to a class implementing CLISubcommandExecutor abstract interface defined in paddleocr/_abstract.py18-25

2. Python API

The Python API exposes classes from paddleocr/__init__.py All classes inherit from PaddleXPipelineWrapper (see page 3.2 for integration details):

Other importable classes include PPStructureV3, PPChatOCRv4Doc, PPDocTranslation, PaddleOCRVL, TextDetection, TextRecognition, DocPreprocessor, and specialized modules (see page 2.6 for complete list).

Sources: README.md231-265 docs/quick_start.en.md37-196 paddleocr/__init__.py paddleocr/_abstract.py18-25

Hardware and Deployment Support

Supported Hardware Platforms

PaddleOCR 3.0 supports deployment across heterogeneous hardware:

Sources: README.md14-16 README.md74-79 docs/version3.x/paddlex/overview.en.md140-165

Deployment Modes

PaddleOCR provides six deployment paths from trained models:

Deployment Mode	Implementation	Configuration	Use Case
Python Inference	`pip install paddleocr`	Default Paddle Inference or ONNX Runtime backend	Development, prototyping, simple applications
High-Performance Inference	`enable_hpi=True` parameter	TensorRT (GPU), MKL-DNN (CPU), CINN compiler	Production servers requiring maximum throughput
Service Deployment	PaddleX Serving (HTTP/gRPC)	Docker containers, multi-language clients	Microservices, cloud applications
On-Device Deployment	Paddle-Lite (`.nb` models)	ARM CPU/GPU/NPU optimization	Android, iOS, embedded systems
C++ Inference	Native binary via CMake	Links Paddle C++ library	High-performance local applications
ONNX Export	`paddle2onnx` conversion	Cross-platform ONNX Runtime	Platform-agnostic deployment

All modes support multiple hardware backends (CPU, GPU, XPU, NPU, MLU, DCU) through PaddlePaddle's unified inference API. Model export is handled by tools/export_model.py

Model Weight Distribution:

Default source: HuggingFace (configurable via PADDLE_PDX_MODEL_SOURCE environment variable)
Alternative sources: BOS (Baidu Object Storage), AIStudio, ModelScope
Auto-download on first inference; cached locally in ~/.paddlex/

Sources: README.md372-376 docs/version3.x/deployment/high_performance_inference.en.md1-50 tools/export_model.py1-100 README.md170-185

Integration with PaddleX

PaddleOCR 3.0 deeply integrates with PaddleX a low-code development platform:

PaddleX provides:

One-click model inference for all PaddleOCR pipelines
No-code training via cloud-based GUI at AI Studio
Unified deployment APIs across multiple hardware platforms
200+ additional models for image classification, object detection, segmentation, time series

Sources: docs/version3.x/paddlex/overview.en.md1-21 README.md57 docs/update/upgrade_notes.en.md20

Version 3.0 Architecture Evolution

PaddleOCR 3.0 represents a complete architectural redesign from 2.x:

Key Changes:

Aspect	PaddleOCR 2.x	PaddleOCR 3.x
Architecture	Monolithic with feature branches	Modular, plugin-based
Interfaces	Mixed, inconsistent APIs	Unified CLI and Python API
Pipeline Design	Single `PPStructure` class	Separate pipelines: `PPStructureV3`, `PPChatOCRv4Doc`, etc.
Deployment	Limited to PaddleServing	High-performance, service, on-device, C++
Framework	PaddlePaddle 2.x	PaddlePaddle 3.0 with CINN compiler
LLM Integration	None	Native ERNIE 4.5 support in PP-ChatOCRv4

Breaking Changes:

PaddleOCR.ocr() method no longer accepts det, rec parameters (use dedicated TextDetection, TextRecognition classes instead)
PPStructure class removed (replaced by PPStructureV3)
show_log parameter replaced by comprehensive logging system
use_onnx parameter replaced by high-performance inference configuration

For detailed migration guidance, see Version 2.x to 3.x Migration.

Sources: docs/update/upgrade_notes.en.md1-83 docs/update/upgrade_notes.md1-84

Model Capabilities and Supported Scenarios

PaddleOCR 3.0 provides comprehensive model support across text recognition, document analysis, and preprocessing:

Text Recognition:

PP-OCRv5 (Main): Single model supporting 5 text types (Simplified Chinese, Traditional Chinese, English, Japanese, Pinyin) plus handwriting
PP-OCRv5 (Multilingual): 37+ languages including French, Spanish, Portuguese, Russian, Korean, Latin scripts, Cyrillic, Arabic, Devanagari, Telugu, Tamil covering 109 languages total
PP-OCRv5 (Specialized): Dedicated high-accuracy models for English, Thai, Greek
PaddleOCR-VL: 0.9B parameter vision-language model supporting 109 languages with unified element recognition

Document Structure Analysis:

Layout Detection: PP-DocLayout series (L/M/S) supporting 23 layout categories (text blocks, titles, tables, figures, formulas, etc.)
Table Recognition: SLANeXt models for wired/wireless tables; RT-DETR for table cell detection; supports nested formulas and images
Formula Recognition: PP-FormulaNet (L/S) with 50,000 LaTeX vocabulary for printed and handwritten formulas
Seal Recognition: Specialized curved text detection for official stamps
Chart Recognition: PP-Chart2Table for converting charts to structured table format

Document Preprocessing:

Orientation Classification: PP-LCNet models for document and text line rotation detection
Document Unwarping: UVDoc model for correcting distorted/curved documents
Text Line Orientation: Ultra-lightweight (0.3M parameters) classification model

Comprehensive Pipeline Models:

PP-StructureV3: Combines multiple modules for end-to-end document parsing to Markdown/JSON
PP-ChatOCRv4: Integrates PP-DocBee2 multimodal model with ERNIE 4.5 for intelligent extraction
PP-DocTranslation: Document translation preserving layout using PP-StructureV3 + ERNIE 4.5

Sources: README.md76-97 README.md158-172 docs/update/update.en.md159-172

Development Workflow and Tools

PaddleOCR 3.0 provides a complete toolkit for the AI development lifecycle:

Diagram: Development Lifecycle Workflow

Key Development Features:

Data Annotation: PPOCRLabel tool (separate repository) for text detection, recognition, table, and KIE annotation
Model Training: Unified training via tools/train.py accepting YAML configs, supporting distributed training, AMP, and multiple optimizers
Model Evaluation: tools/eval.py calculates task-specific metrics (DetMetric for detection, RecMetric for recognition)
Model Export: tools/export_model.py converts training checkpoints (.pdparams) to inference format (.pdmodel + .pdiparams)
Quality Assurance: TIPC framework in test_tipc/ validates training, inference, and deployment across configurations using test_tipc/prepare.sh and test_tipc/test_train_inference_python.sh
Performance Analysis: Benchmark tools (benchmark_train.sh scripts/analysis.py) measure end-to-end and per-module latency
Multiple Deployment Paths: Python (via paddleocr/__init__.py), C++ (deploy/cpp_infer/), mobile (deploy/lite/), service (PaddleX Serving)

For detailed training documentation, see section 4 (Model Training System). For deployment options, see section 5 (Deployment and Inference). For quality assurance details, see section 6.1 (Testing Framework).

Sources: README.md228-244 .github/workflows/python-publish.yml1-40 TIPC testing diagram, test_tipc/

Community and Contribution Guidelines

Documentation Resources

Official Documentation:

Getting Started: docs/quick_start.en.md and docs/version3.x/installation.en.md
Pipeline Guides: docs/version3.x/pipeline_usage/ directory contains tutorials for each pipeline
Deployment Documentation: docs/version3.x/deployment/ covers high-performance, service, and on-device deployment
Algorithm Details: docs/version3.x/algorithm/ explains model architectures and training procedures
API Reference: Inline documentation in paddleocr/__init__.py and submodules

Online Resources:

Official Website: https://www.paddleocr.com (Beta, offers free API and MCP services)
AI Studio Demos: PP-OCRv5, PP-StructureV3, PP-ChatOCRv4
HuggingFace Space: PaddleOCR-VL Demo
Technical Reports: PaddleOCR 3.0 arXiv, PaddleOCR-VL arXiv

Getting Support

GitHub Issue Templates:

Bug Reports: Use .github/ISSUE_TEMPLATE/bug-report.yml template for reproducible bugs
Feature Requests: Use .github/ISSUE_TEMPLATE/feature-request.yml for enhancement proposals
Documentation Issues: Use .github/ISSUE_TEMPLATE/documentation.yml for doc improvements
Questions: Use GitHub Discussions for Q&A

Issue Lifecycle:

Inactive issues are automatically closed after 30 days (managed by .github/workflows/close_inactive_issues.yaml)
Stale issues receive warning labels after 7 days of inactivity
Community members can reopen closed issues with updates

Contributing to PaddleOCR

Contribution Workflow:

Fork and Clone: Fork the repository and clone your fork locally
Environment Setup: Follow docs/version3.x/installation.en.md for development environment
Create Branch: Create a feature branch from main or dygraph
Code Development: Follow PaddlePaddle coding standards and add tests in test_tipc/
Testing: Run TIPC tests via test_tipc/test_train_inference_python.sh
Pull Request: Submit PR using template in .github/PULL_REQUEST_TEMPLATE.md

Contribution Areas:

New Models: Add models to paddleocr/ following existing class structure
Bug Fixes: Address issues in GitHub Issues
Documentation: Improve docs in docs/ directory (multi-language support welcomed)
Deployment: Enhance deployment tools in deploy/
Testing: Expand test coverage in test_tipc/

Community Engagement:

Best Practice Projects: Submit scenario-based applications to PaddleOCR Best Practice Projects
PaddlePaddle Community: Join broader PaddlePaddle Developer Community
Social Media: Follow updates on Twitter and WeChat official account

Quality Standards:

All new models must include training configs in configs/
Deployment code must support CPU and at least one GPU platform
Documentation must be provided in both English and Chinese
Tests must pass in TIPC framework before merge

Sources: README.md408-455 docs/index.en.md89-96 docs/index.md80-88 .github/ISSUE_TEMPLATE/bug-report.yml1-100 .github/workflows/close_inactive_issues.yaml1-24

Next Steps

To begin using PaddleOCR:

Installation: Follow the Installation and Setup guide to install PaddleOCR with the appropriate dependency groups for your use case
Quick Start: Try the Quick Start examples in Quick Start to run inference with pre-trained models
Choose Your Pipeline: Review the Core Pipelines section to select the pipeline that matches your requirements
Deployment: Plan your deployment strategy using the Deployment and Inference documentation
Customization: For custom model training, see Model Architecture and Training

For migration from PaddleOCR 2.x, start with Version 2.x to 3.x Migration.

Sources: README.md205-244 docs/quick_start.en.md1-197