PaddleOCR is a production-ready OCR and document AI engine built on PaddlePaddle 3.0, providing end-to-end solutions from text extraction to intelligent document understanding. The system converts images and PDFs into structured, AI-friendly data (JSON/Markdown) through a modular architecture supporting 109+ languages, multiple text types (printed, handwritten, mixed scripts), and heterogeneous hardware platforms including domestic accelerators (Kunlunxin XPU, Ascend NPU).
Key Project Statistics:
The codebase is distributed as a Python package via pip, with functionality exposed through:
paddleocr command with subcommands registered in pyproject.toml81-98PaddleOCR, PPStructureV3, PPChatOCRv4Doc, etc.)This page introduces PaddleOCR's purpose, architecture, and project structure. For installation, see page 1.2. For migration from 2.x, see page 1.3. For pipeline details, see section 2.
Sources: README.md1-50 docs/index.en.md1-60 pyproject.toml1-105 paddleocr/__init__.py1-50 setup.py1-19
PaddleOCR 3.0 addresses document understanding through four flagship pipelines:
paddleocr.PaddleOCR class): Single model supports 5 text types (Simplified/Traditional Chinese, English, Japanese, Pinyin) plus handwriting; 13% accuracy improvement over v4paddleocr.PaddleOCRVL class): 0.9B parameter vision-language model supporting 109 languages with unified recognition of text, tables, formulas, chartspaddleocr.PPStructureV3 class): Converts complex PDFs to Markdown/JSON preserving layout; outperforms commercial solutions on OmniDocBench benchmarkpaddleocr.PPChatOCRv4Doc class): Integrates ERNIE 4.5 LLM for intelligent information extraction; 15% accuracy improvement over v3Development Lifecycle Support:
Project Structure:
PaddleOCR/ ├── paddleocr/ # Main package (Python API exports) │ ├── __init__.py # Exports PaddleOCR, PPStructureV3, etc. │ ├── _cli.py # CLI implementation │ └── _abstract.py # Base classes and interfaces ├── tools/ # Training and evaluation scripts │ ├── train.py # Model training entry point │ ├── eval.py # Model evaluation │ └── export_model.py # Inference model export ├── deploy/ # Deployment implementations │ ├── cpp_infer/ # C++ inference │ └── lite/ # Mobile deployment (Paddle-Lite) ├── test_tipc/ # Testing framework (TIPC) ├── benchmark/ # Performance benchmarking ├── pyproject.toml # Package configuration └── setup.py # Build system entry point Sources: README.md48-75 docs/index.en.md17-42 pyproject.toml1-105 paddleocr/__init__.py1-50 tools/train.py1-50 tools/eval.py1-50 tools/export_model.py1-100
PaddleOCR 3.0 organizes its functionality into five flagship pipelines that build upon each other through hierarchical module composition:
Diagram: Core Pipeline Class Hierarchy and Module Composition
Pipeline Characteristics:
| Pipeline | Python Class | Primary Purpose | Key Features |
|---|---|---|---|
| PP-OCRv5 | PaddleOCR | Universal scene text recognition | 13% accuracy improvement; single model supports 5 text types + handwriting |
| PaddleOCR-VL | PaddleOCRVL | End-to-end VLM document parsing | SOTA 0.9B parameter model; 109 languages; recognizes text, tables, formulas, charts |
| PP-StructureV3 | PPStructureV3 | Complex document parsing | Outputs Markdown/JSON; outperforms commercial solutions on OmniDocBench |
| PP-ChatOCRv4 | PPChatOCRv4Doc | LLM-enhanced information extraction | 15% accuracy improvement; native ERNIE 4.5 integration with RAG |
| PP-DocTranslation | PPDocTranslation | Intelligent document translation | Preserves layout; based on PP-StructureV3 + ERNIE 4.5 |
Architecture Design Principles:
PaddleXPipelineWrapper base class providing consistent predict() APISources: README.md48-75 docs/index.en.md17-42 pyproject.toml81-98 paddleocr/__init__.py1-50 docs/update/upgrade_notes.en.md15-50
PaddleOCR 3.0 employs a layered, modular architecture designed for flexibility, reusability, and production deployment. The system is organized into distinct layers, each with clear responsibilities:
Diagram: System Architecture Layers
Key Architectural Principles:
For detailed architecture documentation including module interactions, data flow, and component diagrams, see page 1.1 (System Architecture and Components).
Sources: README.md58-70 docs/update/upgrade_notes.en.md15-21 pyproject.toml81-98 paddleocr/__main__.py18-39 paddleocr/__init__.py
PaddleOCR 3.0 requires PaddlePaddle 3.0+ (installable via pip install paddlepaddle==3.2.0 or paddlepaddle-gpu==3.2.0). The package provides optional dependency groups defined in pyproject.toml50-78:
| Dependency Group | Installation Command | Features | Defined in pyproject.toml |
|---|---|---|---|
| Core (default) | pip install paddleocr | Basic text recognition (PP-OCR series) | Base dependencies |
doc-parser | pip install "paddleocr[doc-parser]" | Document parsing (PP-StructureV3) | Line 52-56 |
ie | pip install "paddleocr[ie]" | Information extraction (PP-ChatOCRv4) | Line 57-61 |
trans | pip install "paddleocr[trans]" | Document translation (PP-DocTranslation) | Line 62-64 |
all | pip install "paddleocr[all]" | Complete functionality | Line 65-78 |
The build system uses setup.py1-19 which delegates to setuptools.setup() with configuration from pyproject.toml1-98
Sources: README.md211-230 docs/quick_start.en.md7-35 setup.py1-19 pyproject.toml50-78 docs/version3.x/installation.en.md97-131
PaddleOCR provides two primary interfaces:
1. Command-Line Interface (CLI)
The CLI is implemented via paddleocr/__main__.py21-35 which calls console_entry(), wrapping the main CLI logic in paddleocr/_cli.py Subcommands are registered in pyproject.toml81-98 under [project.scripts]:
Each CLI subcommand maps to a class implementing CLISubcommandExecutor abstract interface defined in paddleocr/_abstract.py18-25
2. Python API
The Python API exposes classes from paddleocr/__init__.py All classes inherit from PaddleXPipelineWrapper (see page 3.2 for integration details):
Other importable classes include PPStructureV3, PPChatOCRv4Doc, PPDocTranslation, PaddleOCRVL, TextDetection, TextRecognition, DocPreprocessor, and specialized modules (see page 2.6 for complete list).
Sources: README.md231-265 docs/quick_start.en.md37-196 paddleocr/__init__.py paddleocr/_abstract.py18-25
PaddleOCR 3.0 supports deployment across heterogeneous hardware:
Sources: README.md14-16 README.md74-79 docs/version3.x/paddlex/overview.en.md140-165
PaddleOCR provides six deployment paths from trained models:
| Deployment Mode | Implementation | Configuration | Use Case |
|---|---|---|---|
| Python Inference | pip install paddleocr | Default Paddle Inference or ONNX Runtime backend | Development, prototyping, simple applications |
| High-Performance Inference | enable_hpi=True parameter | TensorRT (GPU), MKL-DNN (CPU), CINN compiler | Production servers requiring maximum throughput |
| Service Deployment | PaddleX Serving (HTTP/gRPC) | Docker containers, multi-language clients | Microservices, cloud applications |
| On-Device Deployment | Paddle-Lite (.nb models) | ARM CPU/GPU/NPU optimization | Android, iOS, embedded systems |
| C++ Inference | Native binary via CMake | Links Paddle C++ library | High-performance local applications |
| ONNX Export | paddle2onnx conversion | Cross-platform ONNX Runtime | Platform-agnostic deployment |
All modes support multiple hardware backends (CPU, GPU, XPU, NPU, MLU, DCU) through PaddlePaddle's unified inference API. Model export is handled by tools/export_model.py
Model Weight Distribution:
PADDLE_PDX_MODEL_SOURCE environment variable)~/.paddlex/Sources: README.md372-376 docs/version3.x/deployment/high_performance_inference.en.md1-50 tools/export_model.py1-100 README.md170-185
PaddleOCR 3.0 deeply integrates with PaddleX a low-code development platform:
PaddleX provides:
Sources: docs/version3.x/paddlex/overview.en.md1-21 README.md57 docs/update/upgrade_notes.en.md20
PaddleOCR 3.0 represents a complete architectural redesign from 2.x:
Key Changes:
| Aspect | PaddleOCR 2.x | PaddleOCR 3.x |
|---|---|---|
| Architecture | Monolithic with feature branches | Modular, plugin-based |
| Interfaces | Mixed, inconsistent APIs | Unified CLI and Python API |
| Pipeline Design | Single PPStructure class | Separate pipelines: PPStructureV3, PPChatOCRv4Doc, etc. |
| Deployment | Limited to PaddleServing | High-performance, service, on-device, C++ |
| Framework | PaddlePaddle 2.x | PaddlePaddle 3.0 with CINN compiler |
| LLM Integration | None | Native ERNIE 4.5 support in PP-ChatOCRv4 |
Breaking Changes:
PaddleOCR.ocr() method no longer accepts det, rec parameters (use dedicated TextDetection, TextRecognition classes instead)PPStructure class removed (replaced by PPStructureV3)show_log parameter replaced by comprehensive logging systemuse_onnx parameter replaced by high-performance inference configurationFor detailed migration guidance, see Version 2.x to 3.x Migration.
Sources: docs/update/upgrade_notes.en.md1-83 docs/update/upgrade_notes.md1-84
PaddleOCR 3.0 provides comprehensive model support across text recognition, document analysis, and preprocessing:
Text Recognition:
Document Structure Analysis:
Document Preprocessing:
Comprehensive Pipeline Models:
Sources: README.md76-97 README.md158-172 docs/update/update.en.md159-172
PaddleOCR 3.0 provides a complete toolkit for the AI development lifecycle:
Diagram: Development Lifecycle Workflow
Key Development Features:
.pdparams) to inference format (.pdmodel + .pdiparams)For detailed training documentation, see section 4 (Model Training System). For deployment options, see section 5 (Deployment and Inference). For quality assurance details, see section 6.1 (Testing Framework).
Sources: README.md228-244 .github/workflows/python-publish.yml1-40 TIPC testing diagram, test_tipc/
Official Documentation:
Online Resources:
GitHub Issue Templates:
Issue Lifecycle:
Contribution Workflow:
main or dygraphContribution Areas:
Community Engagement:
Quality Standards:
Sources: README.md408-455 docs/index.en.md89-96 docs/index.md80-88 .github/ISSUE_TEMPLATE/bug-report.yml1-100 .github/workflows/close_inactive_issues.yaml1-24
To begin using PaddleOCR:
For migration from PaddleOCR 2.x, start with Version 2.x to 3.x Migration.
Refresh this wiki
This wiki was recently refreshed. Please wait 6 days to refresh again.