Menu

Overview

Relevant source files

PaddleOCR is a production-ready OCR and document AI engine built on PaddlePaddle 3.0, providing end-to-end solutions from text extraction to intelligent document understanding. The system converts images and PDFs into structured, AI-friendly data (JSON/Markdown) through a modular architecture supporting 109+ languages, multiple text types (printed, handwritten, mixed scripts), and heterogeneous hardware platforms including domestic accelerators (Kunlunxin XPU, Ascend NPU).

Key Project Statistics:

  • 60,000+ GitHub stars
  • 100+ million PyPI downloads
  • Integrated into 6,000+ repositories including MinerU, RAGFlow, pathway, cherry-studio
  • Supports Python 3.8-3.12 on Linux, Windows, macOS

The codebase is distributed as a Python package via pip, with functionality exposed through:

This page introduces PaddleOCR's purpose, architecture, and project structure. For installation, see page 1.2. For migration from 2.x, see page 1.3. For pipeline details, see section 2.

Sources: README.md1-50 docs/index.en.md1-60 pyproject.toml1-105 paddleocr/__init__.py1-50 setup.py1-19


System Purpose and Key Features

PaddleOCR 3.0 addresses document understanding through four flagship pipelines:

  1. PP-OCRv5 Universal Text Recognition (paddleocr.PaddleOCR class): Single model supports 5 text types (Simplified/Traditional Chinese, English, Japanese, Pinyin) plus handwriting; 13% accuracy improvement over v4
  2. PaddleOCR-VL Multilingual Parsing (paddleocr.PaddleOCRVL class): 0.9B parameter vision-language model supporting 109 languages with unified recognition of text, tables, formulas, charts
  3. PP-StructureV3 Document Parsing (paddleocr.PPStructureV3 class): Converts complex PDFs to Markdown/JSON preserving layout; outperforms commercial solutions on OmniDocBench benchmark
  4. PP-ChatOCRv4 Intelligent Extraction (paddleocr.PPChatOCRv4Doc class): Integrates ERNIE 4.5 LLM for intelligent information extraction; 15% accuracy improvement over v3

Development Lifecycle Support:

Project Structure:

PaddleOCR/ ├── paddleocr/ # Main package (Python API exports) │ ├── __init__.py # Exports PaddleOCR, PPStructureV3, etc. │ ├── _cli.py # CLI implementation │ └── _abstract.py # Base classes and interfaces ├── tools/ # Training and evaluation scripts │ ├── train.py # Model training entry point │ ├── eval.py # Model evaluation │ └── export_model.py # Inference model export ├── deploy/ # Deployment implementations │ ├── cpp_infer/ # C++ inference │ └── lite/ # Mobile deployment (Paddle-Lite) ├── test_tipc/ # Testing framework (TIPC) ├── benchmark/ # Performance benchmarking ├── pyproject.toml # Package configuration └── setup.py # Build system entry point 

Sources: README.md48-75 docs/index.en.md17-42 pyproject.toml1-105 paddleocr/__init__.py1-50 tools/train.py1-50 tools/eval.py1-50 tools/export_model.py1-100


Core Pipelines

PaddleOCR 3.0 organizes its functionality into five flagship pipelines that build upon each other through hierarchical module composition:

Diagram: Core Pipeline Class Hierarchy and Module Composition

Pipeline Characteristics:

PipelinePython ClassPrimary PurposeKey Features
PP-OCRv5PaddleOCRUniversal scene text recognition13% accuracy improvement; single model supports 5 text types + handwriting
PaddleOCR-VLPaddleOCRVLEnd-to-end VLM document parsingSOTA 0.9B parameter model; 109 languages; recognizes text, tables, formulas, charts
PP-StructureV3PPStructureV3Complex document parsingOutputs Markdown/JSON; outperforms commercial solutions on OmniDocBench
PP-ChatOCRv4PPChatOCRv4DocLLM-enhanced information extraction15% accuracy improvement; native ERNIE 4.5 integration with RAG
PP-DocTranslationPPDocTranslationIntelligent document translationPreserves layout; based on PP-StructureV3 + ERNIE 4.5

Architecture Design Principles:

  • Modular Composition: Basic modules (text detection, recognition) in individual classes serve as reusable building blocks for complex pipelines
  • Progressive Enhancement: Simpler pipelines (PP-OCRv5) provide foundational OCR; advanced pipelines (PP-ChatOCRv4) add LLM intelligence
  • Dual Paradigm Support: Traditional pipeline models (composing detection + recognition) coexist with unified VLM approach (PaddleOCR-VL)
  • Deployment Flexibility: Same model code runs on CPU, GPU, XPU, NPU through unified inference interface
  • Unified Interface: All pipelines inherit from PaddleXPipelineWrapper base class providing consistent predict() API

Sources: README.md48-75 docs/index.en.md17-42 pyproject.toml81-98 paddleocr/__init__.py1-50 docs/update/upgrade_notes.en.md15-50


High-Level System Design

PaddleOCR 3.0 employs a layered, modular architecture designed for flexibility, reusability, and production deployment. The system is organized into distinct layers, each with clear responsibilities:

Diagram: System Architecture Layers

Key Architectural Principles:

  1. Separation of Concerns: User interfaces (CLI/API) are decoupled from pipeline logic, which is decoupled from inference execution
  2. Hierarchical Composition: Complex pipelines compose basic modules; each layer builds on the layer below
  3. Unified Inference: PaddleX wrapper provides consistent inference interface across all models and hardware
  4. Plugin-Based Design: New modules and pipelines can be added without modifying existing code
  5. Hardware Abstraction: Same code runs across heterogeneous platforms (CPU, GPU, domestic accelerators)

For detailed architecture documentation including module interactions, data flow, and component diagrams, see page 1.1 (System Architecture and Components).

Sources: README.md58-70 docs/update/upgrade_notes.en.md15-21 pyproject.toml81-98 paddleocr/__main__.py18-39 paddleocr/__init__.py


Installation and Quick Start

Installation

PaddleOCR 3.0 requires PaddlePaddle 3.0+ (installable via pip install paddlepaddle==3.2.0 or paddlepaddle-gpu==3.2.0). The package provides optional dependency groups defined in pyproject.toml50-78:

Dependency GroupInstallation CommandFeaturesDefined in pyproject.toml
Core (default)pip install paddleocrBasic text recognition (PP-OCR series)Base dependencies
doc-parserpip install "paddleocr[doc-parser]"Document parsing (PP-StructureV3)Line 52-56
iepip install "paddleocr[ie]"Information extraction (PP-ChatOCRv4)Line 57-61
transpip install "paddleocr[trans]"Document translation (PP-DocTranslation)Line 62-64
allpip install "paddleocr[all]"Complete functionalityLine 65-78

The build system uses setup.py1-19 which delegates to setuptools.setup() with configuration from pyproject.toml1-98

Sources: README.md211-230 docs/quick_start.en.md7-35 setup.py1-19 pyproject.toml50-78 docs/version3.x/installation.en.md97-131

Entry Points

PaddleOCR provides two primary interfaces:

1. Command-Line Interface (CLI)

The CLI is implemented via paddleocr/__main__.py21-35 which calls console_entry(), wrapping the main CLI logic in paddleocr/_cli.py Subcommands are registered in pyproject.toml81-98 under [project.scripts]:

Each CLI subcommand maps to a class implementing CLISubcommandExecutor abstract interface defined in paddleocr/_abstract.py18-25

2. Python API

The Python API exposes classes from paddleocr/__init__.py All classes inherit from PaddleXPipelineWrapper (see page 3.2 for integration details):

Other importable classes include PPStructureV3, PPChatOCRv4Doc, PPDocTranslation, PaddleOCRVL, TextDetection, TextRecognition, DocPreprocessor, and specialized modules (see page 2.6 for complete list).

Sources: README.md231-265 docs/quick_start.en.md37-196 paddleocr/__init__.py paddleocr/_abstract.py18-25


Hardware and Deployment Support

Supported Hardware Platforms

PaddleOCR 3.0 supports deployment across heterogeneous hardware:

Sources: README.md14-16 README.md74-79 docs/version3.x/paddlex/overview.en.md140-165

Deployment Modes

PaddleOCR provides six deployment paths from trained models:

Deployment ModeImplementationConfigurationUse Case
Python Inferencepip install paddleocrDefault Paddle Inference or ONNX Runtime backendDevelopment, prototyping, simple applications
High-Performance Inferenceenable_hpi=True parameterTensorRT (GPU), MKL-DNN (CPU), CINN compilerProduction servers requiring maximum throughput
Service DeploymentPaddleX Serving (HTTP/gRPC)Docker containers, multi-language clientsMicroservices, cloud applications
On-Device DeploymentPaddle-Lite (.nb models)ARM CPU/GPU/NPU optimizationAndroid, iOS, embedded systems
C++ InferenceNative binary via CMakeLinks Paddle C++ libraryHigh-performance local applications
ONNX Exportpaddle2onnx conversionCross-platform ONNX RuntimePlatform-agnostic deployment

All modes support multiple hardware backends (CPU, GPU, XPU, NPU, MLU, DCU) through PaddlePaddle's unified inference API. Model export is handled by tools/export_model.py

Model Weight Distribution:

  • Default source: HuggingFace (configurable via PADDLE_PDX_MODEL_SOURCE environment variable)
  • Alternative sources: BOS (Baidu Object Storage), AIStudio, ModelScope
  • Auto-download on first inference; cached locally in ~/.paddlex/

Sources: README.md372-376 docs/version3.x/deployment/high_performance_inference.en.md1-50 tools/export_model.py1-100 README.md170-185


Integration with PaddleX

PaddleOCR 3.0 deeply integrates with PaddleX a low-code development platform:

PaddleX provides:

  • One-click model inference for all PaddleOCR pipelines
  • No-code training via cloud-based GUI at AI Studio
  • Unified deployment APIs across multiple hardware platforms
  • 200+ additional models for image classification, object detection, segmentation, time series

Sources: docs/version3.x/paddlex/overview.en.md1-21 README.md57 docs/update/upgrade_notes.en.md20


Version 3.0 Architecture Evolution

PaddleOCR 3.0 represents a complete architectural redesign from 2.x:

Key Changes:

AspectPaddleOCR 2.xPaddleOCR 3.x
ArchitectureMonolithic with feature branchesModular, plugin-based
InterfacesMixed, inconsistent APIsUnified CLI and Python API
Pipeline DesignSingle PPStructure classSeparate pipelines: PPStructureV3, PPChatOCRv4Doc, etc.
DeploymentLimited to PaddleServingHigh-performance, service, on-device, C++
FrameworkPaddlePaddle 2.xPaddlePaddle 3.0 with CINN compiler
LLM IntegrationNoneNative ERNIE 4.5 support in PP-ChatOCRv4

Breaking Changes:

  • PaddleOCR.ocr() method no longer accepts det, rec parameters (use dedicated TextDetection, TextRecognition classes instead)
  • PPStructure class removed (replaced by PPStructureV3)
  • show_log parameter replaced by comprehensive logging system
  • use_onnx parameter replaced by high-performance inference configuration

For detailed migration guidance, see Version 2.x to 3.x Migration.

Sources: docs/update/upgrade_notes.en.md1-83 docs/update/upgrade_notes.md1-84


Model Capabilities and Supported Scenarios

PaddleOCR 3.0 provides comprehensive model support across text recognition, document analysis, and preprocessing:

Text Recognition:

  • PP-OCRv5 (Main): Single model supporting 5 text types (Simplified Chinese, Traditional Chinese, English, Japanese, Pinyin) plus handwriting
  • PP-OCRv5 (Multilingual): 37+ languages including French, Spanish, Portuguese, Russian, Korean, Latin scripts, Cyrillic, Arabic, Devanagari, Telugu, Tamil covering 109 languages total
  • PP-OCRv5 (Specialized): Dedicated high-accuracy models for English, Thai, Greek
  • PaddleOCR-VL: 0.9B parameter vision-language model supporting 109 languages with unified element recognition

Document Structure Analysis:

  • Layout Detection: PP-DocLayout series (L/M/S) supporting 23 layout categories (text blocks, titles, tables, figures, formulas, etc.)
  • Table Recognition: SLANeXt models for wired/wireless tables; RT-DETR for table cell detection; supports nested formulas and images
  • Formula Recognition: PP-FormulaNet (L/S) with 50,000 LaTeX vocabulary for printed and handwritten formulas
  • Seal Recognition: Specialized curved text detection for official stamps
  • Chart Recognition: PP-Chart2Table for converting charts to structured table format

Document Preprocessing:

  • Orientation Classification: PP-LCNet models for document and text line rotation detection
  • Document Unwarping: UVDoc model for correcting distorted/curved documents
  • Text Line Orientation: Ultra-lightweight (0.3M parameters) classification model

Comprehensive Pipeline Models:

  • PP-StructureV3: Combines multiple modules for end-to-end document parsing to Markdown/JSON
  • PP-ChatOCRv4: Integrates PP-DocBee2 multimodal model with ERNIE 4.5 for intelligent extraction
  • PP-DocTranslation: Document translation preserving layout using PP-StructureV3 + ERNIE 4.5

Sources: README.md76-97 README.md158-172 docs/update/update.en.md159-172


Development Workflow and Tools

PaddleOCR 3.0 provides a complete toolkit for the AI development lifecycle:

Diagram: Development Lifecycle Workflow

Key Development Features:

  1. Data Annotation: PPOCRLabel tool (separate repository) for text detection, recognition, table, and KIE annotation
  2. Model Training: Unified training via tools/train.py accepting YAML configs, supporting distributed training, AMP, and multiple optimizers
  3. Model Evaluation: tools/eval.py calculates task-specific metrics (DetMetric for detection, RecMetric for recognition)
  4. Model Export: tools/export_model.py converts training checkpoints (.pdparams) to inference format (.pdmodel + .pdiparams)
  5. Quality Assurance: TIPC framework in test_tipc/ validates training, inference, and deployment across configurations using test_tipc/prepare.sh and test_tipc/test_train_inference_python.sh
  6. Performance Analysis: Benchmark tools (benchmark_train.sh scripts/analysis.py) measure end-to-end and per-module latency
  7. Multiple Deployment Paths: Python (via paddleocr/__init__.py), C++ (deploy/cpp_infer/), mobile (deploy/lite/), service (PaddleX Serving)

For detailed training documentation, see section 4 (Model Training System). For deployment options, see section 5 (Deployment and Inference). For quality assurance details, see section 6.1 (Testing Framework).

Sources: README.md228-244 .github/workflows/python-publish.yml1-40 TIPC testing diagram, test_tipc/


Community and Contribution Guidelines

Documentation Resources

Official Documentation:

Online Resources:

Getting Support

GitHub Issue Templates:

Issue Lifecycle:

  • Inactive issues are automatically closed after 30 days (managed by .github/workflows/close_inactive_issues.yaml)
  • Stale issues receive warning labels after 7 days of inactivity
  • Community members can reopen closed issues with updates

Contributing to PaddleOCR

Contribution Workflow:

  1. Fork and Clone: Fork the repository and clone your fork locally
  2. Environment Setup: Follow docs/version3.x/installation.en.md for development environment
  3. Create Branch: Create a feature branch from main or dygraph
  4. Code Development: Follow PaddlePaddle coding standards and add tests in test_tipc/
  5. Testing: Run TIPC tests via test_tipc/test_train_inference_python.sh
  6. Pull Request: Submit PR using template in .github/PULL_REQUEST_TEMPLATE.md

Contribution Areas:

  • New Models: Add models to paddleocr/ following existing class structure
  • Bug Fixes: Address issues in GitHub Issues
  • Documentation: Improve docs in docs/ directory (multi-language support welcomed)
  • Deployment: Enhance deployment tools in deploy/
  • Testing: Expand test coverage in test_tipc/

Community Engagement:

Quality Standards:

  • All new models must include training configs in configs/
  • Deployment code must support CPU and at least one GPU platform
  • Documentation must be provided in both English and Chinese
  • Tests must pass in TIPC framework before merge

Sources: README.md408-455 docs/index.en.md89-96 docs/index.md80-88 .github/ISSUE_TEMPLATE/bug-report.yml1-100 .github/workflows/close_inactive_issues.yaml1-24


Next Steps

To begin using PaddleOCR:

  1. Installation: Follow the Installation and Setup guide to install PaddleOCR with the appropriate dependency groups for your use case
  2. Quick Start: Try the Quick Start examples in Quick Start to run inference with pre-trained models
  3. Choose Your Pipeline: Review the Core Pipelines section to select the pipeline that matches your requirements
  4. Deployment: Plan your deployment strategy using the Deployment and Inference documentation
  5. Customization: For custom model training, see Model Architecture and Training

For migration from PaddleOCR 2.x, start with Version 2.x to 3.x Migration.

Sources: README.md205-244 docs/quick_start.en.md1-197