PP-OCRv5 Universal Text Recognition is the core pipeline in PaddleOCR 3.x for general-purpose text detection and recognition tasks. This pipeline extracts text from images and outputs it in editable text format, supporting diverse text types including printed, handwritten, vertical, and text with rare characters across multiple languages (Simplified Chinese, Traditional Chinese, English, Japanese).
PP-OCRv5 represents a 13% accuracy improvement over PP-OCRv4 in multi-scenario benchmarks while maintaining efficient inference performance. The pipeline is designed for universal scene text recognition, covering street scenes, web images, documents, and handwritten content.
Related Pipelines:
Sources: docs/version3.x/pipeline_usage/OCR.md1-23 docs/version3.x/pipeline_usage/OCR.en.md1-25
PP-OCRv5 follows a modular architecture with two required modules and three optional preprocessing modules:
Sources: docs/version3.x/pipeline_usage/OCR.md15-21 docs/version3.x/pipeline_usage/OCR.en.md15-21
| Module | Purpose | Model Options | Default Enabled |
|---|---|---|---|
| Document Orientation Classification | Detects document rotation (0°, 90°, 180°, 270°) and corrects orientation | PP-LCNet_x1_0_doc_ori | Yes |
| Text Image Unwarping | Corrects geometric distortions from photography/scanning | UVDoc | Yes |
| Text Detection | Locates text regions with bounding boxes | PP-OCRv5_server_det, PP-OCRv5_mobile_det | Required |
| Text Line Orientation | Identifies inverted text lines (0° vs 180°) | PP-LCNet_x0_25_textline_ori, PP-LCNet_x1_0_textline_ori | Yes |
| Text Recognition | Recognizes characters within detected regions | PP-OCRv5_server_rec, PP-OCRv5_mobile_rec | Required |
Sources: docs/version3.x/pipeline_usage/OCR.md27-116 docs/version3.x/pipeline_usage/OCR.en.md27-115
PP-OCRv5 provides two detection model variants optimized for different deployment scenarios:
| Model | Hmean (%) | GPU Time (ms) | CPU Time (ms) | Size (MB) | Use Case |
|---|---|---|---|---|---|
| PP-OCRv5_server_det | 83.8 | 89.55 / 70.19 | 383.15 | 84.3 | High-accuracy server deployment |
| PP-OCRv5_mobile_det | 79.0 | 10.67 / 6.36 | 57.77 / 28.15 | 4.7 | Edge device deployment |
Sources: docs/version3.x/pipeline_usage/OCR.md133-148 docs/version3.x/pipeline_usage/OCR.en.md132-147
PP-OCRv5 recognition models support multi-language and multi-scenario text:
| Model | Avg Acc (%) | Chinese | English | Traditional Chinese | Japanese | GPU Time (ms) | CPU Time (ms) | Size (MB) |
|---|---|---|---|---|---|---|---|---|
| PP-OCRv5_server_rec | 86.38 | 86.38 | 64.70 | 93.29 | 60.35 | 8.46 / 2.36 | 31.21 | 81 |
| PP-OCRv5_mobile_rec | 81.29 | 81.29 | 66.00 | 83.55 | 54.65 | 5.43 / 1.46 | 21.20 / 5.32 | 16 |
Key Features:
Sources: docs/version3.x/pipeline_usage/OCR.md184-284 docs/version3.x/pipeline_usage/OCR.en.md183-283
Sources: docs/version3.x/pipeline_usage/OCR.md190-191 docs/version3.x/pipeline_usage/OCR.en.md190-191
PP-OCRv5 models natively support:
Additional language-specific models are available for Korean, Latin, Cyrillic, Arabic, Devanagari, Thai, Greek, and other scripts.
Sources: docs/version3.x/pipeline_usage/OCR.md246-632 docs/version3.x/pipeline_usage/OCR.en.md245-631
The PP-OCRv5 pipeline can be accessed through multiple interfaces:
Sources: docs/version3.x/pipeline_usage/OCR.md709-1057 docs/version3.x/pipeline_usage/OCR.en.md708-1056
Basic invocation with default PP-OCRv5 models:
Version selection:
Sources: docs/version3.x/pipeline_usage/OCR.md711-724 docs/version3.x/pipeline_usage/OCR.en.md710-723
Configuration Parameters:
text_detection_model_name: Model name (e.g., PP-OCRv5_server_det)text_recognition_model_name: Model name (e.g., PP-OCRv5_server_rec)text_det_limit_side_len: Input size limit (default: 64)text_det_thresh: Pixel threshold (default: 0.3)text_det_box_thresh: Box threshold (default: 0.6)text_rec_score_thresh: Recognition confidence threshold (default: 0.0)Sources: docs/version3.x/pipeline_usage/OCR.md727-1057 docs/version3.x/pipeline_usage/OCR.en.md726-1056
PP-OCRv5 detection models use the Differentiable Binarization (DB) architecture:
Key Components:
unclip_ratioSources: docs/version3.x/pipeline_usage/OCR.md841-876
PP-OCRv5 recognition models use Scene Text Recognition Transformer (SVTR):
Character Set:
Sources: docs/version3.x/pipeline_usage/OCR.md184-284 docs/version3.x/pipeline_usage/OCR.en.md183-283
PP-OCRv5 supports two inference modes with different performance characteristics:
| Mode | Configuration | Use Case |
|---|---|---|
| Standard Mode | FP32 precision, no acceleration | Development, debugging |
| High-Performance Mode | TensorRT (GPU) / MKL-DNN (CPU), optimal precision | Production deployment |
Enabling High-Performance Mode:
Sources: docs/version3.x/pipeline_usage/OCR.md670-696 docs/version3.x/pipeline_usage/OCR.en.md669-695
Recognition module supports batch processing for improved throughput:
Trade-offs:
Sources: docs/version3.x/pipeline_usage/OCR.md799-820
Supported acceleration backends:
Configuration:
Sources: docs/version3.x/pipeline_usage/OCR.md995-1050 docs/version3.x/pipeline_usage/OCR.en.md994-1049
| Metric | PP-OCRv4 | PP-OCRv5 | Improvement |
|---|---|---|---|
| Detection Hmean (server) | 69.2% | 83.8% | +14.6% |
| Detection Hmean (mobile) | 63.8% | 79.0% | +15.2% |
| Recognition Accuracy (server) | 85.19% | 86.38% | +1.19% |
| Recognition Accuracy (mobile) | 78.74% | 81.29% | +2.55% |
| Overall End-to-End | Baseline | +13% | - |
Sources: docs/version3.x/pipeline_usage/OCR.md11 docs/version3.x/pipeline_usage/OCR.en.md11
Recommendations:
PP-OCRv5_server_det + PP-OCRv5_server_recPP-OCRv5_mobile_det + PP-OCRv5_mobile_recPP-OCRv4 or PP-OCRv3 versionsSources: docs/version3.x/pipeline_usage/OCR.md701-702
PP-StructureV3 uses PP-OCRv5 as a sub-pipeline for text extraction within document regions:
Sources: docs/version3.x/pipeline_usage/PP-StructureV3.md11-20 docs/version3.x/pipeline_usage/PP-StructureV3.en.md11-20
PP-ChatOCRv4 leverages PP-OCRv5 for text extraction before LLM processing:
Sources: docs/version3.x/pipeline_usage/PP-ChatOCRv4.md15-26 docs/version3.x/pipeline_usage/PP-ChatOCRv4.en.md13-24
Hardware:
Software:
Test Datasets:
Sources: docs/version3.x/pipeline_usage/OCR.md637-667 docs/version3.x/pipeline_usage/OCR.en.md636-666
Inference Time Breakdown:
| Component | Server Model (ms) | Mobile Model (ms) |
|---|---|---|
| Document Orientation | 2.62 / 0.59 | 2.62 / 0.59 |
| Text Unwarping | 19.05 | 19.05 |
| Text Detection | 89.55 / 70.19 | 10.67 / 6.36 |
| Line Orientation | 2.16 / 0.41 | 2.16 / 0.41 |
| Text Recognition (per region) | 8.46 / 2.36 | 5.43 / 1.46 |
Format: Standard Mode / High-Performance Mode
Sources: docs/version3.x/pipeline_usage/OCR.md29-240 docs/version3.x/pipeline_usage/OCR.en.md28-239
For basic text extraction with minimal preprocessing:
Sources: docs/version3.x/pipeline_usage/OCR.md715-720
For maximum accuracy on server hardware:
Sources: docs/version3.x/pipeline_usage/OCR.md750-890
For deployment on resource-constrained devices:
Sources: docs/version3.x/pipeline_usage/OCR.md142-148
For specific language support:
Sources: docs/version3.x/pipeline_usage/OCR.md904-920 docs/version3.x/pipeline_usage/OCR.md432-632
Refresh this wiki
This wiki was recently refreshed. Please wait 6 days to refresh again.