This document describes PaddleOCR's configuration system for inference, focusing on common arguments that control runtime behavior, hardware acceleration, and model execution. These configurations apply across all PaddleOCR pipelines (PP-OCRv5, PP-StructureV3, PP-ChatOCRv4, etc.) and determine how models are loaded, optimized, and executed.
Topics covered:
device, enable_hpi, precision, etc.)PaddlePredictorOption abstraction and its relationship to PaddleXpaddlex_config parameterRelated topics:
PaddleOCR's configuration system separates user-facing parameters from underlying inference engine configuration. User parameters flow through multiple abstraction layers before reaching the PaddleX inference engine.
Configuration Flow:
device="gpu:0", enable_hpi=True)get_paddlex_predictor_option() (paddleocr/_common_args.py88-151)Sources: paddleocr/_common_args.py1-151 docs/version3.x/pipeline_usage/OCR.md995-1056 Diagram 6 from system architecture
The following table summarizes common inference arguments supported across all PaddleOCR pipelines:
| Argument | Type | Default | Description |
|---|---|---|---|
device | str | Auto-detected | Target hardware: cpu, gpu:0, npu:0, xpu:0, mlu:0, dcu:0 |
enable_hpi | bool | False | Enable High Performance Inference mode (auto-selects optimal backend) |
precision | str | "fp32" | Computation precision: fp32, fp16, int8 |
use_tensorrt | bool | False | Enable TensorRT subgraph engine (GPU only) |
enable_mkldnn | bool | True | Enable MKL-DNN acceleration (CPU only) |
mkldnn_cache_capacity | int | 10 | MKL-DNN cache size for operator optimization |
cpu_threads | int | 8 | Thread count for CPU inference |
paddlex_config | str | None | Path to advanced PaddleX configuration file |
Sources: docs/version3.x/pipeline_usage/OCR.md995-1056 paddleocr/_common_args.py23-86
The device parameter specifies the target hardware for inference. PaddleOCR supports a wide range of devices:
Device Auto-detection: If device is not specified, PaddleOCR attempts to use the first available GPU (gpu:0). If no GPU is available, it falls back to CPU.
Implementation: Device parsing occurs in CommonArguments.__post_init__() (paddleocr/_common_args.py42-61), which extracts the device type (e.g., "gpu") and device ID (e.g., 0) for downstream use.
Sources: docs/version3.x/pipeline_usage/OCR.md995-1008 paddleocr/_common_args.py42-61
The enable_hpi flag activates PaddleOCR's High Performance Inference mode, which automatically selects the optimal inference backend and precision for the target hardware.
Behavior:
Trade-offs:
Performance Impact: Documentation shows "High-Performance Mode" inference times in benchmark tables (e.g., docs/version3.x/pipeline_usage/OCR.md34-35). For PP-OCRv5_server_det on GPU, high-performance mode reduces inference time from 89.55ms to 70.19ms.
Sources: docs/version3.x/pipeline_usage/OCR.md1010-1014 docs/version3.x/pipeline_usage/OCR.md673-696
The precision parameter controls numerical precision used during model inference:
| Precision | Description | Use Case |
|---|---|---|
fp32 | 32-bit floating point (full precision) | Maximum accuracy, default mode |
fp16 | 16-bit floating point (half precision) | GPU acceleration, reduced memory |
int8 | 8-bit integer quantization | Extreme performance, edge deployment |
Hardware Compatibility:
fp16: Requires GPU with Tensor Cores (V100, A100, etc.) or specialized hardwareint8: Requires calibration dataset for post-training quantizationIntegration: The precision setting is passed to PaddlePredictorOption and affects model loading and operator execution (paddleocr/_common_args.py117-119).
Sources: docs/version3.x/pipeline_usage/OCR.md1025-1029 paddleocr/_common_args.py117-119
Additional flags control specific acceleration technologies:
TensorRT (GPU):
MKL-DNN (CPU):
mkldnn_cache_capacity: Controls operator cache size (default: 10)CPU Threading:
Sources: docs/version3.x/pipeline_usage/OCR.md1016-1050 paddleocr/_common_args.py63-85
PaddlePredictorOption is the bridge between PaddleOCR's user-facing configuration and PaddleX's inference engine. It encapsulates all predictor settings in a PaddleX-compatible format.
The get_paddlex_predictor_option() method (paddleocr/_common_args.py88-151) performs the following transformations:
device="gpu:0" to device_type="gpu" and device_id=0enable_hpi=True to appropriate run_mode valueuse_tensorrt, enable_mkldnn to PaddleX formatcpu_threads settingCode Structure:
The run_mode field in PaddlePredictorOption determines the inference backend:
| Run Mode | Description | Trigger |
|---|---|---|
"paddle" | Standard Paddle Inference | Default CPU/GPU mode |
"trt_fp32" | TensorRT with FP32 | use_tensorrt=True + precision="fp32" |
"trt_fp16" | TensorRT with FP16 | use_tensorrt=True + precision="fp16" |
"trt_int8" | TensorRT with INT8 | use_tensorrt=True + precision="int8" |
| Auto-selected | Optimal backend | enable_hpi=True |
Implementation: Run mode determination logic is in the _determine_run_mode() helper method within get_paddlex_predictor_option().
Sources: paddleocr/_common_args.py88-151 Diagram 6 from system architecture
For complex scenarios requiring fine-grained control, the paddlex_config parameter accepts a path to a JSON/YAML configuration file that directly configures the underlying PaddleX pipeline.
The configuration merging process follows this priority order:
Priority Rules:
_get_paddlex_config_overrides()) override defaults_get_default_paddlex_config()) provide baseline configurationUse Cases:
Each pipeline can implement _get_paddlex_config_overrides() to inject pipeline-specific configuration:
This mechanism allows pipelines to translate user-friendly parameters (e.g., use_table_recognition=True) into detailed PaddleX configuration.
Sources: docs/version3.x/pipeline_usage/OCR.md1052-1056 paddleocr/_pipelines/pp_structurev3.py1-500
Overview: TensorRT is NVIDIA's high-performance deep learning inference optimizer. It fuses layers, selects optimal kernels, and performs precision calibration to maximize GPU throughput.
Enabling TensorRT:
Requirements:
Performance Impact (from documentation): For PP-OCRv5_server_det on NVIDIA Tesla T4:
Initialization Overhead: First inference requires ~10-60 seconds for TensorRT engine compilation. Subsequent inferences use cached engines.
Precision Calibration:
fp32: No calibration needed, minimal accuracy lossfp16: Automatic mixed precision, ~0.5% accuracy lossint8: Requires calibration dataset, up to 2% accuracy lossImplementation Details: When use_tensorrt=True, the run_mode in PaddlePredictorOption is set to "trt_fp32", "trt_fp16", or "trt_int8" based on the precision setting (paddleocr/_common_args.py88-151).
Sources: docs/version3.x/pipeline_usage/OCR.md1016-1023 docs/version3.x/pipeline_usage/OCR.md656-667 paddleocr/_common_args.py88-151
Overview: Intel Math Kernel Library for Deep Neural Networks (MKL-DNN), now called oneDNN, provides optimized CPU operators for deep learning inference. It's automatically enabled for CPU inference.
Enabling MKL-DNN:
Key Features:
mkldnn_cache_capacity controls operator compilation cache sizePerformance Impact (from documentation): For PP-OCRv5_mobile_det on Intel Xeon Gold 6271C:
Cache Capacity: The mkldnn_cache_capacity parameter (default: 10) controls how many compiled operators are cached. Higher values reduce recompilation at the cost of memory:
Compatibility:
Sources: docs/version3.x/pipeline_usage/OCR.md1031-1044 paddleocr/_common_args.py63-85 docs/version3.x/pipeline_usage/OCR.md673-696
Overview: CINN (Compiler Infrastructure for Neural Networks) is a deep learning compiler that generates optimized kernels for models. It's automatically enabled with enable_hpi=True.
Features:
Enabling CINN: CINN is typically enabled automatically with High Performance Inference:
Performance Characteristics:
Sources: docs/version3.x/pipeline_usage/OCR.md1010-1014 Diagram 6 from system architecture
Overview: The High Performance Plugin is PaddleX's auto-optimization framework that selects the best combination of backend, precision, and acceleration technologies for the target hardware.
Activation:
Auto-Selection Logic:
Decision Criteria:
Trade-offs:
Sources: docs/version3.x/pipeline_usage/OCR.md1010-1014 docs/version3.x/pipeline_usage/OCR.md673-696 Diagram 6 from system architecture
Sources: docs/version3.x/pipeline_usage/OCR.md995-1056 paddleocr/_common_args.py1-151
Refresh this wiki
This wiki was recently refreshed. Please wait 5 days to refresh again.