Configuration System and Common Arguments

Relevant source files

Purpose and Scope

This document describes PaddleOCR's configuration system for inference, focusing on common arguments that control runtime behavior, hardware acceleration, and model execution. These configurations apply across all PaddleOCR pipelines (PP-OCRv5, PP-StructureV3, PP-ChatOCRv4, etc.) and determine how models are loaded, optimized, and executed.

Topics covered:

Common inference arguments (device, enable_hpi, precision, etc.)
Hardware acceleration configuration (TensorRT, MKL-DNN, CINN)
The PaddlePredictorOption abstraction and its relationship to PaddleX
Advanced configuration via the paddlex_config parameter

Related topics:

For pipeline-specific parameters (detection thresholds, batch sizes, etc.), see individual pipeline documentation (2.1, 2.2, 2.3)
For training configuration (optimizers, loss functions, data augmentation), see 4
For deployment-specific setup (C++ builds, Paddle-Lite optimization), see 5
For the PaddleX integration layer, see 3.2

Configuration System Architecture

PaddleOCR's configuration system separates user-facing parameters from underlying inference engine configuration. User parameters flow through multiple abstraction layers before reaching the PaddleX inference engine.

Configuration Flow:

User Parameters: Raw arguments from Python API or CLI (e.g., device="gpu:0", enable_hpi=True)
CommonArguments: Validated dataclass instance holding common inference settings (paddleocr/_common_args.py23-86)
PaddlePredictorOption: PaddleX-compatible configuration object created via get_paddlex_predictor_option() (paddleocr/_common_args.py88-151)
Pipeline Overrides: Pipeline-specific configuration tweaks merged into final config
PaddleX Pipeline: Instantiated with merged configuration, managing model execution

Sources: paddleocr/_common_args.py1-151 docs/version3.x/pipeline_usage/OCR.md995-1056 Diagram 6 from system architecture

Common Inference Arguments

Overview Table

The following table summarizes common inference arguments supported across all PaddleOCR pipelines:

Argument	Type	Default	Description
`device`	`str`	Auto-detected	Target hardware: `cpu`, `gpu:0`, `npu:0`, `xpu:0`, `mlu:0`, `dcu:0`
`enable_hpi`	`bool`	`False`	Enable High Performance Inference mode (auto-selects optimal backend)
`precision`	`str`	`"fp32"`	Computation precision: `fp32`, `fp16`, `int8`
`use_tensorrt`	`bool`	`False`	Enable TensorRT subgraph engine (GPU only)
`enable_mkldnn`	`bool`	`True`	Enable MKL-DNN acceleration (CPU only)
`mkldnn_cache_capacity`	`int`	`10`	MKL-DNN cache size for operator optimization
`cpu_threads`	`int`	`8`	Thread count for CPU inference
`paddlex_config`	`str`	`None`	Path to advanced PaddleX configuration file

Sources: docs/version3.x/pipeline_usage/OCR.md995-1056 paddleocr/_common_args.py23-86

Device Selection

The device parameter specifies the target hardware for inference. PaddleOCR supports a wide range of devices:

Device Auto-detection: If device is not specified, PaddleOCR attempts to use the first available GPU (gpu:0). If no GPU is available, it falls back to CPU.

Implementation: Device parsing occurs in CommonArguments.__post_init__() (paddleocr/_common_args.py42-61), which extracts the device type (e.g., "gpu") and device ID (e.g., 0) for downstream use.

Sources: docs/version3.x/pipeline_usage/OCR.md995-1008 paddleocr/_common_args.py42-61

High Performance Inference (HPI)

The enable_hpi flag activates PaddleOCR's High Performance Inference mode, which automatically selects the optimal inference backend and precision for the target hardware.

Behavior:

GPU: Attempts to use TensorRT with mixed precision (fp16/int8 if supported)
CPU: Enables MKL-DNN with operator fusion and quantization
Backend Selection: May use OpenVINO, TensorRT, or other accelerators depending on availability

Trade-offs:

Pros: Maximum throughput, reduced latency, optimal resource utilization
Cons: Longer initialization time (model compilation), potential precision loss

Performance Impact: Documentation shows "High-Performance Mode" inference times in benchmark tables (e.g., docs/version3.x/pipeline_usage/OCR.md34-35). For PP-OCRv5_server_det on GPU, high-performance mode reduces inference time from 89.55ms to 70.19ms.

Sources: docs/version3.x/pipeline_usage/OCR.md1010-1014 docs/version3.x/pipeline_usage/OCR.md673-696

Precision Control

The precision parameter controls numerical precision used during model inference:

Precision	Description	Use Case
`fp32`	32-bit floating point (full precision)	Maximum accuracy, default mode
`fp16`	16-bit floating point (half precision)	GPU acceleration, reduced memory
`int8`	8-bit integer quantization	Extreme performance, edge deployment

Hardware Compatibility:

fp16: Requires GPU with Tensor Cores (V100, A100, etc.) or specialized hardware
int8: Requires calibration dataset for post-training quantization

Integration: The precision setting is passed to PaddlePredictorOption and affects model loading and operator execution (paddleocr/_common_args.py117-119).

Sources: docs/version3.x/pipeline_usage/OCR.md1025-1029 paddleocr/_common_args.py117-119

Hardware Acceleration Flags

Additional flags control specific acceleration technologies:

TensorRT (GPU):

Enables NVIDIA TensorRT subgraph engine
Compatible with CUDA 11.8 + TensorRT 8.6.1.6
See TensorRT Configuration for details

MKL-DNN (CPU):

Enables Intel MKL-DNN (oneDNN) operator acceleration
mkldnn_cache_capacity: Controls operator cache size (default: 10)
Active by default on CPU

CPU Threading:

Controls parallelism for CPU inference
Default: 8 threads
Higher values may improve throughput on multi-core systems

Sources: docs/version3.x/pipeline_usage/OCR.md1016-1050 paddleocr/_common_args.py63-85

PaddlePredictorOption

PaddlePredictorOption is the bridge between PaddleOCR's user-facing configuration and PaddleX's inference engine. It encapsulates all predictor settings in a PaddleX-compatible format.

Construction Process

The get_paddlex_predictor_option() method (paddleocr/_common_args.py88-151) performs the following transformations:

Device Mapping: Converts device="gpu:0" to device_type="gpu" and device_id=0
Run Mode Selection: Maps enable_hpi=True to appropriate run_mode value
Acceleration Flags: Translates use_tensorrt, enable_mkldnn to PaddleX format
Thread Configuration: Passes through cpu_threads setting

Code Structure:

Run Mode Determination

The run_mode field in PaddlePredictorOption determines the inference backend:

Run Mode	Description	Trigger
`"paddle"`	Standard Paddle Inference	Default CPU/GPU mode
`"trt_fp32"`	TensorRT with FP32	`use_tensorrt=True` + `precision="fp32"`
`"trt_fp16"`	TensorRT with FP16	`use_tensorrt=True` + `precision="fp16"`
`"trt_int8"`	TensorRT with INT8	`use_tensorrt=True` + `precision="int8"`
Auto-selected	Optimal backend	`enable_hpi=True`

Implementation: Run mode determination logic is in the _determine_run_mode() helper method within get_paddlex_predictor_option().

Sources: paddleocr/_common_args.py88-151 Diagram 6 from system architecture

Advanced Configuration via paddlex_config

For complex scenarios requiring fine-grained control, the paddlex_config parameter accepts a path to a JSON/YAML configuration file that directly configures the underlying PaddleX pipeline.

Usage

Configuration Override Mechanism

The configuration merging process follows this priority order:

Priority Rules:

User parameters (CLI/API arguments) override all other sources
paddlex_config file overrides pipeline defaults
Pipeline-specific overrides (_get_paddlex_config_overrides()) override defaults
Pipeline defaults (_get_default_paddlex_config()) provide baseline configuration

Example Configuration File

Use Cases:

Multi-model pipelines: Configure different models for different sub-tasks
Advanced acceleration: Combine multiple acceleration strategies (TensorRT + CINN)
Production deployment: Version-controlled configuration files
Batch processing: Optimize batch sizes for throughput

Pipeline-Specific Overrides

Each pipeline can implement _get_paddlex_config_overrides() to inject pipeline-specific configuration:

This mechanism allows pipelines to translate user-friendly parameters (e.g., use_table_recognition=True) into detailed PaddleX configuration.

Sources: docs/version3.x/pipeline_usage/OCR.md1052-1056 paddleocr/_pipelines/pp_structurev3.py1-500

Acceleration Technologies Deep Dive

TensorRT Configuration

Overview: TensorRT is NVIDIA's high-performance deep learning inference optimizer. It fuses layers, selects optimal kernels, and performs precision calibration to maximize GPU throughput.

Enabling TensorRT:

Requirements:

NVIDIA GPU with Compute Capability ≥ 6.1
CUDA 11.8 or later
TensorRT 8.6.1.6 (recommended for Paddle 3.0)

Performance Impact (from documentation): For PP-OCRv5_server_det on NVIDIA Tesla T4:

Standard mode: 89.55ms
High-performance mode (TensorRT fp16): 70.19ms
Speedup: 1.28x

Initialization Overhead: First inference requires ~10-60 seconds for TensorRT engine compilation. Subsequent inferences use cached engines.

Precision Calibration:

fp32: No calibration needed, minimal accuracy loss
fp16: Automatic mixed precision, ~0.5% accuracy loss
int8: Requires calibration dataset, up to 2% accuracy loss

Implementation Details: When use_tensorrt=True, the run_mode in PaddlePredictorOption is set to "trt_fp32", "trt_fp16", or "trt_int8" based on the precision setting (paddleocr/_common_args.py88-151).

Sources: docs/version3.x/pipeline_usage/OCR.md1016-1023 docs/version3.x/pipeline_usage/OCR.md656-667 paddleocr/_common_args.py88-151

MKL-DNN (oneDNN) Configuration

Overview: Intel Math Kernel Library for Deep Neural Networks (MKL-DNN), now called oneDNN, provides optimized CPU operators for deep learning inference. It's automatically enabled for CPU inference.

Enabling MKL-DNN:

Key Features:

Operator Fusion: Combines multiple operations (e.g., Conv + BN + ReLU) into single kernels
Vectorization: Uses AVX-512, AVX2, SSE instructions for SIMD parallelism
Memory Optimization: Reduces memory bandwidth requirements
Cache Management: mkldnn_cache_capacity controls operator compilation cache size

Performance Impact (from documentation): For PP-OCRv5_mobile_det on Intel Xeon Gold 6271C:

Without MKL-DNN: ~80ms (estimated)
With MKL-DNN: 28.15ms (high-performance mode)
Speedup: ~2.8x

Cache Capacity: The mkldnn_cache_capacity parameter (default: 10) controls how many compiled operators are cached. Higher values reduce recompilation at the cost of memory:

Small models: 10 sufficient
Large models or batch processing: 20-50 recommended

Compatibility:

Required: Intel CPUs with AVX2 or later
Optimal: Intel CPUs with AVX-512 (Skylake, Cascade Lake, Ice Lake)
Fallback: Works on AMD CPUs with reduced performance

Sources: docs/version3.x/pipeline_usage/OCR.md1031-1044 paddleocr/_common_args.py63-85 docs/version3.x/pipeline_usage/OCR.md673-696

CINN Compiler

Overview: CINN (Compiler Infrastructure for Neural Networks) is a deep learning compiler that generates optimized kernels for models. It's automatically enabled with enable_hpi=True.

Features:

Operator Fusion: Fuses multiple operations into single kernels
Memory Optimization: Reduces memory allocation overhead
Auto-tuning: Searches for optimal kernel configurations

Enabling CINN: CINN is typically enabled automatically with High Performance Inference:

Performance Characteristics:

First Run: Longer initialization (kernel compilation and tuning)
Subsequent Runs: Faster execution using cached kernels
Best For: Production scenarios with repeated inference

Sources: docs/version3.x/pipeline_usage/OCR.md1010-1014 Diagram 6 from system architecture

High Performance Plugin (HPI)

Overview: The High Performance Plugin is PaddleX's auto-optimization framework that selects the best combination of backend, precision, and acceleration technologies for the target hardware.

Activation:

Auto-Selection Logic:

Decision Criteria:

Hardware Detection: Identifies available accelerators (GPU, NPU, XPU, etc.)
TensorRT Check: Tests TensorRT availability and model compatibility
Precision Selection: Chooses optimal precision based on hardware capabilities
Fallback Strategy: Gracefully degrades if advanced features are unavailable