Menu

Configuration System and Common Arguments

Relevant source files

Purpose and Scope

This document describes PaddleOCR's configuration system for inference, focusing on common arguments that control runtime behavior, hardware acceleration, and model execution. These configurations apply across all PaddleOCR pipelines (PP-OCRv5, PP-StructureV3, PP-ChatOCRv4, etc.) and determine how models are loaded, optimized, and executed.

Topics covered:

  • Common inference arguments (device, enable_hpi, precision, etc.)
  • Hardware acceleration configuration (TensorRT, MKL-DNN, CINN)
  • The PaddlePredictorOption abstraction and its relationship to PaddleX
  • Advanced configuration via the paddlex_config parameter

Related topics:

  • For pipeline-specific parameters (detection thresholds, batch sizes, etc.), see individual pipeline documentation (2.1, 2.2, 2.3)
  • For training configuration (optimizers, loss functions, data augmentation), see 4
  • For deployment-specific setup (C++ builds, Paddle-Lite optimization), see 5
  • For the PaddleX integration layer, see 3.2

Configuration System Architecture

PaddleOCR's configuration system separates user-facing parameters from underlying inference engine configuration. User parameters flow through multiple abstraction layers before reaching the PaddleX inference engine.

Configuration Flow:

  1. User Parameters: Raw arguments from Python API or CLI (e.g., device="gpu:0", enable_hpi=True)
  2. CommonArguments: Validated dataclass instance holding common inference settings (paddleocr/_common_args.py23-86)
  3. PaddlePredictorOption: PaddleX-compatible configuration object created via get_paddlex_predictor_option() (paddleocr/_common_args.py88-151)
  4. Pipeline Overrides: Pipeline-specific configuration tweaks merged into final config
  5. PaddleX Pipeline: Instantiated with merged configuration, managing model execution

Sources: paddleocr/_common_args.py1-151 docs/version3.x/pipeline_usage/OCR.md995-1056 Diagram 6 from system architecture


Common Inference Arguments

Overview Table

The following table summarizes common inference arguments supported across all PaddleOCR pipelines:

ArgumentTypeDefaultDescription
devicestrAuto-detectedTarget hardware: cpu, gpu:0, npu:0, xpu:0, mlu:0, dcu:0
enable_hpiboolFalseEnable High Performance Inference mode (auto-selects optimal backend)
precisionstr"fp32"Computation precision: fp32, fp16, int8
use_tensorrtboolFalseEnable TensorRT subgraph engine (GPU only)
enable_mkldnnboolTrueEnable MKL-DNN acceleration (CPU only)
mkldnn_cache_capacityint10MKL-DNN cache size for operator optimization
cpu_threadsint8Thread count for CPU inference
paddlex_configstrNonePath to advanced PaddleX configuration file

Sources: docs/version3.x/pipeline_usage/OCR.md995-1056 paddleocr/_common_args.py23-86

Device Selection

The device parameter specifies the target hardware for inference. PaddleOCR supports a wide range of devices:

Device Auto-detection: If device is not specified, PaddleOCR attempts to use the first available GPU (gpu:0). If no GPU is available, it falls back to CPU.

Implementation: Device parsing occurs in CommonArguments.__post_init__() (paddleocr/_common_args.py42-61), which extracts the device type (e.g., "gpu") and device ID (e.g., 0) for downstream use.

Sources: docs/version3.x/pipeline_usage/OCR.md995-1008 paddleocr/_common_args.py42-61

High Performance Inference (HPI)

The enable_hpi flag activates PaddleOCR's High Performance Inference mode, which automatically selects the optimal inference backend and precision for the target hardware.

Behavior:

  • GPU: Attempts to use TensorRT with mixed precision (fp16/int8 if supported)
  • CPU: Enables MKL-DNN with operator fusion and quantization
  • Backend Selection: May use OpenVINO, TensorRT, or other accelerators depending on availability

Trade-offs:

  • Pros: Maximum throughput, reduced latency, optimal resource utilization
  • Cons: Longer initialization time (model compilation), potential precision loss

Performance Impact: Documentation shows "High-Performance Mode" inference times in benchmark tables (e.g., docs/version3.x/pipeline_usage/OCR.md34-35). For PP-OCRv5_server_det on GPU, high-performance mode reduces inference time from 89.55ms to 70.19ms.

Sources: docs/version3.x/pipeline_usage/OCR.md1010-1014 docs/version3.x/pipeline_usage/OCR.md673-696

Precision Control

The precision parameter controls numerical precision used during model inference:

PrecisionDescriptionUse Case
fp3232-bit floating point (full precision)Maximum accuracy, default mode
fp1616-bit floating point (half precision)GPU acceleration, reduced memory
int88-bit integer quantizationExtreme performance, edge deployment

Hardware Compatibility:

  • fp16: Requires GPU with Tensor Cores (V100, A100, etc.) or specialized hardware
  • int8: Requires calibration dataset for post-training quantization

Integration: The precision setting is passed to PaddlePredictorOption and affects model loading and operator execution (paddleocr/_common_args.py117-119).

Sources: docs/version3.x/pipeline_usage/OCR.md1025-1029 paddleocr/_common_args.py117-119

Hardware Acceleration Flags

Additional flags control specific acceleration technologies:

TensorRT (GPU):

  • Enables NVIDIA TensorRT subgraph engine
  • Compatible with CUDA 11.8 + TensorRT 8.6.1.6
  • See TensorRT Configuration for details

MKL-DNN (CPU):

  • Enables Intel MKL-DNN (oneDNN) operator acceleration
  • mkldnn_cache_capacity: Controls operator cache size (default: 10)
  • Active by default on CPU

CPU Threading:

  • Controls parallelism for CPU inference
  • Default: 8 threads
  • Higher values may improve throughput on multi-core systems

Sources: docs/version3.x/pipeline_usage/OCR.md1016-1050 paddleocr/_common_args.py63-85


PaddlePredictorOption

PaddlePredictorOption is the bridge between PaddleOCR's user-facing configuration and PaddleX's inference engine. It encapsulates all predictor settings in a PaddleX-compatible format.

Construction Process

The get_paddlex_predictor_option() method (paddleocr/_common_args.py88-151) performs the following transformations:

  1. Device Mapping: Converts device="gpu:0" to device_type="gpu" and device_id=0
  2. Run Mode Selection: Maps enable_hpi=True to appropriate run_mode value
  3. Acceleration Flags: Translates use_tensorrt, enable_mkldnn to PaddleX format
  4. Thread Configuration: Passes through cpu_threads setting

Code Structure:

Run Mode Determination

The run_mode field in PaddlePredictorOption determines the inference backend:

Run ModeDescriptionTrigger
"paddle"Standard Paddle InferenceDefault CPU/GPU mode
"trt_fp32"TensorRT with FP32use_tensorrt=True + precision="fp32"
"trt_fp16"TensorRT with FP16use_tensorrt=True + precision="fp16"
"trt_int8"TensorRT with INT8use_tensorrt=True + precision="int8"
Auto-selectedOptimal backendenable_hpi=True

Implementation: Run mode determination logic is in the _determine_run_mode() helper method within get_paddlex_predictor_option().

Sources: paddleocr/_common_args.py88-151 Diagram 6 from system architecture


Advanced Configuration via paddlex_config

For complex scenarios requiring fine-grained control, the paddlex_config parameter accepts a path to a JSON/YAML configuration file that directly configures the underlying PaddleX pipeline.

Usage

Configuration Override Mechanism

The configuration merging process follows this priority order:

Priority Rules:

  1. User parameters (CLI/API arguments) override all other sources
  2. paddlex_config file overrides pipeline defaults
  3. Pipeline-specific overrides (_get_paddlex_config_overrides()) override defaults
  4. Pipeline defaults (_get_default_paddlex_config()) provide baseline configuration

Example Configuration File

Use Cases:

  • Multi-model pipelines: Configure different models for different sub-tasks
  • Advanced acceleration: Combine multiple acceleration strategies (TensorRT + CINN)
  • Production deployment: Version-controlled configuration files
  • Batch processing: Optimize batch sizes for throughput

Pipeline-Specific Overrides

Each pipeline can implement _get_paddlex_config_overrides() to inject pipeline-specific configuration:

This mechanism allows pipelines to translate user-friendly parameters (e.g., use_table_recognition=True) into detailed PaddleX configuration.

Sources: docs/version3.x/pipeline_usage/OCR.md1052-1056 paddleocr/_pipelines/pp_structurev3.py1-500


Acceleration Technologies Deep Dive

TensorRT Configuration

Overview: TensorRT is NVIDIA's high-performance deep learning inference optimizer. It fuses layers, selects optimal kernels, and performs precision calibration to maximize GPU throughput.

Enabling TensorRT:

Requirements:

  • NVIDIA GPU with Compute Capability ≥ 6.1
  • CUDA 11.8 or later
  • TensorRT 8.6.1.6 (recommended for Paddle 3.0)

Performance Impact (from documentation): For PP-OCRv5_server_det on NVIDIA Tesla T4:

  • Standard mode: 89.55ms
  • High-performance mode (TensorRT fp16): 70.19ms
  • Speedup: 1.28x

Initialization Overhead: First inference requires ~10-60 seconds for TensorRT engine compilation. Subsequent inferences use cached engines.

Precision Calibration:

  • fp32: No calibration needed, minimal accuracy loss
  • fp16: Automatic mixed precision, ~0.5% accuracy loss
  • int8: Requires calibration dataset, up to 2% accuracy loss

Implementation Details: When use_tensorrt=True, the run_mode in PaddlePredictorOption is set to "trt_fp32", "trt_fp16", or "trt_int8" based on the precision setting (paddleocr/_common_args.py88-151).

Sources: docs/version3.x/pipeline_usage/OCR.md1016-1023 docs/version3.x/pipeline_usage/OCR.md656-667 paddleocr/_common_args.py88-151

MKL-DNN (oneDNN) Configuration

Overview: Intel Math Kernel Library for Deep Neural Networks (MKL-DNN), now called oneDNN, provides optimized CPU operators for deep learning inference. It's automatically enabled for CPU inference.

Enabling MKL-DNN:

Key Features:

  • Operator Fusion: Combines multiple operations (e.g., Conv + BN + ReLU) into single kernels
  • Vectorization: Uses AVX-512, AVX2, SSE instructions for SIMD parallelism
  • Memory Optimization: Reduces memory bandwidth requirements
  • Cache Management: mkldnn_cache_capacity controls operator compilation cache size

Performance Impact (from documentation): For PP-OCRv5_mobile_det on Intel Xeon Gold 6271C:

  • Without MKL-DNN: ~80ms (estimated)
  • With MKL-DNN: 28.15ms (high-performance mode)
  • Speedup: ~2.8x

Cache Capacity: The mkldnn_cache_capacity parameter (default: 10) controls how many compiled operators are cached. Higher values reduce recompilation at the cost of memory:

  • Small models: 10 sufficient
  • Large models or batch processing: 20-50 recommended

Compatibility:

  • Required: Intel CPUs with AVX2 or later
  • Optimal: Intel CPUs with AVX-512 (Skylake, Cascade Lake, Ice Lake)
  • Fallback: Works on AMD CPUs with reduced performance

Sources: docs/version3.x/pipeline_usage/OCR.md1031-1044 paddleocr/_common_args.py63-85 docs/version3.x/pipeline_usage/OCR.md673-696

CINN Compiler

Overview: CINN (Compiler Infrastructure for Neural Networks) is a deep learning compiler that generates optimized kernels for models. It's automatically enabled with enable_hpi=True.

Features:

  • Operator Fusion: Fuses multiple operations into single kernels
  • Memory Optimization: Reduces memory allocation overhead
  • Auto-tuning: Searches for optimal kernel configurations

Enabling CINN: CINN is typically enabled automatically with High Performance Inference:

Performance Characteristics:

  • First Run: Longer initialization (kernel compilation and tuning)
  • Subsequent Runs: Faster execution using cached kernels
  • Best For: Production scenarios with repeated inference

Sources: docs/version3.x/pipeline_usage/OCR.md1010-1014 Diagram 6 from system architecture

High Performance Plugin (HPI)

Overview: The High Performance Plugin is PaddleX's auto-optimization framework that selects the best combination of backend, precision, and acceleration technologies for the target hardware.

Activation:

Auto-Selection Logic:

Decision Criteria:

  1. Hardware Detection: Identifies available accelerators (GPU, NPU, XPU, etc.)
  2. TensorRT Check: Tests TensorRT availability and model compatibility
  3. Precision Selection: Chooses optimal precision based on hardware capabilities
  4. Fallback Strategy: Gracefully degrades if advanced features are unavailable

Trade-offs:

  • Pros: Zero-configuration optimization, production-ready defaults
  • Cons: Non-deterministic behavior across hardware, longer initialization

Sources: docs/version3.x/pipeline_usage/OCR.md1010-1014 docs/version3.x/pipeline_usage/OCR.md673-696 Diagram 6 from system architecture


Configuration Examples by Scenario

Development and Debugging

Production Server (GPU)

Production Server (CPU)

Edge Device

Domestic Hardware

Custom Advanced Configuration

Sources: docs/version3.x/pipeline_usage/OCR.md995-1056 paddleocr/_common_args.py1-151