DEV Community

wintrover
wintrover

Posted on • Originally published at wintrover.github.io

I Finally Achieved Automatic ID Card and Face Capture on Web Pages, Face Similarity Comparison Between Images and Videos, and...

🎯 Project Overview

Context: Building a production-grade KYC (Know Your Customer) verification system from scratch
Timeline: 3 months intensive development
Team Size: Solo developer (with backend infrastructure support)
Business Impact: Critical for company compliance and user onboarding

I was tasked with leading the development of our company's core KYC system. This wasn't just a technical challenge - it was a business-critical project that would determine whether our company could scale user onboarding while maintaining regulatory compliance. The system needed to handle thousands of verification attempts daily with 99.9% accuracy.

πŸ“‹ Technical Requirements & Constraints

Business Requirements

  • Accuracy: >99% face recognition accuracy
  • Speed: Complete verification within 2 minutes
  • Availability: 99.9% uptime with no data loss
  • Scalability: Handle 10,000+ concurrent verifications
  • Compliance: GDPR and local data protection regulations

Technical Constraints

  • Environment: Mixed GPU/CPU infrastructure
  • Languages: Korean ID cards with English support
  • Platforms: Web-based with mobile optimization
  • Storage: Efficient handling of large video files
  • Real-time: WebSocket-based progress updates

πŸ—οΈ System Architecture Design

Technology Stack Decision Process

Initially, I analyzed and compared several face recognition libraries based on specific criteria:

Face Recognition Engine Comparison:

interface FaceEngine { name: string; accuracy: number; speed: 'fast' | 'medium' | 'slow'; license: 'free' | 'commercial'; gpuSupport: boolean; stability: number; // 1-10 scale } const engines: FaceEngine[] = [ { name: 'InsightFace', accuracy: 99.8, speed: 'fast', license: 'free', gpuSupport: true, stability: 7 }, { name: 'OpenCV YuNet/sFace', accuracy: 97.2, speed: 'medium', license: 'free', gpuSupport: true, stability: 9 } ]; 
Enter fullscreen mode Exit fullscreen mode

Selection Matrix:
| Criteria | InsightFace | OpenCV | Winner |
|----------|------------|--------|--------|
| Accuracy | βœ… 99.8% | ❌ 97.2% | InsightFace |
| Stability | ❌ Medium | βœ… High | OpenCV |
| License | βœ… Free | βœ… Free | Tie |
| GPU Support | βœ… Yes | βœ… Yes | Tie |

Final Decision: Hybrid approach combining both engines

API Design Pattern

The system follows a RESTful API design with WebSocket support for real-time updates:

// Core API Endpoints interface KYCApiSpec { // ID Card Processing 'POST /api/v1/id-capture': CaptureRequest; 'GET /api/v1/id-capture/{sessionId}': CaptureStatus; // Face Recognition 'POST /api/v1/face-video': VideoUploadRequest; 'POST /api/v1/face-similarity': SimilarityRequest; 'GET /api/v1/face-similarity/{comparisonId}': SimilarityResult; // Real-time Updates 'WS /ws/kyc/{sessionId}': WebSocketUpdates; } // Response Schema Standards interface APIResponse<T> { success: boolean; data?: T; error?: { code: string; message: string; details?: any; }; timestamp: string; requestId: string; } 
Enter fullscreen mode Exit fullscreen mode

Overall System Architecture

Frontend (React 19 + TypeScript) ↓ WebSocket Backend (FastAPI + SQLAlchemy) ↓ Async Tasks Celery Workers ↓ Database MariaDB + Redis 
Enter fullscreen mode Exit fullscreen mode

πŸ”₯ Phase 1: Face Recognition Dual Engine Implementation

The Hybrid Strategy: Why Two Engines Are Better Than One

I discovered that no single face recognition engine could handle all real-world scenarios. InsightFace offered incredible accuracy (99.8%) but failed in poor lighting, while OpenCV was rock-solid but slightly less accurate.

The Solution: A dual-engine system that automatically switches between engines based on conditions:

Mermaid Diagram

Key Innovations:

  • Smart Hardware Detection: Automatic GPU/CPU adaptation
  • Memory Management: Singleton pattern prevents GPU memory leaks
  • Fallback Logic: Seamless engine switching based on confidence scores

Impact: Success rate jumped from 92% to 99.9% by combining both engines' strengths.

πŸŽ₯ Phase 2: Video-Image Similarity Comparison

From 3 Minutes to 6 Seconds: The Video Processing Revolution

My initial approach processed all 900 frames of a 30-second video - taking over 3 minutes and often crashing servers. The breakthrough was realizing most frames were redundant.

Smart Sampling Strategy:
Mermaid Diagram

Key Innovations:

  • Frame Sampling: Reduced from 900 to 12 frames (98.7% reduction)
  • Quality Filtering: Only frames >80% quality used
  • Cosine Similarity: 512-dimensional embeddings for accurate comparison

Result: Processing time dropped from 180+ seconds to 6 seconds while actually improving accuracy.

πŸ“Έ Phase 3: Automatic ID Card Capture

The Korean OCR Challenge: Teaching Computers to Read Hangul

Most OCR systems fail with Korean characters. After testing Tesseract, EasyOCR, and cloud services, I discovered PaddleOCR which had surprisingly good Korean support, but required extensive fine-tuning.

Automatic Quality Assessment Pipeline:
Mermaid Diagram

Four Quality Metrics:

  • Sharpness: Laplacian variance for blur detection
  • Lighting: Even illumination without glare
  • Angle: Perspective distortion detection
  • Completeness: All four corners visible

Result: User completion rate jumped from 60% to 95% by eliminating manual capture timing.

πŸ—„οΈ Phase 4: Database & Asynchronous Processing

The Scalability Architecture: Handling Thousands at Once

Traditional synchronous processing would make users wait 5-10 seconds - unacceptable for KYC. The solution was a complete asynchronous revolution.

Async Processing Pipeline:
Mermaid Diagram

Key Innovations:

  • Hybrid Storage: Database metadata + filesystem for large files
  • Complex Relationships: Many-to-many image/video similarity mappings
  • Distributed Tasks: Celery + Redis for reliable processing
  • Real-time Updates: WebSocket connections for live progress

Impact: System handles 100x more concurrent users with zero perceived delay.

πŸ”„ Phase 5: Real-time User Experience

Zero-Wait Processing: The WebSocket Revolution

Users need instant feedback, not spinning loaders. The challenge was maintaining real-time connections for thousands of simultaneous KYC sessions.

Real-time Communication Flow:
Mermaid Diagram

Key Frontend Innovations:

  • Progress Visualization: Multi-stage progress bars with specific feedback
  • Smart Error Handling: User-friendly guidance instead of cryptic errors
  • Mobile Optimization: Touch-friendly interface with camera quality detection
  • State Recovery: Automatic recovery after page refreshes

Connection Management: Heartbeat mechanisms prevent memory leaks, automatic reconnection handles network drops, session persistence maintains processing state.

Result: Users never feel like they're waiting - they see exactly what's happening at every step.

πŸ› Major Debugging Process

Problem 1: The Ghost in the GPU - Memory Leaks

The Crisis: After a week of successful testing in production, the system suddenly started crashing every 4-6 hours. The pattern was always the same - gradual memory increase followed by a complete system freeze. At first, I thought it was a regular memory leak, but monitoring showed RAM usage was stable. The culprit was GPU memory.

Step-by-Step Problem Resolution:

  1. Problem Identification

    • Symptom: System crashes every 4-6 hours
    • Initial diagnosis: Memory leak
    • Tools: nvidia-smi, system monitoring
  2. Hypothesis Testing

    • Theory 1: Regular RAM leak β†’ ❌ RAM usage stable
    • Theory 2: GPU memory leak β†’ βœ… GPU memory steadily increasing
    • Evidence: Each face recognition call added 50-100MB GPU memory
  3. Root Cause Analysis

    • Location: InsightFace model initialization
    • Issue: GPU contexts not released after inference
    • Impact: Cumulative memory allocation
  4. Solution Implementation

 # Memory management workflow  def process_with_memory_cleanup(): try: # Face recognition operation  result = insightface_app.process(frame) return result finally: # Critical: Explicit GPU cleanup  if torch.cuda.is_available(): torch.cuda.empty_cache() torch.cuda.synchronize() 
Enter fullscreen mode Exit fullscreen mode
  1. Prevention Measures
    • Memory monitoring with automatic thresholds
    • Service restart automation
    • Regular memory usage reporting

The Learning: GPU memory management requires explicit cleanup. Python's garbage collector doesn't automatically free GPU resources, leading to cumulative memory leaks that can crash production systems.

Problem 2: The Time Traveling Video Frames

The Bizarre Bug: During testing, I noticed something impossible - sometimes the similarity calculations would show results that didn't make sense, like comparing a face from the beginning of a video with one from the end, but the timestamps would suggest they were consecutive frames.

Step-by-Step Debugging:

  1. Anomaly Detection

    • Symptom: Similarity scores didn't match expected frame progression
    • Evidence: Frame timestamps didn't align with calculated similarities
    • Impact: Random accuracy drops
  2. Root Cause Investigation

 # Problem: OpenCV's internal buffering  cap.set(cv2.CAP_PROP_POS_FRAMES, target_frame) # Requested frame  ret, frame = cap.read() # Got buffered frame instead! 
Enter fullscreen mode Exit fullscreen mode
  1. Solution Implementation
 # Frame precision control  cap.set(cv2.CAP_PROP_BUFFERSIZE, 1) # Minimize buffer  cap.set(cv2.CAP_PROP_POS_FRAMES, target_frame) ret, frame = cap.read() # Validation step  actual_frame = int(cap.get(cv2.CAP_PROP_POS_FRAMES)) if actual_frame != target_frame: # Handle frame mismatch 
Enter fullscreen mode Exit fullscreen mode

Problem 3: The Concurrent Catastrophe

The Meltdown Scenario: During load testing with just 10 concurrent users, the system started producing completely wrong results. Users would get similarity scores that belonged to completely different people. This was a critical security and privacy issue that could have had serious consequences.

Crisis Management Steps:

  1. Incident Response (Minutes)

    • Immediate system shutdown
    • Alert security team
    • Preserve logs for forensics
  2. Root Cause Analysis (Hours)

 # Problem: Shared singleton instance  class FaceRecognitionService: _instance = None # Shared across all requests! ❌  # Solution: Service pooling  class FaceRecognitionPool: def __init__(self, pool_size=5): self.pool = [FaceRecognitionService() for _ in range(pool_size)] self.available = Queue() def get_service(self): return self.available.get() def return_service(self, service): self.available.put(service) 
Enter fullscreen mode Exit fullscreen mode
  1. Security Validation

    • Multi-threaded testing with 100+ concurrent requests
    • Result verification: No cross-contamination
    • Performance testing: Maintained throughput
  2. Production Safeguards

    • Comprehensive logging for all face recognition operations
    • Request correlation tracking
    • Automated anomaly detection

Learning: Thread safety is not optional for biometric systems. Always design for concurrency from day one, especially when dealing with sensitive user data.

πŸ“Š Performance Optimization Results

Processing Speed Improvements

Task Before After Improvement
Image Face Recognition 2.3s 0.8s 65% faster
Video Processing (12 frames) 15s 6s 60% faster
Similarity Calculation 1.2s 0.3s 75% faster
Database Storage 0.8s 0.2s 75% faster

The biggest win was video processing - reducing a 3-minute ordeal to just 6 seconds completely changed the user experience. Users went from abandoning the process to completing it successfully.

Face Recognition Engine Performance

Metric InsightFace OpenCV Hybrid System
Accuracy 99.8% 97.2% 99.9%
Reliability Medium High Very High
Speed Fast Medium Fast

The hybrid approach gave us the best of both worlds - InsightFace's industry-leading accuracy when conditions are good, and OpenCV's rock-solid reliability as a safety net. This increased our overall success rate from about 92% to 99.9%.

🎯 Final System Architecture

Mermaid Diagram

πŸ’‘ Key Learning Points

Technical Growth

  1. Advanced Computer Vision: Practical experience with diverse CV libraries like InsightFace, OpenCV, and PaddleOCR
  2. Performance Optimization: GPU memory management, asynchronous processing, caching strategies
  3. System Architecture: Experience designing microservices and event-driven architectures
  4. Database Design: Optimization for large-scale media data storage and retrieval

Project Management

  1. Technology Selection Process: Experience with accuracy vs performance vs stability trade-offs
  2. Incremental Development: Methods for implementing complex systems step by step
  3. Problem-solving Skills: Experience resolving memory leaks, concurrency, and performance issues
  4. Documentation: Understanding the importance of systematic recording of technical decision-making processes

Business Value

  1. KYC Automation: Reduced manual processes taking over 10 minutes to under 2 minutes
  2. Improved Accuracy: Created a more consistent and accurate authentication system than human judgment
  3. Scalability: Architecture capable of handling multiple concurrent users
  4. Cost Reduction: Decreased operational staffing and enabled 24/7 automated operation

KYC Processing Flow

Mermaid Diagram

πŸš€ Future Improvement Directions

1. Liveness Detection πŸ”’

Real-time facial movement detection to prevent photo/video spoofing attacks:

  • Blink Detection: Natural eye movement patterns
  • Head Movement Analysis: 3D rotation validation
  • Challenge-Response: Random facial gesture requests

2. Mobile Optimization πŸ“±

Native mobile apps for better camera control and user experience:

  • iOS App: Native camera integration with ARKit
  • Android App: Camera2 API with ML Kit acceleration
  • Progressive Web App: Cross-platform fallback

3. Multi-national Document Support 🌍

Expand to support international ID documents:

  • US Driver Licenses: All 50 states
  • EU Passports: GDPR-compliant processing
  • Asian ID Cards: Korea, Japan, China, Singapore

4. AI-based Quality Assessment πŸ€–

More sophisticated real-time quality evaluation:

  • Advanced Blur Detection: Frequency domain analysis
  • Lighting Optimization: Automatic exposure correction
  • Face Pose Validation: 3D head pose estimation

5. Cloud Infrastructure ☁️

Scale globally with cloud deployment:

  • AWS Multi-region: Low-latency global deployment
  • Auto-scaling: Handle traffic spikes automatically
  • CDN Integration: Fast media delivery worldwide

Through this project, I developed the capability to design and implement complex systems that create real business value, going beyond simple feature development. The experience of successfully integrating computer vision technologies in a web environment will be a great asset for my future development career.

Top comments (0)