Skip to content
View KernelOverseer's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Organizations

@didmathematikoi

Block or report KernelOverseer

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
KernelOverseer/README.md

Aymane Biri — Senior Software/AI Engineer

LLM/DL inference optimization • Deployment orchestration • Systems engineering
Agadir, Morocco · aymanebiri@gmail.com · +212 696 408 522


👋 About me

I build reliable, high-performance AI systems from first principles—bridging low-level engineering with pragmatic product delivery.
Currently at Omniops (KSA), focusing on LLM/DL inference performance (latency/throughput/cost) and deployment orchestration at scale.

🚀 What I work on

  • Serving pipelines for LLMs & DL models: token streaming, concurrency control, batching, KV cache/memory efficiency
  • Orchestrating inference across clusters: Kubernetes + queues + autoscaling + observability
  • Production toolchains: Python/TS, FastAPI/Flask, React, Docker, K8s, Postgres, Redis, RabbitMQ, Celery

🧠 Highlights

  • Led 1337 AI Exploration Lab (8 → 16 engineers): CV for industrial inspection, chemical process modeling, HR RAG chatbot, SFM stock tracking
  • Built iOS Bluetooth plugin + proximity algorithm for Wiqaytna (Moroccan COVID tracing app)
  • Security background (web audits, IR tooling) and microservices ERP (auth, BI, i18n, automation)
  • Bronze — MCPC 2020, first to solve Problem C; 1st place OpenSourceDays 2019 & 2021

🛠️ Core stack

Languages: C/C++, Python, JS/TS
AI/Serving: vLLM, RAG patterns, (Ops: batching, streaming, caching, tracing)
Backend/Infra: FastAPI/Flask, Docker, Kubernetes, Celery, RabbitMQ, Redis, Postgres
Frontend: React
Domains: Inference perf, systems programming, reliability engineering

🔩 Principles I optimize for

  • Latency/Throughput/Cost trade-offs with measurable SLOs
  • Determinism & debuggability via structured logs, traces, and health signals
  • Simple-by-default architectures that scale without heroics

📌 Selected work

  • caLLMe — Voice-first real-time LLM assistant (VAD → STT → Gen → TTS with interruptibility) — link
  • K8s Inference Orchestrator — Queue-routed tasks, autoscaling, backpressure, observability
  • HR RAG Chatbot — Policy/benefits QA with retrieval + structured outputs
  • Industrial CV — Inspection + predictive maintenance pipelines

📫 Reach me


**Languages:** Arabic (native), English (professional), French (very good) · **Hobbies:** Electronics, Psychology, Guitar & Guembri

Pinned Loading

  1. auto-graph auto-graph Public

    A web tool for creating and visualizing graphs and trees, and trying out code.

    TypeScript 20 1

  2. RT RT Public

    Forked from Pinkyboi/RT

    A Raytracing program from scratch in C language, with complex shapes, texture mapping, soft shadows, multiple lights and fractals.

    C 27 2

  3. KSICARDOOM KSICARDOOM Public

    A DOOM and Duke Nukem 3D style game with ray-casting, featuring a level editor and multiplayer from scratch in C

    C 61

  4. BnademOverflow/libCplus BnademOverflow/libCplus Public archive

    Wonderful library with lots of useful functions, algorithms and data structures in C

    C 51 5

  5. beginners_guide_to_raycasting beginners_guide_to_raycasting Public

    code for my video guide about raycasting https://www.youtube.com/watch?v=DFZnzCbmlng

    C 10

  6. caLLMe caLLMe Public

    Realtime voice conversation with llm models using an asynchronous Voice to Text to Voice pipeline.

    Python 21 2