DEV Community

Cover image for Building a Production-Ready RAG System: Zero to Hero with TypeScript, Docker, Google Gemini & LangChain.js
Glaucia Lemos
Glaucia Lemos

Posted on

Building a Production-Ready RAG System: Zero to Hero with TypeScript, Docker, Google Gemini & LangChain.js

🚀 Building a Production-Ready RAG System: Zero to Hero with TypeScript, Docker, Google Gemini & LangChain.js

GitHub logo glaucia86 / rag-search-ingestion-langchainjs-gemini

A PDF search ingestion RAG application with Docker + LangChain.js + Gemini

🤖 RAG Search Ingestion - LangChain.js + Docker + Gemini

Node.js TypeScript LangChain Google Gemini PostgreSQL pgVector Docker License

Uma aplicação completa de Retrieval-Augmented Generation (RAG) para busca inteligente em documentos PDF, construída com TypeScript, Node.js e tecnologias modernas de IA.

📋 Índice

🎯 Visão Geral

Este projeto implementa um sistema RAG completo que permite fazer perguntas em linguagem natural sobre o conteúdo de documentos PDF. O sistema processa documentos, cria embeddings vetoriais, armazena em um banco de dados PostgreSQL com pgVector e responde perguntas usando Google Gemini.

Como Funciona

  1. Ingestão: O sistema carrega e processa documentos PDF, dividindo-os em chunks
  2. Vetorização: Cada chunk é convertido em embeddings usando Google Gemini
  3. Armazenamento: Os embeddings são armazenados no PostgreSQL com extensão pgVector
  4. Busca: Quando você faz uma pergunta, o sistema encontra os chunks mais relevantes

Have you ever wondered how to build an AI system that can answer questions about your specific documents without hallucinating? Welcome to the world of Retrieval-Augmented Generation (RAG) - the game-changing architecture that's revolutionizing how we interact with AI systems!

In this comprehensive tutorial, I'll walk you through building a complete, production-ready RAG system from scratch using modern technologies that every developer should know about.

Full Tutorial HERE

🎯 What You'll Learn

By the end of this tutorial, you'll have a fully functional RAG system that can:

  • ✅ Process PDF documents intelligently
  • ✅ Answer natural language questions with precision
  • ✅ Provide source-grounded responses (no more hallucinations!)
  • ✅ Scale to production environments
  • ✅ Run everything in Docker containers

🔧 Our Tech Stack

We're building this with cutting-edge technologies:

  • TypeScript - For type-safe, maintainable code
  • Docker - For containerized, scalable deployment
  • Google Gemini - For powerful AI embeddings and generation
  • LangChain.js - For seamless AI application orchestration
  • PostgreSQL + pgVector - For efficient vector storage and similarity search
  • Node.js - For robust backend runtime

🧠 Why RAG? The Problem with Traditional LLMs

Large Language Models like GPT, Claude, and Gemini are incredibly powerful, but they have some critical limitations:

The Challenges:

  • Static Knowledge: Limited to training data cutoff dates
  • Hallucinations: Tendency to invent information when uncertain
  • No Domain Context: Can't access your private documents or databases
  • Update Limitations: Can't learn new facts without expensive retraining

The RAG Solution:

RAG elegantly solves these problems by combining two powerful components:

  1. Retrieval Component: Intelligently searches for relevant information in your knowledge base
  2. Generation Component: Uses an LLM to generate responses based exclusively on retrieved context

This ensures your AI responses are always grounded in verifiable sources!

🏗️ System Architecture Overview

Our RAG system follows this intelligent pipeline:

PDF Document → Text Extraction → Smart Chunking → Vector Embeddings → PostgreSQL Storage → Semantic Search → Context Assembly → AI Response Generation 
Enter fullscreen mode Exit fullscreen mode

🚀 Quick Start Guide

Prerequisites

Make sure you have these installed:

  • Node.js 22.0.0+
  • Docker 24.0.0+
  • Git 2.40.0+

1. Project Setup

mkdir rag-system-typescript && cd rag-system-typescript mkdir src npm init -y 
Enter fullscreen mode Exit fullscreen mode

2. Install Dependencies

Production dependencies:

npm install @google/generative-ai @langchain/core @langchain/community @langchain/textsplitters dotenv pg uuid 
Enter fullscreen mode Exit fullscreen mode

Development dependencies:

npm install -D @types/node @types/pg @types/pdf-parse tsx typescript 
Enter fullscreen mode Exit fullscreen mode

3. TypeScript Configuration

Create a tsconfig.json with optimized settings:

{ "compilerOptions": { "target": "ES2022", "module": "ESNext", "moduleResolution": "node", "outDir": "./dist", "rootDir": "./src", "strict": true, "esModuleInterop": true, "skipLibCheck": true, "forceConsistentCasingInFileNames": true } } 
Enter fullscreen mode Exit fullscreen mode

4. Docker Infrastructure

Set up PostgreSQL with pgVector using this docker-compose.yml:

services: postgres: image: pgvector/pgvector:pg17 container_name: postgres_rag_ts environment: POSTGRES_USER: postgres POSTGRES_PASSWORD: postgres POSTGRES_DB: rag ports: - "5432:5432" volumes: - postgres_data:/var/lib/postgresql/data healthcheck: test: ["CMD-SHELL", "pg_isready -U postgres -d rag"] interval: 10s timeout: 5s retries: 5 bootstrap_vector_ext: image: pgvector/pgvector:pg17 depends_on: postgres: condition: service_healthy entrypoint: ["/bin/sh", "-c"] command: > PGPASSWORD=postgres psql "postgresql://postgres@postgres:5432/rag" -v ON_ERROR_STOP=1 -c "CREATE EXTENSION IF NOT EXISTS vector;" restart: "no" volumes: postgres_data: 
Enter fullscreen mode Exit fullscreen mode

🤖 Google Gemini Integration

Here's how we create a robust Google client:

import { GoogleGenerativeAI } from '@google/generative-ai'; export class GoogleClient { private genAI: GoogleGenerativeAI; constructor() { const apiKey = process.env.GOOGLE_API_KEY; if (!apiKey) { throw new Error('Google API key is required!'); } this.genAI = new GoogleGenerativeAI(apiKey); } async getEmbeddings(texts: string[]): Promise<number[][]> { const embeddings: number[][] = []; for (const text of texts) { try { const model = this.genAI.getGenerativeModel({ model: 'embedding-001' }); const result = await model.embedContent(text); if (result.embedding?.values) { embeddings.push(result.embedding.values); } } catch (error) { console.error('Error generating embedding:', error); // Fallback to zero vector embeddings.push(new Array(768).fill(0)); } } return embeddings; } } 
Enter fullscreen mode Exit fullscreen mode

🎯 The Magic of Embeddings

What are embeddings? Think of them as numerical "fingerprints" of text that capture semantic meaning:

"cat" → [0.1, 0.3, 0.5, ..., 0.8] // 768 dimensions "dog" → [0.2, 0.4, 0.6, ..., 0.7] // Similar to "cat" 
Enter fullscreen mode Exit fullscreen mode

When vectors are close in mathematical space, the concepts are semantically similar!

📄 Smart Document Processing

Our chunking strategy is crucial for RAG performance:

import { RecursiveCharacterTextSplitter } from '@langchain/textsplitters'; const textSplitter = new RecursiveCharacterTextSplitter({ chunkSize: 400, // Optimal for tabular data chunkOverlap: 0, // No overlap needed for tables }); 
Enter fullscreen mode Exit fullscreen mode

For tabular PDFs (like our use case), we break documents line by line, preserving table headers in each chunk for maximum semantic clarity.

💫 HNSW: The Secret Sauce of Fast Vector Search

Our system uses Hierarchical Navigable Small World (HNSW) indexing - think of it as a GPS for vector space:

  • Hierarchical Structure: Multiple levels for efficient navigation
  • Fast Searches: Millisecond responses even with millions of vectors
  • Scalable: Handles large datasets without performance degradation
-- Automatic index creation by pgVector CREATE INDEX ON pdf_documents USING hnsw (vector vector_cosine_ops); 
Enter fullscreen mode Exit fullscreen mode

🎨 Interactive CLI Experience

We've built a user-friendly CLI that includes:

  • Real-time Processing: See your questions being processed
  • System Status: Health checks for all components
  • Smart Commands: help, status, clear, exit
  • Error Handling: Graceful degradation with helpful messages
// Special commands for better UX if (['exit', 'quit', 'q'].includes(command)) { console.log('Thank you for using RAG Chat. Goodbye!'); break; } if (['help', 'h'].includes(command)) { printHelp(); continue; } 
Enter fullscreen mode Exit fullscreen mode

🔐 Environment Configuration

Keep your secrets safe with proper environment management:

# .env file GOOGLE_API_KEY=your_google_api_key_here GOOGLE_EMBEDDING_MODEL=models/embedding-001 GOOGLE_CHAT_MODEL=gemini-2.0-flash DATABASE_URL=postgresql://postgres:postgres@localhost:5432/rag PG_VECTOR_COLLECTION_NAME=pdf_documents PDF_PATH=./document.pdf 
Enter fullscreen mode Exit fullscreen mode

🚀 Running Your RAG System

  1. Start Infrastructure:
docker-compose up -d 
Enter fullscreen mode Exit fullscreen mode
  1. Ingest Your PDF:
npm run dev:ingest 
Enter fullscreen mode Exit fullscreen mode
  1. Start Chatting:
npm run dev:chat 
Enter fullscreen mode Exit fullscreen mode

🎯 Production-Ready Features

Our system includes enterprise-grade features:

  • Batch Processing: Optimized API calls with rate limiting
  • Connection Pooling: Efficient database connections
  • Error Recovery: Graceful handling of failures
  • Health Monitoring: System status checks
  • Scalable Architecture: Ready for horizontal scaling

🔍 Performance Metrics

Real-world performance you can expect:

  • Ingestion: 50-page PDF processed in ~30 seconds
  • Query Response: 2-3 seconds per question
  • Throughput: 100+ questions per minute
  • Accuracy: Source-grounded responses (no hallucinations!)

🛡️ Anti-Hallucination Strategies

We implement several techniques to ensure factual responses:

  • Context-Only Responses: AI only uses retrieved information
  • Low Temperature: Reduces creative/speculative responses
  • Fallback Handling: "I don't know" when information isn't available
  • Source Attribution: Always trace back to original documents

🔮 Future Roadmap

Exciting enhancements planned:

  • REST API: Easy integration with web applications
  • React Dashboard: Modern web interface
  • Multi-tenancy: Support multiple users and document sets
  • Redis Caching: Faster response times
  • OpenTelemetry: Complete observability

🎓 Want to Learn More?

🚀 Get the Complete Source Code - Clone the repository and start building your own RAG system today!

📚 Additional Resources

🤝 Connect & Learn Together

Building AI systems is more fun with a community! Let's connect:

💭 What's Next?

This RAG system is just the beginning! Here are some exciting directions to explore:

  1. Multi-modal RAG: Add support for images and audio
  2. Real-time Updates: Implement live document synchronization
  3. Advanced Retrieval: Experiment with hybrid search strategies
  4. Custom Models: Fine-tune embeddings for your specific domain

🏆 Key Takeaways

Building a production-ready RAG system involves:

  • Smart Architecture: Thoughtful component design
  • Robust Infrastructure: Docker + PostgreSQL + pgVector
  • Quality Implementation: TypeScript + LangChain.js
  • Performance Optimization: HNSW indexing + batch processing
  • User Experience: Intuitive interfaces and error handling

🎯 Ready to Build? Start your RAG journey now with the complete source code and step-by-step guide!


Questions or feedback? Drop a comment below! I love discussing AI architecture and helping fellow developers build amazing systems.

Found this helpful? Give it a ❤️ and share it with your developer friends who are interested in AI and TypeScript!

Happy coding, and welcome to the future of intelligent document interaction! 🚀✨

Top comments (0)