🚀 Building a Production-Ready RAG System: Zero to Hero with TypeScript, Docker, Google Gemini & LangChain.js
glaucia86 / rag-search-ingestion-langchainjs-gemini
A PDF search ingestion RAG application with Docker + LangChain.js + Gemini
🤖 RAG Search Ingestion - LangChain.js + Docker + Gemini
Uma aplicação completa de Retrieval-Augmented Generation (RAG) para busca inteligente em documentos PDF, construída com TypeScript, Node.js e tecnologias modernas de IA.
📋 Índice
- Visão Geral
- Tecnologias Utilizadas
- Arquitetura
- Pré-requisitos
- Configuração
- Como Executar
- Como Usar
- Exemplos de Perguntas
- Estrutura do Projeto
- Funcionalidades
- Troubleshooting
- Tutorial Completo
🎯 Visão Geral
Este projeto implementa um sistema RAG completo que permite fazer perguntas em linguagem natural sobre o conteúdo de documentos PDF. O sistema processa documentos, cria embeddings vetoriais, armazena em um banco de dados PostgreSQL com pgVector e responde perguntas usando Google Gemini.
Como Funciona
- Ingestão: O sistema carrega e processa documentos PDF, dividindo-os em chunks
- Vetorização: Cada chunk é convertido em embeddings usando Google Gemini
- Armazenamento: Os embeddings são armazenados no PostgreSQL com extensão pgVector
- Busca: Quando você faz uma pergunta, o sistema encontra os chunks mais relevantes
- …
Have you ever wondered how to build an AI system that can answer questions about your specific documents without hallucinating? Welcome to the world of Retrieval-Augmented Generation (RAG) - the game-changing architecture that's revolutionizing how we interact with AI systems!
In this comprehensive tutorial, I'll walk you through building a complete, production-ready RAG system from scratch using modern technologies that every developer should know about.
Full Tutorial HERE
🎯 What You'll Learn
By the end of this tutorial, you'll have a fully functional RAG system that can:
- ✅ Process PDF documents intelligently
- ✅ Answer natural language questions with precision
- ✅ Provide source-grounded responses (no more hallucinations!)
- ✅ Scale to production environments
- ✅ Run everything in Docker containers
🔧 Our Tech Stack
We're building this with cutting-edge technologies:
- TypeScript - For type-safe, maintainable code
- Docker - For containerized, scalable deployment
- Google Gemini - For powerful AI embeddings and generation
- LangChain.js - For seamless AI application orchestration
- PostgreSQL + pgVector - For efficient vector storage and similarity search
- Node.js - For robust backend runtime
🧠 Why RAG? The Problem with Traditional LLMs
Large Language Models like GPT, Claude, and Gemini are incredibly powerful, but they have some critical limitations:
The Challenges:
- Static Knowledge: Limited to training data cutoff dates
- Hallucinations: Tendency to invent information when uncertain
- No Domain Context: Can't access your private documents or databases
- Update Limitations: Can't learn new facts without expensive retraining
The RAG Solution:
RAG elegantly solves these problems by combining two powerful components:
- Retrieval Component: Intelligently searches for relevant information in your knowledge base
- Generation Component: Uses an LLM to generate responses based exclusively on retrieved context
This ensures your AI responses are always grounded in verifiable sources!
🏗️ System Architecture Overview
Our RAG system follows this intelligent pipeline:
PDF Document → Text Extraction → Smart Chunking → Vector Embeddings → PostgreSQL Storage → Semantic Search → Context Assembly → AI Response Generation
🚀 Quick Start Guide
Prerequisites
Make sure you have these installed:
- Node.js 22.0.0+
- Docker 24.0.0+
- Git 2.40.0+
1. Project Setup
mkdir rag-system-typescript && cd rag-system-typescript mkdir src npm init -y
2. Install Dependencies
Production dependencies:
npm install @google/generative-ai @langchain/core @langchain/community @langchain/textsplitters dotenv pg uuid
Development dependencies:
npm install -D @types/node @types/pg @types/pdf-parse tsx typescript
3. TypeScript Configuration
Create a tsconfig.json
with optimized settings:
{ "compilerOptions": { "target": "ES2022", "module": "ESNext", "moduleResolution": "node", "outDir": "./dist", "rootDir": "./src", "strict": true, "esModuleInterop": true, "skipLibCheck": true, "forceConsistentCasingInFileNames": true } }
4. Docker Infrastructure
Set up PostgreSQL with pgVector using this docker-compose.yml
:
services: postgres: image: pgvector/pgvector:pg17 container_name: postgres_rag_ts environment: POSTGRES_USER: postgres POSTGRES_PASSWORD: postgres POSTGRES_DB: rag ports: - "5432:5432" volumes: - postgres_data:/var/lib/postgresql/data healthcheck: test: ["CMD-SHELL", "pg_isready -U postgres -d rag"] interval: 10s timeout: 5s retries: 5 bootstrap_vector_ext: image: pgvector/pgvector:pg17 depends_on: postgres: condition: service_healthy entrypoint: ["/bin/sh", "-c"] command: > PGPASSWORD=postgres psql "postgresql://postgres@postgres:5432/rag" -v ON_ERROR_STOP=1 -c "CREATE EXTENSION IF NOT EXISTS vector;" restart: "no" volumes: postgres_data:
🤖 Google Gemini Integration
Here's how we create a robust Google client:
import { GoogleGenerativeAI } from '@google/generative-ai'; export class GoogleClient { private genAI: GoogleGenerativeAI; constructor() { const apiKey = process.env.GOOGLE_API_KEY; if (!apiKey) { throw new Error('Google API key is required!'); } this.genAI = new GoogleGenerativeAI(apiKey); } async getEmbeddings(texts: string[]): Promise<number[][]> { const embeddings: number[][] = []; for (const text of texts) { try { const model = this.genAI.getGenerativeModel({ model: 'embedding-001' }); const result = await model.embedContent(text); if (result.embedding?.values) { embeddings.push(result.embedding.values); } } catch (error) { console.error('Error generating embedding:', error); // Fallback to zero vector embeddings.push(new Array(768).fill(0)); } } return embeddings; } }
🎯 The Magic of Embeddings
What are embeddings? Think of them as numerical "fingerprints" of text that capture semantic meaning:
"cat" → [0.1, 0.3, 0.5, ..., 0.8] // 768 dimensions "dog" → [0.2, 0.4, 0.6, ..., 0.7] // Similar to "cat"
When vectors are close in mathematical space, the concepts are semantically similar!
📄 Smart Document Processing
Our chunking strategy is crucial for RAG performance:
import { RecursiveCharacterTextSplitter } from '@langchain/textsplitters'; const textSplitter = new RecursiveCharacterTextSplitter({ chunkSize: 400, // Optimal for tabular data chunkOverlap: 0, // No overlap needed for tables });
For tabular PDFs (like our use case), we break documents line by line, preserving table headers in each chunk for maximum semantic clarity.
💫 HNSW: The Secret Sauce of Fast Vector Search
Our system uses Hierarchical Navigable Small World (HNSW) indexing - think of it as a GPS for vector space:
- Hierarchical Structure: Multiple levels for efficient navigation
- Fast Searches: Millisecond responses even with millions of vectors
- Scalable: Handles large datasets without performance degradation
-- Automatic index creation by pgVector CREATE INDEX ON pdf_documents USING hnsw (vector vector_cosine_ops);
🎨 Interactive CLI Experience
We've built a user-friendly CLI that includes:
- Real-time Processing: See your questions being processed
- System Status: Health checks for all components
- Smart Commands:
help
,status
,clear
,exit
- Error Handling: Graceful degradation with helpful messages
// Special commands for better UX if (['exit', 'quit', 'q'].includes(command)) { console.log('Thank you for using RAG Chat. Goodbye!'); break; } if (['help', 'h'].includes(command)) { printHelp(); continue; }
🔐 Environment Configuration
Keep your secrets safe with proper environment management:
# .env file GOOGLE_API_KEY=your_google_api_key_here GOOGLE_EMBEDDING_MODEL=models/embedding-001 GOOGLE_CHAT_MODEL=gemini-2.0-flash DATABASE_URL=postgresql://postgres:postgres@localhost:5432/rag PG_VECTOR_COLLECTION_NAME=pdf_documents PDF_PATH=./document.pdf
🚀 Running Your RAG System
- Start Infrastructure:
docker-compose up -d
- Ingest Your PDF:
npm run dev:ingest
- Start Chatting:
npm run dev:chat
🎯 Production-Ready Features
Our system includes enterprise-grade features:
- Batch Processing: Optimized API calls with rate limiting
- Connection Pooling: Efficient database connections
- Error Recovery: Graceful handling of failures
- Health Monitoring: System status checks
- Scalable Architecture: Ready for horizontal scaling
🔍 Performance Metrics
Real-world performance you can expect:
- Ingestion: 50-page PDF processed in ~30 seconds
- Query Response: 2-3 seconds per question
- Throughput: 100+ questions per minute
- Accuracy: Source-grounded responses (no hallucinations!)
🛡️ Anti-Hallucination Strategies
We implement several techniques to ensure factual responses:
- Context-Only Responses: AI only uses retrieved information
- Low Temperature: Reduces creative/speculative responses
- Fallback Handling: "I don't know" when information isn't available
- Source Attribution: Always trace back to original documents
🔮 Future Roadmap
Exciting enhancements planned:
- REST API: Easy integration with web applications
- React Dashboard: Modern web interface
- Multi-tenancy: Support multiple users and document sets
- Redis Caching: Faster response times
- OpenTelemetry: Complete observability
🎓 Want to Learn More?
🚀 Get the Complete Source Code - Clone the repository and start building your own RAG system today!
📚 Additional Resources
- Complete Tutorial Article - Deep dive into every implementation detail
- LangChain.js Documentation - Master the AI orchestration framework
- Google Gemini API Docs - Explore advanced AI capabilities
- pgVector Guide - Vector database mastery
🤝 Connect & Learn Together
Building AI systems is more fun with a community! Let's connect:
- GitHub: @glaucia86
- Twitter: @glaucia86
- LinkedIn: Glaucia Lemos
- YouTube: @GlauciaLemos
💭 What's Next?
This RAG system is just the beginning! Here are some exciting directions to explore:
- Multi-modal RAG: Add support for images and audio
- Real-time Updates: Implement live document synchronization
- Advanced Retrieval: Experiment with hybrid search strategies
- Custom Models: Fine-tune embeddings for your specific domain
🏆 Key Takeaways
Building a production-ready RAG system involves:
- ✅ Smart Architecture: Thoughtful component design
- ✅ Robust Infrastructure: Docker + PostgreSQL + pgVector
- ✅ Quality Implementation: TypeScript + LangChain.js
- ✅ Performance Optimization: HNSW indexing + batch processing
- ✅ User Experience: Intuitive interfaces and error handling
🎯 Ready to Build? Start your RAG journey now with the complete source code and step-by-step guide!
Questions or feedback? Drop a comment below! I love discussing AI architecture and helping fellow developers build amazing systems.
Found this helpful? Give it a ❤️ and share it with your developer friends who are interested in AI and TypeScript!
Happy coding, and welcome to the future of intelligent document interaction! 🚀✨
Top comments (0)