DEV Community

Cover image for Building an MCP Server with RAG Capabilities for Developer Tools
Stephen Nwankwo
Stephen Nwankwo

Posted on

Building an MCP Server with RAG Capabilities for Developer Tools

As the demand for intelligent, context-aware assistants grows, the need for flexible and modular agent protocols has never been more critical. With this article and project, I hope to demonstrate how easy it is to combine MCP + RAG to aid knowledge-grounded responses whiile working across various development environments such as VS Code, Claude, Cursor, etc..

This article is a very high-level overview of the project, the architecture, and how this hybrid system can be embedded into developer tools seamlessly.


What is MCP?

MCP (Model Context Protocol), invented by a team at Anthropic is an emerging standard for enabling consistent interaction between AI agents and their environments. Designed to separate environment logic from model behavior, MCP allows tools like IDEs and terminal clients to interact with LLMs in a structured way, similar to how the LSP (Language Server Protocol) standardized code intelligence or how REST standardized Client-API interaction. A lot of people like to refer to MCP as the USB-C of Agent - to - external tool interaction, I hope you get the idea.

There are two major parts that make up an MCP, the first is the Server and the second is the Host(aka Client), the former houses the interface that connects to all the external/internal tools, while the Host houses the interfaces responsible for connecting to the server and translating user input to commands to be sent to the server. The image below is an illustration of what a basic MCP architecture looks like.

MCP Architecture
figure(1)

Connecting the Server to the Host typically involves writing a JSON data to the host MCP configuration or settings.json in the case of VsCode(what I used). The typical structure of the Json configuration looks like this:

"name-of-mcp-server": { "type": "protocol", "command": "run_command", "args": [ "url_or_file_path_to_server" ] } 
Enter fullscreen mode Exit fullscreen mode

What is RAG?

Retrieval-Augmented Generation (RAG) combines a language model with an external knowledge retriever. Instead of relying solely on the model's training data, RAG systems pull relevant documents or facts in real-time to augment the prompt. This helps particularly to improve response accuracy, reduces hallucinations, and makes models useful in specialized domains.


Project Overview

This project involves building an MCP server that wraps a RAG pipeline. Any client or host that supports MCP can interface with this server to ask questions, perform searches, or generate context-aware completions.

The goal is to provide local or self-hosted intelligence that can adapt to your documents, codebase, or domain knowledge without relying on external APIs.


Architecture

MCP-RAG workflow
figure(2)

The MCP server handles structured requests, transforms them into a context-aware prompt using the RAG system, and returns enriched responses back to the client.


Implementation

For this project, I built an MCP-RAG server that helps to answer questions related to documenting Django Rest Framework(DRF) APIs

Here's a breakdown of the tech stack:

  • Major Library: Google ADK, langchain, python mcp sdk, chroma(for vectore store).

  • MCP Protocol: Custom implementation based on official spec from Anthropic.

  • RAG Stack:
    Google ADK with gemini is used to handle embedding and retrieval, langchain is used for chuking text and the vector store used is Chroma DB

  • Client Interface: VsCode configured using the standard MCP JSON config.


Here's what the file structure looks like:

File Structure
figure(3)

Hereโ€™s an example exchange inside VS Code using the MCP RAG server:

VsCode chat UI
figure(4)

Use Cases

  • Ask domain-specific questions, get code explanations, or retrieve documentation instantly.
  • Query internal knowledge bases while chatting.
  • Integrate organization-specific coding guidelines and FAQs into Cursor's/CoPilot LLM responses. All of these, while in your code editor.

This project is a step toward plug-and-play intelligence across your developer stack. By combining the modular power of MCP with the contextual strength of RAG, you can build assistants that actually understand your codebase, your docs, your workflows.


Resources

Top comments (0)