A tool to analyze a codebase based on user direction. This tool allows users to "ask the codebase" questions and receive accurate responses that leverage the codebase itself as the source of information.
- Code indexing and processing
- Vector database for semantic search using Chroma
- Efficient similarity search with HNSW algorithm
- Query engine for effective searches
- Integration with Ollama LLM (llama3.2)
- Conversation management
- Mock mode for testing without Ollama
# Clone the repository git clone https://github.com/yourusername/codebase-convo.git cd codebase-convo # Create and activate virtual environment python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install dependencies pip install -r requirements.txt # Online installation - Pull the required Ollama model for embeddings ollama pull nomic-embed-text
For offline environments, follow these steps:
-
Download the GGUF model file from Hugging Face:
- Go to: https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF
- Download the
nomic-embed-text-v1.5.Q4_K_M.gguf
file (84.1 MB)
-
Move the downloaded file to your offline machine
-
Create a modelfile named
nomic-embed-text.modelfile
with the following content:FROM nomic-embed-text-v1.5.Q4_K_M.gguf PARAMETER temperature 0.0 PARAMETER embedding true PARAMETER mirostat 0 PARAMETER num_ctx 2048
-
Import the model to Ollama:
# Make sure both the GGUF file and modelfile are in the same directory ollama create nomic-embed-text -f nomic-embed-text.modelfile
-
Verify the model is available:
ollama list
# Basic usage python main.py --codebase-path /path/to/your/codebase # Rebuild the index python main.py --codebase-path /path/to/your/codebase --rebuild-index # Use mock mode (no Ollama required) python main.py --codebase-path /path/to/your/codebase --mock-mode
The project includes test scripts to verify functionality:
# Test the Chroma vector database implementation python test_chroma.py # Test the application with mock data python test_app.py
codebase-convo/ ├── src/ │ ├── indexing/ # Code indexing & processing │ ├── vector_db/ # Chroma vector database implementation │ ├── query_engine/ # Query processing and search │ ├── llm_interface/ # Ollama LLM integration │ └── conversation/ # Conversation management ├── tests/ # Unit and integration tests ├── main.py # Application entry point ├── requirements.txt # Project dependencies ├── test_chroma.py # Test script for Chroma implementation ├── test_app.py # Test script for application └── README.md # Project documentation
The application uses Chroma, a specialized vector database, for storing and retrieving code embeddings:
- Efficient Similarity Search: Uses HNSW (Hierarchical Navigable Small World) algorithm for fast nearest-neighbor search
- Persistent Storage: Embeddings are stored on disk for persistence between runs
- Metadata Management: Stores code chunks with associated metadata for rich retrieval
- Cosine Similarity: Uses cosine similarity for comparing embeddings
MIT