Codebase Convo

A tool to analyze a codebase based on user direction. This tool allows users to "ask the codebase" questions and receive accurate responses that leverage the codebase itself as the source of information.

Features

Code indexing and processing
Vector database for semantic search using Chroma
Efficient similarity search with HNSW algorithm
Query engine for effective searches
Integration with Ollama LLM (llama3.2)
Conversation management
Mock mode for testing without Ollama

Installation

# Clone the repository git clone https://github.com/yourusername/codebase-convo.git cd codebase-convo # Create and activate virtual environment python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install dependencies pip install -r requirements.txt # Online installation - Pull the required Ollama model for embeddings ollama pull nomic-embed-text

Offline Installation

For offline environments, follow these steps:

Download the GGUF model file from Hugging Face:
- Go to: https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF
- Download the nomic-embed-text-v1.5.Q4_K_M.gguf file (84.1 MB)
Move the downloaded file to your offline machine

Create a modelfile named nomic-embed-text.modelfile with the following content:

FROM nomic-embed-text-v1.5.Q4_K_M.gguf PARAMETER temperature 0.0 PARAMETER embedding true PARAMETER mirostat 0 PARAMETER num_ctx 2048

Import the model to Ollama:

# Make sure both the GGUF file and modelfile are in the same directory ollama create nomic-embed-text -f nomic-embed-text.modelfile

Verify the model is available:
```
ollama list
```

Usage

# Basic usage python main.py --codebase-path /path/to/your/codebase # Rebuild the index python main.py --codebase-path /path/to/your/codebase --rebuild-index # Use mock mode (no Ollama required) python main.py --codebase-path /path/to/your/codebase --mock-mode

Testing

The project includes test scripts to verify functionality:

# Test the Chroma vector database implementation python test_chroma.py # Test the application with mock data python test_app.py

Project Structure

codebase-convo/ ├── src/ │ ├── indexing/ # Code indexing & processing │ ├── vector_db/ # Chroma vector database implementation │ ├── query_engine/ # Query processing and search │ ├── llm_interface/ # Ollama LLM integration │ └── conversation/ # Conversation management ├── tests/ # Unit and integration tests ├── main.py # Application entry point ├── requirements.txt # Project dependencies ├── test_chroma.py # Test script for Chroma implementation ├── test_app.py # Test script for application └── README.md # Project documentation

Vector Database

The application uses Chroma, a specialized vector database, for storing and retrieving code embeddings:

Efficient Similarity Search: Uses HNSW (Hierarchical Navigable Small World) algorithm for fast nearest-neighbor search
Persistent Storage: Embeddings are stored on disk for persistence between runs
Metadata Management: Stores code chunks with associated metadata for rich retrieval
Cosine Similarity: Uses cosine similarity for comparing embeddings

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Codebase Convo

Features

Installation

Offline Installation

Usage

Testing

Project Structure

Vector Database

License

About

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src		src
README.md		README.md
main.py		main.py
nomic-embed-text.modelfile		nomic-embed-text.modelfile
requirements.txt		requirements.txt
test_app.py		test_app.py
test_chroma.py		test_chroma.py
trackinglog.txt		trackinglog.txt

AIAfterDark/codebase-convo

Folders and files

Latest commit

History

Repository files navigation

Codebase Convo

Features

Installation

Offline Installation

Usage

Testing

Project Structure

Vector Database

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages