Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions .docker/minio/setup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
#!/bin/sh

# Simple script to set up MinIO bucket and user
# Based on example from MinIO issues

# Format bucket name to ensure compatibility
BUCKET_NAME=$(echo "${S3_BUCKET_NAME}" | tr '[:upper:]' '[:lower:]' | tr '_' '-')

# Configure MinIO client
mc alias set myminio http://minio:9000 ${MINIO_ROOT_USER} ${MINIO_ROOT_PASSWORD}

# Remove bucket if it exists (for clean setup)
mc rm -r --force myminio/${BUCKET_NAME} || true

# Create bucket
mc mb myminio/${BUCKET_NAME}

# Set bucket policy to allow downloads
mc anonymous set download myminio/${BUCKET_NAME}

# Create user with access and secret keys
mc admin user add myminio ${S3_ACCESS_KEY} ${S3_SECRET_KEY} || echo "User already exists"

# Create policy for the bucket
echo '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Action":["s3:*"],"Resource":["arn:aws:s3:::'${BUCKET_NAME}'/*","arn:aws:s3:::'${BUCKET_NAME}'"]}]}' > /tmp/policy.json

# Apply policy
mc admin policy create myminio gitingest-policy /tmp/policy.json || echo "Policy already exists"
mc admin policy attach myminio gitingest-policy --user ${S3_ACCESS_KEY}

echo "MinIO setup completed successfully"
echo "Bucket: ${BUCKET_NAME}"
echo "Access via console: http://localhost:9001"
23 changes: 23 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,26 @@ GITINGEST_SENTRY_PROFILE_LIFECYCLE=trace
GITINGEST_SENTRY_SEND_DEFAULT_PII=true
# Environment name for Sentry (default: "")
GITINGEST_SENTRY_ENVIRONMENT=development

# MinIO Configuration (for development)
# Root user credentials for MinIO admin access
MINIO_ROOT_USER=minioadmin
MINIO_ROOT_PASSWORD=minioadmin

# S3 Configuration (for application)
# Set to "true" to enable S3 storage for digests
# S3_ENABLED=true
# Endpoint URL for the S3 service (MinIO in development)
S3_ENDPOINT=http://minio:9000
# Access key for the S3 bucket (created automatically in development)
S3_ACCESS_KEY=gitingest
# Secret key for the S3 bucket (created automatically in development)
S3_SECRET_KEY=gitingest123
# Name of the S3 bucket (created automatically in development)
S3_BUCKET_NAME=gitingest-bucket
# Region for the S3 bucket (default for MinIO)
S3_REGION=us-east-1
# Public URL/CDN for accessing S3 resources
S3_ALIAS_HOST=127.0.0.1:9000/gitingest-bucket
# Optional prefix for S3 file paths (if set, prefixes all S3 paths with this value)
# S3_DIRECTORY_PREFIX=my-prefix
2 changes: 2 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,7 @@ repos:
files: ^src/
additional_dependencies:
[
boto3>=1.28.0,
click>=8.0.0,
'fastapi[standard]>=0.109.1',
httpx,
Expand All @@ -138,6 +139,7 @@ repos:
- --rcfile=tests/.pylintrc
additional_dependencies:
[
boto3>=1.28.0,
click>=8.0.0,
'fastapi[standard]>=0.109.1',
httpx,
Expand Down
85 changes: 85 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,8 @@ This is because Jupyter notebooks are asynchronous by default.

## 🐳 Self-host

### Using Docker

1. Build the image:

``` bash
Expand Down Expand Up @@ -239,6 +241,89 @@ The application can be configured using the following environment variables:
- **GITINGEST_SENTRY_PROFILE_SESSION_SAMPLE_RATE**: Sampling rate for profile sessions (default: "1.0", range: 0.0-1.0)
- **GITINGEST_SENTRY_PROFILE_LIFECYCLE**: Profile lifecycle mode (default: "trace")
- **GITINGEST_SENTRY_SEND_DEFAULT_PII**: Send default personally identifiable information (default: "true")
- **S3_ALIAS_HOST**: Public URL/CDN for accessing S3 resources (default: "127.0.0.1:9000/gitingest-bucket")
- **S3_DIRECTORY_PREFIX**: Optional prefix for S3 file paths (if set, prefixes all S3 paths with this value)

### Using Docker Compose

The project includes a `compose.yml` file that allows you to easily run the application in both development and production environments.

#### Compose File Structure

The `compose.yml` file uses YAML anchoring with `&app-base` and `<<: *app-base` to define common configuration that is shared between services:

```yaml
# Common base configuration for all services
x-app-base: &app-base
build:
context: .
dockerfile: Dockerfile
ports:
- "${APP_WEB_BIND:-8000}:8000" # Main application port
- "${GITINGEST_METRICS_HOST:-127.0.0.1}:${GITINGEST_METRICS_PORT:-9090}:9090" # Metrics port
# ... other common configurations
```

#### Services

The file defines three services:

1. **app**: Production service configuration
- Uses the `prod` profile
- Sets the Sentry environment to "production"
- Configured for stable operation with `restart: unless-stopped`

2. **app-dev**: Development service configuration
- Uses the `dev` profile
- Enables debug mode
- Mounts the source code for live development
- Uses hot reloading for faster development

3. **minio**: S3-compatible object storage for development
- Uses the `dev` profile (only available in development mode)
- Provides S3-compatible storage for local development
- Accessible via:
- API: Port 9000 ([localhost:9000](http://localhost:9000))
- Web Console: Port 9001 ([localhost:9001](http://localhost:9001))
- Default admin credentials:
- Username: `minioadmin`
- Password: `minioadmin`
- Configurable via environment variables:
- `MINIO_ROOT_USER`: Custom admin username (default: minioadmin)
- `MINIO_ROOT_PASSWORD`: Custom admin password (default: minioadmin)
- Includes persistent storage via Docker volume
- Auto-creates a bucket and application-specific credentials:
- Bucket name: `gitingest-bucket` (configurable via `S3_BUCKET_NAME`)
- Access key: `gitingest` (configurable via `S3_ACCESS_KEY`)
- Secret key: `gitingest123` (configurable via `S3_SECRET_KEY`)
- These credentials are automatically passed to the app-dev service via environment variables:
- `S3_ENDPOINT`: URL of the MinIO server
- `S3_ACCESS_KEY`: Access key for the S3 bucket
- `S3_SECRET_KEY`: Secret key for the S3 bucket
- `S3_BUCKET_NAME`: Name of the S3 bucket
- `S3_REGION`: Region for the S3 bucket (default: us-east-1)
- `S3_ALIAS_HOST`: Public URL/CDN for accessing S3 resources (default: "127.0.0.1:9000/gitingest-bucket")

#### Usage Examples

To run the application in development mode:

```bash
docker compose --profile dev up
```

To run the application in production mode:

```bash
docker compose --profile prod up -d
```

To build and run the application:

```bash
docker compose --profile prod build
docker compose --profile prod up -d
```

## 🤝 Contributing

Expand Down
110 changes: 110 additions & 0 deletions compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
# Common base configuration for all services
x-app-base: &app-base
ports:
- "${APP_WEB_BIND:-8000}:8000" # Main application port
- "${GITINGEST_METRICS_HOST:-127.0.0.1}:${GITINGEST_METRICS_PORT:-9090}:9090" # Metrics port
environment:
# Python Configuration
- PYTHONUNBUFFERED=1
- PYTHONDONTWRITEBYTECODE=1
# Host Configuration
- ALLOWED_HOSTS=${ALLOWED_HOSTS:-gitingest.com,*.gitingest.com,localhost,127.0.0.1}
# Metrics Configuration
- GITINGEST_METRICS_ENABLED=${GITINGEST_METRICS_ENABLED:-true}
- GITINGEST_METRICS_HOST=${GITINGEST_METRICS_HOST:-127.0.0.1}
- GITINGEST_METRICS_PORT=${GITINGEST_METRICS_PORT:-9090}
# Sentry Configuration
- GITINGEST_SENTRY_ENABLED=${GITINGEST_SENTRY_ENABLED:-false}
- GITINGEST_SENTRY_DSN=${GITINGEST_SENTRY_DSN:-}
- GITINGEST_SENTRY_TRACES_SAMPLE_RATE=${GITINGEST_SENTRY_TRACES_SAMPLE_RATE:-1.0}
- GITINGEST_SENTRY_PROFILE_SESSION_SAMPLE_RATE=${GITINGEST_SENTRY_PROFILE_SESSION_SAMPLE_RATE:-1.0}
- GITINGEST_SENTRY_PROFILE_LIFECYCLE=${GITINGEST_SENTRY_PROFILE_LIFECYCLE:-trace}
- GITINGEST_SENTRY_SEND_DEFAULT_PII=${GITINGEST_SENTRY_SEND_DEFAULT_PII:-true}
user: "1000:1000"
command: ["python", "-m", "uvicorn", "server.main:app", "--host", "0.0.0.0", "--port", "8000"]

services:
# Production service configuration
app:
<<: *app-base
image: ghcr.io/coderamp-labs/gitingest:latest
profiles:
- prod
environment:
- GITINGEST_SENTRY_ENVIRONMENT=${GITINGEST_SENTRY_ENVIRONMENT:-production}
restart: unless-stopped

# Development service configuration
app-dev:
<<: *app-base
build:
context: .
dockerfile: Dockerfile
profiles:
- dev
environment:
- DEBUG=true
- GITINGEST_SENTRY_ENVIRONMENT=${GITINGEST_SENTRY_ENVIRONMENT:-development}
# S3 Configuration
- S3_ENABLED=true
- S3_ENDPOINT=http://minio:9000
- S3_ACCESS_KEY=${S3_ACCESS_KEY:-gitingest}
- S3_SECRET_KEY=${S3_SECRET_KEY:-gitingest123}
# Use lowercase bucket name to ensure compatibility with MinIO
- S3_BUCKET_NAME=${S3_BUCKET_NAME:-gitingest-bucket}
- S3_REGION=${S3_REGION:-us-east-1}
# Public URL for S3 resources
- S3_ALIAS_HOST=${S3_ALIAS_HOST:-http://127.0.0.1:9000/${S3_BUCKET_NAME:-gitingest-bucket}}
volumes:
# Mount source code for live development
- ./src:/app:ro
# Use --reload flag for hot reloading during development
command: ["python", "-m", "uvicorn", "server.main:app", "--host", "0.0.0.0", "--port", "8000", "--reload"]
depends_on:
minio-setup:
condition: service_completed_successfully

# MinIO S3-compatible object storage for development
minio:
image: minio/minio:latest
profiles:
- dev
ports:
- "9000:9000" # API port
- "9001:9001" # Console port
environment:
- MINIO_ROOT_USER=${MINIO_ROOT_USER:-minioadmin}
- MINIO_ROOT_PASSWORD=${MINIO_ROOT_PASSWORD:-minioadmin}
volumes:
- minio-data:/data
command: server /data --console-address ":9001"
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
interval: 30s
timeout: 30s
start_period: 30s
start_interval: 1s

# MinIO setup service to create bucket and user
minio-setup:
image: minio/mc
profiles:
- dev
depends_on:
minio:
condition: service_healthy
environment:
- MINIO_ROOT_USER=${MINIO_ROOT_USER:-minioadmin}
- MINIO_ROOT_PASSWORD=${MINIO_ROOT_PASSWORD:-minioadmin}
- S3_ACCESS_KEY=${S3_ACCESS_KEY:-gitingest}
- S3_SECRET_KEY=${S3_SECRET_KEY:-gitingest123}
- S3_BUCKET_NAME=${S3_BUCKET_NAME:-gitingest-bucket}
volumes:
- ./.docker/minio/setup.sh:/setup.sh:ro
entrypoint: sh
command: -c /setup.sh

volumes:
minio-data:
driver: local
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ dev = [
]

server = [
"boto3>=1.28.0", # AWS SDK for S3 support
"fastapi[standard]>=0.109.1", # Minimum safe release (https://osv.dev/vulnerability/PYSEC-2024-38)
"prometheus-client",
"sentry-sdk[fastapi]",
Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
boto3>=1.28.0 # AWS SDK for S3 support
click>=8.0.0
fastapi[standard]>=0.109.1 # Vulnerable to https://osv.dev/vulnerability/PYSEC-2024-38
httpx
Expand Down
6 changes: 3 additions & 3 deletions src/gitingest/query_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,9 +44,9 @@ async def parse_remote_repo(source: str, token: str | None = None) -> IngestionQ
host = parsed_url.netloc
user, repo = _get_user_and_repo_from_path(parsed_url.path)

_id = str(uuid.uuid4())
_id = uuid.uuid4()
slug = f"{user}-{repo}"
local_path = TMP_BASE_PATH / _id / slug
local_path = TMP_BASE_PATH / str(_id) / slug
url = f"https://{host}/{user}/{repo}"

query = IngestionQuery(
Expand Down Expand Up @@ -132,7 +132,7 @@ def parse_local_dir_path(path_str: str) -> IngestionQuery:
"""
path_obj = Path(path_str).resolve()
slug = path_obj.name if path_str == "." else path_str.strip("/")
return IngestionQuery(local_path=path_obj, slug=slug, id=str(uuid.uuid4()))
return IngestionQuery(local_path=path_obj, slug=slug, id=uuid.uuid4())


async def _configure_branch_or_tag(
Expand Down
8 changes: 6 additions & 2 deletions src/gitingest/schemas/ingestion.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
from __future__ import annotations

from pathlib import Path # noqa: TC003 (typing-only-standard-library-import) needed for type checking (pydantic)
from uuid import UUID # noqa: TC003 (typing-only-standard-library-import) needed for type checking (pydantic)

from pydantic import BaseModel, Field

Expand All @@ -27,7 +28,7 @@ class IngestionQuery(BaseModel): # pylint: disable=too-many-instance-attributes
The URL of the repository.
slug : str
The slug of the repository.
id : str
id : UUID
The ID of the repository.
subpath : str
The subpath to the repository or file (default: ``"/"``).
Expand All @@ -47,6 +48,8 @@ class IngestionQuery(BaseModel): # pylint: disable=too-many-instance-attributes
The patterns to include.
include_submodules : bool
Whether to include all Git submodules within the repository. (default: ``False``)
s3_url : str | None
The S3 URL where the digest is stored if S3 is enabled.

"""

Expand All @@ -56,7 +59,7 @@ class IngestionQuery(BaseModel): # pylint: disable=too-many-instance-attributes
local_path: Path
url: str | None = None
slug: str
id: str
id: UUID
subpath: str = Field(default="/")
type: str | None = None
branch: str | None = None
Expand All @@ -66,6 +69,7 @@ class IngestionQuery(BaseModel): # pylint: disable=too-many-instance-attributes
ignore_patterns: set[str] = Field(default_factory=set) # TODO: ssame type for ignore_* and include_* patterns
include_patterns: set[str] | None = None
include_submodules: bool = Field(default=False)
s3_url: str | None = None

def extract_clone_config(self) -> CloneConfig:
"""Extract the relevant fields for the CloneConfig object.
Expand Down
6 changes: 3 additions & 3 deletions src/server/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,8 +71,8 @@ class IngestSuccessResponse(BaseModel):
Short form of repository URL (user/repo).
summary : str
Summary of the ingestion process including token estimates.
ingest_id : str
Ingestion id used to download full context.
digest_url : str
URL to download the full digest content (either S3 URL or local download endpoint).
tree : str
File tree structure of the repository.
content : str
Expand All @@ -89,7 +89,7 @@ class IngestSuccessResponse(BaseModel):
repo_url: str = Field(..., description="Original repository URL")
short_repo_url: str = Field(..., description="Short repository URL (user/repo)")
summary: str = Field(..., description="Ingestion summary with token estimates")
ingest_id: str = Field(..., description="Ingestion id used to download full context")
digest_url: str = Field(..., description="URL to download the full digest content")
tree: str = Field(..., description="File tree structure")
content: str = Field(..., description="Processed file content")
default_max_file_size: int = Field(..., description="File size slider position used")
Expand Down
Loading
Loading