๐ง Under construction
Graph-augmented semantic search for academic literature
Current frontend with mock data: semantic results + graph view
- ๐ Fetch papers from ArXiv API
- ๐ง Queue papers for embedding (deferred, async)
- ๐งฎ Track paper ingestion state in SQLite index
- ๐ฆ Index embeddings into Qdrant
- ๐ Search Qdrant with fastembed support
- ๐งฉ Merge vector + (planned) graph hits
- โ Type-safe, testable, modular pipeline
- โ๏ธ Health checks, typed interfaces, and task runner setup
- Python 3.12+
- Node.js 20+
- Docker + Docker Compose
make up-full # Run full stack with CPU make up-full USE_GPU=1 # Run with GPU (requires NVIDIA runtime)Services:
- API โ http://localhost:8000/docs
- Qdrant UI โ http://localhost:6333/dashboard
- Redis โ localhost:6379 (use
redis-cli)
poe server # Run the FastAPI backend poe pipeline # Run the end-to-end pipeline (logs steps) poe test # Run the testsSystem architecture diagram
flowchart TD subgraph API A1[GET /search] --> P["run_pipeline()"] end subgraph Pipeline P --> F[Discover papers from ArXiv] F --> I[Update PaperIndex] I --> Q[Enqueue papers if not embedded] Q --> S[Semantic Search] S --> G[Get related from GraphStore] G --> M[Merge vector + graph results] M --> R[Return SearchResults] end subgraph Vector Store V1["Qdrant (hosted/local)"] end subgraph Embedding Worker W1["Reads Redis queue"] W1 --> E[Embed papers] E --> V[Upsert to Qdrant] V --> U[Update PaperIndex status] end subgraph Graph Store G1["(Planned) Neo4j / in-memory graph"] end S -->|vector hits| V1 G -->|edges| G1 G1 -->|related| G Stack highlights:
- Backend: FastAPI + Pydantic + Poetry (with
poethepoettask runner) - Queue: Redis (Upstash or local)
- Vector DB: Qdrant
- Frontend: React + Vite + Tailwind
- Infra: Docker Compose + Fly.io
Thank you to arXiv for use of its open access interoperability.
MIT