Skip to content
#

chunking-algorithm

Here are 17 public repositories matching this topic...

SmartChunk is a lightweight, structure-aware semantic chunking toolkit designed to supercharge RAG (Retrieval-Augmented Generation) and LLM pipelines. Unlike naive splitters that break text arbitrarily, SmartChunk respects document structure (headings, lists, tables, code blocks) and semantic flow, ensuring cleaner, more coherent chunks.

  • Updated Oct 10, 2025
  • Python

Implementation of an interactive chatbot for summarizing legal and policy documents. Includes data preprocessing (cleaning, tokenization, chunking), extractive summarization baselines, and fine-tuned abstractive models (PEGASUS and LED). Integrates a retrieval layer for document relevance and uses ROUGE, BLEU, and cosine similarity for evaluation.

  • Updated Oct 18, 2025
  • Jupyter Notebook

Improve this page

Add a description, image, and links to the chunking-algorithm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the chunking-algorithm topic, visit your repo's landing page and select "manage topics."

Learn more