Skip to content

Conversation

@waynechu1109
Copy link

Summary / Motivation

This pull request introduces COFT into torch_geometric.llm, a modular component designed to reduce hallucinations in retrieval-augmented and knowledge-grounded LLM workflows.

While PyG already provides various utilities for LLM integration, there is currently no built-in mechanism for context selection, entity-driven scoring, or highlight-based grounding. COFT fills this gap by offering a plug-and-play module that:

  • identifies key entities using a graph-aware recall step,
  • scores them via contextual weighting,
  • and highlights important spans in the reference text at different granularities (paragraph, sentence, word).

This reflects the methodology proposed in the COFT research work and enables more accurate downstream LLM reasoning.


What This PR Adds

New modules

  • COFT: main highlighting pipeline with recaller, scorer, and selector
  • Graph-based “recaller” using entity alias dictionaries
  • Contextual weight “scorer” integrating TF-ISF + language-model self-information
  • Dynamic threshold “selector” supporting paragraph, sentence, and word-level highlighting

New example script

examples/coft.py

Demonstrates full end-to-end usage with torch_geometric.llm.LLM.

New unit tests

Located under test/llm/models/test_coft.py, covering:

  • candidate recall
  • word/sentence/paragraph-level highlighting
  • consistency on various granularities

Why This Is Useful

Large-context reasoning can hallucinate when irrelevant text overwhelms the LLM.

COFT significantly improves robustness by:

  • reducing distractions in long contexts
  • grounding LLM reasoning to graph-derived key entities
  • improving interpretability (highlighted spans act as an attention bottleneck)
  • supporting CPU-friendly scoring models when running lightweight LLMs

This module complements PyG’s direction toward graph-assisted LLMs and aligns well with existing efforts such as RAG, graph prompting, and KG-augmented workflows.


Breaking Changes

No breaking changes introduced.

COFT is self-contained and does not modify existing LLM APIs.


Example Usage

from torch_geometric.llm.models import LLM, Granularity, COFT llm = LLM("Qwen/Qwen2.5-0.5B-Instruct") coft = COFT(llm, triplets, entity_alias) highlighted = coft( query="What nutrients do apples provide?", reference=text, granularity=Granularity.SENTENCE, ) print(highlighted)

Test Plan

All tests pass:

pytest test/llm/models/test_coft.py -v 

Manual validation:

python examples/coft.py 

Both example results and unit tests confirm consistent highlighting behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant