feat(retriever): add token-aware context truncation to cap documents passed to LLM by GovindhKishore · Pull Request #139 · reactome/reactome_chatbot

GovindhKishore · 2026-03-10T21:41:03Z

Summary

Adds token-aware context truncation to HybridRetriever to cap the number of documents and tokens passed to the LLM. Fixes the unbounded document passing problem described in #138.

Motivation

create_stuff_documents_chain in rag_chain.py stuffs ALL retrieved documents into the LLM context with zero token awareness. With two databases installed (Reactome + UniProt), this results in 60-100 documents per query reaching the LLM, causing:

Verbose, unfocused answers from too much noisy context
High token cost depending on the number of tokens in a document and the current OpenAI API cost.
"Lost in the middle" degradation - LLMs ignore middle documents in long contexts.

Changes

New - `src/util/context_truncator.py`

truncate_to_token_limit() - truncates a ranked document list to fit within a token budget and document count limit
Uses tiktoken for exact GPT-4o token counting - already installed via langchain-openai, zero new dependencies
Processes documents in order (best first) and stops when either limit is hit - least relevant documents always removed first
Guarantees at least one document is always returned even if it exceeds the token budget, preventing empty context edge case

Modified - `src/retrievers/csv_chroma.py`

retrieve_documents() - returns truncate_to_token_limit(subdirectory_docs) instead of raw subdirectory_docs
aretrieve_documents() - same change applied to async retrieval path
Added import for truncate_to_token_limit from util.context_truncator

Modified - `config_default.yml`

Added retriever.context_truncation block with max_docs: 15 and max_tokens: 12000
Config wiring to HybridRetriever can be added later - current defaults are hardcoded

Why These Default Values

max_docs: 15
  15 high quality ranked docs provides sufficient context for any question
  beyond this quality degrades due to lost-in-the-middle effect

max_tokens: 12000
  leaves room for system prompt + chat history + answer
  within GPT-4o's 128k context window

Expected Impact

Limits context passed to the LLM to a maximum of 15 documents and 12,000 tokens per query, down from an unbounded 60-100 documents. This reduces token usage significantly and avoids the "lost in the middle" quality degradation that occurs with excessively long contexts.

Note: Exact token counts per document depend on installed database versions and chunk sizes. A follow-up evaluation with real embeddings will quantify the precise reduction.

Note: GPT-4o supports a 128k context window but the 12,000 token limit is intentional - it reserves space for system prompt, chat history, and model output, while avoiding the "lost in the middle" quality degradation that occurs with excessively long contexts.

Interaction With Other PRs

Truncation runs at the end of retrieve_documents() and aretrieve_documents() - after WRR ranking and after FlashRank reranking (PR #116). This means truncation always removes the least relevant documents from last, preserving quality ordering established by both ranking stages.

BM25 + Vector retrieval
      ->
weighted_reciprocal_rank
      ->
FlashRank reranking (PR #116)
      ->
truncate_to_token_limit()   
      ->
create_stuff_documents_chain

Closes fix(retriever): HybridRetriever passes unlimited documents to LLM with no token budget #138
Complements PR Add FlashRank reranker to HybridRetriever to improve retrieval quality #116 (FlashRank reranking)
Complements PR Improve response precision by refining system prompts for Reactome and UniProt retrievers #121 (prompt precision)

feat(retriever): add token-aware context truncation to HybridRetriever

ba77670

GovindhKishore mentioned this pull request Mar 28, 2026

fix(retriever): HybridRetriever passes unlimited documents to LLM with no token budget #138

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(retriever): add token-aware context truncation to cap documents passed to LLM#139

feat(retriever): add token-aware context truncation to cap documents passed to LLM#139
GovindhKishore wants to merge 1 commit intoreactome:mainfrom
GovindhKishore:feat/token-aware-context-truncation

GovindhKishore commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

GovindhKishore commented Mar 10, 2026

Summary

Motivation

Changes

New - src/util/context_truncator.py

Modified - src/retrievers/csv_chroma.py

Modified - config_default.yml

Why These Default Values

Expected Impact

Interaction With Other PRs

Related

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

New - `src/util/context_truncator.py`

Modified - `src/retrievers/csv_chroma.py`

Modified - `config_default.yml`