[Feature] Improve retrieval quality by adding reranking layer to HybridRetriever

## Problem

The current `HybridRetriever` in `csv_chroma.py` retrieves documents 
from multiple subdirectories using BM25 + SelfQuery + MultiQuery 
expansion, resulting in ~90 documents being passed to 
`create_stuff_documents_chain`.

All retrieved documents are stuffed directly into the LLM prompt 
with no relevance filtering across subdirectories. This causes:

- Responses becoming increasingly long as more data sources are added
- No cross-subdirectory relevance ranking - a low-relevance document 
  from one subdirectory is treated equally to a high-relevance document 
  from another
- LLM receiving noisy context which reduces answer precision

## Proposed Solution

Add a reranking layer after `weighted_reciprocal_rank` and before 
returning documents in both `retrieve_documents()` and 
`aretrieve_documents()`.

The reranker scores all retrieved documents against the original user 
query using a cross-encoder model and returns only the top N most 
relevant documents regardless of which subdirectory they came from.

**Implementation:**
- Add `src/retrievers/reranker.py` using FlashRank 
  (`ms-marco-MiniLM-L-12-v2`)
- Modify return statements in both sync and async retrieve methods 
  in `csv_chroma.py`
- Add reranker configuration to `config_default.yml`

**Why FlashRank:**
- Runs locally - no API key required
- CPU only - no GPU needed
- Lightweight (~4MB model)
- Already compatible with existing `list[Document]` pipeline

## Impact

- Applies automatically to both Reactome and UniProt retrievers 
  since `csv_chroma.py` is shared
- Any future database integrations get reranking for free
- Response length directly controlled via `top_n` config parameter
- Zero changes to downstream pipeline - same `list[Document]` type 
  returned throughout



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Improve retrieval quality by adding reranking layer to HybridRetriever #115

Problem

Proposed Solution

Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Improve retrieval quality by adding reranking layer to HybridRetriever #115

Description

Problem

Proposed Solution

Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions