Skip to content

feat(retriever): add hybrid retrieval engine pipeline with lexical + semantic ranking#23

Open
Chethan-Regala wants to merge 11 commits intoAOSSIE-Org:mainfrom
Chethan-Regala:feat/hybrid-retrieval-engine
Open

feat(retriever): add hybrid retrieval engine pipeline with lexical + semantic ranking#23
Chethan-Regala wants to merge 11 commits intoAOSSIE-Org:mainfrom
Chethan-Regala:feat/hybrid-retrieval-engine

Conversation

@Chethan-Regala
Copy link

@Chethan-Regala Chethan-Regala commented Mar 13, 2026

Overview

This PR introduces a modular hybrid retrieval engine for Smart Notes.

The goal is to establish a retrieval architecture capable of combining lexical search (e.g. SQLite FTS5) with semantic similarity for future AI-assisted features.

The design follows an adapter-based architecture so the retrieval pipeline remains independent of specific storage or embedding implementations.


Architecture

image

The implemented pipeline follows this flow:

Query
→ lexical candidate search
→ embedding loading
→ cosine similarity scoring
→ hybrid ranking
→ top-k results

Hybrid ranking blends lexical and semantic signals using a weighted scoring function:

final_score = α * semantic_score + β * normalized_lexical_score

Where:

  • α = semantic weight
  • β = lexical weight

Lexical scores are normalized before blending to avoid scale mismatch with cosine similarity.


Key Components

RetrievalEngine

Orchestrates the full retrieval pipeline while remaining independent of storage implementations.

HybridScorer

Combines lexical and semantic signals into a final ranking score.

CosineSimilarity

Efficient vector similarity utility used for semantic scoring.

Types & Adapter Contracts

Defines clear interfaces for:

  • RetrievalStore
  • QueryEmbeddingProvider
  • SearchCandidate / SearchResult

This allows future integration with:

  • SQLite FTS5
  • embedding workers
  • RAG context assembly

Improvements After Code Audit

Based on automated review feedback, the following robustness improvements were added:

  • lexical score normalization before hybrid ranking
  • clarified weight contract (α = semantic, β = lexical)
  • widened lexical candidate pool for reranking (limit * 5)
  • strict validation for hybrid scoring weights
  • validation for limit input
  • strengthened embedding alignment contract
  • cleaned public API exports

Demo

A demo runner is included to demonstrate the pipeline without external dependencies.

Run:


npm run build
node dist/demo/RetrievalDemo.js


Why This Matters

This module lays the foundation for future AI capabilities in Smart Notes, including:

  • semantic search
  • hybrid retrieval
  • context assembly for RAG
  • AI-assisted note exploration

The design keeps the system:

  • offline-first
  • modular
  • storage-agnostic

Summary by CodeRabbit

  • New Features

    • Added a hybrid search engine combining semantic and keyword-based ranking with configurable scoring weights, cosine-similarity utility, and a unified public API surface.
  • Chores

    • Added a new retriever app with project config, package manifest, and TypeScript setup; updated top-level ignore rules.
  • Documentation / Demos

    • Included an end-to-end demo illustrating the retrieval pipeline and example results.

@coderabbitai
Copy link

coderabbitai bot commented Mar 13, 2026

Walkthrough

Adds a new TypeScript Smart Notes hybrid retrieval app: project config and tsconfig, core types and public barrel, cosine-similarity and hybrid-scoring utilities, a RetrievalEngine orchestration class, a demo with mock store/embedder, and updates to .gitignore and package.json.

Changes

Cohort / File(s) Summary
Configuration & Setup
/.gitignore, apps/retriever/package.json, apps/retriever/tsconfig.json
Adds ignore rules for dist/ and node_modules/, a new package.json for the retriever app (build script, devDeps), and a TypeScript project config (outDir dist, strict checks, declaration generation).
Types & Public API
apps/retriever/src/types.ts, apps/retriever/src/index.ts
Introduces core interfaces (SearchCandidate, SearchResult, RetrievalStore, QueryEmbeddingProvider, HybridScoreWeights) and a barrel file re-exporting RetrievalEngine, cosineSimilarity, scoreHybridResults, and types.
Core Utilities
apps/retriever/src/CosineSimilarity.ts
New exported functions vectorMagnitude and cosineSimilarity with input validation, single-pass dot/magnitude accumulation, zero-magnitude handling, and clamping to [-1,1].
Hybrid Scoring
apps/retriever/src/HybridScorer.ts
New scoreHybridResults exported function: validates inputs and weights, normalizes lexical scores, blends semantic and lexical signals using alpha/beta, and returns results sorted by final score.
Retrieval Orchestration
apps/retriever/src/RetrievalEngine.ts
New RetrievalEngine class (constructor: RetrievalStore, QueryEmbeddingProvider, HybridScoreWeights) implementing the pipeline: lexical search (pooling), embed query, batch load embeddings, compute cosine similarities, hybrid scoring, sort and truncate to requested limit.
Demo Implementation
apps/retriever/src/demo/RetrievalDemo.ts
Adds an end-to-end demo with MockStore and MockEmbedder to exercise the pipeline and print formatted results; runs a sample search using the new engine.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant RetrievalEngine
    participant RetrievalStore
    participant QueryEmbeddingProvider
    participant CosineSimilarity
    participant HybridScorer

    Client->>RetrievalEngine: search(query, limit)
    RetrievalEngine->>RetrievalStore: searchLexical(query, limit * 5)
    RetrievalStore-->>RetrievalEngine: candidates[]
    RetrievalEngine->>QueryEmbeddingProvider: embedQuery(query)
    QueryEmbeddingProvider-->>RetrievalEngine: queryEmbedding
    RetrievalEngine->>RetrievalStore: loadEmbeddings(chunkIds)
    RetrievalStore-->>RetrievalEngine: chunkEmbeddings[][]
    RetrievalEngine->>CosineSimilarity: cosineSimilarity(queryEmbedding, chunkEmbedding)
    CosineSimilarity-->>RetrievalEngine: semanticScores[]
    RetrievalEngine->>HybridScorer: scoreHybridResults(candidates, semanticScores, weights)
    HybridScorer-->>RetrievalEngine: SearchResult[] (sorted)
    RetrievalEngine->>RetrievalEngine: truncate to limit
    RetrievalEngine-->>Client: SearchResult[]
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

Typescript Lang

Poem

🐰 I hop through vectors, tiny and spry,

I mix the words with meaning in my eye.
Chunks align, the scores take flight,
Rankings bloom in morning light.
A little rabbit cheers — retrieval's right.

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: introducing a hybrid retrieval engine that combines lexical and semantic ranking.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added size/XL and removed size/XL labels Mar 13, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/retriever/src/demo/RetrievalDemo.ts`:
- Around line 37-43: The mock currently ignores the RetrievalStore contract:
update searchLexical to honor the incoming limit by returning at most limit
candidates (preserving deterministic ordering of the hard-coded candidate set)
and update loadEmbeddings to return embeddings only for the requested chunkIds
in the same order as chunkIds (preserving one embedding per chunkId); locate the
hard-coded candidates array in RetrievalDemo.ts and change searchLexical to
slice it by Math.max(0, limit) and change loadEmbeddings to map over the
incoming chunkIds to produce exactly one vector per id (or throw/return null for
missing ids) so cardinality and ordering match the real RetrievalStore behavior.
- Around line 15-20: The imports RetrievalStore, QueryEmbeddingProvider,
HybridScoreWeights, and SearchCandidate are only used as type annotations;
change the import to be type-only by using an "import type" statement for those
symbols from "../types" (so the runtime require is not emitted) and keep the
rest of the file unchanged.

In `@apps/retriever/src/HybridScorer.ts`:
- Line 10: The import in HybridScorer.ts should be type-only: update the import
of SearchCandidate, SearchResult, and HybridScoreWeights to use a type-only
import so the compiled output doesn't include a runtime require("./types");
locate the top-level import statement that currently imports these symbols and
change it to an `import type` form referencing SearchCandidate, SearchResult,
and HybridScoreWeights.
- Around line 99-113: The current lexical normalization uses reduce(..., 0) to
compute maxLexicalScore and falls back to 0 when max <= 0, which zeroes out
lexical signal for all-negative score sets; update the logic in the block that
computes maxLexicalScore and normalizedLexicalScore (around candidates.reduce,
maxLexicalScore, normalizedLexicalScore, finalScore) to either validate adapters
emit non-negative lexicalScore or—preferably—compute both minLexicalScore and
maxLexicalScore and normalize each candidate via (lexicalScore -
minLexicalScore) / (maxLexicalScore - minLexicalScore) when range > 0 (otherwise
fall back to 0.0), then use that normalized value in finalScore
(weights.alpha/weights.beta unchanged).

In `@apps/retriever/src/index.ts`:
- Around line 9-13: The example import in the docblock uses the wrong package
name; update the example to import from "retriever" (matching
apps/retriever/package.json) instead of "@smart-notes/retriever". Edit the
import lines that reference RetrievalEngine, cosineSimilarity,
scoreHybridResults and the type imports SearchResult, HybridScoreWeights in the
example at the top of apps/retriever/src/index.ts so they use "retriever", and
ensure both example import and type import use the same package string.

In `@apps/retriever/src/RetrievalEngine.ts`:
- Around line 10-15: The import statement is pulling in type-only symbols at
runtime; change the import to a type-only import by using "import type" for
RetrievalStore, QueryEmbeddingProvider, HybridScoreWeights, and SearchResult in
RetrievalEngine.ts so the compiler only emits types and no runtime import
side-effects—update the existing import line that currently reads "import {
RetrievalStore, QueryEmbeddingProvider, HybridScoreWeights, SearchResult } from
'./types';" to use "import type" for those symbols and leave other non-type
imports unchanged.
- Around line 120-129: The semanticScores array currently uses raw
cosineSimilarity values (range [-1,1]) which must be normalized to [0,1] before
hybrid blending; update the computation that builds semanticScores (the variable
computed from cosineSimilarity) to normalize each score via (score + 1) / 2,
then pass those normalized values into scoreHybridResults so HybridScorer's
blend (weights.alpha * semanticScore + weights.beta * normalizedLexicalScore)
uses aligned scales for correct ranking of candidates.

In `@apps/retriever/src/types.ts`:
- Around line 67-72: Update the JSDoc for finalScore to match the implementation
in scoreHybridResults: state that lexicalScore is normalized before blending,
that scoreHybridResults enforces/assumes alpha + beta ≈ 1 (throws if not), and
that finalScore is the weighted blend of the normalized lexical score and
semanticScore computed after normalization; apply the same clarification to the
other comment block referenced (lines around the second occurrence) and
reference scoreHybridResults, finalScore, lexicalScore, alpha and beta so
adapter authors understand the exact contract.

In `@apps/retriever/tsconfig.json`:
- Around line 2-11: Enable TypeScript declaration emission by updating
tsconfig.json's compilerOptions to include "declaration": true (and optionally
"declarationMap": true) so .d.ts files are emitted into the existing "outDir":
"dist"; ensure "rootDir": "src" and "include": ["src"] still cover your public
API surface (types like SearchResult, HybridScoreWeights, RetrievalStore), and
add "types": "dist/index.d.ts" to package.json to point consumers to the
generated declaration entry if applicable.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b784b02a-c6af-4b05-9807-3d13788d91f5

📥 Commits

Reviewing files that changed from the base of the PR and between a3ccb2b and 90b4fee.

⛔ Files ignored due to path filters (1)
  • apps/retriever/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (9)
  • .gitignore
  • apps/retriever/package.json
  • apps/retriever/src/CosineSimilarity.ts
  • apps/retriever/src/HybridScorer.ts
  • apps/retriever/src/RetrievalEngine.ts
  • apps/retriever/src/demo/RetrievalDemo.ts
  • apps/retriever/src/index.ts
  • apps/retriever/src/types.ts
  • apps/retriever/tsconfig.json

@github-actions github-actions bot added size/XL and removed size/XL labels Mar 14, 2026
@Chethan-Regala
Copy link
Author

Thanks for the suggestion ! I've updated the implementation to address this and pushed the fix in the latest commit.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
apps/retriever/src/HybridScorer.ts (1)

39-60: ⚠️ Potential issue | 🟡 Minor

Publish the actual scoring contract in this docblock.

This export still documents blending raw lexicalScore, says non-1 weight sums only warn, and only lists a length-mismatch RangeError. The implementation now min/max normalizes lexical scores, throws on invalid weights, and relies on callers to pass semanticScores on the expected scale, so this docblock is no longer an accurate API contract.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/retriever/src/HybridScorer.ts` around lines 39 - 60, The JSDoc for
HybridScorer's scoring API is outdated: update the docblock on HybridScorer (the
scoring function that takes candidates, semanticScores, and weights) to
accurately state that lexicalScore values are min/max normalized before
blending, that weights (weights.alpha and weights.beta) must be valid (e.g., sum
to 1 or meet the function's validation) and the function now throws on invalid
weights, that semanticScores are expected to already be cosine-similarity-scaled
by the caller, and that the function throws on candidates/semanticScores length
mismatch; also keep the note that results are returned sorted by finalScore
descending and document the exact finalScore formula used after normalization.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/retriever/src/HybridScorer.ts`:
- Around line 84-97: The validation currently in scoreHybridResults lets
NaN/Infinity through because comparisons with < and Math.abs(NaN - 1) are false;
add an explicit finite-number check using Number.isFinite for weights.alpha and
weights.beta (reject if not finite) before the existing non-negative check so
you throw a RangeError early; keep the existing checks for non-negativity and
sum-to-1 (using WEIGHT_SUM_TOLERANCE) after this new finite validation to avoid
producing NaN finalScore values later.

In `@apps/retriever/src/RetrievalEngine.ts`:
- Around line 66-67: Update the public doc comment for the RetrievalEngine
`limit` parameter to accurately state that the engine requests a widened
candidate pool from the upstream lexical search (currently expands to limit * 5
candidates before reranking) so callers and store adapters understand the actual
query load; reference the RetrievalEngine class and the `limit` parameter and
explicitly mention the candidate expansion multiplier (5) and that the final
ranked output is trimmed to `limit`.

In `@apps/retriever/tsconfig.json`:
- Around line 5-13: The base tsconfig currently hardcodes "rootDir": "src" and
"include": ["src"], which breaks when package-level .ts files exist; remove
"rootDir" (and any emit-only settings like "outDir" or
"declaration"/"declarationMap") from the base tsconfig and create a new
build-only tsconfig (e.g., tsconfig.build.json) that sets "rootDir": "src",
"outDir": "dist", and other emit options; update the base tsconfig "include" to
["*.ts", "src"] (or simply ["**/*.ts"]) so files like vitest.config.ts and
eslint.config.ts are type-checked while keeping compilation settings isolated in
tsconfig.build.json referenced by build scripts or "extends" as appropriate.

---

Duplicate comments:
In `@apps/retriever/src/HybridScorer.ts`:
- Around line 39-60: The JSDoc for HybridScorer's scoring API is outdated:
update the docblock on HybridScorer (the scoring function that takes candidates,
semanticScores, and weights) to accurately state that lexicalScore values are
min/max normalized before blending, that weights (weights.alpha and
weights.beta) must be valid (e.g., sum to 1 or meet the function's validation)
and the function now throws on invalid weights, that semanticScores are expected
to already be cosine-similarity-scaled by the caller, and that the function
throws on candidates/semanticScores length mismatch; also keep the note that
results are returned sorted by finalScore descending and document the exact
finalScore formula used after normalization.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 663714ab-04af-483f-a450-e2ea86ab3728

📥 Commits

Reviewing files that changed from the base of the PR and between 90b4fee and 58ad570.

📒 Files selected for processing (6)
  • apps/retriever/src/HybridScorer.ts
  • apps/retriever/src/RetrievalEngine.ts
  • apps/retriever/src/demo/RetrievalDemo.ts
  • apps/retriever/src/index.ts
  • apps/retriever/src/types.ts
  • apps/retriever/tsconfig.json

@github-actions github-actions bot added size/XL and removed size/XL labels Mar 14, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/retriever/src/RetrievalEngine.ts`:
- Around line 106-114: Wrap the embedding calls in RetrievalEngine (the call to
this.embedder.embedQuery and this.store.loadEmbeddings) with explicit try/catch
so failures are caught and annotated with context; on error, log or rethrow a
new Error that includes which operation failed (embedQuery vs loadEmbeddings),
the query or chunkIds involved (or their counts/ids), and the original error to
preserve the stack, and consider returning a sensible fallback (empty
embeddings/empty candidates) or letting the caller handle the failure. Ensure
you update the code paths that use queryEmbedding and embeddings to handle a
thrown or fallback result.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4ee3361b-9ecf-493c-af0d-b950cd3f9ebd

📥 Commits

Reviewing files that changed from the base of the PR and between 58ad570 and ad44737.

📒 Files selected for processing (2)
  • apps/retriever/src/HybridScorer.ts
  • apps/retriever/src/RetrievalEngine.ts

@github-actions github-actions bot added size/XL and removed size/XL labels Mar 15, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/retriever/src/RetrievalEngine.ts`:
- Around line 127-131: The error handler in RetrievalEngine currently injects
raw user query text into wrapOperationError in the embedQuery block (and
similarly includes full chunk ID lists elsewhere); change these to avoid leaking
sensitive data by passing redacted values (e.g., replace query with "<redacted>"
or a fixed-length hash/summary) and replace explicit ID arrays with safe
metadata such as the count of chunks or hashed/trimmed IDs; update the calls to
wrapOperationError in embedQuery and the other location (the block handling
chunk IDs) to include only non-sensitive context (e.g., queryLength or
chunkCount and optionally a deterministic short hash) instead of the full query
or full ID lists.
- Around line 102-106: The check in RetrievalEngine.search currently only
enforces limit > 0 but needs an upper bound to avoid resource amplification (the
code later multiplies limit by 5); add a MAX_LIMIT constant (e.g.
MAX_SEARCH_LIMIT) and enforce that limit is an integer between 1 and MAX_LIMIT
in the existing validation for limit (update the RangeError message
accordingly), and ensure any subsequent calculation (the multiplication at the
adjusted-limit step where limit is multiplied by 5) uses the validated/capped
value so callers cannot trigger excessive lexical lookups, embeddings, or
ranking work.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4ecbd3b5-1fb9-4b57-aa30-636e29d3e9e3

📥 Commits

Reviewing files that changed from the base of the PR and between ad44737 and fdab2e5.

📒 Files selected for processing (1)
  • apps/retriever/src/RetrievalEngine.ts

Comment on lines +102 to +106
if (!Number.isInteger(limit) || limit <= 0) {
throw new RangeError(
`RetrievalEngine.search: limit must be a positive integer (got ${limit}).`
);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add an upper bound for limit to prevent resource amplification.

Line 114 multiplies caller input by 5. With no max cap at Line 102-106, a large limit can trigger expensive lexical lookup, embedding loads, and ranking work.

⚙️ Proposed fix
+const MAX_LIMIT = 100;
...
-    if (!Number.isInteger(limit) || limit <= 0) {
+    if (!Number.isInteger(limit) || limit <= 0 || limit > MAX_LIMIT) {
       throw new RangeError(
-        `RetrievalEngine.search: limit must be a positive integer (got ${limit}).`
+        `RetrievalEngine.search: limit must be an integer between 1 and ${MAX_LIMIT} (got ${limit}).`
       );
     }

As per coding guidelines: "The code adheres to best practices recommended for performance".

Also applies to: 114-115

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/retriever/src/RetrievalEngine.ts` around lines 102 - 106, The check in
RetrievalEngine.search currently only enforces limit > 0 but needs an upper
bound to avoid resource amplification (the code later multiplies limit by 5);
add a MAX_LIMIT constant (e.g. MAX_SEARCH_LIMIT) and enforce that limit is an
integer between 1 and MAX_LIMIT in the existing validation for limit (update the
RangeError message accordingly), and ensure any subsequent calculation (the
multiplication at the adjusted-limit step where limit is multiplied by 5) uses
the validated/capped value so callers cannot trigger excessive lexical lookups,
embeddings, or ranking work.

Comment on lines +127 to +131
try {
queryEmbedding = await this.embedder.embedQuery(query);
} catch (error: unknown) {
throw wrapOperationError("embedQuery", `query="${query}"`, error);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid logging raw query text and full chunk ID lists in thrown errors.

Line 130 and Line 145 embed potentially sensitive payloads directly into error messages. This can leak user input/internal identifiers via logs and telemetry.

🔒 Proposed fix
-      throw wrapOperationError("embedQuery", `query="${query}"`, error);
+      throw wrapOperationError("embedQuery", `queryLength=${query.length}`, error);
...
-        `chunkCount=${chunkIds.length}, chunkIds=[${chunkIds.join(", ")}]`,
+        `chunkCount=${chunkIds.length}`,

As per coding guidelines: "No exposed API keys or sensitive data".

Also applies to: 143-146

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/retriever/src/RetrievalEngine.ts` around lines 127 - 131, The error
handler in RetrievalEngine currently injects raw user query text into
wrapOperationError in the embedQuery block (and similarly includes full chunk ID
lists elsewhere); change these to avoid leaking sensitive data by passing
redacted values (e.g., replace query with "<redacted>" or a fixed-length
hash/summary) and replace explicit ID arrays with safe metadata such as the
count of chunks or hashed/trimmed IDs; update the calls to wrapOperationError in
embedQuery and the other location (the block handling chunk IDs) to include only
non-sensitive context (e.g., queryLength or chunkCount and optionally a
deterministic short hash) instead of the full query or full ID lists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant