Skip to content

feat(platform): improve web tool modes, search UX, and context builder#1551

Merged
larryro merged 2 commits into
mainfrom
feat/web-tool-search-improvements
Apr 16, 2026
Merged

feat(platform): improve web tool modes, search UX, and context builder#1551
larryro merged 2 commits into
mainfrom
feat/web-tool-search-improvements

Conversation

@larryro

@larryro larryro commented Apr 16, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Web tool discriminated union: Refactored the web tool from URL auto-detection to an explicit mode parameter (fetch / search), making the API clearer and more predictable for the LLM.
  • Similarity threshold tuning: Raised the default similarity threshold from 0.4 to 0.51 across RAG search, web search, and web context queries to reduce low-relevance results.
  • No-results UX: When search returns no results, the response now lists available indexed websites so the agent can guide users toward what's searchable or suggest fetch mode instead.
  • Context builder prompt message fix: Pass promptMessageId through to the structured context builder so it skips the exact message used as the prompt, rather than always dropping the last user message — fixes context loss when the prompt is a system message (e.g. location response).

Test plan

  • Verify web tool works in fetch mode with a direct URL (web page, PDF, image)
  • Verify web tool works in search mode with and without domain filter
  • Confirm no-results messages include the list of indexed websites
  • Confirm RAG search filters out lower-relevance results with the new 0.51 threshold
  • Verify agent context includes the correct user message history (no dropped messages)

Summary by CodeRabbit

  • New Features

    • Added website knowledge base summaries display, showing indexed websites with titles, descriptions, and page counts.
  • Improvements

    • Adjusted search similarity threshold from 0.4 to 0.51 for improved result relevance.
    • Refactored web search tool to use explicit fetch vs search modes for clarity.
    • Enhanced fetch mode to include source citations in responses.
    • Improved context handling for continued agent responses.

Refactor the web tool to use a discriminated union (fetch/search) instead
of auto-detecting URLs in the query string. Raise the default similarity
threshold from 0.4 to 0.51 across RAG and web search. Show available
indexed websites in no-results messages to help users understand what is
searchable. Fix the structured context builder to skip the correct prompt
message by ID rather than always dropping the last user message.
@coderabbitai

coderabbitai Bot commented Apr 16, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

This PR updates default similarity thresholds from 0.4 to 0.51 across RAG and web search operations, adds website summary formatting functionality via a new helper module, refactors the web tool's public API from implicit URL detection to explicit mode-based discrimination ('fetch' vs 'search' modes), threads a promptMessageId parameter through agent response generation and context building for improved message tracking, and introduces an internal query for retrieving website summaries by organization ID.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly and concisely summarizes the primary changes: web tool mode refactoring, search UX improvements, and context builder enhancements.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/web-tool-search-improvements

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@services/platform/convex/agent_tools/web/web_tool.ts`:
- Around line 83-89: The code drops args.query for non-file URLs by only setting
instruction when isFileUrl(args.url) is true; update the fetch branch so
fetchAndExtract always receives the extraction instruction (pass instruction:
args.query or undefined) instead of conditionalizing on isFileUrl, or if you
intend file-only behavior, explicitly enforce/validate that in the fetch-mode
contract; change the call site in the fetch branch that invokes fetchAndExtract
to pass args.query (referencing args.mode, isFileUrl, fetchAndExtract, and
args.query).
- Around line 111-119: Replace the inline type cast "as const" by giving the
citations array an explicit type annotation (e.g. const citations: WebCitation[]
or Citation[] depending on your domain types) and construct the object with
matching property types from result (index, type: 'web', source, url,
relevance). Update the declaration for the variable named citations in
web_tool.ts and ensure the chosen type (WebCitation/Citation) defines type:
'web' as a literal union so no casting is required; remove the "as const" from
the object literal.

In `@services/platform/convex/websites/internal_queries.ts`:
- Around line 93-99: The loop currently only skips 'deleting'/'error' via
excludeStatuses, allowing 'idle'/'scanning' sites to be listed; change the
filter in the websites iterator (the block using
ctx.db.query('websites').withIndex('by_organizationId', ...) and the subsequent
if (website.status ...) check) to only allow truly searchable sites by requiring
website.status === 'active' and, if applicable, website.pageCount > 0 (or
pageCount ?? 0 > 0); replace the existing excludeStatuses logic with this
explicit active+pageCount check so only indexed/searchable websites are
returned.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d0b013e2-83cc-4c6f-94a3-91a85d3b42b9

📥 Commits

Reviewing files that changed from the base of the PR and between f3c51cc and 7f5823f.

⛔ Files ignored due to path filters (1)
  • services/platform/convex/_generated/api.d.ts is excluded by !**/_generated/**
📒 Files selected for processing (9)
  • services/platform/convex/agent_tools/rag/query_rag_context.ts
  • services/platform/convex/agent_tools/rag/rag_search_tool.ts
  • services/platform/convex/agent_tools/web/helpers/format_website_summaries.ts
  • services/platform/convex/agent_tools/web/helpers/query_web_context.ts
  • services/platform/convex/agent_tools/web/helpers/search_pages.ts
  • services/platform/convex/agent_tools/web/web_tool.ts
  • services/platform/convex/lib/agent_response/generate_response.ts
  • services/platform/convex/lib/context_management/structured_context_builder.ts
  • services/platform/convex/websites/internal_queries.ts

Comment on lines +83 to 89
if (args.mode === 'fetch') {
const instruction = isFileUrl(args.url) ? args.query : undefined;

const result = await fetchAndExtract(ctx, {
url: targetUrl,
url: args.url,
instruction,
});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t drop fetch instructions for normal web pages.

query is documented as the extraction instruction for fetch mode, but Line 84 only forwards it when the URL looks like a file. A call like { mode: "fetch", url: "https://example.com/pricing", query: "extract the enterprise limits" } will ignore the instruction and return an unguided extraction. Pass args.query through for all fetches, or narrow the fetch-mode contract to file-only instructions.

🔧 Proposed fix
-      if (args.mode === 'fetch') {
-        const instruction = isFileUrl(args.url) ? args.query : undefined;
+      if (args.mode === 'fetch') {
+        const instruction = args.query;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@services/platform/convex/agent_tools/web/web_tool.ts` around lines 83 - 89,
The code drops args.query for non-file URLs by only setting instruction when
isFileUrl(args.url) is true; update the fetch branch so fetchAndExtract always
receives the extraction instruction (pass instruction: args.query or undefined)
instead of conditionalizing on isFileUrl, or if you intend file-only behavior,
explicitly enforce/validate that in the fetch-mode contract; change the call
site in the fetch branch that invokes fetchAndExtract to pass args.query
(referencing args.mode, isFileUrl, fetchAndExtract, and args.query).

Comment on lines +111 to +119
const citations = [
{
index: 1,
type: 'web' as const,
source: result.title ?? result.url,
url: result.url,
relevance: 1,
},
];

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the file and understand the context
cat -n services/platform/convex/agent_tools/web/web_tool.ts | head -150

Repository: tale-project/tale

Length of output: 5793


🏁 Script executed:

# Check if there are type definitions for citation objects
rg -A 5 -B 5 "citations" services/platform/convex/agent_tools/web/web_tool.ts

Repository: tale-project/tale

Length of output: 962


🏁 Script executed:

# Search for type definitions related to citations in the codebase
fd -t f "\.ts$" -path "*/agent_tools/*" -o -path "*/types/*" | xargs rg -l "citation" -i 2>/dev/null | head -20

Repository: tale-project/tale

Length of output: 1398


🏁 Script executed:

# Look for ToolDefinition type to understand expected return structure
rg -A 10 "type ToolDefinition" services/platform/convex/agent_tools/

Repository: tale-project/tale

Length of output: 43


🏁 Script executed:

# Check the searchPages function to see what citation type it returns
cat -n services/platform/convex/agent_tools/web/helpers/search_pages.ts | head -80

Repository: tale-project/tale

Length of output: 2565


🏁 Script executed:

# Look for citation type definitions
cat -n services/platform/convex/openai_compat/citations.ts

Repository: tale-project/tale

Length of output: 5149


🏁 Script executed:

# Verify the return type of the tool to ensure explicit annotation will work
rg -B 5 -A 15 "createTool" services/platform/convex/agent_tools/web/web_tool.ts | head -40

Repository: tale-project/tale

Length of output: 1768


🏁 Script executed:

# Check if Citation interface is exported from search_pages.ts
rg "export.*Citation" services/platform/convex/agent_tools/web/helpers/search_pages.ts

Repository: tale-project/tale

Length of output: 43


🏁 Script executed:

# Check the full exports and structure of search_pages.ts
tail -50 services/platform/convex/agent_tools/web/helpers/search_pages.ts

Repository: tale-project/tale

Length of output: 1499


🏁 Script executed:

# Check if Citation is defined as interface in search_pages.ts and whether it's exported
grep -n "interface Citation\|export.*Citation" services/platform/convex/agent_tools/web/helpers/search_pages.ts

Repository: tale-project/tale

Length of output: 84


Use explicit type annotation for the citations array instead of adding as const.

The citations array should be explicitly typed to avoid the type casting. This aligns with the coding guideline to not use type casting (as) in TypeScript code.

♻️ Proposed refactor
-        const citations = [
-          {
-            index: 1,
-            type: 'web' as const,
-            source: result.title ?? result.url,
-            url: result.url,
-            relevance: 1,
-          },
-        ];
+        const citations: Array<{
+          index: number;
+          type: 'web';
+          source: string;
+          url: string;
+          relevance: number;
+        }> = [
+          {
+            index: 1,
+            type: 'web',
+            source: result.title ?? result.url,
+            url: result.url,
+            relevance: 1,
+          },
+        ];
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const citations = [
{
index: 1,
type: 'web' as const,
source: result.title ?? result.url,
url: result.url,
relevance: 1,
},
];
const citations: Array<{
index: number;
type: 'web';
source: string;
url: string;
relevance: number;
}> = [
{
index: 1,
type: 'web',
source: result.title ?? result.url,
url: result.url,
relevance: 1,
},
];
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@services/platform/convex/agent_tools/web/web_tool.ts` around lines 111 - 119,
Replace the inline type cast "as const" by giving the citations array an
explicit type annotation (e.g. const citations: WebCitation[] or Citation[]
depending on your domain types) and construct the object with matching property
types from result (index, type: 'web', source, url, relevance). Update the
declaration for the variable named citations in web_tool.ts and ensure the
chosen type (WebCitation/Citation) defines type: 'web' as a literal union so no
casting is required; remove the "as const" from the object literal.

Comment on lines +93 to +99
const excludeStatuses = new Set(['deleting', 'error']);
for await (const website of ctx.db
.query('websites')
.withIndex('by_organizationId', (q) =>
q.eq('organizationId', args.organizationId),
)) {
if (website.status && excludeStatuses.has(website.status)) continue;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Only return websites that are actually searchable.

Line 99 only filters out deleting and error, so idle/scanning websites can still be listed as “currently indexed” even though they may not be searchable yet. Restrict this query to indexed websites only (e.g. status === "active" and, if needed, pageCount > 0) so the no-results guidance matches what the web search can really search.

🔧 Proposed fix
-    const excludeStatuses = new Set(['deleting', 'error']);
     for await (const website of ctx.db
       .query('websites')
-      .withIndex('by_organizationId', (q) =>
-        q.eq('organizationId', args.organizationId),
+      .withIndex('by_organizationId_and_status', (q) =>
+        q.eq('organizationId', args.organizationId).eq('status', 'active'),
       )) {
-      if (website.status && excludeStatuses.has(website.status)) continue;
+      if ((website.pageCount ?? 0) === 0) continue;
       results.push({
         domain: website.domain,
         title: website.title,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@services/platform/convex/websites/internal_queries.ts` around lines 93 - 99,
The loop currently only skips 'deleting'/'error' via excludeStatuses, allowing
'idle'/'scanning' sites to be listed; change the filter in the websites iterator
(the block using ctx.db.query('websites').withIndex('by_organizationId', ...)
and the subsequent if (website.status ...) check) to only allow truly searchable
sites by requiring website.status === 'active' and, if applicable,
website.pageCount > 0 (or pageCount ?? 0 > 0); replace the existing
excludeStatuses logic with this explicit active+pageCount check so only
indexed/searchable websites are returned.

…JSDoc

Web tool fetch mode now prepends the standard citation format header
so the OpenAI-compat citation parser can extract citations from fetch
results. Also updates the stale similarityThreshold JSDoc default.
@larryro larryro merged commit 2e62d79 into main Apr 16, 2026
26 checks passed
@larryro larryro deleted the feat/web-tool-search-improvements branch April 16, 2026 03:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant