feat(platform): improve web tool modes, search UX, and context builder by larryro · Pull Request #1551 · tale-project/tale

larryro · 2026-04-16T02:52:57Z

Summary

Web tool discriminated union: Refactored the web tool from URL auto-detection to an explicit mode parameter (fetch / search), making the API clearer and more predictable for the LLM.
Similarity threshold tuning: Raised the default similarity threshold from 0.4 to 0.51 across RAG search, web search, and web context queries to reduce low-relevance results.
No-results UX: When search returns no results, the response now lists available indexed websites so the agent can guide users toward what's searchable or suggest fetch mode instead.
Context builder prompt message fix: Pass promptMessageId through to the structured context builder so it skips the exact message used as the prompt, rather than always dropping the last user message — fixes context loss when the prompt is a system message (e.g. location response).

Test plan

Verify web tool works in fetch mode with a direct URL (web page, PDF, image)
Verify web tool works in search mode with and without domain filter
Confirm no-results messages include the list of indexed websites
Confirm RAG search filters out lower-relevance results with the new 0.51 threshold
Verify agent context includes the correct user message history (no dropped messages)

Summary by CodeRabbit

New Features
- Added website knowledge base summaries display, showing indexed websites with titles, descriptions, and page counts.
Improvements
- Adjusted search similarity threshold from 0.4 to 0.51 for improved result relevance.
- Refactored web search tool to use explicit fetch vs search modes for clarity.
- Enhanced fetch mode to include source citations in responses.
- Improved context handling for continued agent responses.

Refactor the web tool to use a discriminated union (fetch/search) instead of auto-detecting URLs in the query string. Raise the default similarity threshold from 0.4 to 0.51 across RAG and web search. Show available indexed websites in no-results messages to help users understand what is searchable. Fix the structured context builder to skip the correct prompt message by ID rather than always dropping the last user message.

coderabbitai · 2026-04-16T03:01:43Z

📝 Walkthrough

Walkthrough

This PR updates default similarity thresholds from 0.4 to 0.51 across RAG and web search operations, adds website summary formatting functionality via a new helper module, refactors the web tool's public API from implicit URL detection to explicit mode-based discrimination ('fetch' vs 'search' modes), threads a promptMessageId parameter through agent response generation and context building for improved message tracking, and introduces an internal query for retrieving website summaries by organization ID.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Possibly related PRs

refactor(platform): consolidate crawler tools into unified web tool #369: Makes overlapping changes to web tool files (web_tool.ts, helpers, and web search consolidation patterns)
fix(convex): harden agent stream cleanup on error #702: Modifies the same file (generate_response.ts) for stream cleanup and error handling
fix(crawler): add vector similarity pre-filter to web search #1530: Updates similarity threshold behavior across RAG and web search paths with similar pattern changes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title clearly and concisely summarizes the primary changes: web tool mode refactoring, search UX improvements, and context builder enhancements.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/web-tool-search-improvements

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@services/platform/convex/agent_tools/web/web_tool.ts`:
- Around line 83-89: The code drops args.query for non-file URLs by only setting
instruction when isFileUrl(args.url) is true; update the fetch branch so
fetchAndExtract always receives the extraction instruction (pass instruction:
args.query or undefined) instead of conditionalizing on isFileUrl, or if you
intend file-only behavior, explicitly enforce/validate that in the fetch-mode
contract; change the call site in the fetch branch that invokes fetchAndExtract
to pass args.query (referencing args.mode, isFileUrl, fetchAndExtract, and
args.query).
- Around line 111-119: Replace the inline type cast "as const" by giving the
citations array an explicit type annotation (e.g. const citations: WebCitation[]
or Citation[] depending on your domain types) and construct the object with
matching property types from result (index, type: 'web', source, url,
relevance). Update the declaration for the variable named citations in
web_tool.ts and ensure the chosen type (WebCitation/Citation) defines type:
'web' as a literal union so no casting is required; remove the "as const" from
the object literal.

In `@services/platform/convex/websites/internal_queries.ts`:
- Around line 93-99: The loop currently only skips 'deleting'/'error' via
excludeStatuses, allowing 'idle'/'scanning' sites to be listed; change the
filter in the websites iterator (the block using
ctx.db.query('websites').withIndex('by_organizationId', ...) and the subsequent
if (website.status ...) check) to only allow truly searchable sites by requiring
website.status === 'active' and, if applicable, website.pageCount > 0 (or
pageCount ?? 0 > 0); replace the existing excludeStatuses logic with this
explicit active+pageCount check so only indexed/searchable websites are
returned.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d0b013e2-83cc-4c6f-94a3-91a85d3b42b9

📥 Commits

Reviewing files that changed from the base of the PR and between f3c51cc and 7f5823f.

⛔ Files ignored due to path filters (1)

services/platform/convex/_generated/api.d.ts is excluded by !**/_generated/**

📒 Files selected for processing (9)

services/platform/convex/agent_tools/rag/query_rag_context.ts
services/platform/convex/agent_tools/rag/rag_search_tool.ts
services/platform/convex/agent_tools/web/helpers/format_website_summaries.ts
services/platform/convex/agent_tools/web/helpers/query_web_context.ts
services/platform/convex/agent_tools/web/helpers/search_pages.ts
services/platform/convex/agent_tools/web/web_tool.ts
services/platform/convex/lib/agent_response/generate_response.ts
services/platform/convex/lib/context_management/structured_context_builder.ts
services/platform/convex/websites/internal_queries.ts

coderabbitai · 2026-04-16T03:01:46Z

+      if (args.mode === 'fetch') {
+        const instruction = isFileUrl(args.url) ? args.query : undefined;

        const result = await fetchAndExtract(ctx, {
-          url: targetUrl,
+          url: args.url,
          instruction,
        });


⚠️ Potential issue | 🟠 Major

Don’t drop fetch instructions for normal web pages.

query is documented as the extraction instruction for fetch mode, but Line 84 only forwards it when the URL looks like a file. A call like { mode: "fetch", url: "https://example.com/pricing", query: "extract the enterprise limits" } will ignore the instruction and return an unguided extraction. Pass args.query through for all fetches, or narrow the fetch-mode contract to file-only instructions.

🔧 Proposed fix

- if (args.mode === 'fetch') { - const instruction = isFileUrl(args.url) ? args.query : undefined; + if (args.mode === 'fetch') { + const instruction = args.query;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@services/platform/convex/agent_tools/web/web_tool.ts` around lines 83 - 89, The code drops args.query for non-file URLs by only setting instruction when isFileUrl(args.url) is true; update the fetch branch so fetchAndExtract always receives the extraction instruction (pass instruction: args.query or undefined) instead of conditionalizing on isFileUrl, or if you intend file-only behavior, explicitly enforce/validate that in the fetch-mode contract; change the call site in the fetch branch that invokes fetchAndExtract to pass args.query (referencing args.mode, isFileUrl, fetchAndExtract, and args.query).

coderabbitai · 2026-04-16T03:01:46Z

+        const citations = [
+          {
+            index: 1,
+            type: 'web' as const,
+            source: result.title ?? result.url,
+            url: result.url,
+            relevance: 1,
+          },
+        ];


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the file and understand the context cat -n services/platform/convex/agent_tools/web/web_tool.ts | head -150

Repository: tale-project/tale

Length of output: 5793

🏁 Script executed:

# Check if there are type definitions for citation objects rg -A 5 -B 5 "citations" services/platform/convex/agent_tools/web/web_tool.ts

Repository: tale-project/tale

Length of output: 962

🏁 Script executed:

# Search for type definitions related to citations in the codebase fd -t f "\.ts$" -path "*/agent_tools/*" -o -path "*/types/*" | xargs rg -l "citation" -i 2>/dev/null | head -20

Repository: tale-project/tale

Length of output: 1398

🏁 Script executed:

# Look for ToolDefinition type to understand expected return structure rg -A 10 "type ToolDefinition" services/platform/convex/agent_tools/

Repository: tale-project/tale

Length of output: 43

🏁 Script executed:

# Check the searchPages function to see what citation type it returns cat -n services/platform/convex/agent_tools/web/helpers/search_pages.ts | head -80

Repository: tale-project/tale

Length of output: 2565

🏁 Script executed:

# Look for citation type definitions cat -n services/platform/convex/openai_compat/citations.ts

Repository: tale-project/tale

Length of output: 5149

🏁 Script executed:

# Verify the return type of the tool to ensure explicit annotation will work rg -B 5 -A 15 "createTool" services/platform/convex/agent_tools/web/web_tool.ts | head -40

Repository: tale-project/tale

Length of output: 1768

🏁 Script executed:

# Check if Citation interface is exported from search_pages.ts rg "export.*Citation" services/platform/convex/agent_tools/web/helpers/search_pages.ts

Repository: tale-project/tale

Length of output: 43

🏁 Script executed:

# Check the full exports and structure of search_pages.ts tail -50 services/platform/convex/agent_tools/web/helpers/search_pages.ts

Repository: tale-project/tale

Length of output: 1499

🏁 Script executed:

# Check if Citation is defined as interface in search_pages.ts and whether it's exported grep -n "interface Citation\|export.*Citation" services/platform/convex/agent_tools/web/helpers/search_pages.ts

Repository: tale-project/tale

Length of output: 84

Use explicit type annotation for the citations array instead of adding as const.

The citations array should be explicitly typed to avoid the type casting. This aligns with the coding guideline to not use type casting (as) in TypeScript code.

♻️ Proposed refactor

- const citations = [ - { - index: 1, - type: 'web' as const, - source: result.title ?? result.url, - url: result.url, - relevance: 1, - }, - ]; + const citations: Array<{ + index: number; + type: 'web'; + source: string; + url: string; + relevance: number; + }> = [ + { + index: 1, + type: 'web', + source: result.title ?? result.url, + url: result.url, + relevance: 1, + }, + ];

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const citations = [

{

index: 1,

type: 'web' as const,

source: result.title ?? result.url,

url: result.url,

relevance: 1,

},

];

const citations: Array<{

index: number;

type: 'web';

source: string;

url: string;

relevance: number;

}> = [

{

index: 1,

type: 'web',

source: result.title ?? result.url,

url: result.url,

relevance: 1,

},

];

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@services/platform/convex/agent_tools/web/web_tool.ts` around lines 111 - 119, Replace the inline type cast "as const" by giving the citations array an explicit type annotation (e.g. const citations: WebCitation[] or Citation[] depending on your domain types) and construct the object with matching property types from result (index, type: 'web', source, url, relevance). Update the declaration for the variable named citations in web_tool.ts and ensure the chosen type (WebCitation/Citation) defines type: 'web' as a literal union so no casting is required; remove the "as const" from the object literal.

coderabbitai · 2026-04-16T03:01:46Z

+    const excludeStatuses = new Set(['deleting', 'error']);
+    for await (const website of ctx.db
+      .query('websites')
+      .withIndex('by_organizationId', (q) =>
+        q.eq('organizationId', args.organizationId),
+      )) {
+      if (website.status && excludeStatuses.has(website.status)) continue;


⚠️ Potential issue | 🟠 Major

Only return websites that are actually searchable.

Line 99 only filters out deleting and error, so idle/scanning websites can still be listed as “currently indexed” even though they may not be searchable yet. Restrict this query to indexed websites only (e.g. status === "active" and, if needed, pageCount > 0) so the no-results guidance matches what the web search can really search.

🔧 Proposed fix

- const excludeStatuses = new Set(['deleting', 'error']); for await (const website of ctx.db .query('websites') - .withIndex('by_organizationId', (q) => - q.eq('organizationId', args.organizationId), + .withIndex('by_organizationId_and_status', (q) => + q.eq('organizationId', args.organizationId).eq('status', 'active'), )) { - if (website.status && excludeStatuses.has(website.status)) continue; + if ((website.pageCount ?? 0) === 0) continue; results.push({ domain: website.domain, title: website.title,

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@services/platform/convex/websites/internal_queries.ts` around lines 93 - 99, The loop currently only skips 'deleting'/'error' via excludeStatuses, allowing 'idle'/'scanning' sites to be listed; change the filter in the websites iterator (the block using ctx.db.query('websites').withIndex('by_organizationId', ...) and the subsequent if (website.status ...) check) to only allow truly searchable sites by requiring website.status === 'active' and, if applicable, website.pageCount > 0 (or pageCount ?? 0 > 0); replace the existing excludeStatuses logic with this explicit active+pageCount check so only indexed/searchable websites are returned.

…JSDoc Web tool fetch mode now prepends the standard citation format header so the OpenAI-compat citation parser can extract citations from fetch results. Also updates the stale similarityThreshold JSDoc default.

coderabbitai Bot requested changes Apr 16, 2026

View reviewed changes

larryro merged commit 2e62d79 into main Apr 16, 2026
26 checks passed

larryro deleted the feat/web-tool-search-improvements branch April 16, 2026 03:22

coderabbitai Bot mentioned this pull request Apr 21, 2026

fix(platform): default structuredResponsesEnabled to false #1590

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(platform): improve web tool modes, search UX, and context builder#1551

feat(platform): improve web tool modes, search UX, and context builder#1551
larryro merged 2 commits into
mainfrom
feat/web-tool-search-improvements

larryro commented Apr 16, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 16, 2026

Walkthrough

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 16, 2026

Uh oh!

coderabbitai Bot Apr 16, 2026

Uh oh!

coderabbitai Bot Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

larryro commented Apr 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 16, 2026

Walkthrough

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

larryro commented Apr 16, 2026 •

edited by coderabbitai Bot

Loading