feat(platform): improve web tool modes, search UX, and context builder#1551
Conversation
Refactor the web tool to use a discriminated union (fetch/search) instead of auto-detecting URLs in the query string. Raise the default similarity threshold from 0.4 to 0.51 across RAG and web search. Show available indexed websites in no-results messages to help users understand what is searchable. Fix the structured context builder to skip the correct prompt message by ID rather than always dropping the last user message.
📝 WalkthroughWalkthroughThis PR updates default similarity thresholds from 0.4 to 0.51 across RAG and web search operations, adds website summary formatting functionality via a new helper module, refactors the web tool's public API from implicit URL detection to explicit mode-based discrimination ('fetch' vs 'search' modes), threads a promptMessageId parameter through agent response generation and context building for improved message tracking, and introduces an internal query for retrieving website summaries by organization ID. Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@services/platform/convex/agent_tools/web/web_tool.ts`:
- Around line 83-89: The code drops args.query for non-file URLs by only setting
instruction when isFileUrl(args.url) is true; update the fetch branch so
fetchAndExtract always receives the extraction instruction (pass instruction:
args.query or undefined) instead of conditionalizing on isFileUrl, or if you
intend file-only behavior, explicitly enforce/validate that in the fetch-mode
contract; change the call site in the fetch branch that invokes fetchAndExtract
to pass args.query (referencing args.mode, isFileUrl, fetchAndExtract, and
args.query).
- Around line 111-119: Replace the inline type cast "as const" by giving the
citations array an explicit type annotation (e.g. const citations: WebCitation[]
or Citation[] depending on your domain types) and construct the object with
matching property types from result (index, type: 'web', source, url,
relevance). Update the declaration for the variable named citations in
web_tool.ts and ensure the chosen type (WebCitation/Citation) defines type:
'web' as a literal union so no casting is required; remove the "as const" from
the object literal.
In `@services/platform/convex/websites/internal_queries.ts`:
- Around line 93-99: The loop currently only skips 'deleting'/'error' via
excludeStatuses, allowing 'idle'/'scanning' sites to be listed; change the
filter in the websites iterator (the block using
ctx.db.query('websites').withIndex('by_organizationId', ...) and the subsequent
if (website.status ...) check) to only allow truly searchable sites by requiring
website.status === 'active' and, if applicable, website.pageCount > 0 (or
pageCount ?? 0 > 0); replace the existing excludeStatuses logic with this
explicit active+pageCount check so only indexed/searchable websites are
returned.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: d0b013e2-83cc-4c6f-94a3-91a85d3b42b9
⛔ Files ignored due to path filters (1)
services/platform/convex/_generated/api.d.tsis excluded by!**/_generated/**
📒 Files selected for processing (9)
services/platform/convex/agent_tools/rag/query_rag_context.tsservices/platform/convex/agent_tools/rag/rag_search_tool.tsservices/platform/convex/agent_tools/web/helpers/format_website_summaries.tsservices/platform/convex/agent_tools/web/helpers/query_web_context.tsservices/platform/convex/agent_tools/web/helpers/search_pages.tsservices/platform/convex/agent_tools/web/web_tool.tsservices/platform/convex/lib/agent_response/generate_response.tsservices/platform/convex/lib/context_management/structured_context_builder.tsservices/platform/convex/websites/internal_queries.ts
| if (args.mode === 'fetch') { | ||
| const instruction = isFileUrl(args.url) ? args.query : undefined; | ||
|
|
||
| const result = await fetchAndExtract(ctx, { | ||
| url: targetUrl, | ||
| url: args.url, | ||
| instruction, | ||
| }); |
There was a problem hiding this comment.
Don’t drop fetch instructions for normal web pages.
query is documented as the extraction instruction for fetch mode, but Line 84 only forwards it when the URL looks like a file. A call like { mode: "fetch", url: "https://example.com/pricing", query: "extract the enterprise limits" } will ignore the instruction and return an unguided extraction. Pass args.query through for all fetches, or narrow the fetch-mode contract to file-only instructions.
🔧 Proposed fix
- if (args.mode === 'fetch') {
- const instruction = isFileUrl(args.url) ? args.query : undefined;
+ if (args.mode === 'fetch') {
+ const instruction = args.query;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/platform/convex/agent_tools/web/web_tool.ts` around lines 83 - 89,
The code drops args.query for non-file URLs by only setting instruction when
isFileUrl(args.url) is true; update the fetch branch so fetchAndExtract always
receives the extraction instruction (pass instruction: args.query or undefined)
instead of conditionalizing on isFileUrl, or if you intend file-only behavior,
explicitly enforce/validate that in the fetch-mode contract; change the call
site in the fetch branch that invokes fetchAndExtract to pass args.query
(referencing args.mode, isFileUrl, fetchAndExtract, and args.query).
| const citations = [ | ||
| { | ||
| index: 1, | ||
| type: 'web' as const, | ||
| source: result.title ?? result.url, | ||
| url: result.url, | ||
| relevance: 1, | ||
| }, | ||
| ]; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the file and understand the context
cat -n services/platform/convex/agent_tools/web/web_tool.ts | head -150Repository: tale-project/tale
Length of output: 5793
🏁 Script executed:
# Check if there are type definitions for citation objects
rg -A 5 -B 5 "citations" services/platform/convex/agent_tools/web/web_tool.tsRepository: tale-project/tale
Length of output: 962
🏁 Script executed:
# Search for type definitions related to citations in the codebase
fd -t f "\.ts$" -path "*/agent_tools/*" -o -path "*/types/*" | xargs rg -l "citation" -i 2>/dev/null | head -20Repository: tale-project/tale
Length of output: 1398
🏁 Script executed:
# Look for ToolDefinition type to understand expected return structure
rg -A 10 "type ToolDefinition" services/platform/convex/agent_tools/Repository: tale-project/tale
Length of output: 43
🏁 Script executed:
# Check the searchPages function to see what citation type it returns
cat -n services/platform/convex/agent_tools/web/helpers/search_pages.ts | head -80Repository: tale-project/tale
Length of output: 2565
🏁 Script executed:
# Look for citation type definitions
cat -n services/platform/convex/openai_compat/citations.tsRepository: tale-project/tale
Length of output: 5149
🏁 Script executed:
# Verify the return type of the tool to ensure explicit annotation will work
rg -B 5 -A 15 "createTool" services/platform/convex/agent_tools/web/web_tool.ts | head -40Repository: tale-project/tale
Length of output: 1768
🏁 Script executed:
# Check if Citation interface is exported from search_pages.ts
rg "export.*Citation" services/platform/convex/agent_tools/web/helpers/search_pages.tsRepository: tale-project/tale
Length of output: 43
🏁 Script executed:
# Check the full exports and structure of search_pages.ts
tail -50 services/platform/convex/agent_tools/web/helpers/search_pages.tsRepository: tale-project/tale
Length of output: 1499
🏁 Script executed:
# Check if Citation is defined as interface in search_pages.ts and whether it's exported
grep -n "interface Citation\|export.*Citation" services/platform/convex/agent_tools/web/helpers/search_pages.tsRepository: tale-project/tale
Length of output: 84
Use explicit type annotation for the citations array instead of adding as const.
The citations array should be explicitly typed to avoid the type casting. This aligns with the coding guideline to not use type casting (as) in TypeScript code.
♻️ Proposed refactor
- const citations = [
- {
- index: 1,
- type: 'web' as const,
- source: result.title ?? result.url,
- url: result.url,
- relevance: 1,
- },
- ];
+ const citations: Array<{
+ index: number;
+ type: 'web';
+ source: string;
+ url: string;
+ relevance: number;
+ }> = [
+ {
+ index: 1,
+ type: 'web',
+ source: result.title ?? result.url,
+ url: result.url,
+ relevance: 1,
+ },
+ ];📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const citations = [ | |
| { | |
| index: 1, | |
| type: 'web' as const, | |
| source: result.title ?? result.url, | |
| url: result.url, | |
| relevance: 1, | |
| }, | |
| ]; | |
| const citations: Array<{ | |
| index: number; | |
| type: 'web'; | |
| source: string; | |
| url: string; | |
| relevance: number; | |
| }> = [ | |
| { | |
| index: 1, | |
| type: 'web', | |
| source: result.title ?? result.url, | |
| url: result.url, | |
| relevance: 1, | |
| }, | |
| ]; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/platform/convex/agent_tools/web/web_tool.ts` around lines 111 - 119,
Replace the inline type cast "as const" by giving the citations array an
explicit type annotation (e.g. const citations: WebCitation[] or Citation[]
depending on your domain types) and construct the object with matching property
types from result (index, type: 'web', source, url, relevance). Update the
declaration for the variable named citations in web_tool.ts and ensure the
chosen type (WebCitation/Citation) defines type: 'web' as a literal union so no
casting is required; remove the "as const" from the object literal.
| const excludeStatuses = new Set(['deleting', 'error']); | ||
| for await (const website of ctx.db | ||
| .query('websites') | ||
| .withIndex('by_organizationId', (q) => | ||
| q.eq('organizationId', args.organizationId), | ||
| )) { | ||
| if (website.status && excludeStatuses.has(website.status)) continue; |
There was a problem hiding this comment.
Only return websites that are actually searchable.
Line 99 only filters out deleting and error, so idle/scanning websites can still be listed as “currently indexed” even though they may not be searchable yet. Restrict this query to indexed websites only (e.g. status === "active" and, if needed, pageCount > 0) so the no-results guidance matches what the web search can really search.
🔧 Proposed fix
- const excludeStatuses = new Set(['deleting', 'error']);
for await (const website of ctx.db
.query('websites')
- .withIndex('by_organizationId', (q) =>
- q.eq('organizationId', args.organizationId),
+ .withIndex('by_organizationId_and_status', (q) =>
+ q.eq('organizationId', args.organizationId).eq('status', 'active'),
)) {
- if (website.status && excludeStatuses.has(website.status)) continue;
+ if ((website.pageCount ?? 0) === 0) continue;
results.push({
domain: website.domain,
title: website.title,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/platform/convex/websites/internal_queries.ts` around lines 93 - 99,
The loop currently only skips 'deleting'/'error' via excludeStatuses, allowing
'idle'/'scanning' sites to be listed; change the filter in the websites iterator
(the block using ctx.db.query('websites').withIndex('by_organizationId', ...)
and the subsequent if (website.status ...) check) to only allow truly searchable
sites by requiring website.status === 'active' and, if applicable,
website.pageCount > 0 (or pageCount ?? 0 > 0); replace the existing
excludeStatuses logic with this explicit active+pageCount check so only
indexed/searchable websites are returned.
…JSDoc Web tool fetch mode now prepends the standard citation format header so the OpenAI-compat citation parser can extract citations from fetch results. Also updates the stale similarityThreshold JSDoc default.
Summary
modeparameter (fetch/search), making the API clearer and more predictable for the LLM.promptMessageIdthrough to the structured context builder so it skips the exact message used as the prompt, rather than always dropping the last user message — fixes context loss when the prompt is a system message (e.g. location response).Test plan
Summary by CodeRabbit
New Features
Improvements
fetchvssearchmodes for clarity.