Skip to content

refactor(platform): consolidate crawler tools into unified web tool#369

Merged
larryro merged 7 commits into
mainfrom
refactor/web-tools
Feb 5, 2026
Merged

refactor(platform): consolidate crawler tools into unified web tool#369
larryro merged 7 commits into
mainfrom
refactor/web-tools

Conversation

@larryro

@larryro larryro commented Feb 5, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Consolidate separate crawler helpers into a unified web tool module
  • Add thinking animation during tool processing to improve UX
  • Preserve user intent when delegating to sub-agents

Changes

Web Tool Consolidation

  • Replace separate crawler helpers (fetch_page_content, fetch_searxng_results, search_and_fetch, search_web) with unified web/ module
  • Add browser_operate and fetch_url_via_pdf helpers for web interactions
  • Move operator service helpers to web/helpers/ directory
  • Add new crawler service web router with page content and search endpoints
  • Update all agents to use new web tool structure
  • Remove deprecated action cache functions

UX Improvements

  • Add isProcessingToolResult state to detect when agent is processing tool results but hasn't resumed streaming text
  • Show thinking animation during the gap between tool completion and agent response
  • Update agent and web_assistant_tool prompts to preserve user's specific questions when delegating to sub-agents

Test plan

  • Verify web search functionality works with new unified tool
  • Verify page content fetching works correctly
  • Verify thinking animation displays during tool processing
  • Verify sub-agent delegation preserves original user intent

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added web fetching and browser automation capabilities for agents.
    • Enhanced chat UI to display thinking animation when processing tool results.
    • Improved web content extraction with PDF-based processing.
  • Bug Fixes

    • Refined web tool operations for better content retrieval and browsing.
  • Chores

    • Consolidated web tool architecture; updated agent tool configurations.
    • Updated translations for web tool and browsing operations.

- Replace separate crawler helpers with unified web tool module
- Add browser_operate and fetch_url_via_pdf helpers for web interactions
- Move operator service helpers to web/helpers directory
- Add new crawler service web router with page content and search endpoints
- Update agents to use new web tool structure
- Simplify web assistant tool implementation
- Remove deprecated action cache functions
…serve user intent

Add isProcessingToolResult state to detect when agent is processing tool
results but hasn't resumed streaming text. This fixes the UI gap where
no loading indicator was shown between tool completion and agent response.

Also update agent and web_assistant_tool prompts to preserve user's
specific questions when delegating to sub-agents, instead of reducing
them to generic "Get content from URL" requests.
@coderabbitai

coderabbitai Bot commented Feb 5, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

This PR refactors web content fetching and agent tooling across the platform. It introduces a new FastAPI endpoint /api/v1/web/fetch-and-extract in the crawler service that converts URLs to PDFs and extracts content via Vision OCR. The legacy "web_read" tool is replaced with a unified "web" tool supporting fetch_url and browser_operate operations. The chat UI gains isProcessingToolResult state tracking for improved tool-result processing feedback. Multiple legacy modules (search_web, search_and_fetch, fetch_page_content) are removed. Agent tool configurations are updated to reference the new web tool and optionally include request_human_input handling. Related types, caches, and helper utilities are reorganized and consolidated under a new web helpers directory.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In `@services/crawler/app/routers/web.py`:
- Around line 34-56: This code fetches arbitrary user URLs (url_str/hostname)
before any validation in the PDF flow (get_pdf_service -> url_to_pdf), so add
SSRF protection by resolving the request.url hostname to IP(s) and rejecting
requests that resolve to loopback, link-local, or private RFC1918/IPv6
unique/local addresses or to hostnames on an allowlist; perform this check
immediately after computing hostname and before calling get_pdf_service or
url_to_pdf, returning a 4xx error for blocked hosts and logging the blocked
hostname/IPs.

In `@services/platform/convex/agent_tools/sub_agents/web_assistant_tool.ts`:
- Around line 64-78: The web_assistant_tool handler currently calls
validateToolContext(ctx, 'web_assistant') without requiring a userId, which lets
undefined userId propagate; update the call in the handler (the
validateToolContext invocation inside handler: async (ctx: ToolCtx, args):
Promise<ToolResponse>) to pass the options object { requireUserId: true } so
validation fails fast and returns a clear error when userId is missing before
calling getOrCreateSubThread or invoking the Web Agent action.

In `@services/platform/convex/agent_tools/web/helpers/browser_operate.ts`:
- Around line 28-44: The try block using AbortController/timeoutId for the fetch
can produce generic errors on timeout; update the error handling so that when
fetch throws due to abort you detect the AbortError (from controller.signal or
error.name === 'AbortError') and throw or log a clear, specific timeout/abort
error message (e.g., "Operator service request timed out after 300000ms")
instead of the generic error; make this change around the fetch call and the
existing catch path that handles response errors, referencing AbortController,
timeoutId, controller.signal, and response to locate where to add the explicit
AbortError handling.

In `@services/platform/convex/agent_tools/web/helpers/fetch_url_via_pdf.ts`:
- Around line 73-96: The code computes a truncated boolean but doesn't return
it, so update the WebFetchUrlResult type (in types.ts) to include a truncated:
boolean field and then include truncated in the object returned by the function
that builds the fetch result (the block that currently sets operation:
'fetch_url', success: true, url: args.url, title: result.title, content, ...).
Ensure the computed truncated value (from content.length > MAX_CONTENT_LENGTH)
is preserved and returned so callers/LLMs can detect truncation.

In `@services/platform/convex/agents/crm/agent.ts`:
- Around line 30-36: The multi-match response currently exposes full email
addresses; update the code path that formats/returns multiple CRM matches (e.g.,
the function handling multiple-results in agent.ts such as
getContactMatches/formatMatchList or the branch labeled "**CRITICAL - MULTIPLE
MATCHES:**") to mask email addresses by default (e.g., show local-part partial
or initials and domain as ****) and include non‑PII distinguishing fields
(title, company, last activity) so the user can disambiguate; only return full
email addresses when an explicit authorization flag or an explicit user
confirmation is present (check for an isAuthorized or revealEmail parameter) and
log that full PII was disclosed. Ensure the response asks the user to clarify
which record they mean rather than selecting one automatically.

Comment thread services/crawler/app/routers/web.py
Comment thread services/platform/convex/agent_tools/web/helpers/browser_operate.ts
Comment thread services/platform/convex/agents/crm/agent.ts
…ontext

Clarify in the pre-analyzed content marker and routing agent instructions
that attachments from the current message take priority over any previous
conversation context. This prevents the AI from confusing attached
documents with previously discussed content.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant