refactor(platform): rename website scan operations, add embedding workpool, and improve streaming#574
Conversation
…workpool Clarify website scan operation names to better reflect their purpose: - register_discovered_urls → register_urls - sync_pending_pages → crawl_and_upsert - generateForPage → embedPage Replace manual embedding retry logic with @convex-dev/workpool (maxParallelism=1, 3 retries with exponential backoff) to serialize embedding generation during bulk scans and prevent rate-limit failures. Also add missing output schemas for register_urls and crawl_and_upsert operations in the workflow engine validation layer.
The embedding dimension setting was only used to select the vector table but never passed to the model, so API calls still returned the default 1536-d vectors. Add a withDimensions wrapper for the AI SDK embedding model and inject EMBEDDING_DIMENSIONS into litellm sync/async patches.
…model selection Parse OPENAI_MODEL, OPENAI_FAST_MODEL, etc. as comma-separated lists across all services (platform, crawler, operator, rag), always selecting the first entry. Replace the `useFastModel` boolean flag with explicit model string resolution through centralized helpers in agent_runtime_config. Add `modelId` field to custom agents so users can select a specific model from the available pool rather than only choosing a preset tier.
…oll during typewriter drain Add useStableStreamText hook to suppress temporary text regression caused by WebSocket reconnection delivering shorter committed text while the sync stream reconnects. Make useAutoScroll's ResizeObserver run continuously so content growth is followed even after streaming ends (typewriter drain), while shrinkage is only followed during active streaming.
5b4774a to
28a0d28
Compare
📝 WalkthroughWalkthroughThis PR systematically refactors model selection across multiple services by introducing standardized utilities for parsing comma-separated model environment variables and replacing direct environment variable reads with centralized helper functions. It updates agent creation factories to use explicit model strings instead of boolean flags, adds a Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Comment |
Greptile SummaryThis PR implements five key refactorings across the platform to improve reliability and clarity: 1. Embedding Workpool (Rate Limit Fix)
2. Embedding Dimension Injection (Bug Fix)
3. Comma-Separated Model Lists
4. Website Scan Operation Renames
5. Streaming Text Regression Fix
All changes include comprehensive test coverage and maintain backward compatibility in behavior. Confidence Score: 5/5
|
| Filename | Overview |
|---|---|
| services/platform/convex/website_page_embeddings/embedding_pool.ts | Introduces workpool (maxParallelism=1) to serialize embedding generation and prevent rate-limit failures during bulk scans |
| services/platform/convex/lib/embedding_config.ts | Adds withDimensions() wrapper to inject configured dimensions into embedding API calls via providerOptions |
| services/platform/lib/shared/utils/model-list.ts | New utility for parsing comma-separated model lists, selecting first entry from environment variables |
| services/platform/convex/lib/agent_runtime_config.ts | Refactors to use model list utilities, removes useFastModel boolean in favor of explicit model resolution |
| services/platform/convex/websites/bulk_upsert_pages.ts | Uses embeddingPool.enqueueAction() instead of scheduler.runAfter() to serialize embedding jobs |
| services/platform/app/hooks/use-auto-scroll.ts | ResizeObserver now runs continuously (not just when enabled), follows growth always but shrinkage only during streaming |
| services/platform/app/features/chat/components/typewriter-text.tsx | Adds useStableStreamText hook to suppress temporary text regression from WebSocket reconnection |
| services/platform/convex/custom_agents/config.ts | Adds resolveModel() function to support custom modelId or fall back to preset-based resolution |
| services/platform/convex/workflow_engine/action_defs/website_pages/website_pages_action.ts | Renames operations: register_discovered_urls → register_urls, sync_pending_pages → crawl_and_upsert |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
Start[Website Scan Triggered] --> FetchMain[Fetch Main Page]
FetchMain --> QueryURLs[Query URLs from Sitemap]
QueryURLs --> RegisterURLs[register_urls: Register/Update URLs]
RegisterURLs --> CheckHash{Content Hash Changed?}
CheckHash -->|New/Changed| AddPending[Mark as pending sync]
CheckHash -->|Unchanged| Skip[Skip]
AddPending --> CrawlUpsert[crawl_and_upsert: Fetch & Upsert Pages]
CrawlUpsert --> EnqueueEmbed[Enqueue to embeddingPool]
EnqueueEmbed --> EmbedPage[embedPage: Generate Embeddings]
EmbedPage --> InjectDims[withDimensions: Inject API dimensions]
InjectDims --> CallAPI[Embedding API Call]
CallAPI -->|Rate Limit| Retry[Retry with Backoff]
CallAPI -->|Success| StoreVectors[Store in Vector Table]
Retry -->|Max 3 attempts| StoreVectors
StoreVectors --> HasMore{More URLs?}
HasMore -->|Yes| RegisterURLs
HasMore -->|No| UpdateStatus[Update Website Status]
style EnqueueEmbed fill:#e1f5ff
style InjectDims fill:#e1f5ff
style Retry fill:#fff4e1
Last reviewed commit: 28a0d28
…h test Add @ai-sdk/provider as an explicit dependency (was only transitive via @ai-sdk/openai) to satisfy Knip. Align cleanup_website test expectations with the actual PAGE_BATCH_SIZE of 2.
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
services/platform/convex/website_page_embeddings/internal_actions.ts (1)
60-66:⚠️ Potential issue | 🔴 CriticalPrevent partial-embedding lock-in under retries.
The skip check only compares
contentHash. If an earlier attempt fails mid-insert, a retry can observe matching hash and skip, leaving an incomplete embedding set for the page.Suggested fix pattern
- // 2. Check content hash to skip if unchanged - const contentHash = computeContentHash(page.content); - const existing = await ctx.runQuery( - internal.website_page_embeddings.internal_queries - .getExistingEmbeddingHash, - { organizationId, pageId, dimension }, - ); - - if (existing?.contentHash === contentHash) { - debugLog('embedPage skip - content unchanged', { - pageId, - contentHash, - }); - return { status: 'skipped', reason: 'unchanged' }; - } - - // 3. Chunk content - const chunks = chunkContent(page.content, page.title); + // 2. Chunk first so unchanged checks can validate completeness, not just hash + const contentHash = computeContentHash(page.content); + const chunks = chunkContent(page.content, page.title); + const existing = await ctx.runQuery( + internal.website_page_embeddings.internal_queries + .getExistingEmbeddingStats, // contentHash + chunkCount + { organizationId, pageId, dimension }, + ); + + if ( + existing?.contentHash === contentHash && + existing.chunkCount === chunks.length + ) { + debugLog('embedPage skip - content unchanged', { + pageId, + contentHash, + }); + return { status: 'skipped', reason: 'unchanged' }; + }Also applies to: 81-88, 104-116
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@services/crawler/app/config.py`:
- Line 74: The line assigning model is terse and hard to read; split it into two
clear steps: first evaluate the preferred source(s) (e.g., assign a variable
like preferred_model = self.openai_fast_model and fallback_env =
os.environ.get("OPENAI_FAST_MODEL")), then call get_first_model(preferred_model)
and if that returns falsy call get_first_model(fallback_env) and assign the
result to model; update the code around the get_first_model call so the intent
is explicit and easier to debug.
- Line 67: Split the long assignment to improve readability by first resolving
the environment value and then calling get_first_model; e.g., extract
os.environ.get("OPENAI_VISION_MODEL") into a named variable (e.g., env_model)
and then set model = get_first_model(self.openai_vision_model) or
get_first_model(env_model), referencing the existing get_first_model function
and the self.openai_vision_model attribute to preserve current precedence and
behavior.
In `@services/operator/app/config.py`:
- Around line 77-89: The model properties can return None or malformed raw
strings when environment values parse to empty/invalid lists; update
get_first_model to return None for empty/invalid parses and then change
llm_model, llm_vision_model and llm_fast_model to validate that result and fail
fast: llm_model should raise a clear ValueError if
get_first_model(self.openai_model) yields None (so startup fails on invalid
OPENAI_MODEL), llm_vision_model should return None only when openai_vision_model
is falsy or when get_first_model(...) returns None, and llm_fast_model should
use get_first_model(self.openai_fast_model) but fall back to self.llm_model only
if that call returns a non-None model; otherwise raise a ValueError indicating
OPENAI_FAST_MODEL is invalid (or if optional, fall back to llm_model only when
safe). Ensure references are to get_first_model, llm_model, llm_vision_model,
llm_fast_model, openai_model, openai_vision_model, and openai_fast_model so the
change is locatable.
In `@services/operator/app/utils/model_list.py`:
- Around line 1-22: This file duplicates the same model-list parsing utilities
across services; extract the shared functions parse_model_list, get_first_model,
and get_first_model_or_raise from model_list.py into a common/shared utility
package (e.g., common.utils.model_list) and update each service to import those
functions instead of keeping local copies: move the implementations to the
shared package, remove the three local model_list.py copies, adjust imports in
services/operator, services/crawler, and services/rag to reference the new
common module, and ensure the shared package is added to each service’s
build/requirements and covered by unit tests.
In `@services/platform/app/hooks/use-auto-scroll.ts`:
- Around line 37-44: Update the comment/docs in use-auto-scroll.ts to reflect
the new invariant: state that the hook's core behavior always follows content
growth when the user is at the bottom, and that the enabled option only controls
whether shrinkage is followed (not global auto-scroll). Edit the
function/comment block for useAutoScroll and the options description to replace
wording that suggests enabled toggles all auto-scrolling with language that
enabled = "follow shrinkage" (or similar), and add a short note clarifying that
growth-following remains active regardless of enabled.
In
`@services/platform/app/routes/dashboard/`$id/custom-agents/$agentId/instructions.tsx:
- Around line 61-66: getDefaultModelId can return '' when presets aren't present
which allows initialModelId to be empty and saves an invalid agent.modelId;
locate getDefaultModelId and the code that builds initialModelId and either (A)
disable the model Select component until modelPresets is loaded (use the route
loader state or a modelPresets truthy check to set disabled on the Select) or
(B) add a guard inside handleSave (check agent.modelId or initialModelId is
non-empty) to abort saving and surface an error/validation message to the user;
update references to getDefaultModelId, initialModelId, the Select control, and
handleSave accordingly so an empty string cannot be persisted.
In `@services/platform/convex/agents/workflow/generate_response.ts`:
- Line 45: The use of getCodingModel() ?? '' is confusing because it relies on
downstream truthiness in createAgentConfig to pick a default; change the
initialization to an explicit fallback or enforce presence: either set model to
getCodingModel() || getDefaultModel() so the default is chosen clearly, or call
getCodingModelOrThrow() if a coding model is mandatory for workflow agents;
update the call site where model is passed into createAgentConfig to use one of
these two alternatives and ensure tests/consumers expect an explicit model or a
thrown error accordingly.
In `@services/platform/convex/predefined_workflows/website_scan.ts`:
- Around line 9-10: The flow comment at the top lists step slug
"crawl_and_upsert_pages" but the actual workflow uses "sync_pages" in
nextSteps/stepSlug; update the comment to match the real slug chain (or rename
the step slug to match the comment) so they are consistent. Locate the step
definitions that use nextSteps/stepSlug (check references to "sync_pages" and
any transition logic) and either change the comment to include "sync_pages" in
place of "crawl_and_upsert_pages" or rename the step slug and its occurrences to
"crawl_and_upsert_pages" so the documented chain and actual step names align.
In
`@services/platform/convex/workflow_engine/helpers/validation/variables/action_schemas.ts`:
- Around line 622-625: The schema for the field urlsToSync is declared as type
'array' but missing an items spec; update the action schema (the object that
contains urlsToSync) to add an items property describing the element type (e.g.,
items: { type: 'string', format: 'uri' } or items: { type: 'string' } as
appropriate) so validation mirrors other array fields like tags; modify the
urlsToSync definition in action_schemas.ts to include the items schema and any
required constraints (minItems, uniqueItems) consistent with surrounding array
definitions.
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (1)
services/platform/convex/_generated/api.d.tsis excluded by!**/_generated/**
📒 Files selected for processing (62)
services/crawler/app/config.pyservices/crawler/app/utils/model_list.pyservices/crawler/tests/test_config.pyservices/operator/app/config.pyservices/operator/app/utils/__init__.pyservices/operator/app/utils/model_list.pyservices/operator/tests/conftest.pyservices/operator/tests/test_config.pyservices/platform/app/features/chat/components/__tests__/typewriter-text.test.tsxservices/platform/app/features/chat/components/typewriter-text.tsxservices/platform/app/features/chat/hooks/__tests__/use-stream-buffer.test.tsservices/platform/app/features/custom-agents/components/custom-agents-table.tsxservices/platform/app/features/custom-agents/hooks/use-custom-agents-table-config.tsxservices/platform/app/hooks/__tests__/use-auto-scroll.test.tsservices/platform/app/hooks/use-auto-scroll.tsservices/platform/app/routes/dashboard/$id/custom-agents/$agentId/instructions.tsxservices/platform/convex/agent_tools/files/helpers/analyze_text.tsservices/platform/convex/agent_tools/files/helpers/vision_agent.tsservices/platform/convex/agents/crm/agent.tsservices/platform/convex/agents/document/agent.tsservices/platform/convex/agents/integration/agent.tsservices/platform/convex/agents/web/agent.tsservices/platform/convex/agents/workflow/agent.tsservices/platform/convex/agents/workflow/generate_response.tsservices/platform/convex/conversations/improve_message.tsservices/platform/convex/convex.config.tsservices/platform/convex/custom_agents/config.tsservices/platform/convex/custom_agents/internal_queries.tsservices/platform/convex/custom_agents/mutations.tsservices/platform/convex/custom_agents/queries.tsservices/platform/convex/custom_agents/schema.tsservices/platform/convex/custom_agents/test_chat.test.tsservices/platform/convex/lib/agent_chat/internal_actions.tsservices/platform/convex/lib/agent_chat/types.tsservices/platform/convex/lib/agent_response/generate_response.tsservices/platform/convex/lib/agent_runtime_config.tsservices/platform/convex/lib/create_agent_config.tsservices/platform/convex/lib/embedding_config.test.tsservices/platform/convex/lib/embedding_config.tsservices/platform/convex/lib/summarize_context.tsservices/platform/convex/predefined_workflows/website_scan.tsservices/platform/convex/website_page_embeddings/embedding_pool.tsservices/platform/convex/website_page_embeddings/internal_actions.tsservices/platform/convex/websites/bulk_upsert_pages.tsservices/platform/convex/websites/cleanup_website.tsservices/platform/convex/websites/helpers.tsservices/platform/convex/websites/internal_mutations.tsservices/platform/convex/websites/register_urls.tsservices/platform/convex/workflow_engine/action_defs/website_pages/helpers/types.tsservices/platform/convex/workflow_engine/action_defs/website_pages/website_pages_action.tsservices/platform/convex/workflow_engine/helpers/nodes/llm/utils/validate_and_normalize_config.tsservices/platform/convex/workflow_engine/helpers/validation/variables/action_schemas.tsservices/platform/convex/workflow_engine/workflow_syntax_compact.tsservices/platform/convex/workflows/triggers/actions.tsservices/platform/lib/shared/schemas/custom_agents.tsservices/platform/lib/shared/utils/model-list.test.tsservices/platform/lib/shared/utils/model-list.tsservices/platform/messages/en.jsonservices/rag/app/config.pyservices/rag/app/utils/__init__.pyservices/rag/app/utils/model_list.pyservices/rag/tests/test_config.py
| def get_vision_model(self) -> str: | ||
| """Get Vision model from CRAWLER_OPENAI_VISION_MODEL or OPENAI_VISION_MODEL.""" | ||
| model = self.openai_vision_model or os.environ.get("OPENAI_VISION_MODEL") | ||
| model = get_first_model(self.openai_vision_model) or get_first_model(os.environ.get("OPENAI_VISION_MODEL")) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider splitting long line for readability.
The logic is correct—prioritizing the crawler-prefixed config over the generic environment variable and extracting the first model from comma-separated lists. However, the line is quite long.
♻️ Optional: Split for readability
def get_vision_model(self) -> str:
"""Get Vision model from CRAWLER_OPENAI_VISION_MODEL or OPENAI_VISION_MODEL."""
- model = get_first_model(self.openai_vision_model) or get_first_model(os.environ.get("OPENAI_VISION_MODEL"))
+ model = get_first_model(self.openai_vision_model) or get_first_model(
+ os.environ.get("OPENAI_VISION_MODEL")
+ )
if not model:
raise ValueError("OPENAI_VISION_MODEL must be set in environment.")
return model🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/crawler/app/config.py` at line 67, Split the long assignment to
improve readability by first resolving the environment value and then calling
get_first_model; e.g., extract os.environ.get("OPENAI_VISION_MODEL") into a
named variable (e.g., env_model) and then set model =
get_first_model(self.openai_vision_model) or get_first_model(env_model),
referencing the existing get_first_model function and the
self.openai_vision_model attribute to preserve current precedence and behavior.
| def get_fast_model(self) -> str: | ||
| """Get Fast LLM model from CRAWLER_OPENAI_FAST_MODEL or OPENAI_FAST_MODEL.""" | ||
| model = self.openai_fast_model or os.environ.get("OPENAI_FAST_MODEL") | ||
| model = get_first_model(self.openai_fast_model) or get_first_model(os.environ.get("OPENAI_FAST_MODEL")) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Same readability suggestion applies here.
♻️ Optional: Split for readability
def get_fast_model(self) -> str:
"""Get Fast LLM model from CRAWLER_OPENAI_FAST_MODEL or OPENAI_FAST_MODEL."""
- model = get_first_model(self.openai_fast_model) or get_first_model(os.environ.get("OPENAI_FAST_MODEL"))
+ model = get_first_model(self.openai_fast_model) or get_first_model(
+ os.environ.get("OPENAI_FAST_MODEL")
+ )
if not model:
raise ValueError("OPENAI_FAST_MODEL must be set in environment.")
return model📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| model = get_first_model(self.openai_fast_model) or get_first_model(os.environ.get("OPENAI_FAST_MODEL")) | |
| def get_fast_model(self) -> str: | |
| """Get Fast LLM model from CRAWLER_OPENAI_FAST_MODEL or OPENAI_FAST_MODEL.""" | |
| model = get_first_model(self.openai_fast_model) or get_first_model( | |
| os.environ.get("OPENAI_FAST_MODEL") | |
| ) | |
| if not model: | |
| raise ValueError("OPENAI_FAST_MODEL must be set in environment.") | |
| return model |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/crawler/app/config.py` at line 74, The line assigning model is terse
and hard to read; split it into two clear steps: first evaluate the preferred
source(s) (e.g., assign a variable like preferred_model = self.openai_fast_model
and fallback_env = os.environ.get("OPENAI_FAST_MODEL")), then call
get_first_model(preferred_model) and if that returns falsy call
get_first_model(fallback_env) and assign the result to model; update the code
around the get_first_model call so the intent is explicit and easier to debug.
| def llm_model(self) -> str: | ||
| return self.openai_model | ||
| return get_first_model(self.openai_model) or self.openai_model | ||
|
|
||
| @property | ||
| def llm_vision_model(self) -> str | None: | ||
| """Return vision model if configured, None otherwise.""" | ||
| return self.openai_vision_model if self.openai_vision_model else None | ||
| return get_first_model(self.openai_vision_model) if self.openai_vision_model else None | ||
|
|
||
| @property | ||
| def llm_fast_model(self) -> str: | ||
| """Return fast model for summarization, falls back to coding model.""" | ||
| return self.openai_fast_model if self.openai_fast_model else self.openai_model | ||
| return get_first_model(self.openai_fast_model) if self.openai_fast_model else self.llm_model | ||
|
|
There was a problem hiding this comment.
Harden model-list parsing to avoid invalid/None model values.
llm_fast_model can currently return None when OPENAI_FAST_MODEL is set but parses to an empty list (e.g., whitespace/comma-only). Also, llm_model can fall back to malformed raw strings instead of failing fast.
🔧 Proposed fix
-from app.utils.model_list import get_first_model
+from app.utils.model_list import get_first_model, get_first_model_or_raise
@@
`@property`
def llm_model(self) -> str:
- return get_first_model(self.openai_model) or self.openai_model
+ return get_first_model_or_raise(self.openai_model, "OPENAI_MODEL")
@@
`@property`
def llm_fast_model(self) -> str:
"""Return fast model for summarization, falls back to coding model."""
- return get_first_model(self.openai_fast_model) if self.openai_fast_model else self.llm_model
+ fast_model = get_first_model(self.openai_fast_model) if self.openai_fast_model else None
+ return fast_model or self.llm_modelBased on learnings: Enforce fail-fast validation for required configuration at service startup so invalid config fails early rather than causing runtime issues.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/operator/app/config.py` around lines 77 - 89, The model properties
can return None or malformed raw strings when environment values parse to
empty/invalid lists; update get_first_model to return None for empty/invalid
parses and then change llm_model, llm_vision_model and llm_fast_model to
validate that result and fail fast: llm_model should raise a clear ValueError if
get_first_model(self.openai_model) yields None (so startup fails on invalid
OPENAI_MODEL), llm_vision_model should return None only when openai_vision_model
is falsy or when get_first_model(...) returns None, and llm_fast_model should
use get_first_model(self.openai_fast_model) but fall back to self.llm_model only
if that call returns a non-None model; otherwise raise a ValueError indicating
OPENAI_FAST_MODEL is invalid (or if optional, fall back to llm_model only when
safe). Ensure references are to get_first_model, llm_model, llm_vision_model,
llm_fast_model, openai_model, openai_vision_model, and openai_fast_model so the
change is locatable.
| """Utilities for parsing comma-separated model lists from environment variables.""" | ||
|
|
||
|
|
||
| def parse_model_list(value: str | None) -> list[str]: | ||
| """Parse a comma-separated model list into a list of trimmed, non-empty strings.""" | ||
| if not value: | ||
| return [] | ||
| return [m.strip() for m in value.split(",") if m.strip()] | ||
|
|
||
|
|
||
| def get_first_model(value: str | None) -> str | None: | ||
| """Get the first model from a comma-separated model list.""" | ||
| models = parse_model_list(value) | ||
| return models[0] if models else None | ||
|
|
||
|
|
||
| def get_first_model_or_raise(value: str | None, var_name: str) -> str: | ||
| """Get the first model from a comma-separated model list, or raise if none available.""" | ||
| model = get_first_model(value) | ||
| if not model: | ||
| raise ValueError(f"{var_name} is not set or contains no valid models.") | ||
| return model |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider extracting shared utilities to a common package.
This module is duplicated verbatim across three services:
services/operator/app/utils/model_list.pyservices/crawler/app/utils/model_list.pyservices/rag/app/utils/model_list.py
While duplication across independently deployed services can be acceptable to avoid coupling, if these services share a build system or common package infrastructure, consider extracting this into a shared utility package to reduce maintenance burden.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/operator/app/utils/model_list.py` around lines 1 - 22, This file
duplicates the same model-list parsing utilities across services; extract the
shared functions parse_model_list, get_first_model, and get_first_model_or_raise
from model_list.py into a common/shared utility package (e.g.,
common.utils.model_list) and update each service to import those functions
instead of keeping local copies: move the implementations to the shared package,
remove the three local model_list.py copies, adjust imports in
services/operator, services/crawler, and services/rag to reference the new
common module, and ensure the shared package is added to each service’s
build/requirements and covered by unit tests.
| * useAutoScroll - Position-based auto-scrolling for growing content. | ||
| * | ||
| * Uses a single principle: "Was the user at the bottom before content grew?" | ||
| * - Yes → scroll to the new bottom (follow the stream) | ||
| * - No → don't scroll (user is reading above) | ||
| * Core invariant: "If the user is at the bottom and content grows, follow it." | ||
| * | ||
| * Position is the sole source of truth — no input-event sniffing, no direction | ||
| * tracking. A programmatic scroll guard distinguishes our own scrollTo calls | ||
| * from user scrolls, preventing async scroll events from falsely updating the | ||
| * position state. Works correctly with ALL scroll methods (mouse wheel, | ||
| * trackpad, scrollbar drag, keyboard, programmatic scroll). | ||
| * The ResizeObserver and scroll handler run continuously — position is the sole | ||
| * source of truth, not streaming state. The `enabled` flag only controls whether | ||
| * shrinkage is also followed (useful during streaming when markdown restructuring | ||
| * can temporarily reduce height). |
There was a problem hiding this comment.
Align enabled API documentation with the new behavior.
The new invariant says enabled only controls shrinkage follow, but the options doc still implies it toggles auto-scroll globally. Please make both docs consistent to avoid misuse.
📝 Suggested doc update
interface UseAutoScrollOptions {
/**
- * Whether auto-scroll should be active (e.g., during streaming)
+ * Whether shrinkage should also be followed (typically while streaming).
+ * Growth is followed whenever the user is at bottom.
*/
enabled: boolean;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/platform/app/hooks/use-auto-scroll.ts` around lines 37 - 44, Update
the comment/docs in use-auto-scroll.ts to reflect the new invariant: state that
the hook's core behavior always follows content growth when the user is at the
bottom, and that the enabled option only controls whether shrinkage is followed
(not global auto-scroll). Edit the function/comment block for useAutoScroll and
the options description to replace wording that suggests enabled toggles all
auto-scrolling with language that enabled = "follow shrinkage" (or similar), and
add a short note clarifying that growth-following remains active regardless of
enabled.
| function getDefaultModelId( | ||
| preset: string, | ||
| presets: Record<string, string[]> | undefined, | ||
| ): string { | ||
| return presets?.[preset]?.[0] ?? ''; | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider guarding against empty modelId.
getDefaultModelId returns an empty string when presets is undefined or the preset key doesn't exist. If agent.modelId is also unset and modelPresets hasn't loaded yet, initialModelId will be '', which propagates to the form and could be saved.
The route loader prefetches modelPresets, so in practice this window should be brief. Consider either:
- Disabling the Select until
modelPresetsis loaded, or - Adding a guard in
handleSaveto prevent saving an emptymodelId.
💡 Optional guard in handleSave
const handleSave = useCallback(
async (data: InstructionsFormData) => {
+ if (!data.modelId) return; // Skip save if modelId not yet resolved
+
const modelPreset = modelPresets
? derivePresetFromModelId(data.modelId, modelPresets)
: 'standard';🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@services/platform/app/routes/dashboard/`$id/custom-agents/$agentId/instructions.tsx
around lines 61 - 66, getDefaultModelId can return '' when presets aren't
present which allows initialModelId to be empty and saves an invalid
agent.modelId; locate getDefaultModelId and the code that builds initialModelId
and either (A) disable the model Select component until modelPresets is loaded
(use the route loader state or a modelPresets truthy check to set disabled on
the Select) or (B) add a guard inside handleSave (check agent.modelId or
initialModelId is non-empty) to abort saving and surface an error/validation
message to the user; update references to getDefaultModelId, initialModelId, the
Select control, and handleSave accordingly so an empty string cannot be
persisted.
| createAgent: (options) => | ||
| createWorkflowAgent({ withTools: true, delegationMode, ...options }), | ||
| model: process.env.OPENAI_CODING_MODEL || '', | ||
| model: getCodingModel() ?? '', |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# First, let's locate and examine generate_response.ts
echo "=== File structure check ==="
fd "generate_response.ts" services/platform/convex/agents/workflow/
echo
echo "=== Content around line 45 in generate_response.ts ==="
head -60 services/platform/convex/agents/workflow/generate_response.ts
echo
echo "=== getCodingModel function definition ==="
rg "getCodingModel" services/platform/convex/ -A 5 -B 2
echo
echo "=== Full generate_response.ts (limited to 150 lines for analysis) ==="
head -150 services/platform/convex/agents/workflow/generate_response.tsRepository: tale-project/tale
Length of output: 8889
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== generateAgentResponse function definition ==="
find services/platform/convex/lib -name "*.ts" -o -name "*.tsx" | xargs grep -l "generateAgentResponse" | head -5
echo
echo "=== Generate Agent Response implementation ==="
fd "generate_response.ts" services/platform/convex/lib/ -x cat {} | head -200
echo
echo "=== How model is used in agent creation and API calls ==="
rg "model:" services/platform/convex/lib/create_agent_config.ts -A 3 -B 3
echo
echo "=== Check if model is validated before API use ==="
rg "function generateAgentResponse" services/platform/convex/lib -A 50 | head -80Repository: tale-project/tale
Length of output: 7055
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Search for where model is used in generateAgentResponse ==="
grep -n "model" services/platform/convex/lib/agent_response/generate_response.ts | head -30
echo
echo "=== Check createAgent call and how model is passed ==="
sed -n '1,250p' services/platform/convex/lib/agent_response/generate_response.ts | grep -A 20 "createAgent"
echo
echo "=== Check agent creation in workflow agent (where model validation happens) ==="
sed -n '50,100p' services/platform/convex/agents/workflow/agent.ts
echo
echo "=== Full agent.ts to see model fallback ==="
head -80 services/platform/convex/agents/workflow/agent.tsRepository: tale-project/tale
Length of output: 5144
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Check how other agents handle model fallback ==="
fd "generate_response.ts" services/platform/convex/agents/ -x grep -H "model:" {}
echo
echo "=== Compare agent factory patterns across the codebase ==="
rg "getCodingModel\(\) \?\? ''" services/platform/convex/
echo
echo "=== Check if getCodingModelOrThrow is used elsewhere in similar patterns ==="
rg "getCodingModelOrThrow\(\)" services/platform/convex/agents/ -B 2 -A 2
echo
echo "=== Verify the nullish coalescing behavior in agent.ts ==="
sed -n '75,90p' services/platform/convex/agents/workflow/agent.ts
echo
echo "=== Check how createAgentConfig uses the model parameter ==="
rg "function createAgentConfig\|const createAgentConfig" services/platform/convex/lib/create_agent_config.ts -A 20Repository: tale-project/tale
Length of output: 1210
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Check createAgentConfig definition and how it handles model ==="
cat services/platform/convex/lib/create_agent_config.ts
echo
echo "=== Check how Agent constructor uses the model in agent config ==="
rg "class Agent\|new Agent" services/platform/convex -B 2 -A 5 | head -40
echo
echo "=== Verify if empty string would cause issues at SDK level ==="
rg "model.*\|\|.*model\|model.*??.*model" services/platform/convex/lib/create_agent_config.tsRepository: tale-project/tale
Length of output: 6924
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Check if getDefaultAgentRuntimeConfig exists and how it's used ==="
rg "getDefaultAgentRuntimeConfig" services/platform/convex/ -B 2 -A 5
echo
echo "=== Check getDefaultModel to understand the fallback ==="
rg "function getDefaultModel\|const getDefaultModel" services/platform/convex/lib/ -A 5
echo
echo "=== Trace complete flow: empty string through the chain ==="
echo "1. generate_response.ts passes: getCodingModel() ?? '' "
echo "2. createWorkflowAgent receives options.model = ''"
echo " - Line: const model = options?.model ?? getCodingModelOrThrow()"
echo " - Result: '' is NOT nullish, so model = '' (does NOT throw)"
echo "3. createAgentConfig receives model = ''"
echo " - truthiness check: if (opts.model) → '' is falsy → uses getDefaultModel()"
echo ""
echo "=== Verify if this pattern is used elsewhere or if there's a preferred pattern ==="
rg "generateWorkflowResponse\|createWorkflowAgent" services/platform/convex/ -B 5 | grep -A 5 "model:"Repository: tale-project/tale
Length of output: 7442
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Get the full getDefaultModel implementation ==="
rg "function getDefaultModel|export.*getDefaultModel" services/platform/convex/lib/agent_runtime_config.ts -A 10
echo
echo "=== Confirm the truthiness check behavior in createAgentConfig ==="
sed -n '95,110p' services/platform/convex/lib/create_agent_config.ts
echo
echo "=== Check other agents to see if they use similar or different patterns ==="
rg "generateAgentResponse" services/platform/convex/agents/ -B 20 | grep -A 10 "model:"
echo
echo "=== Verify the learning about using getDefaultAgentRuntimeConfig ==="
echo "According to PR `#397` learning, agents should use getDefaultAgentRuntimeConfig()"
echo "Let's see if chat_agent uses that pattern for reference"
rg "generateAgentResponse\|generateChatResponse" services/platform/convex/agents/chat/ -A 20 | head -40Repository: tale-project/tale
Length of output: 1198
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Get the actual getModel function in createAgentConfig ==="
sed -n '95,120p' services/platform/convex/lib/create_agent_config.ts
echo
echo "=== Check if getCodingModel vs getCodingModelOrThrow pattern is documented ==="
cat services/platform/convex/lib/agent_runtime_config.ts | head -50
echo
echo "=== Verify that empty string is indeed falsy and will fall back to getDefaultModel ==="
echo "In TypeScript/JavaScript:"
echo " '' (empty string) is falsy"
echo " if ('') → false"
echo " So: if (opts.model) where opts.model = '' → will call getDefaultModel()"
echo ""
echo "=== Check if there's any direct API call that uses model before createAgentConfig ==="
rg "getCodingModel\(\)" services/platform/convex/agents/workflow/ -B 5 -A 10Repository: tale-project/tale
Length of output: 3606
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Check how chat agent (if it exists) handles model ==="
fd "generate_response.ts" services/platform/convex/agents/chat/ -x cat {}
echo
echo "=== Check if there are other agents and their model handling patterns ==="
fd "generate_response.ts" services/platform/convex/agents/
echo
echo "=== Verify the complete call chain: model parameter in generateAgentResponse ==="
echo "generateAgentResponse receives config.model"
echo "Let's see where it's actually used in generateAgentResponse"
sed -n '315,330p' services/platform/convex/lib/agent_response/generate_response.ts
echo
echo "=== Check if model is ever used directly before being passed to createAgent ==="
sed -n '315,400p' services/platform/convex/lib/agent_response/generate_response.ts | grep -i "model"Repository: tale-project/tale
Length of output: 977
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Find where createAgent is called within generateAgentResponse ==="
rg "createAgent" services/platform/convex/lib/agent_response/generate_response.ts -B 3 -A 3
echo
echo "=== Get the context around line 400-420 where the agent is created ==="
sed -n '400,420p' services/platform/convex/lib/agent_response/generate_response.ts
echo
echo "=== Search for all places where createAgent is invoked ==="
rg "await.*createAgent\|createAgent\(" services/platform/convex/lib/agent_response/generate_response.ts -B 2 -A 2Repository: tale-project/tale
Length of output: 2295
Reconsider the model initialization pattern for clarity and consistency.
The empty string fallback is actually safe—createAgentConfig treats '' as falsy and falls back to getDefaultModel(). However, the pattern getCodingModel() ?? '' is unintuitive because empty string is not nullish; the truthiness check downstream only works by accident.
Instead, either:
- Use
getCodingModel() || getDefaultModel()for explicit fallback, or - Use
getCodingModelOrThrow()if coding model is mandatory for workflow agents
This aligns with the codebase pattern where agents source runtime config centrally (per your PR #397 guidance) rather than relying on implicit truthiness checks.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/platform/convex/agents/workflow/generate_response.ts` at line 45,
The use of getCodingModel() ?? '' is confusing because it relies on downstream
truthiness in createAgentConfig to pick a default; change the initialization to
an explicit fallback or enforce presence: either set model to getCodingModel()
|| getDefaultModel() so the default is chosen clearly, or call
getCodingModelOrThrow() if a coding model is mandatory for workflow agents;
update the call site where model is passed into createAgentConfig to use one of
these two alternatives and ensure tests/consumers expect an explicit model or a
thrown error accordingly.
| * start → fetch_main_page → update_metadata → query_urls → register_urls | ||
| * → sync_pages → check_has_more → (loop or update_status) | ||
| * → crawl_and_upsert_pages → check_has_more → (loop or update_status) |
There was a problem hiding this comment.
Flow documentation no longer matches the actual step slug chain.
At Line 10, the flow says crawl_and_upsert_pages, but the workflow transitions through sync_pages (see nextSteps/stepSlug). Please align the comment with the actual slug names (or rename the slug chain consistently).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/platform/convex/predefined_workflows/website_scan.ts` around lines 9
- 10, The flow comment at the top lists step slug "crawl_and_upsert_pages" but
the actual workflow uses "sync_pages" in nextSteps/stepSlug; update the comment
to match the real slug chain (or rename the step slug to match the comment) so
they are consistent. Locate the step definitions that use nextSteps/stepSlug
(check references to "sync_pages" and any transition logic) and either change
the comment to include "sync_pages" in place of "crawl_and_upsert_pages" or
rename the step slug and its occurrences to "crawl_and_upsert_pages" so the
documented chain and actual step names align.
| urlsToSync: { | ||
| type: 'array', | ||
| description: 'URLs that need content fetching', | ||
| }, |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Missing items specification for array field.
The urlsToSync field is typed as array but lacks an items specification to describe its element type. Other array fields in this file (e.g., line 142-143 for tags) include items definitions.
📝 Suggested fix
urlsToSync: {
type: 'array',
description: 'URLs that need content fetching',
+ items: { type: 'string' },
},📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| urlsToSync: { | |
| type: 'array', | |
| description: 'URLs that need content fetching', | |
| }, | |
| urlsToSync: { | |
| type: 'array', | |
| description: 'URLs that need content fetching', | |
| items: { type: 'string' }, | |
| }, |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@services/platform/convex/workflow_engine/helpers/validation/variables/action_schemas.ts`
around lines 622 - 625, The schema for the field urlsToSync is declared as type
'array' but missing an items spec; update the action schema (the object that
contains urlsToSync) to add an items property describing the element type (e.g.,
items: { type: 'string', format: 'uri' } or items: { type: 'string' } as
appropriate) so validation mirrors other array fields like tags; modify the
urlsToSync definition in action_schemas.ts to include the items schema and any
required constraints (minItems, uniqueItems) consistent with surrounding array
definitions.
…kpool, and improve streaming (#574)
Summary
register_discovered_urls→register_urls,sync_pending_pages→crawl_and_upsert,generateForPage→embedPage@convex-dev/workpool(maxParallelism=1, 3 retries with exponential backoff) to serialize embedding generation during bulk scans and prevent rate-limit failuresuseFastModelboolean with explicit model string resolution. AddmodelIdfield to custom agents for user-selectable modelsuseStableStreamTexthook to suppress temporary regression from WebSocket reconnection, and makeuseAutoScrollResizeObserver run continuously through typewriter drainTest plan
useStableStreamText,useAutoScroll, model list parsing, and embedding config🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Improvements