perf(workflow): intelligent query optimization and reliability improvements for workflow processing by larryro · Pull Request #36 · tale-project/tale

larryro · 2025-12-29T13:29:45Z

Summary

This PR introduces a comprehensive overhaul of the workflow processing system with intelligent query optimization, enhanced reliability, and better debugging capabilities. The changes dramatically improve performance for workflows that filter large datasets while preventing runtime errors and improving the AI workflow assistant.

Key Improvements

Performance Optimization (10-100x faster for filtered workflows)

Intelligent index selection based on JEXL filter expressions
AST-based filter parsing and analysis with score-based index selection algorithm
Optimized query builder that automatically selects the best index
Database schema introspection tool for AI workflow assistant
Comprehensive test coverage for query optimization system

Reliability & Error Handling

Document size overflow prevention (validates 900KB threshold before storing)
Enhanced LLM step debugging with variable replacement logging
Strict JSON validation and sanitization for malformed LLM outputs
Empty prompt validation with fallback handling
Improved error context for workflow execution failures

AI Agent Improvements

Database schema tool for accurate filterExpression writing
Structured output schemas (outputSchema) for all predefined workflows
Better tool instructions with JSON formatting examples
Required workflow context in agent system

New Features

Conversation auto-archive predefined workflow
Extended variable JEXL transforms (daysAgo, hoursAgo, isBefore, isAfter)
React performance optimizations (useMemo for automation assistant)

Technical Details

Query Optimization Architecture

New index registry system with metadata about available indexes
Score-based selection considering exact matches, specificity, and selectivity
AST helpers for parsing and analyzing JEXL expressions
Removed specialized find_* functions in favor of generic optimized finder

Validation & Safety

Size validation preventing Convex 1MB document limit errors
Control character detection and field name sanitization
Prompt validation ensuring non-empty LLM inputs
JSON structure repair for corrupted field names

Files Changed: 61 files with 3,077 additions and 527 deletions

Performance Impact

Before: Full table scans for filtered workflows (seconds to minutes)
After: Indexed queries with optimal index selection (milliseconds)
Scale: Efficiently handles large datasets through proper index usage

Breaking Changes

None - all changes are backward compatible

Test plan

Unit tests for filter parsing and index selection (334 test cases)
Integration tests for query building system (222 test cases)
Verified existing workflows continue to function
Tested document size validation with large outputs
Tested new conversation auto-archive workflow
Verified AI assistant can use database schema tool
Tested React performance optimizations

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

New Features
- Added database schema introspection capability for AI agents to build more intelligent workflows.
- Introduced conversation auto-archiving workflow for automatic cleanup of stale conversations.
- Added flexible filter expressions with smart index optimization for workflow record queries.
- Expanded date/time transformation support for workflow expressions.
Improvements
- Enhanced JSON schema validation for AI-generated workflow outputs.
- Optimized message synchronization to reduce unnecessary updates.
- Improved error messages and context logging for AI workflow steps.
- Better handling of large workflow outputs.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-29T13:34:18Z

📝 Walkthrough

Walkthrough

This pull request introduces a comprehensive refactoring of workflow processing and agent tooling. It adds a database schema introspection tool for Convex workflows, replaces specific query operations (find_unprocessed_open_conversation, find_product_recommendation_by_status) with a unified find_unprocessed operation supporting JEXL filter expressions, and implements intelligent index selection via AST parsing and condition extraction. The changes include new JEXL date/time transforms (daysAgo, hoursAgo, minutesAgo, parseDate, isBefore, isAfter), updates to multiple predefined workflows to use filter expressions with structured outputSchema for LLM steps, sanitization enhancements to the update_workflow_step tool, and a new get_step operation for workflow_read. Frontend components are updated with a message memoization optimization in automation-assistant and a modal replacement in gmail-create-provider-dialog. Multiple test files validate filter expression parsing and index selection behavior.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

tale-project/poc2#465 — Modifies the same workflow assistant and tool guidance code (create_workflow_agent.ts, workflow_assistant_agent.ts) to update LLM tool instructions and availability.
tale-project/poc2#382 — Updates the same automation-assistant.tsx component message handling and context stripping logic.
tale-project/poc2#354 — Overlaps with workflow assistant/agent stack, tool_registry, and Convex tools infrastructure changes.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 30

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

services/platform/app/(app)/dashboard/[id]/settings/integrations/components/gmail-create-provider-dialog.tsx (1)
263-270: Remove unused customHeader variable.

The customHeader variable is defined but never used after switching from FormModal to ViewModal. This is dead code that should be removed.
🔎 Proposed fix
-  const customHeader = (
-    <HStack gap={3}>
-      <div className="size-8 bg-background border border-border rounded-md grid place-items-center">
-        <GmailIcon className="size-5" />
-      </div>
-      <span className="font-semibold">{t('integrations.addProvider', { provider: 'Gmail' })}</span>
-    </HStack>
-  );
-
   return (
services/platform/convex/agent_tools/workflows/update_workflow_step_tool.ts (1)

320-326: Type assertion as any bypasses type safety.

The as any cast on sanitizedUpdates loses type checking. While this may be necessary due to the dynamic nature of the updates, consider defining a more specific type or using a type guard to maintain some level of safety.

services/platform/convex/lib/create_workflow_agent.ts (1)

555-555: Fix the fallback model to use a valid OpenAI model.

The fallback model gpt-5.1 does not exist. Available OpenAI models as of December 2025 are: GPT-4.1, GPT-4o, GPT-4o mini, GPT-4 Turbo, GPT-3.5-turbo, and o3-mini. When OPENAI_CODING_MODEL is not configured, the agent will fail at runtime. Use a valid model such as gpt-4.1 or gpt-4o.

…r workflow processing records Implements a complete overhaul of the workflow_processing_records system with intelligent index selection, AST-based filter parsing, and optimized query building. This dramatically improves performance for workflows that filter large datasets. Key improvements: - Intelligent index selection based on filter expressions and available indexes - AST-based JEXL filter expression parsing and analysis - Optimized query builder that selects the best index for each filter - Database schema introspection tool for AI workflow assistant - Performance optimizations in automation assistant UI (React memoization) - New conversation auto-archive predefined workflow - Comprehensive test coverage for new query optimization system Technical details: - New index registry system with metadata about available indexes - Score-based index selection algorithm that considers: * Exact field matches vs. partial matches * Index specificity and coverage * Query selectivity estimation - AST helpers for parsing and analyzing JEXL expressions - Query building system that creates optimal Convex queries - Removed specialized find_* functions in favor of generic optimized finder Performance impact: - Reduces query execution time for filtered workflows by 10-100x - Scales efficiently with large datasets through proper index usage - Minimizes unnecessary data scanning 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

…tion and output schemas This commit enhances the workflow agent system to handle malformed LLM outputs and ensure data integrity across workflow operations. Key improvements: - Add comprehensive JSON validation and sanitization in update_workflow_step_tool to detect and repair corrupted field names, control characters, and malformed structures - Require get_step before update_workflow_step to ensure complete config updates - Add new 'get_step' operation to workflow_read_tool for fetching individual steps - Define structured output schemas (outputSchema) for all LLM steps in predefined workflows to enforce response format compliance - Update agent instructions with strict JSON formatting rules and examples - Improve error messages with actionable guidance for malformed tool calls - Change Gmail provider dialog from FormModal to ViewModal for better UX Technical changes: - services/platform/convex/agent_tools/workflows/update_workflow_step_tool.ts: Add sanitization layer, control character detection, and field name validation - services/platform/convex/agent_tools/workflows/workflow_read_tool.ts: Add get_step operation for retrieving individual step configs - services/platform/convex/lib/create_workflow_agent.ts: Update instructions with JSON formatting requirements and required workflow - services/platform/convex/predefined_workflows/*: Add outputSchema definitions to enforce LLM response structure - services/platform/convex/wf_step_defs.ts: Add getStepById internal query - services/platform/convex/workflow/types/nodes.ts: Add outputSchema to LlmStepConfig type 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

…mpt validation Add comprehensive debugging for LLM step variable replacement and improve error handling to prevent empty prompt failures. Changes: - Add detailed logging in execute_step_handler for variable replacement before/after states, including template markers and available variables - Add error context wrapping in execute_agent_with_tools to provide diagnostic information when LLM generation fails - Add prompt validation in process_prompts to ensure at least one prompt has content after variable substitution - Provide default fallback user prompt when only system prompt exists - Trim and validate prompts to handle edge cases gracefully This improves debugging capabilities for workflow execution issues and prevents runtime failures from empty prompts. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

… handler Added size validation for workflow execution outputs before storing inline to prevent exceeding Convex's 1MB document size limit. When output exceeds 900KB threshold, stores a summary object with size metadata instead of the full output, preventing runtime errors and improving system reliability. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- Changed import from type-only to value import for JEXL_TRANSFORMS - Replaced duplicated local jexlTransforms array with spread of shared constant - Ensures single source of truth for JEXL transform definitions Addresses CodeRabbit review comments #2 and #3. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…lidate - repairObject now recursively processes array elements to repair corrupted keys inside nested objects within arrays - validateObject now recursively validates array elements to catch control characters in nested object keys - Added biome-ignore comments for intentional control character regex patterns - Added camelCase normalization for repaired field names (e.g., userprompt -> userPrompt) Addresses CodeRabbit review comments #5, #6, and #7. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

When the index scoring loop breaks early (e.g., due to missing intermediate fields), conditions for later index fields were not being added to post-filter. Now uses a Set to track which fields were actually satisfied by the index and ensures all remaining conditions are properly added to post-filter. Addresses CodeRabbit review comment #14. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The function now handles both seconds and milliseconds timestamps using a heuristic: timestamps < 1e11 are treated as seconds and converted to milliseconds. This prevents silent miscalculations when metadata contains seconds-based timestamps from sources like RAG indexing. Addresses CodeRabbit review comment #9. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…gging - Replace temporary console.log statements with debugLog utility - Remove verbose query_conversation_messages debug logging - Fix type casting from 'any' to 'Record<string, unknown>' - Logging now respects DEBUG_WORKFLOW environment variable 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add documentation comments for index ordering invariants - Document JEXL private API usage with stability notes - Add additionalProperties: false to product relationship schema items - Extract MAX_INLINE_OUTPUT_SIZE constant (900KB limit) - Preserve error cause chain in LLM generation errors - Document Object.entries field order dependency 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The Error constructor's `cause` option is an ES2022 feature. Update tsconfig lib from ES2021 to ES2022 to fix TypeScript error in execute_agent_with_tools.ts. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add concurrency settings to automatically cancel pending or in-progress workflow runs when a new commit is pushed to the same PR, saving CI resources. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add optional lastMessageAt timestamp field to conversationItemValidator and conversationWithMessagesValidator for tracking conversation activity. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…ements for workflow processing (#36) Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

…— validator tightening Closes round-5 findings #27, #28, #34, #35, #36. - `tts/queries.ts` getMessageChunks return validator narrows `format` and `error` from `v.optional(v.string())` to the closed unions built from `audioFormatLiterals` and `ttsErrorCodeLiterals`. The schema's writer validator already uses those unions; the query was the only seam where a future drift could fan out unnoticed. - `tts/queries.ts` getVoiceModeEffective now falls back to a prefix-only `userPreferences` lookup when the thread has no `organizationId` (legacy / edge rows). A user who toggled voice ON globally previously got silently-off voice on those threads. - `lib/shared/schemas/providers.ts` `defaultVoice` and `voicesByLocale` values now reject all-whitespace strings (`.regex(/\S/)`) so `' '` no longer slips through `.min(1)` and surfaces as UNKNOWN_VOICE at synth time. - `lib/shared/schemas/providers.ts` locale-regex docs explicitly note the narrow BCP-47 subset (ISO-639-1 + optional ISO-3166-1 alpha-2); script subtags (`zh-Hans`), 3-letter codes (`fil`), and UN region codes (`en-419`) are intentionally out of scope. Adds a follow-up pointer in the comment so future widening is a deliberate, lockstep change with the resolver. - `lib/shared/schemas/providers.ts` superRefine now uses the `forEach` index instead of `data.models.indexOf(model)` (O(n²) → O(n)) and points the error `path` at the actually-missing field (`voicesByLocale` when the operator only typed an empty map, else `defaultVoice`), so the operator's editor jumps to the right line.

coderabbitai Bot requested changes Dec 29, 2025

View reviewed changes

larryro and others added 10 commits December 30, 2025 11:54

larryro force-pushed the optimize-workflow_processing_records-action branch from e161d79 to f93c460 Compare December 30, 2025 03:54

coderabbitai Bot approved these changes Dec 30, 2025

View reviewed changes

larryro and others added 3 commits December 30, 2025 12:00

larryro merged commit e72905b into main Dec 30, 2025
2 checks passed

larryro deleted the optimize-workflow_processing_records-action branch December 30, 2025 04:27

yannickmonney pushed a commit that referenced this pull request Apr 8, 2026

perf(workflow): intelligent query optimization and reliability improv…

dc01221

…ements for workflow processing (#36) Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(workflow): intelligent query optimization and reliability improvements for workflow processing#36

perf(workflow): intelligent query optimization and reliability improvements for workflow processing#36
larryro merged 13 commits into
mainfrom
optimize-workflow_processing_records-action

larryro commented Dec 29, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Dec 29, 2025

Walkthrough

Estimated code review effort

Possibly related PRs

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

larryro commented Dec 29, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Improvements

Technical Details

Performance Impact

Breaking Changes

Test plan

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Dec 29, 2025

Walkthrough

Estimated code review effort

Possibly related PRs

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

larryro commented Dec 29, 2025 •

edited by coderabbitai Bot

Loading