feat: track current step and loop progress in workflow executions#731
Conversation
…ract comparison v4 Add currentStepName field to workflow executions for human-readable progress tracking in the UI. Update the approval card to display step names instead of slugs. Upgrade contract comparison workflow to v4 with executive insights, transition summaries, hedged language for legal implications, and pre-computed clause statistics.
Track loop iteration progress (current/total) during workflow execution and surface it in the approval card UI for real-time feedback on loop steps.
…ipeline Remove precompute_statistics step, passing clause data directly to the overview LLM. Rewrite analyzer and classifier prompts for structured clarity with explicit interpretation rules and clause type normalization. Clarify loop progress label with "iteration" prefix.
… helper Remove the classify_clauses LLM step and allClauses variable, moving change statistics counting into the summarize_transition prompt. Move methodology to static report content. Extract loop progress logic from execute_step_handler into a dedicated get_active_loop_progress helper with unit tests.
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
📝 WalkthroughWalkthroughThis PR extends the workflow execution system to track and display step-level progress information in real-time. It introduces Estimated code review effort🎯 3 (Moderate) | ⏱️ ~28 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/workflows/contract-comparison/config.json`:
- Around line 139-151: The prompt in stepSlug "summarize_transition" asks the
LLM to count markdown table rows by Change Type, which is unreliable; instead
have the deterministic analysis step emit explicit counts and make the LLM use
them: update the producer (e.g., the step that populates
steps.analyze_differences.output.data) to include fields like
counts.substantive, counts.editorial, counts.structural (or a preformatted
statistics string), then change the userPrompt to reference those fields (e.g.,
{{steps.analyze_differences.output.counts}} or
{{steps.analyze_differences.output.statsLine}}) and remove the LLM counting
instruction so the model only summarizes; alternatively, if you must keep
counting in the model, add a validation step that compares the LLM's reported
numbers to the deterministic counts and fails the job on mismatch.
In
`@services/platform/app/features/chat/components/workflow-run-approval-card.tsx`:
- Around line 249-259: The UI currently falls back to t('executionRunning') when
executionStatus.currentStepName is absent, which hides the existing
executionStatus.currentStepSlug; update the rendering logic in the
WorkflowRunApprovalCard component to use executionStatus.currentStepName ||
executionStatus.currentStepSlug as the step label (preserving the loopProgress
branch and using the same i18n keys t('executionRunningStepWithProgress') and
t('executionRunningStep')), so that when currentStepName is missing the code
displays currentStepSlug instead of the generic executionRunning text.
- Line 243: The live region is currently applied to the entire Stack
(role="status") causing the ticking timer to re-announce; move role="status"
from the Stack wrapper to only the element that renders the step/status text
(e.g., the status label node), and either remove live attributes from the
elapsed timer or mark the timer element as non-live (aria-hidden="true" or
aria-live="off") so the timer updates on lines ~279-283 do not trigger screen
reader announcements; update the JSX so only status changes are in the live
region and the timer is excluded.
In
`@services/platform/convex/workflow_engine/helpers/engine/execute_step_handler.ts`:
- Around line 74-86: The current call to
ctx.runMutation(internal.wf_executions.internal_mutations.updateExecutionStatus,
...) makes progress bookkeeping a hard prerequisite for running the step—if it
throws, handleExecuteStep returns early and executeStepByType never runs; change
the flow so updateExecutionStatus errors are non-fatal: wrap the call to
getActiveLoopProgress and
ctx.runMutation(internal.wf_executions.internal_mutations.updateExecutionStatus,
...) in a try/catch, log/report the error but do not rethrow, and then proceed
to call executeStepByType; ensure you still pass loopProgress (if available) and
keep the same mutation params so behavior is preserved when it succeeds.
In
`@services/platform/convex/workflow_engine/helpers/step_execution/get_active_loop_progress.ts`:
- Around line 14-19: The helper currently returns progress for any numeric
values, allowing impossible states; update the guard in get_active_loop_progress
(the block using isLoopProgress(state) and isRecord(state)) to validate that
state.currentIndex and state.totalItems are integers and non-negative (use
Number.isInteger), that state.totalItems > 0, and that state.currentIndex <
state.totalItems (otherwise return null); only then compute and return {
current: state.currentIndex + 1, total: state.totalItems } to ensure impossible
progress like currentIndex >= totalItems or negative indices do not reach the
UI.
In `@services/platform/convex/workflows/executions/update_execution_status.ts`:
- Around line 32-38: In update_execution_status, callers who omit
args.loopProgress leave stale progress on the record; change the logic so that
if args.loopProgress is undefined AND the incoming status (args.status or
args.newStatus used in this function) indicates the run is no longer running
(e.g., !== 'running' or === 'completed'/'failed'), you explicitly clear
updates.loopProgress (set it to undefined/null as your DB patch convention) so
the stored loopProgress is removed; keep the existing branch that writes
updates.loopProgress when args.loopProgress is provided.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 868334eb-7352-430f-bd3a-48543464da0e
⛔ Files ignored due to path filters (1)
services/platform/convex/_generated/api.d.tsis excluded by!**/_generated/**
📒 Files selected for processing (14)
examples/workflows/contract-comparison/config.jsonservices/platform/app/features/chat/components/workflow-run-approval-card.tsxservices/platform/convex/wf_executions/internal_mutations.tsservices/platform/convex/wf_executions/queries.tsservices/platform/convex/workflow_engine/helpers/engine/execute_step_handler.tsservices/platform/convex/workflow_engine/helpers/step_execution/get_active_loop_progress.test.tsservices/platform/convex/workflow_engine/helpers/step_execution/get_active_loop_progress.tsservices/platform/convex/workflow_engine/types/workflow.tsservices/platform/convex/workflows/executions/types.tsservices/platform/convex/workflows/executions/update_execution_status.tsservices/platform/convex/workflows/executions/validators.tsservices/platform/convex/workflows/schema.tsservices/platform/lib/utils/type-guards.tsservices/platform/messages/en.json
| { | ||
| "stepSlug": "classify_clauses", | ||
| "name": "Extract Clause Classifications", | ||
| "stepSlug": "summarize_transition", | ||
| "name": "Summarize Version Transition", | ||
| "stepType": "llm", | ||
| "order": 8, | ||
| "config": { | ||
| "name": "Clause Classification Extractor", | ||
| "systemPrompt": "You extract structured clause data from a markdown legal analysis into JSON.\n\nThe analysis contains markdown tables grouped under ### headings. Each ### heading is the clause type (e.g., \"Kaufgegenstand und Kaufpreis\", \"Indemnification\"). Each table has columns: Clause Reference | Change Type | Description | Legal Significance.\n\nFor each table row, create one JSON object:\n- clauseType: the ### heading text above the table\n- classification: the Change Type cell value (must be exactly \"Editorial\", \"Structural\", or \"Substantive\")\n- clauseRef: the Clause Reference cell value, or null if empty\n- versionFrom: use the source file name provided below\n- versionTo: use the target file name provided below\n\nExample — given this markdown:\n\n### Indemnification\n| Clause Reference | Change Type | Description | Legal Significance |\n|---|---|---|---|\n| Section 7.1 | Substantive | Changed liability cap from fixed to percentage | Material change to risk allocation |\n| Section 7.2 | Editorial | Minor wording adjustment | No material legal change |\n\nWith source file \"v1.docx\" and target file \"v2.docx\", extract:\n[{\"clauseType\":\"Indemnification\",\"classification\":\"Substantive\",\"clauseRef\":\"Section 7.1\",\"versionFrom\":\"v1.docx\",\"versionTo\":\"v2.docx\"},{\"clauseType\":\"Indemnification\",\"classification\":\"Editorial\",\"clauseRef\":\"Section 7.2\",\"versionFrom\":\"v1.docx\",\"versionTo\":\"v2.docx\"}]\n\nExtract every single table row. Do not skip any.", | ||
| "userPrompt": "Source file: \"{{files[loop.index].fileName}}\"\nTarget file: \"{{loop.item.fileName}}\"\n\n{{steps.analyze_differences.output.data}}", | ||
| "outputFormat": "json", | ||
| "outputSchema": { | ||
| "type": "object", | ||
| "properties": { | ||
| "clauses": { | ||
| "type": "array", | ||
| "description": "Structured clause classifications extracted from the analysis", | ||
| "items": { | ||
| "type": "object", | ||
| "properties": { | ||
| "clauseType": { | ||
| "type": "string", | ||
| "description": "The identified clause type (e.g., Indemnification, Limitation of Liability)" | ||
| }, | ||
| "classification": { | ||
| "type": "string", | ||
| "enum": ["Editorial", "Structural", "Substantive"], | ||
| "description": "Change significance level (always English)" | ||
| }, | ||
| "clauseRef": { | ||
| "type": ["string", "null"], | ||
| "description": "The clause/section reference if identified (e.g., 'Section 7.1.2', 'Ziff. 3.2')" | ||
| }, | ||
| "versionFrom": { | ||
| "type": "string", | ||
| "description": "Source version file name" | ||
| }, | ||
| "versionTo": { | ||
| "type": "string", | ||
| "description": "Target version file name" | ||
| } | ||
| }, | ||
| "required": ["clauseType", "classification", "versionFrom", "versionTo"] | ||
| } | ||
| } | ||
| }, | ||
| "required": ["clauses"] | ||
| } | ||
| "name": "Transition Summary Analyst", | ||
| "systemPrompt": "You are a transaction analyst. Given a detailed change analysis between two contract versions, produce a concise summary of how the contract evolved in this transition. Focus on:\n- What negotiation positions appear to have shifted\n- How risk allocation may have changed between parties\n- Whether new deal mechanisms were introduced or removed (e.g., earn-out, escrow, locked-box, MAC clauses, non-compete)\n- The overall direction of the negotiation (tightening, loosening, converging)\n\nUse hedged language for interpretive conclusions (\"appears to\", \"may indicate\", \"suggests\"). Write 3-5 sentences maximum. Write in the language specified by the language parameter. If no language parameter is provided or empty, write in the same language as the source documents.\n\nAt the very end, append a single statistics line in this exact format:\nChanges: X Substantive, Y Editorial, Z Structural\n\nCount the rows in each Change Type column from the markdown tables in the input. This line must always be in English regardless of the output language.", | ||
| "userPrompt": "Output language: {{language || 'auto-detect from documents'}}\n\nTransition {{loop.index + 1}} of {{filesToCompare | length}}: \"{{files[loop.index].fileName}}\" → \"{{loop.item.fileName}}\"\n\nBelow is the detailed change analysis for this version transition. Summarize the key transaction-level developments.\n\nCount the table rows grouped under each ### heading and their Change Type column values (Editorial, Structural, or Substantive) to produce the statistics line at the end.\n\n{{steps.analyze_differences.output.data}}", | ||
| "outputFormat": "text" | ||
| }, | ||
| "nextSteps": { "success": "accumulate_results" } | ||
| }, |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
LLM-based row counting may be unreliable.
The prompt instructs the LLM to count table rows by Change Type from markdown tables in the input. LLMs can miscount or inconsistently parse complex markdown structures, leading to inaccurate statistics.
Consider whether the deterministic diff step could emit counts directly, or accept that statistics may occasionally be approximate.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/workflows/contract-comparison/config.json` around lines 139 - 151,
The prompt in stepSlug "summarize_transition" asks the LLM to count markdown
table rows by Change Type, which is unreliable; instead have the deterministic
analysis step emit explicit counts and make the LLM use them: update the
producer (e.g., the step that populates steps.analyze_differences.output.data)
to include fields like counts.substantive, counts.editorial, counts.structural
(or a preformatted statistics string), then change the userPrompt to reference
those fields (e.g., {{steps.analyze_differences.output.counts}} or
{{steps.analyze_differences.output.statsLine}}) and remove the LLM counting
instruction so the model only summarizes; alternatively, if you must keep
counting in the model, add a validation step that compares the LLM's reported
numbers to the deterministic counts and fails the job on mismatch.
| {/* Live Execution Status */} | ||
| {status === 'approved' && executionId && ( | ||
| <Stack gap={1} className="mb-3" aria-live="polite"> | ||
| <Stack gap={1} className="mb-3" role="status" aria-live="polite"> |
There was a problem hiding this comment.
Scope the live region to status changes, not the ticking timer.
Line 243 makes the whole block a role="status" live region, but Lines 279-283 mutate every second. That will cause screen readers to keep re-announcing the running state for the entire execution. Keep the live region on the step/status text only, and move the elapsed timer outside it or mark the timer as non-live.
♿ Suggested structure
- <Stack gap={1} className="mb-3" role="status" aria-live="polite">
+ <Stack gap={1} className="mb-3">
{isRunning && (
<>
<HStack gap={1} justify="between" className="text-xs">
- <HStack gap={1} className="text-primary">
+ <HStack gap={1} className="text-primary" role="status" aria-live="polite">
<Loader2 className="size-3 animate-spin" />
…
</HStack>
…
</HStack>
{elapsed && (
- <Text as="div" variant="caption">
+ <Text as="div" variant="caption" aria-hidden="true">
{t('executionElapsed', { duration: elapsed })}
</Text>
)}
</>
)}Also applies to: 279-283
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@services/platform/app/features/chat/components/workflow-run-approval-card.tsx`
at line 243, The live region is currently applied to the entire Stack
(role="status") causing the ticking timer to re-announce; move role="status"
from the Stack wrapper to only the element that renders the step/status text
(e.g., the status label node), and either remove live attributes from the
elapsed timer or mark the timer element as non-live (aria-hidden="true" or
aria-live="off") so the timer updates on lines ~279-283 do not trigger screen
reader announcements; update the JSX so only status changes are in the live
region and the timer is excluded.
| {executionStatus?.currentStepName | ||
| ? executionStatus.loopProgress | ||
| ? t('executionRunningStepWithProgress', { | ||
| step: executionStatus.currentStepName, | ||
| current: executionStatus.loopProgress.current, | ||
| total: executionStatus.loopProgress.total, | ||
| }) | ||
| : t('executionRunningStep', { | ||
| step: executionStatus.currentStepName, | ||
| }) | ||
| : t('executionRunning')} |
There was a problem hiding this comment.
Keep currentStepSlug as the fallback label.
This now drops straight to executionRunning when currentStepName is missing, even though services/platform/convex/wf_executions/queries.ts still returns currentStepSlug. Older in-flight executions and rolling-deploy stragglers will lose the step label entirely instead of falling back to the existing slug.
🔁 Suggested fallback
- {executionStatus?.currentStepName
- ? executionStatus.loopProgress
+ {(() => {
+ const stepLabel =
+ executionStatus?.currentStepName ??
+ executionStatus?.currentStepSlug;
+ return stepLabel
+ ? executionStatus.loopProgress
? t('executionRunningStepWithProgress', {
- step: executionStatus.currentStepName,
+ step: stepLabel,
current: executionStatus.loopProgress.current,
total: executionStatus.loopProgress.total,
})
: t('executionRunningStep', {
- step: executionStatus.currentStepName,
+ step: stepLabel,
})
- : t('executionRunning')}
+ : t('executionRunning');
+ })()}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@services/platform/app/features/chat/components/workflow-run-approval-card.tsx`
around lines 249 - 259, The UI currently falls back to t('executionRunning')
when executionStatus.currentStepName is absent, which hides the existing
executionStatus.currentStepSlug; update the rendering logic in the
WorkflowRunApprovalCard component to use executionStatus.currentStepName ||
executionStatus.currentStepSlug as the step label (preserving the loopProgress
branch and using the same i18n keys t('executionRunningStepWithProgress') and
t('executionRunningStep')), so that when currentStepName is missing the code
displays currentStepSlug instead of the generic executionRunning text.
| // 4. Update current step for real-time progress tracking (after variable init so loop context is available) | ||
| const loopProgress = getActiveLoopProgress(fullVariables.loop); | ||
|
|
||
| await ctx.runMutation( | ||
| internal.wf_executions.internal_mutations.updateExecutionStatus, | ||
| { | ||
| executionId: toId<'wfExecutions'>(args.executionId), | ||
| status: 'running', | ||
| currentStepSlug: args.stepSlug, | ||
| currentStepName: args.stepName || args.stepSlug, | ||
| loopProgress, | ||
| }, | ||
| ); |
There was a problem hiding this comment.
Don't make progress bookkeeping a hard prerequisite for running the step.
If this new updateExecutionStatus mutation throws, handleExecuteStep exits before executeStepByType runs. That means a failure in the visibility/tracking path can now stall the workflow itself.
🩹 Suggested fix
- await ctx.runMutation(
- internal.wf_executions.internal_mutations.updateExecutionStatus,
- {
- executionId: toId<'wfExecutions'>(args.executionId),
- status: 'running',
- currentStepSlug: args.stepSlug,
- currentStepName: args.stepName || args.stepSlug,
- loopProgress,
- },
- );
+ try {
+ await ctx.runMutation(
+ internal.wf_executions.internal_mutations.updateExecutionStatus,
+ {
+ executionId: toId<'wfExecutions'>(args.executionId),
+ status: 'running',
+ currentStepSlug: args.stepSlug,
+ currentStepName: args.stepName || args.stepSlug,
+ loopProgress,
+ },
+ );
+ } catch (error) {
+ debugLog('Failed to update execution progress', {
+ executionId: args.executionId,
+ stepSlug: args.stepSlug,
+ error,
+ });
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@services/platform/convex/workflow_engine/helpers/engine/execute_step_handler.ts`
around lines 74 - 86, The current call to
ctx.runMutation(internal.wf_executions.internal_mutations.updateExecutionStatus,
...) makes progress bookkeeping a hard prerequisite for running the step—if it
throws, handleExecuteStep returns early and executeStepByType never runs; change
the flow so updateExecutionStatus errors are non-fatal: wrap the call to
getActiveLoopProgress and
ctx.runMutation(internal.wf_executions.internal_mutations.updateExecutionStatus,
...) in a try/catch, log/report the error but do not rethrow, and then proceed
to call executeStepByType; ensure you still pass loopProgress (if available) and
keep the same mutation params so behavior is preserved when it succeeds.
| if (!isLoopProgress(state)) return null; | ||
| if (isRecord(state) && state.isComplete === true) return null; | ||
|
|
||
| return { | ||
| current: state.currentIndex + 1, | ||
| total: state.totalItems, |
There was a problem hiding this comment.
Reject impossible active progress states before returning them.
After the structural check, any numeric values are accepted here. Inputs like { currentIndex: 3, totalItems: 2, isComplete: false } or { currentIndex: -1, totalItems: 0 } currently surface impossible progress to the UI. This helper should return null unless the indices are non-negative integers and currentIndex < totalItems.
🧮 Suggested guard
const state = loop.state;
if (!isLoopProgress(state)) return null;
if (isRecord(state) && state.isComplete === true) return null;
+ if (
+ !Number.isInteger(state.currentIndex) ||
+ !Number.isInteger(state.totalItems) ||
+ state.currentIndex < 0 ||
+ state.totalItems <= 0 ||
+ state.currentIndex >= state.totalItems
+ ) {
+ return null;
+ }
return {
current: state.currentIndex + 1,
total: state.totalItems,
};🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@services/platform/convex/workflow_engine/helpers/step_execution/get_active_loop_progress.ts`
around lines 14 - 19, The helper currently returns progress for any numeric
values, allowing impossible states; update the guard in get_active_loop_progress
(the block using isLoopProgress(state) and isRecord(state)) to validate that
state.currentIndex and state.totalItems are integers and non-negative (use
Number.isInteger), that state.totalItems > 0, and that state.currentIndex <
state.totalItems (otherwise return null); only then compute and return {
current: state.currentIndex + 1, total: state.totalItems } to ensure impossible
progress like currentIndex >= totalItems or negative indices do not reach the
UI.
| if (args.currentStepName !== undefined) { | ||
| updates.currentStepName = args.currentStepName; | ||
| } | ||
|
|
||
| if (args.loopProgress !== undefined) { | ||
| updates.loopProgress = args.loopProgress ?? undefined; | ||
| } |
There was a problem hiding this comment.
Clear stale loopProgress on non-running updates.
This only removes progress when callers explicitly send null. Any later completed/failed status update that omits loopProgress will retain the last iteration snapshot on the execution record, so downstream readers can see stale loop state after the run has moved on.
🩹 Suggested fix
if (args.currentStepName !== undefined) {
updates.currentStepName = args.currentStepName;
}
if (args.loopProgress !== undefined) {
updates.loopProgress = args.loopProgress ?? undefined;
+ } else if (args.status !== 'running') {
+ updates.loopProgress = undefined;
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/platform/convex/workflows/executions/update_execution_status.ts`
around lines 32 - 38, In update_execution_status, callers who omit
args.loopProgress leave stale progress on the record; change the logic so that
if args.loopProgress is undefined AND the incoming status (args.status or
args.newStatus used in this function) indicates the run is no longer running
(e.g., !== 'running' or === 'completed'/'failed'), you explicitly clear
updates.loopProgress (set it to undefined/null as your DB patch convention) so
the stored loopProgress is removed; keep the existing branch that writes
updates.loopProgress when args.loopProgress is provided.
Check isComplete before isLoopProgress narrows the type, so the property access is valid on Record<string, unknown>.
Summary
classify_clausesLLM step, inlining change statistics into the summarize promptget_active_loop_progresshelper with unit testsTest plan
get_active_loop_progressunit tests🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Improvements