Skip to content

Fix flaky Cursor/OpenCode E2E behavior and transcript prep timing#923

Merged
gtrrz-victor merged 10 commits intomainfrom
fix/flaky-e2e-tests
Apr 11, 2026
Merged

Fix flaky Cursor/OpenCode E2E behavior and transcript prep timing#923
gtrrz-victor merged 10 commits intomainfrom
fix/flaky-e2e-tests

Conversation

@pfleidi
Copy link
Copy Markdown
Contributor

@pfleidi pfleidi commented Apr 10, 2026

Summary

  • Re-resolve Cursor transcript paths before waiting and increase transcript preparation wait time to avoid missing nested session file writes.
  • Reduce Cursor interactive test flakiness by broadening ready-state prompt matching and slightly increasing startup/trust-dialog wait windows.
  • Make OpenCode honor per-prompt timeout overrides and raise TestMultiSessionSequential prompt timeouts to reduce timeout-bound failures.
  • Harden a checkpoint path-normalization test on macOS by resolving temp dir symlinks before repo initialization.

Validation

  • go build ./...
  • go vet ./...
  • go test -c -tags e2e ./e2e/tests
  • mise run test:e2e:canary
  • mise run test:e2e --agent cursor-cli "TestInteractive(AttributionMultiCommitSameSession|AttributionOnAgentCommit|ContentOverlapRevertNewFile|MultiStep|ShadowBranchCleanedAfterAgentCommit)$"
  • mise run test:e2e --agent opencode TestMultiSessionSequential

Note

Low Risk
Low risk: changes are limited to transcript polling behavior and test/e2e timeouts/prompt matching, with minimal impact on core business logic. Main risk is masking genuine failures by increasing waits or broadening readiness regexes.

Overview
Hardens Cursor transcript preparation by extending the wait window and making polling respect ctx deadlines/cancellation using a timer-based sleep.

Stabilizes E2E agent runs by broadening Cursor CLI ready-state prompt matching, increasing startup/trust-dialog wait timeouts, and making OpenCode honor WithPromptTimeout over the E2E_TIMEOUT env var.

Reduces test flakiness across platforms by resolving macOS temp-dir symlinks in a checkpoint path-normalization test, and by raising timeouts in TestMultiSessionSequential via per-prompt overrides.

Reviewed by Cursor Bugbot for commit 957579f. Configure here.

pfleidi added 2 commits April 10, 2026 13:48
Match Cursor interactive completion prompts with current UI text and let OpenCode honor per-prompt timeout overrides so the affected interactive tests and multi-session flow are less timing-sensitive.

Entire-Checkpoint: 13ad540ecefa
Resolve Cursor transcript path before polling in prepareTranscriptForState.
Previously, PrepareTranscript was called with the stored flat path
(.../id.jsonl) before re-resolution to the correct nested path
(.../id/id.jsonl), wasting the entire timeout polling a nonexistent file.

Increase Cursor PrepareTranscript timeout from 3s to 5s to provide margin
for slow IDE flushes. Increase Cursor E2E startup timeouts from 30s to 45s
to handle slow trust dialog and initialization.

Fix TestWriteTemporary_PathNormalizationAndSkipping by resolving macOS
/var -> /private/var symlink on temp directories so absolute paths match
git's resolved repo root.

Entire-Checkpoint: 1bd0dcfecd83
Copilot AI review requested due to automatic review settings April 10, 2026 21:26
@pfleidi pfleidi requested a review from a team as a code owner April 10, 2026 21:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces E2E flakiness for Cursor and OpenCode agents and improves transcript preparation reliability in the manual-commit strategy, aiming to avoid missed transcript writes and timeout-bound failures.

Changes:

  • Adds per-prompt timeout overrides in TestMultiSessionSequential and makes OpenCode read those overrides.
  • Broadens Cursor CLI “ready” prompt matching and increases startup/trust wait windows.
  • Re-resolves transcript paths before transcript preparation waits to handle Cursor’s flat→nested relocation behavior; hardens a macOS path normalization test via symlink resolution.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
e2e/tests/multi_session_test.go Passes per-prompt timeout overrides to reduce sequential multi-prompt failures.
e2e/agents/opencode.go Applies per-prompt timeout configuration when running OpenCode prompts.
e2e/agents/cursor_cli.go Expands prompt readiness regex and increases wait windows to reduce Cursor CLI flakiness.
cmd/entire/cli/strategy/common.go Re-resolves transcript paths before preparation waits to avoid polling stale locations.
cmd/entire/cli/checkpoint/checkpoint_test.go Stabilizes macOS path normalization test by resolving temp dir symlinks.
cmd/entire/cli/agent/cursor/cursor.go Increases Cursor transcript preparation wait window.
Comments suppressed due to low confidence (1)

e2e/agents/opencode.go:105

  • In openCodeAgent.RunPrompt, the per-prompt timeout option (cfg.PromptTimeout) is overridden by the E2E_TIMEOUT env var unconditionally. That means callers (like tests) cannot actually override the timeout when E2E_TIMEOUT is set, which conflicts with the intended “per-prompt override” behavior. Consider making the precedence explicit (e.g., cfg.PromptTimeout > env > default, or at least document that env always wins).
	timeout := a.timeout
	if cfg.PromptTimeout > 0 {
		timeout = cfg.PromptTimeout
	}
	if envTimeout := os.Getenv("E2E_TIMEOUT"); envTimeout != "" {
		if parsed, err := time.ParseDuration(envTimeout); err == nil {
			timeout = parsed
		}
	}

@pfleidi
Copy link
Copy Markdown
Contributor Author

pfleidi commented Apr 10, 2026

Bugbot run

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 957579f. Configure here.

computermode
computermode previously approved these changes Apr 10, 2026
Copy link
Copy Markdown
Contributor

@computermode computermode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏🏻 🙏🏻 🙏🏻 🙏🏻 🙏🏻 🙏🏻

When the context deadline is shorter than maxWait, the warning log
now reports the actual timeout used instead of the constant maxWait.

Entire-Checkpoint: 268ae794a8c6
@gtrrz-victor gtrrz-victor enabled auto-merge April 11, 2026 07:02
@gtrrz-victor gtrrz-victor merged commit 80b9a40 into main Apr 11, 2026
9 checks passed
@gtrrz-victor gtrrz-victor deleted the fix/flaky-e2e-tests branch April 11, 2026 07:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants