test(platform): add Playwright E2E suite with mock LLM and CI workflow#1860
Conversation
Stand up the full-app E2E layer for #179 (the Testing Library half already ships via test:ui): - @playwright/test@1.58.2 (pinned to the transitive playwright version) with its own config at services/platform/playwright.config.ts; specs in services/platform/e2e/, separate from all vitest projects. - Auth setup project formalizes the sign-up-endpoint loophole (the first-user restriction is UI-only) into a per-run owner account + org, persisted as storageState + context.json for the specs. - Hermetic chat determinism: TALE_CONFIG_DIR fixture (one agent, one provider) pointing at a mock OpenAI-compatible SSE server (e2e/mock-llm/server.ts) — no live LLM, no keys, no cost. The provider key rides the secretsEnv mechanism (#1711); sync-convex-env-from-dotenv now passes TALE_PROVIDER_KEY_* and TALE_ALLOW_PRIVATE_PROVIDER_HOSTS through from process.env so the documented platform-process-env knobs reach the local Convex deployment. - Four v1 smoke specs: login (negative + positive), chat send + streamed reply, governance policy save/persist, automation run to completion. Locators resolve labels from messages/en.json; workers:1, CI retries 2, trace on first retry; throwaway accounts keep the login lockout away from the shared owner. - .github/workflows/e2e.yml: PR runs (advisory during burn-in via continue-on-error) + strict nightly + workflow_dispatch; uploads the Playwright report on failure. - Rename the vitest browser-e2e project/script to browser so "e2e" unambiguously means the Playwright suite.
📝 WalkthroughWalkthroughThis PR establishes a comprehensive Playwright-based end-to-end test suite for the platform. It introduces a hermetic mock LLM HTTP server (Bun-based, OpenAI-compatible) that serves deterministic responses to ensure test reproducibility. The suite includes four smoke-test specs covering authentication flows, chat messaging with streamed responses, automation workflow execution, and governance settings persistence. Support infrastructure includes helper utilities for auth (sign-up via API), i18n label resolution, and test context management; a Playwright setup task that creates fresh accounts and organizations; and detailed fixtures for the mock provider, chat agent, and test workflow. The configuration integrates with the monorepo via Turbo, updates CI workflows to run E2E tests on PR changes and nightly schedules, and manages webServer orchestration within Playwright's config to coordinate the mock LLM and dev servers. Documentation outlines the determinism model, local debugging, state hygiene, and idempotency expectations. Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning Billing warning: we have not been able to collect payment for this subscription for more than 72 hours. Please update the payment method or pay any pending invoices in Billing to avoid service interruption. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Root-cause fixes for the Playwright suite this PR introduces:
- auth: getByLabel('Password') matched both the input and the
"Show password" toggle (substring match) → strict-mode violation.
Match email/password labels with { exact: true }.
- chat: the fixture agent was named assistant.json, but the composer's
default selection pins DEFAULT_CHAT_AGENT_SLUG ('chat-agent'), so the
turn looked up a missing chat-agent.json and never streamed. Rename the
fixture to chat-agent.json so the default agent resolves to the mock.
- chat: the draft key (chat-draft-<userId>-…) flips once auth settles,
re-seeding the controlled composer and dropping characters typed before
the flip → a truncated user bubble. Clear+type under toPass until the
value sticks so the full message is always sent.
- automation: config-dir workflows ship as uninstalled templates and the
automations list only shows installed ones. Install the seeded `test`
template through the create-automation menu before running it.
…tion Two fixes to e2e/specs/automation.spec.ts: - Scope the completion assertion to the tester side panel (role=complementary, labelled "Test automation"). After PRs #1868/#1870 the canvas now renders a "viewing run" role=status banner whose "Completed" label collides with the tester result's, making a bare getByRole('status') a strict-mode violation (2 elements). The banner lives on the canvas, outside the panel, so the complementary scope is unambiguous. - Make reaching the editor idempotent across retries. The first attempt installs the `test` template, which removes it from the "From template" picker; a retry that still went through the install flow hung on a missing button. Now: if `test` is already installed, open its list row directly; otherwise install via template. Wait for the table to settle (row or empty state) before the branch so a mid-load read can't wrongly pick install.
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
services/platform/scripts/sync-convex-env-from-dotenv.ts (1)
123-126:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse key-presence checks, not falsy checks, for dotenv precedence.
Line 124 and Line 130 currently use
!merged[key]. That treats an explicitly defined empty dotenv value as “missing”, soprocess.envcan repopulate sensitive passthrough keys (for exampleTALE_PROVIDER_KEY_*andTALE_ALLOW_PRIVATE_PROVIDER_HOSTS) against the documented precedence intent.Suggested fix
for (const key of ORCHESTRATOR_MANAGED_KEYS) { - if (!merged[key] && process.env[key]) { + if (!(key in merged) && process.env[key]) { merged[key] = process.env[key]; } } for (const [key, value] of Object.entries(process.env)) { - if (!merged[key] && value && isPassthroughKey(key)) { + if (!(key in merged) && value && isPassthroughKey(key)) { merged[key] = value; } }Also applies to: 129-133
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@services/platform/scripts/sync-convex-env-from-dotenv.ts` around lines 123 - 126, The current checks use falsy evaluation (!merged[key]) which treats intentionally empty dotenv values as missing; change these to explicit presence checks so empty strings are respected. In the loops that iterate ORCHESTRATOR_MANAGED_KEYS and other similar blocks, replace the `!merged[key]` condition with an explicit existence check such as `!(key in merged)` or `merged[key] === undefined` (or `Object.prototype.hasOwnProperty.call(merged, key) === false`) so that keys set to "" in merged are not overridden by process.env; update all occurrences referenced around ORCHESTRATOR_MANAGED_KEYS/merged/process.env in this file accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/e2e.yml:
- Around line 46-47: The Checkout step that uses actions/checkout@... currently
leaves persisted Git credentials enabled; update the "Checkout" step (the step
with name "Checkout" and uses: actions/checkout@de0fac2e... ) to include
persist-credentials: false so the GITHUB_TOKEN is not stored in local git config
for subsequent steps—add the persist-credentials: false key under that step in
the workflow.
In `@services/platform/e2e/fixtures/config/default/providers/e2e-mock.json`:
- Line 4: The fixture hard-codes "baseUrl": "http://127.0.0.1:4141/v1" which
will break when E2E_MOCK_LLM_PORT is overridden; update the provider fixture to
derive its port from the same override instead of a literal 4141 — either (A)
change the fixture generation or loader to read process.env.E2E_MOCK_LLM_PORT
(or the Playwright config value) and build baseUrl dynamically (e.g., using
127.0.0.1:{port}/v1) so the mock server orchestration and provider calls stay in
sync, or (B) if the JSON format cannot be interpolated, remove the port override
from the TS/Playwright configs and enforce a single fixed port everywhere;
ensure you touch the "baseUrl" value and any Playwright/mock server
orchestration references to E2E_MOCK_LLM_PORT so they align.
In `@services/platform/e2e/mock-llm/server.ts`:
- Around line 33-35: Replace the use of unknown in the request/body flow by
defining a narrow union or explicit object type for incoming payloads and update
the type predicate (the function returning "value is ChatCompletionRequest") to
validate fields via precise type guards instead of "typeof value === 'object'".
Locate the predicate function and any serializers/parsers used for chunk
handling (referenced around the predicate at lines ~33-35, other request
handling at ~65-66, and chunk serialization at ~136-143) and add explicit
runtime checks for required properties and shapes, refine parameter types from
unknown to the new union/object type, and propagate those types through the
parsing/serialization code so the whole path is type-safe and free of
unknown/any.
In `@services/platform/e2e/README.md`:
- Line 35: Update the README entry that currently instructs to run "bunx
playwright show-report" so it works from the repo root: either change the
command to include the report path (e.g., run "bunx playwright show-report
services/platform/playwright-report") or explicitly instruct readers to run the
command from the services/platform directory; update the line referencing "bunx
playwright show-report" accordingly to point to
"services/platform/playwright-report".
In `@services/platform/e2e/specs/automation.spec.ts`:
- Line 28: The test uses a hardcoded English workflow name literal in the
locator: replace the direct 'test' string in the page.getByRole cell filter call
with the shared translation/label source or a fixture constant used by the suite
(e.g., import the workflow name from the test fixtures or translation helper and
use that variable instead of 'test'); update both occurrences (the filter at the
shown call and the other occurrence referenced at 53) so the locator reads from
the centralized label/translation constant to preserve i18n and rename safety.
In `@services/platform/playwright.config.ts`:
- Around line 69-79: The webServer entry that starts the mock LLM is always
included and causes failures when E2E_MOCK_LLM=0; change the webServer array
construction to only push/include the mock server object when useMockLlm is true
(the boolean derived from E2E_MOCK_LLM), referencing the existing mockLlmPort
and the webServer array in playwright.config.ts; i.e., gate the mock server
config object behind useMockLlm so real-stack runs omit the mock and avoid
port/startup conflicts.
---
Outside diff comments:
In `@services/platform/scripts/sync-convex-env-from-dotenv.ts`:
- Around line 123-126: The current checks use falsy evaluation (!merged[key])
which treats intentionally empty dotenv values as missing; change these to
explicit presence checks so empty strings are respected. In the loops that
iterate ORCHESTRATOR_MANAGED_KEYS and other similar blocks, replace the
`!merged[key]` condition with an explicit existence check such as `!(key in
merged)` or `merged[key] === undefined` (or
`Object.prototype.hasOwnProperty.call(merged, key) === false`) so that keys set
to "" in merged are not overridden by process.env; update all occurrences
referenced around ORCHESTRATOR_MANAGED_KEYS/merged/process.env in this file
accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 797876b2-e8fc-4a8c-a0be-312ec5f88428
⛔ Files ignored due to path filters (1)
bun.lockis excluded by!**/*.lock
📒 Files selected for processing (23)
.github/workflows/e2e.ymlknip.config.tspackage.jsonservices/platform/.gitignoreservices/platform/e2e/README.mdservices/platform/e2e/fixtures/config/default/agents/chat-agent.jsonservices/platform/e2e/fixtures/config/default/providers/e2e-mock.jsonservices/platform/e2e/fixtures/config/default/workflows/test.jsonservices/platform/e2e/helpers/auth.tsservices/platform/e2e/helpers/i18n.tsservices/platform/e2e/helpers/test-context.tsservices/platform/e2e/mock-llm/canned.tsservices/platform/e2e/mock-llm/server.tsservices/platform/e2e/setup/auth.setup.tsservices/platform/e2e/specs/auth.spec.tsservices/platform/e2e/specs/automation.spec.tsservices/platform/e2e/specs/chat.spec.tsservices/platform/e2e/specs/governance-settings.spec.tsservices/platform/package.jsonservices/platform/playwright.config.tsservices/platform/scripts/sync-convex-env-from-dotenv.tsservices/platform/vitest.config.tsturbo.json
The e2e job only reads/tests, so it has no need for the GITHUB_TOKEN to linger in the checkout's git config. Set persist-credentials: false.
…nown The provider fixture's baseUrl (providers/e2e-mock.json) is loaded verbatim into Convex and cannot interpolate env, so an E2E_MOCK_LLM_PORT override moved the mock server and Playwright health-check off the port the provider still pointed at — provider calls would then miss the mock. Drop the override and make 4141 the single source of truth shared by the mock server, the Playwright webServer, and the fixture baseUrl. Also gate the mock-LLM webServer entry behind the existing useMockLlm flag so E2E_MOCK_LLM=0 (real-stack mode) no longer boots the mock and risks a port conflict. Replace the unknown-typed request/chunk handling with a JsonValue union plus narrow type guards (toChatCompletionRequest, ChatCompletionChunk), satisfying the repo's no-unknown rule with identical runtime behaviour.
…w name Point bunx playwright show-report at services/platform/playwright-report (the report dir relative to repo root). Hoist the seeded fixture workflow's name into a single WORKFLOW_NAME constant in the automation spec so the literal isn't repeated across locators.
What / why
Stands up the full-app E2E layer for the platform (Part of #179 — the Testing Library half of the issue already ships via
test:ui). Single PR per the agreed plan:@playwright/test@1.58.2(pinned to the version@vitest/browser-playwrightalready pulls transitively — one browser build, no lockfile fork) with its own config atservices/platform/playwright.config.tsand specs underservices/platform/e2e/— deliberately separate from every vitest project (issue-thread consensus: separate configs, separate CI lanes, trace viewer for debugging).e2e/setup/auth.setup.ts): formalizes the sign-up-endpoint loophole (POST /api/auth/sign-up/emailaccepts new users; the first-user restriction lives only in the sign-up UI) into a per-run owner account + organization, persisted as PlaywrightstorageState(e2e/.auth/owner.json) + org id (e2e/.auth/context.json). Handles both landing paths: fresh-instance auto-createddefaultorg (CI) and the create-organization form (local re-runs).e2e/mock-llm/server.ts, Bun, port 4141) +TALE_CONFIG_DIRfixture (e2e/fixtures/config/default/: one chat agent, one provider whosebaseUrltargets the mock, the trivialtestworkflow). Chat assertions check a canned streamed reply — no live LLM, no keys, no cost. The provider key rides the existingsecretsEnvmechanism (Feature: [Nuvolos, based on release 0.2.67] env-var fallback for provider API keys when SOPS / *.secrets.json is unavailable #1711).webServer[]: entry 1 = mock LLM, entry 2 =bun scripts/dev.ts(untouched).reuseExistingServeroutside CI so devs can point the suite at a running stack (E2E_MOCK_LLM=0documented ine2e/README.md).completed. Locators resolve labels frommessages/en.json(no hardcoded UI strings);workers: 1, CIretries: 2,trace: on-first-retry, throwaway accounts keep the login lockout away from the shared owner; nothing ever wipes state..github/workflows/e2e.yml:pull_request(paths-filtered, advisory during burn-in viacontinue-on-error, skips drafts) + nightlyschedule(strict) +workflow_dispatch; Playwright-browser and Convex-binary caches; uploadsplaywright-report/+test-results/on failure. Not wired intobun run check, the turbotesttask, or branch protection.browser-e2e→browser(test:browser-e2e→test:browser) so "e2e" unambiguously means the Playwright suite.scripts/sync-convex-env-from-dotenv.ts: now passesTALE_PROVIDER_KEY_*(reserved prefix, Feature: [Nuvolos, based on release 0.2.67] env-var fallback for provider API keys when SOPS / *.secrets.json is unavailable #1711) andTALE_ALLOW_PRIVATE_PROVIDER_HOSTSthrough fromprocess.envto the local Convex deployment. Both are documented "platform process env" knobs that previously only worked via dotenv files; the E2E webServer env (and any dev exporting them in the shell) relies on this.Decision-gate recommendations baked in
continue-on-error) and the check is not required; nightly is the strict signal. Promote to a required check after ~2 weeks of stability.test:browser-e2e→test:browserrename done.Deviations from the plan
@playwright/testis declared inservices/platformdevDependencies (where it's used; keeps knip ownership clean) instead of root — same pinned version, same single lockfile entry.secretsEnvenv-var key source: the org scaffold deliberately never copies*.secrets.jsoninto new orgs, so a secrets file would only work for thedefaultorg. The env var works for every org and is the supported mechanism post-feat(platform): env-var source for provider API keys (#1711) #1832.helpers/selectors.tsaria-busy drain helper: the specs use role-based auto-waiting locators instead, which is the more idiomatic (and less flaky) Playwright pattern.main—examples/default/providers/openrouter.secrets.jsonno longer exists (replaced bysecretsEnvin feat(platform): env-var source for provider API keys (#1711) #1832). No action was needed or taken here.Verification performed
bun run checkat the repo root — green (format, lint, typecheck, all tests; 71 900 platform tests pass).bun run knip:check— clean (mock server registered as an entry).bunx playwright test --list— config + project wiring resolves all 6 tests (setup + 5 specs)./health, non-streaming JSON completion, and SSE streaming all correct.bun run --filter @tale/platform test:browser— renamed vitest browser project still passes.git check-ignore:.auth/, scaffolded fixture org dirs ignored;default/template tracked.Manual verification required
bun run --filter @tale/platform test:e2erun could not be executed on the development machine: the local dev stack was live on ports 3000/3210/3211, and running the suite against it would mutate real dev state (and a second stack can't bind the Convex ports). The E2E workflow on this very PR is the intended verification — please check its run (it is advisory, so a red run won't block, but it should be green before merge).E2E_MOCK_LLM=0 bun run --filter @tale/platform test:e2e) to confirm thereuseExistingServer+ live-stack path, pere2e/README.md.Pre-PR checklist
bun run check(format, lint, typecheck, all tests).h-[…]on skeletons — N/A (no UI components touched).services/platform/messages/{en,de,fr}.json— N/A (no keys added/changed; specs only readen.json)./docs/{en,de,fr}/— N/A (internal test infrastructure; no user-visible behavior change. The sync-script passthrough only makes the already-documentedTALE_PROVIDER_KEY_*/TALE_ALLOW_PRIVATE_PROVIDER_HOSTSdeployment knobs effective in localbun run dev).bun run --filter @tale/docs lint/test— N/A (no docs changes).README.md/README.de.md/README.fr.md— N/A.Part of #179.
Summary by CodeRabbit
Release Notes
New Features
Chores