Skip to content

test(platform): add Playwright E2E suite with mock LLM and CI workflow#1860

Merged
yannickmonney merged 6 commits into
mainfrom
feat/platform-playwright-e2e
Jun 11, 2026
Merged

test(platform): add Playwright E2E suite with mock LLM and CI workflow#1860
yannickmonney merged 6 commits into
mainfrom
feat/platform-playwright-e2e

Conversation

@yannickmonney

@yannickmonney yannickmonney commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

What / why

Stands up the full-app E2E layer for the platform (Part of #179 — the Testing Library half of the issue already ships via test:ui). Single PR per the agreed plan:

  • @playwright/test@1.58.2 (pinned to the version @vitest/browser-playwright already pulls transitively — one browser build, no lockfile fork) with its own config at services/platform/playwright.config.ts and specs under services/platform/e2e/ — deliberately separate from every vitest project (issue-thread consensus: separate configs, separate CI lanes, trace viewer for debugging).
  • Auth setup project (e2e/setup/auth.setup.ts): formalizes the sign-up-endpoint loophole (POST /api/auth/sign-up/email accepts new users; the first-user restriction lives only in the sign-up UI) into a per-run owner account + organization, persisted as Playwright storageState (e2e/.auth/owner.json) + org id (e2e/.auth/context.json). Handles both landing paths: fresh-instance auto-created default org (CI) and the create-organization form (local re-runs).
  • Mock OpenAI-compatible SSE server (e2e/mock-llm/server.ts, Bun, port 4141) + TALE_CONFIG_DIR fixture (e2e/fixtures/config/default/: one chat agent, one provider whose baseUrl targets the mock, the trivial test workflow). Chat assertions check a canned streamed reply — no live LLM, no keys, no cost. The provider key rides the existing secretsEnv mechanism (Feature: [Nuvolos, based on release 0.2.67] env-var fallback for provider API keys when SOPS / *.secrets.json is unavailable #1711).
  • Stack orchestration via Playwright webServer[]: entry 1 = mock LLM, entry 2 = bun scripts/dev.ts (untouched). reuseExistingServer outside CI so devs can point the suite at a running stack (E2E_MOCK_LLM=0 documented in e2e/README.md).
  • Four v1 smoke specs: login (wrong-password error + successful login), chat send + streamed reply, governance policy save → reload → persist (voice-output toggle, restored afterwards), automation run to completed. Locators resolve labels from messages/en.json (no hardcoded UI strings); workers: 1, CI retries: 2, trace: on-first-retry, throwaway accounts keep the login lockout away from the shared owner; nothing ever wipes state.
  • .github/workflows/e2e.yml: pull_request (paths-filtered, advisory during burn-in via continue-on-error, skips drafts) + nightly schedule (strict) + workflow_dispatch; Playwright-browser and Convex-binary caches; uploads playwright-report/ + test-results/ on failure. Not wired into bun run check, the turbo test task, or branch protection.
  • Rename: vitest project browser-e2ebrowser (test:browser-e2etest:browser) so "e2e" unambiguously means the Playwright suite.
  • scripts/sync-convex-env-from-dotenv.ts: now passes TALE_PROVIDER_KEY_* (reserved prefix, Feature: [Nuvolos, based on release 0.2.67] env-var fallback for provider API keys when SOPS / *.secrets.json is unavailable #1711) and TALE_ALLOW_PRIVATE_PROVIDER_HOSTS through from process.env to the local Convex deployment. Both are documented "platform process env" knobs that previously only worked via dotenv files; the E2E webServer env (and any dev exporting them in the shell) relies on this.

Decision-gate recommendations baked in

  • v1 flow set as planned: login, chat send+stream, governance save, automation run; document upload+index and SSO redirect deferred to v2.
  • Non-required burn-in first: the PR job is advisory (continue-on-error) and the check is not required; nightly is the strict signal. Promote to a required check after ~2 weeks of stability.
  • Mock LLM for CI (no live-OpenRouter nightly lane for now).
  • test:browser-e2etest:browser rename done.

Deviations from the plan

  • @playwright/test is declared in services/platform devDependencies (where it's used; keeps knip ownership clean) instead of root — same pinned version, same single lockfile entry.
  • The plan's "dummy secrets file" for the fixture provider was replaced by the secretsEnv env-var key source: the org scaffold deliberately never copies *.secrets.json into new orgs, so a secrets file would only work for the default org. The env var works for every org and is the supported mechanism post-feat(platform): env-var source for provider API keys (#1711) #1832.
  • No helpers/selectors.ts aria-busy drain helper: the specs use role-based auto-waiting locators instead, which is the more idiomatic (and less flaky) Playwright pattern.
  • The plan's security finding about a committed OpenRouter key is already resolved on mainexamples/default/providers/openrouter.secrets.json no longer exists (replaced by secretsEnv in feat(platform): env-var source for provider API keys (#1711) #1832). No action was needed or taken here.

Verification performed

  • bun run check at the repo root — green (format, lint, typecheck, all tests; 71 900 platform tests pass).
  • bun run knip:check — clean (mock server registered as an entry).
  • bunx playwright test --list — config + project wiring resolves all 6 tests (setup + 5 specs).
  • Mock LLM server smoke-tested directly: /health, non-streaming JSON completion, and SSE streaming all correct.
  • bun run --filter @tale/platform test:browser — renamed vitest browser project still passes.
  • gitignore rules verified with git check-ignore: .auth/, scaffolded fixture org dirs ignored; default/ template tracked.

Manual verification required

  • A full bun run --filter @tale/platform test:e2e run could not be executed on the development machine: the local dev stack was live on ports 3000/3210/3211, and running the suite against it would mutate real dev state (and a second stack can't bind the Convex ports). The E2E workflow on this very PR is the intended verification — please check its run (it is advisory, so a red run won't block, but it should be green before merge).
  • Locally re-run with a dev stack up (E2E_MOCK_LLM=0 bun run --filter @tale/platform test:e2e) to confirm the reuseExistingServer + live-stack path, per e2e/README.md.

Pre-PR checklist

  • Ran bun run check (format, lint, typecheck, all tests).
  • No hand-rolled skeletons or magic h-[…] on skeletons — N/A (no UI components touched).
  • Updated services/platform/messages/{en,de,fr}.json — N/A (no keys added/changed; specs only read en.json).
  • Updated /docs/{en,de,fr}/ — N/A (internal test infrastructure; no user-visible behavior change. The sync-script passthrough only makes the already-documented TALE_PROVIDER_KEY_* / TALE_ALLOW_PRIVATE_PROVIDER_HOSTS deployment knobs effective in local bun run dev).
  • Ran bun run --filter @tale/docs lint/test — N/A (no docs changes).
  • Updated README.md / README.de.md / README.fr.md — N/A.

Part of #179.

Summary by CodeRabbit

Release Notes

  • New Features

    • Established end-to-end testing infrastructure with automated test coverage for authentication, chat messaging, automation workflows, and governance settings.
    • Added mock LLM server for deterministic and reproducible E2E testing.
    • Configured CI pipeline to run E2E tests on pull requests and scheduled intervals.
  • Chores

    • Added Playwright testing framework and related dependencies.

Stand up the full-app E2E layer for #179 (the Testing Library half already
ships via test:ui):

- @playwright/test@1.58.2 (pinned to the transitive playwright version)
  with its own config at services/platform/playwright.config.ts; specs in
  services/platform/e2e/, separate from all vitest projects.
- Auth setup project formalizes the sign-up-endpoint loophole (the
  first-user restriction is UI-only) into a per-run owner account + org,
  persisted as storageState + context.json for the specs.
- Hermetic chat determinism: TALE_CONFIG_DIR fixture (one agent, one
  provider) pointing at a mock OpenAI-compatible SSE server
  (e2e/mock-llm/server.ts) — no live LLM, no keys, no cost. The provider
  key rides the secretsEnv mechanism (#1711); sync-convex-env-from-dotenv
  now passes TALE_PROVIDER_KEY_* and TALE_ALLOW_PRIVATE_PROVIDER_HOSTS
  through from process.env so the documented platform-process-env knobs
  reach the local Convex deployment.
- Four v1 smoke specs: login (negative + positive), chat send + streamed
  reply, governance policy save/persist, automation run to completion.
  Locators resolve labels from messages/en.json; workers:1, CI retries 2,
  trace on first retry; throwaway accounts keep the login lockout away
  from the shared owner.
- .github/workflows/e2e.yml: PR runs (advisory during burn-in via
  continue-on-error) + strict nightly + workflow_dispatch; uploads the
  Playwright report on failure.
- Rename the vitest browser-e2e project/script to browser so "e2e"
  unambiguously means the Playwright suite.
@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR establishes a comprehensive Playwright-based end-to-end test suite for the platform. It introduces a hermetic mock LLM HTTP server (Bun-based, OpenAI-compatible) that serves deterministic responses to ensure test reproducibility. The suite includes four smoke-test specs covering authentication flows, chat messaging with streamed responses, automation workflow execution, and governance settings persistence. Support infrastructure includes helper utilities for auth (sign-up via API), i18n label resolution, and test context management; a Playwright setup task that creates fresh accounts and organizations; and detailed fixtures for the mock provider, chat agent, and test workflow. The configuration integrates with the monorepo via Turbo, updates CI workflows to run E2E tests on PR changes and nightly schedules, and manages webServer orchestration within Playwright's config to coordinate the mock LLM and dev servers. Documentation outlines the determinism model, local debugging, state hygiene, and idempotency expectations.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: adding a Playwright E2E suite with mock LLM support and CI workflow integration.
Description check ✅ Passed The description is comprehensive, well-organized, and covers the summary, pre-merge checklist completion, test plan, decision rationale, deviations, verification performed, and manual verification needed.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/platform-playwright-e2e

Warning

Billing warning: we have not been able to collect payment for this subscription for more than 72 hours. Please update the payment method or pay any pending invoices in Billing to avoid service interruption.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Root-cause fixes for the Playwright suite this PR introduces:

- auth: getByLabel('Password') matched both the input and the
  "Show password" toggle (substring match) → strict-mode violation.
  Match email/password labels with { exact: true }.
- chat: the fixture agent was named assistant.json, but the composer's
  default selection pins DEFAULT_CHAT_AGENT_SLUG ('chat-agent'), so the
  turn looked up a missing chat-agent.json and never streamed. Rename the
  fixture to chat-agent.json so the default agent resolves to the mock.
- chat: the draft key (chat-draft-<userId>-…) flips once auth settles,
  re-seeding the controlled composer and dropping characters typed before
  the flip → a truncated user bubble. Clear+type under toPass until the
  value sticks so the full message is always sent.
- automation: config-dir workflows ship as uninstalled templates and the
  automations list only shows installed ones. Install the seeded `test`
  template through the create-automation menu before running it.
…tion

Two fixes to e2e/specs/automation.spec.ts:

- Scope the completion assertion to the tester side panel
  (role=complementary, labelled "Test automation"). After PRs #1868/#1870
  the canvas now renders a "viewing run" role=status banner whose "Completed"
  label collides with the tester result's, making a bare getByRole('status')
  a strict-mode violation (2 elements). The banner lives on the canvas,
  outside the panel, so the complementary scope is unambiguous.

- Make reaching the editor idempotent across retries. The first attempt
  installs the `test` template, which removes it from the "From template"
  picker; a retry that still went through the install flow hung on a missing
  button. Now: if `test` is already installed, open its list row directly;
  otherwise install via template. Wait for the table to settle (row or empty
  state) before the branch so a mid-load read can't wrongly pick install.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
services/platform/scripts/sync-convex-env-from-dotenv.ts (1)

123-126: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use key-presence checks, not falsy checks, for dotenv precedence.

Line 124 and Line 130 currently use !merged[key]. That treats an explicitly defined empty dotenv value as “missing”, so process.env can repopulate sensitive passthrough keys (for example TALE_PROVIDER_KEY_* and TALE_ALLOW_PRIVATE_PROVIDER_HOSTS) against the documented precedence intent.

Suggested fix
   for (const key of ORCHESTRATOR_MANAGED_KEYS) {
-    if (!merged[key] && process.env[key]) {
+    if (!(key in merged) && process.env[key]) {
       merged[key] = process.env[key];
     }
   }

   for (const [key, value] of Object.entries(process.env)) {
-    if (!merged[key] && value && isPassthroughKey(key)) {
+    if (!(key in merged) && value && isPassthroughKey(key)) {
       merged[key] = value;
     }
   }

Also applies to: 129-133

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/platform/scripts/sync-convex-env-from-dotenv.ts` around lines 123 -
126, The current checks use falsy evaluation (!merged[key]) which treats
intentionally empty dotenv values as missing; change these to explicit presence
checks so empty strings are respected. In the loops that iterate
ORCHESTRATOR_MANAGED_KEYS and other similar blocks, replace the `!merged[key]`
condition with an explicit existence check such as `!(key in merged)` or
`merged[key] === undefined` (or `Object.prototype.hasOwnProperty.call(merged,
key) === false`) so that keys set to "" in merged are not overridden by
process.env; update all occurrences referenced around
ORCHESTRATOR_MANAGED_KEYS/merged/process.env in this file accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/e2e.yml:
- Around line 46-47: The Checkout step that uses actions/checkout@... currently
leaves persisted Git credentials enabled; update the "Checkout" step (the step
with name "Checkout" and uses: actions/checkout@de0fac2e... ) to include
persist-credentials: false so the GITHUB_TOKEN is not stored in local git config
for subsequent steps—add the persist-credentials: false key under that step in
the workflow.

In `@services/platform/e2e/fixtures/config/default/providers/e2e-mock.json`:
- Line 4: The fixture hard-codes "baseUrl": "http://127.0.0.1:4141/v1" which
will break when E2E_MOCK_LLM_PORT is overridden; update the provider fixture to
derive its port from the same override instead of a literal 4141 — either (A)
change the fixture generation or loader to read process.env.E2E_MOCK_LLM_PORT
(or the Playwright config value) and build baseUrl dynamically (e.g., using
127.0.0.1:{port}/v1) so the mock server orchestration and provider calls stay in
sync, or (B) if the JSON format cannot be interpolated, remove the port override
from the TS/Playwright configs and enforce a single fixed port everywhere;
ensure you touch the "baseUrl" value and any Playwright/mock server
orchestration references to E2E_MOCK_LLM_PORT so they align.

In `@services/platform/e2e/mock-llm/server.ts`:
- Around line 33-35: Replace the use of unknown in the request/body flow by
defining a narrow union or explicit object type for incoming payloads and update
the type predicate (the function returning "value is ChatCompletionRequest") to
validate fields via precise type guards instead of "typeof value === 'object'".
Locate the predicate function and any serializers/parsers used for chunk
handling (referenced around the predicate at lines ~33-35, other request
handling at ~65-66, and chunk serialization at ~136-143) and add explicit
runtime checks for required properties and shapes, refine parameter types from
unknown to the new union/object type, and propagate those types through the
parsing/serialization code so the whole path is type-safe and free of
unknown/any.

In `@services/platform/e2e/README.md`:
- Line 35: Update the README entry that currently instructs to run "bunx
playwright show-report" so it works from the repo root: either change the
command to include the report path (e.g., run "bunx playwright show-report
services/platform/playwright-report") or explicitly instruct readers to run the
command from the services/platform directory; update the line referencing "bunx
playwright show-report" accordingly to point to
"services/platform/playwright-report".

In `@services/platform/e2e/specs/automation.spec.ts`:
- Line 28: The test uses a hardcoded English workflow name literal in the
locator: replace the direct 'test' string in the page.getByRole cell filter call
with the shared translation/label source or a fixture constant used by the suite
(e.g., import the workflow name from the test fixtures or translation helper and
use that variable instead of 'test'); update both occurrences (the filter at the
shown call and the other occurrence referenced at 53) so the locator reads from
the centralized label/translation constant to preserve i18n and rename safety.

In `@services/platform/playwright.config.ts`:
- Around line 69-79: The webServer entry that starts the mock LLM is always
included and causes failures when E2E_MOCK_LLM=0; change the webServer array
construction to only push/include the mock server object when useMockLlm is true
(the boolean derived from E2E_MOCK_LLM), referencing the existing mockLlmPort
and the webServer array in playwright.config.ts; i.e., gate the mock server
config object behind useMockLlm so real-stack runs omit the mock and avoid
port/startup conflicts.

---

Outside diff comments:
In `@services/platform/scripts/sync-convex-env-from-dotenv.ts`:
- Around line 123-126: The current checks use falsy evaluation (!merged[key])
which treats intentionally empty dotenv values as missing; change these to
explicit presence checks so empty strings are respected. In the loops that
iterate ORCHESTRATOR_MANAGED_KEYS and other similar blocks, replace the
`!merged[key]` condition with an explicit existence check such as `!(key in
merged)` or `merged[key] === undefined` (or
`Object.prototype.hasOwnProperty.call(merged, key) === false`) so that keys set
to "" in merged are not overridden by process.env; update all occurrences
referenced around ORCHESTRATOR_MANAGED_KEYS/merged/process.env in this file
accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 797876b2-e8fc-4a8c-a0be-312ec5f88428

📥 Commits

Reviewing files that changed from the base of the PR and between 067b875 and 587875b.

⛔ Files ignored due to path filters (1)
  • bun.lock is excluded by !**/*.lock
📒 Files selected for processing (23)
  • .github/workflows/e2e.yml
  • knip.config.ts
  • package.json
  • services/platform/.gitignore
  • services/platform/e2e/README.md
  • services/platform/e2e/fixtures/config/default/agents/chat-agent.json
  • services/platform/e2e/fixtures/config/default/providers/e2e-mock.json
  • services/platform/e2e/fixtures/config/default/workflows/test.json
  • services/platform/e2e/helpers/auth.ts
  • services/platform/e2e/helpers/i18n.ts
  • services/platform/e2e/helpers/test-context.ts
  • services/platform/e2e/mock-llm/canned.ts
  • services/platform/e2e/mock-llm/server.ts
  • services/platform/e2e/setup/auth.setup.ts
  • services/platform/e2e/specs/auth.spec.ts
  • services/platform/e2e/specs/automation.spec.ts
  • services/platform/e2e/specs/chat.spec.ts
  • services/platform/e2e/specs/governance-settings.spec.ts
  • services/platform/package.json
  • services/platform/playwright.config.ts
  • services/platform/scripts/sync-convex-env-from-dotenv.ts
  • services/platform/vitest.config.ts
  • turbo.json

Comment thread .github/workflows/e2e.yml
Comment thread services/platform/e2e/mock-llm/server.ts Outdated
Comment thread services/platform/e2e/README.md Outdated
Comment thread services/platform/e2e/specs/automation.spec.ts Outdated
Comment thread services/platform/playwright.config.ts
The e2e job only reads/tests, so it has no need for the GITHUB_TOKEN to
linger in the checkout's git config. Set persist-credentials: false.
…nown

The provider fixture's baseUrl (providers/e2e-mock.json) is loaded
verbatim into Convex and cannot interpolate env, so an E2E_MOCK_LLM_PORT
override moved the mock server and Playwright health-check off the port
the provider still pointed at — provider calls would then miss the mock.
Drop the override and make 4141 the single source of truth shared by the
mock server, the Playwright webServer, and the fixture baseUrl.

Also gate the mock-LLM webServer entry behind the existing useMockLlm
flag so E2E_MOCK_LLM=0 (real-stack mode) no longer boots the mock and
risks a port conflict.

Replace the unknown-typed request/chunk handling with a JsonValue union
plus narrow type guards (toChatCompletionRequest, ChatCompletionChunk),
satisfying the repo's no-unknown rule with identical runtime behaviour.
…w name

Point bunx playwright show-report at services/platform/playwright-report
(the report dir relative to repo root). Hoist the seeded fixture
workflow's name into a single WORKFLOW_NAME constant in the automation
spec so the literal isn't repeated across locators.
@yannickmonney yannickmonney merged commit 5dd150f into main Jun 11, 2026
33 checks passed
@yannickmonney yannickmonney deleted the feat/platform-playwright-e2e branch June 11, 2026 09:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant