Skip to content

Fix white-screen-on-login (cross-origin OIDC discovery), plus third-party OAuth sign-in and refresh diagnostics#11693

Merged
nbudin merged 5 commits into
mainfrom
login-troubleshooting-part-2
Jun 14, 2026
Merged

Fix white-screen-on-login (cross-origin OIDC discovery), plus third-party OAuth sign-in and refresh diagnostics#11693
nbudin merged 5 commits into
mainfrom
login-troubleshooting-part-2

Conversation

@nbudin

@nbudin nbudin commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Purpose

Follow-up to the OAuth-callback fix (#11686). This started as "add telemetry so we can finally see the white-screen-on-login," but along the way we reproduced it and found the root cause, so it now actually fixes it — plus a couple of other login problems that surfaced while digging.

The headline: an anonymous visitor hitting a login-required page got a blank screen because the login redirect depends on an OIDC discovery fetch to the issuer host, which is cross-origin (a convention page reaching the root site). When that request is blocked — Brave's shields in production, untrusted self-signed certs in local dev (where it showed up as GET 0 /.well-known/openid-configuration) — initiateAuthentication can't build the redirect URL, and the guard rendered nothing while it waited, with no .catch to surface the failure. Permanent white screen, nothing reported.

The fixes, roughly in order of impact:

  • Stop depending on cross-origin discovery. The SPA only needs the issuer + authorization + end-session endpoints, and token exchange/refresh already go through our own same-origin /oauth_session/*. So we serve those three in /client_configuration (same-origin, already fetched at boot) and build openid-client's Configuration directly instead of calling discovery(). No cross-origin request in the login path anymore.
  • Never render a blank page while redirecting. useLoginRequired now renders a spinner while it redirects, or an error + Retry if initiating auth fails (and reports it), instead of <></>.
  • Capture the failure if it happens anyway. Error reporting is now initialized during boot rather than in an AppRoot effect that never runs when the initial render crashes — which is why this class of failure was invisible before.

Two more login issues fixed along the way:

  • Signing in to a third-party OAuth app (e.g. Cantrip) failed with a CORS error: the sign-in form submitted via fetch() and followed the post-login redirect chain into the relying app's callback, which is cross-origin and got blocked — burning the one-time auth code in the process. Now the server returns the redirect location as JSON and the browser navigates top-level, which isn't subject to CORS and doesn't depend on the relying app's headers.
  • Convention sites logging people out overnight — I couldn't pin the cause from static analysis (timing points at the nightly cleanup cron, but the refresh-token rotation should survive it), so this instruments /oauth_session/refresh to record why it returns invalid_grant. Diagnostic only; we'll follow up once we have a real event.

Changes

💻 Engineer-facing

  • /client_configuration now returns oidc_authorization_endpoint + oidc_end_session_endpoint (built on the issuer host); the SPA constructs the openid-client Configuration from them, with no discovery() call.
  • useLoginRequired returns the element to render while signed out (spinner / error+retry) instead of a bare boolean; the route guard and inline gates render it. It reports failures and fires the redirect only once per mount.
  • SessionsController#create responds to JSON requests with { location } (200) instead of a 302; DeviseSignInPage navigates top-level. safe_sign_in_location preserves the open-redirect guard. Inline sign-in errors (JSONFailureApp) are unchanged.
  • Error-reporting init hoisted into the boot sequence; ErrorReporting#setCurrentUser attaches the user id once AppRootQuery resolves.
  • OAuthSessionsController#refresh reports cookie_absent / token_not_found / grant_rejected via ErrorReporting (filterable tag oauth_refresh_failure); no token material is logged.

Risks

  • The sign-in create change alters the JSON response shape (was a 302, now { location }); the no-JS navigational path still redirects.
  • Building the OIDC Configuration from server-provided metadata instead of discovery means we rely on those three endpoints being correct in /client_configuration. There's a controller test, but worth a sanity check that login still redirects correctly in each environment.

Testing

tsc --noEmit, eslint, and rubocop all pass. New tests: sessions + oauth_sessions + client_configuration controller tests, and vitest coverage for useLoginRequired (signed-in / redirecting / failure paths) and openid (Configuration built from metadata, no fetch). The white screen was also reproduced locally and confirmed to be the discovery status 0.

Release plan and notes

🚢 — note the refresh instrumentation is diagnostic only; expect a small follow-up PR once an overnight oauth_refresh_failure event comes in to tell us why the refresh fails.

🤖 Generated with Claude Code

nbudin and others added 5 commits June 13, 2026 10:02
Error reporting (Sentry/Rollbar) was only initialized inside AppRoot's
useEffect, which runs after AppRoot commits. A render-time crash during
the initial mount — e.g. the Brave white-screen-on-login — happens before
that effect can run, so the SDKs were never set up and the crash was never
reported. That's a large part of why this class of failure has been so hard
to diagnose.

Move initErrorReporting() into the boot sequence in packs/application.tsx,
right after /client_configuration resolves (the earliest the DSN/token are
available) and before the router is built or any React renders. The current
user id isn't known yet at that point, so add ErrorReporting#setCurrentUser
and have AppRoot call it once AppRootQuery resolves, rather than re-running
init (which would install a duplicate set of global handlers and double-report).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…pps)

When signing in to a third-party OAuth relying app, the post-sign-in redirect
chain crosses origin into that app's callback (e.g.
larp-cantrip.herokuapp.com/users/auth/intercode/callback). DeviseSignInPage
submitted the login via fetch() with the default redirect: 'follow', so the
browser walked that chain as subresource (CORS-mode) requests. The cross-origin
hop into the relying app's callback was CORS-blocked — and the one-time
authorization code was burned in the process — leaving the user on the sign-in
page with "An error occurred. Please try again."

Make the post-sign-in redirect a top-level browser navigation instead, which is
not subject to CORS and works regardless of the relying app's headers:

- SessionsController#create responds to JSON requests with { location: ... }
  and 200 instead of a 302. Navigational (no-JS) requests still redirect as
  before. safe_sign_in_location applies the same trusted_origin? guard the
  redirect used, so the JSON path can't become an open redirect.
- DeviseSignInPage sends Accept: application/json, reads the location, and
  navigates to it top-level via window.location.href.

The failure path is unchanged: JSONFailureApp still returns { error: ... } JSON
on bad credentials, so inline error display keeps working. Added tests covering
all four cases.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
We can't yet explain why convention-site sessions drop overnight: the timing
points at the daily CleanupDbService cron, but the refresh-token rotation
machinery should let a well-behaved single browser survive it. Rather than
guess, record *why* /oauth_session/refresh returns invalid_grant so the next
occurrence gives us real data.

OAuthSessionsController#refresh now reports one of three reasons via
ErrorReporting (Sentry/Rollbar, filterable by an `oauth_refresh_failure` tag):

- cookie_absent    — no refresh cookie was sent
- token_not_found  — cookie carried a refresh token but no access token row
                     matches (the signature we'd expect if the nightly cleanup
                     pruned a row the cookie still referenced)
- grant_rejected   — row exists but Doorkeeper refused the grant (already
                     revoked, or refresh-token reuse); logs the row's lifecycle
                     timestamps so we can tell rotation/races from deletion

No token material is logged — only the reason and safe metadata
(resource_owner_id, created/revoked/expires timestamps, previous-refresh-token
presence). Excludes the controller from Metrics/ClassLength (matching the
existing convention for application_controller.rb).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This is the actual white-screen-on-login bug. When an anonymous visitor hit a
login-required page, useLoginRequired rendered nothing (`<></>`) while it kicked
off initiateAuthentication in an effect. But initiating auth is async — it
awaits OIDC discovery before it can build the redirect URL — so the page stayed
blank for the whole round-trip. Worse, the .then had no .catch: if discovery
failed or was blocked (e.g. by Brave's shields, or the `GET 0
/.well-known/openid-configuration` seen in a local repro), the redirect never
happened, nothing surfaced the error, and the visitor was left staring at a
permanent white screen with only a silent unhandled rejection.

useLoginRequired now returns the element to render while not signed in — a
loading indicator while redirecting, or an error with a Retry button if
initiating auth fails (also reported via ErrorReporting) — or `false` once
authenticated. It also guards against firing initiateAuthentication more than
once per mount. The route guard and the inline login gates render that element
instead of a blank fragment.

This pairs with the earlier error-reporting hoist (so the failure is now
captured) and the refresh instrumentation. Doesn't address *why* discovery is
flaky — that's the remaining follow-up.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…in discovery

Initiating login awaited an OIDC discovery fetch to the issuer host
(/.well-known/openid-configuration). From a convention page that's a
cross-origin request to the root site — which gets blocked (Brave shields in
production; untrusted self-signed certs in local dev, where it showed up as
`GET 0 /.well-known/openid-configuration`). When it fails, initiateAuthentication
can't build the redirect URL and login wedges — the root of the white screen.

The SPA only needs three things from the issuer: the issuer URL, the
authorization endpoint (to build the redirect) and the end-session endpoint (for
sign-out); token exchange/refresh already go through our own same-origin
/oauth_session/* endpoints. So serve those in /client_configuration (already
fetched same-origin at boot) and construct openid-client's Configuration
directly, dropping the discovery() call entirely. No more cross-origin
dependency in the login path.

The endpoints are built by joining the issuer URL with the route paths, so a
convention page gets the root-site endpoints regardless of which host served the
request.

Tests: openid.test.ts confirms the authorization URL and end-session endpoint
come out of the metadata-built Configuration (no fetch); a client_configuration
controller test confirms the endpoints are served on the issuer host.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown
Contributor

Code Coverage Report: Only Changed Files listed

Package Base Coverage New Coverage Difference
app/controllers/application_controller.rb 🟠 60.19% 🟠 66.67% 🟢 6.48%
app/controllers/json_failure_app.rb 🟠 50% 🟢 100% 🟢 50%
app/controllers/oauth_sessions_controller.rb 🔴 35.94% 🔴 44% 🟢 8.06%
app/controllers/sessions_controller.rb 🔴 0% 🟠 62.5% 🟢 62.5%
app/javascript/Authentication/authenticationManager.ts 🔴 14.89% 🔴 16.33% 🟢 1.44%
app/javascript/Authentication/openid.ts 🔴 0% 🔴 33.33% 🟢 33.33%
app/javascript/Authentication/useLoginRequired.tsx 🔴 0% 🟢 95.83% 🟢 95.83%
app/javascript/ErrorReporting.ts 🔴 5.88% 🔴 30.91% 🟢 25.03%
lib/devise/strategies/legacy_md5_authenticatable.rb 🟠 52.17% 🟠 65.22% 🟢 13.05%
lib/devise/strategies/legacy_sha1_authenticatable.rb 🔴 43.48% 🟠 56.52% 🟢 13.04%
test/controllers/csv_exports_controller_test.rb 🔴 0% 🟢 100% 🟢 100%
test/controllers/oauth_sessions_controller_test.rb 🔴 0% 🟢 100% 🟢 100%
test/controllers/sessions_controller_test.rb 🔴 0% 🟢 100% 🟢 100%
Overall Coverage 🟢 53.53% 🟢 53.79% 🟢 0.26%

Minimum allowed coverage is 0%, this run produced 53.79%

@nbudin nbudin marked this pull request as ready for review June 14, 2026 16:12
@nbudin nbudin changed the title More login troubleshooting: crash telemetry, third-party OAuth sign-in, and refresh diagnostics Fix white-screen-on-login (cross-origin OIDC discovery), plus third-party OAuth sign-in and refresh diagnostics Jun 14, 2026
@nbudin nbudin added bug patch Bumps the patch version number on release labels Jun 14, 2026
@nbudin nbudin merged commit 1a0dca8 into main Jun 14, 2026
20 checks passed
@nbudin nbudin deleted the login-troubleshooting-part-2 branch June 14, 2026 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug patch Bumps the patch version number on release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant