Skip to content

[Payment due @nyomanjyotisa] Fix stuck isAuthenticatingWithShortLivedToken blocking auto-reauth#91633

Merged
mountiny merged 11 commits into
Expensify:mainfrom
allgandalf:fix-stuck-isAuthenticatingWithShortLivedToken
Jun 3, 2026
Merged

[Payment due @nyomanjyotisa] Fix stuck isAuthenticatingWithShortLivedToken blocking auto-reauth#91633
mountiny merged 11 commits into
Expensify:mainfrom
allgandalf:fix-stuck-isAuthenticatingWithShortLivedToken

Conversation

@allgandalf

@allgandalf allgandalf commented May 25, 2026

Copy link
Copy Markdown
Contributor

Explanation of Change

session.isAuthenticatingWithShortLivedToken was a regular persisted Onyx field. when a SignInWithShortLivedAuthToken request was interrupted between optimisticData (set true) and finallyData (set false), the optimistic IndexedDB write was already committed and never got cleared. on every later page load the stuck true rehydrated, and the abort check in Reauthentication.ts blocked every reauth attempt, so the user could not recover from a 407 without "Clear Cache and Restart".

moved the flag to a new RAM-only Onyx key RAM_ONLY_IS_AUTHENTICATING_WITH_SHORT_LIVED_TOKEN. an interrupted request can no longer persist a stuck value, and anyone currently stuck recovers on the next page load because the new key starts undefined.

full RCA (with logs) on the tracking issue: https://github.com/Expensify/Expensify/issues/637360#issuecomment-4466727707

Fixed Issues

$ https://github.com/Expensify/Expensify/issues/637360
PROPOSAL: N/A (internal RCA, see linked comment)

Tests

  1. Sign in to NewDot as any test user
  2. In devtools console run this to simulate the stuck-flag state:
(async () => {
  const db = await new Promise((r) => { const o = indexedDB.open('OnyxDB'); o.onsuccess = e => r(e.target.result); });
  const tx = db.transaction('keyvaluepairs', 'readwrite');
  const store = tx.objectStore('keyvaluepairs');
  const session = await new Promise((r) => { const o = store.get('session'); o.onsuccess = e => r(e.target.result); });
  session.isAuthenticatingWithShortLivedToken = true;
  await new Promise((r) => { const o = store.put(session, 'session'); o.onsuccess = e => r(e.target.result); });
  console.log('poisoned; reload now');
})();
  1. Hard reload the NewDot tab
  2. Wait for the authToken to expire (or sit idle past the TTL)
  3. Send any message in any DM
  4. Expected (with the fix): message sends successfully. reauth runs silently and a fresh token replaces the stale one
  5. Expected (without the fix on main): 407 + "Your session has expired. Please sign in again." toast
  • Verify that no errors appear in the JS console

Offline tests

N/A. The offline write queue is not affected by this change.

QA Steps

Same as Tests above.

  • Verify that no errors appear in the JS console

PR Author Checklist

  • I linked the correct issue in the ### Fixed Issues section above
  • I wrote clear testing steps that cover the changes made in this PR
    • I added steps for local testing in the Tests section
    • I added steps for the expected offline behavior in the Offline steps section
    • I added steps for Staging and/or Production testing in the QA steps section
    • I added steps to cover failure scenarios (i.e. verify an input displays the correct error message if the entered data is not correct)
    • I turned off my network connection and tested it while offline to ensure it matches the expected behavior (i.e. verify the default avatar icon is displayed if app is offline)
    • I tested this PR with a High Traffic account against the staging or production API to ensure there are no regressions (e.g. long loading states that impact usability).
  • I included screenshots or videos for tests on all platforms
  • I ran the tests on all platforms & verified they passed on:
    • Android: Native
    • Android: mWeb Chrome
    • iOS: Native
    • iOS: mWeb Safari
    • MacOS: Chrome / Safari
  • I verified there are no console errors (if there's a console error not related to the PR, report it or open an issue for it to be fixed)
  • I followed proper code patterns (see Reviewing the code)
    • I verified that any callback methods that were added or modified are named for what the method does and never what callback they handle (i.e. toggleReport and not onIconClick)
    • I verified that comments were added to code that is not self explanatory
    • I verified that any new or modified comments were clear, correct English, and explained "why" the code was doing something instead of only explaining "what" the code was doing.
    • I verified any copy / text shown in the product is localized by adding it to src/languages/* files and using the translation method
    • I verified all numbers, amounts, dates and phone numbers shown in the product are using the localization methods
    • I verified any copy / text that was added to the app is grammatically correct in English. It adheres to proper capitalization guidelines (note: only the first word of header/labels should be capitalized), and is either coming verbatim from figma or has been approved by marketing (in order to get marketing approval, ask the Bug Zero team member to add the Waiting for copy label to the issue)
    • I verified proper file naming conventions were followed for any new files or renamed files. All non-platform specific files are named after what they export and are not named "index.js". All platform-specific files are named for the platform the code supports as outlined in the README.
    • I verified the JSDocs style guidelines (in STYLE.md) were followed
  • If a new code pattern is added I verified it was agreed to be used by multiple Expensify engineers
  • I followed the guidelines as stated in the Review Guidelines
  • I tested other components that can be impacted by my changes (i.e. if the PR modifies a shared library or component like Avatar, I verified the components using Avatar are working as expected)
  • I verified all code is DRY (the PR doesn't include any logic written more than once, with the exception of tests)
  • I verified any variables that can be defined as constants (ie. in CONST.ts or at the top of the file that uses the constant) are defined as such
  • I verified that if a function's arguments changed that all usages have also been updated correctly
  • If any new file was added I verified that:
    • The file has a description of what it does and/or why is needed at the top of the file if the code is not self explanatory
  • If a new CSS style is added I verified that:
    • A similar style doesn't already exist
    • The style can't be created with an existing StyleUtils function (i.e. StyleUtils.getBackgroundAndBorderStyle(theme.componentBG))
  • If new assets were added or existing ones were modified, I verified that:
    • The assets are optimized and compressed (for SVG files, run npm run compress-svg)
    • The assets load correctly across all supported platforms.
  • If the PR modifies code that runs when editing or sending messages, I tested and verified there is no unexpected behavior for all supported markdown - URLs, single line code, code blocks, quotes, headings, bold, strikethrough, and italic.
  • If the PR modifies a generic component, I tested and verified that those changes do not break usages of that component in the rest of the App (i.e. if a shared library or component like Avatar is modified, I verified that Avatar is working as expected in all cases)
  • If the PR modifies a component related to any of the existing Storybook stories, I tested and verified all stories for that component are still working as expected.
  • If the PR modifies a component or page that can be accessed by a direct deeplink, I verified that the code functions as expected when the deeplink is used - from a logged in and logged out account.
  • If the PR modifies the UI (e.g. new buttons, new UI components, changing the padding/spacing/sizing, moving components, etc) or modifies the form input styles:
    • I verified that all the inputs inside a form are aligned with each other.
    • I added Design label and/or tagged @Expensify/design so the design team can review the changes.
  • If a new page is added, I verified it's using the ScrollView component to make it scrollable when more elements are added to the page.
  • I added unit tests for any new feature or bug fix in this PR to help automatically prevent regressions in this user flow.
  • If the main branch was merged into this PR after a review, I tested again and verified the outcome was still expected according to the Test steps.

Screenshots/Videos

Android: Native
Android: mWeb Chrome
iOS: Native
iOS: mWeb Safari
MacOS: Chrome / Safari

@allgandalf allgandalf marked this pull request as ready for review June 1, 2026 21:23
@allgandalf allgandalf requested review from a team as code owners June 1, 2026 21:23
@melvin-bot melvin-bot Bot requested review from JmillsExpensify and nyomanjyotisa and removed request for a team June 1, 2026 21:23
@melvin-bot

melvin-bot Bot commented Jun 1, 2026

Copy link
Copy Markdown

@nyomanjyotisa Please copy/paste the Reviewer Checklist from here into a new comment on this PR and complete it. If you have the K2 extension, you can simply click: [this button]

@melvin-bot melvin-bot Bot removed request for a team and JmillsExpensify June 1, 2026 21:23
@mountiny

mountiny commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

@allgandalf can you sync latest main to fix the jest

@mountiny mountiny requested a review from Copilot June 2, 2026 09:06
@mountiny

mountiny commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

@nyomanjyotisa can you please prioritize this pr? thanks

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a bug where session.isAuthenticatingWithShortLivedToken could persist as true in IndexedDB if a SignInWithShortLivedAuthToken request was interrupted between optimisticData and finallyData. The stuck value would rehydrate on subsequent loads and cause Reauthentication.ts to permanently abort all reauth attempts, leaving the user unable to recover from a 407 without clearing cache. The fix moves the flag to a new RAM-only Onyx key so an interrupted request cannot persist a stuck value, and previously stuck users recover automatically on next load.

Changes:

  • Added new RAM-only Onyx key RAM_ONLY_IS_AUTHENTICATING_WITH_SHORT_LIVED_TOKEN and registered it in setup/index.ts.
  • Updated Session/index.ts, Reauthentication.ts, and AppState/index.ts to read/write the flag via the new key, and removed isAuthenticatingWithShortLivedToken from the Session Onyx type.
  • Added two unit tests covering both the new RAM-only abort path and the recovery path for legacy persisted stuck flags.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/ONYXKEYS.ts Defines the new RAM-only key and adds its type mapping.
src/setup/index.ts Registers the new key in ramOnlyKeys.
src/types/onyx/Session.ts Removes the persisted isAuthenticatingWithShortLivedToken field.
src/libs/Reauthentication.ts Reads the flag from the new RAM-only key instead of SESSION.
src/libs/AppState/index.ts Captures the flag for logging via the new RAM-only key.
src/libs/actions/Session/index.ts Writes optimistic/finally updates to the new RAM-only key instead of SESSION.
tests/actions/SessionTest.ts Adds tests for the new abort path and legacy recovery path.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@nyomanjyotisa

Copy link
Copy Markdown
Member

Reviewing now

@nyomanjyotisa

Copy link
Copy Markdown
Member

Reviewer Checklist

  • I have verified the author checklist is complete (all boxes are checked off).
  • I verified the correct issue is linked in the ### Fixed Issues section above
  • I verified testing steps are clear and they cover the changes made in this PR
    • I verified the steps for local testing are in the Tests section
    • I verified the steps for Staging and/or Production testing are in the QA steps section
    • I verified the steps cover any possible failure scenarios (i.e. verify an input displays the correct error message if the entered data is not correct)
    • I turned off my network connection and tested it while offline to ensure it matches the expected behavior (i.e. verify the default avatar icon is displayed if app is offline)
  • I checked that screenshots or videos are included for tests on all platforms
  • I included screenshots or videos for tests on all platforms
  • I verified that the composer does not automatically focus or open the keyboard on mobile unless explicitly intended. This includes checking that returning the app from the background does not unexpectedly open the keyboard.
  • I verified tests pass on all platforms & I tested again on:
    • Android: HybridApp
    • Android: mWeb Chrome
    • iOS: HybridApp
    • iOS: mWeb Safari
    • MacOS: Chrome / Safari
  • If there are any errors in the console that are unrelated to this PR, I either fixed them (preferred) or linked to where I reported them in Slack
  • I verified proper code patterns were followed (see Reviewing the code)
    • I verified that any callback methods that were added or modified are named for what the method does and never what callback they handle (i.e. toggleReport and not onIconClick).
    • I verified that comments were added to code that is not self explanatory
    • I verified that any new or modified comments were clear, correct English, and explained "why" the code was doing something instead of only explaining "what" the code was doing.
    • I verified any copy / text shown in the product is localized by adding it to src/languages/* files and using the translation method
    • I verified all numbers, amounts, dates and phone numbers shown in the product are using the localization methods
    • I verified any copy / text that was added to the app is grammatically correct in English. It adheres to proper capitalization guidelines (note: only the first word of header/labels should be capitalized), and is either coming verbatim from figma or has been approved by marketing (in order to get marketing approval, ask the Bug Zero team member to add the Waiting for copy label to the issue)
    • I verified proper file naming conventions were followed for any new files or renamed files. All non-platform specific files are named after what they export and are not named "index.js". All platform-specific files are named for the platform the code supports as outlined in the README.
    • I verified the JSDocs style guidelines (in STYLE.md) were followed
  • If a new code pattern is added I verified it was agreed to be used by multiple Expensify engineers
  • I verified that this PR follows the guidelines as stated in the Review Guidelines
  • I verified other components that can be impacted by these changes have been tested, and I retested again (i.e. if the PR modifies a shared library or component like Avatar, I verified the components using Avatar have been tested & I retested again)
  • I verified all code is DRY (the PR doesn't include any logic written more than once, with the exception of tests)
  • I verified any variables that can be defined as constants (ie. in CONST.ts or at the top of the file that uses the constant) are defined as such
  • If a new component is created I verified that:
    • A similar component doesn't exist in the codebase
    • All props are defined accurately and each prop has a /** comment above it */
    • The file is named correctly
    • The component has a clear name that is non-ambiguous and the purpose of the component can be inferred from the name alone
    • The only data being stored in the state is data necessary for rendering and nothing else
    • For Class Components, any internal methods passed to components event handlers are bound to this properly so there are no scoping issues (i.e. for onClick={this.submit} the method this.submit should be bound to this in the constructor)
    • Any internal methods bound to this are necessary to be bound (i.e. avoid this.submit = this.submit.bind(this); if this.submit is never passed to a component event handler like onClick)
    • All JSX used for rendering exists in the render method
    • The component has the minimum amount of code necessary for its purpose, and it is broken down into smaller components in order to separate concerns and functions
  • If any new file was added I verified that:
    • The file has a description of what it does and/or why is needed at the top of the file if the code is not self explanatory
  • If a new CSS style is added I verified that:
    • A similar style doesn't already exist
    • The style can't be created with an existing StyleUtils function (i.e. StyleUtils.getBackgroundAndBorderStyle(theme.componentBG)
  • If the PR modifies code that runs when editing or sending messages, I tested and verified there is no unexpected behavior for all supported markdown - URLs, single line code, code blocks, quotes, headings, bold, strikethrough, and italic.
  • If the PR modifies a generic component, I tested and verified that those changes do not break usages of that component in the rest of the App (i.e. if a shared library or component like Avatar is modified, I verified that Avatar is working as expected in all cases)
  • If the PR modifies a component related to any of the existing Storybook stories, I tested and verified all stories for that component are still working as expected.
  • If the PR modifies a component or page that can be accessed by a direct deeplink, I verified that the code functions as expected when the deeplink is used - from a logged in and logged out account.
  • If the PR modifies the UI (e.g. new buttons, new UI components, changing the padding/spacing/sizing, moving components, etc) or modifies the form input styles:
    • I verified that all the inputs inside a form are aligned with each other.
    • I added Design label and/or tagged @Expensify/design so the design team can review the changes.
  • If a new page is added, I verified it's using the ScrollView component to make it scrollable when more elements are added to the page.
  • For any bug fix or new feature in this PR, I verified that sufficient unit tests are included to prevent regressions in this flow.
  • If the main branch was merged into this PR after a review, I tested again and verified the outcome was still expected according to the Test steps.
  • I have checked off every checkbox in the PR reviewer checklist, including those that don't apply to this PR.

Screenshots/Videos

Android: HybridApp
Android: mWeb Chrome
iOS: HybridApp
iOS: mWeb Safari
MacOS: Chrome / Safari
MacOS-Chrome.mp4

main:

MacOS-Chrome-2.mp4

@nyomanjyotisa nyomanjyotisa left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! The failing test is unrelated

@melvin-bot melvin-bot Bot changed the title Fix stuck isAuthenticatingWithShortLivedToken blocking auto-reauth [Payment due @nyomanjyotisa] Fix stuck isAuthenticatingWithShortLivedToken blocking auto-reauth Jun 2, 2026
@melvin-bot

melvin-bot Bot commented Jun 2, 2026

Copy link
Copy Markdown

🎯 @nyomanjyotisa, thanks for reviewing and testing this PR! 🎉

A payment issue will be created for your review once this PR is deployed to production.
E/E issue linked to the PR - https://www.github.com/Expensify/Expensify/issues/637360.

If payment is not needed (e.g., regression PR review fix etc), react with 👎 to this comment to prevent the payment issue from being created.

@mountiny

mountiny commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Restarted the test

function captureSessionState(): SessionStateInfo {
// Check multiple authentication states to get complete picture
const isSessionLoading = !!currentSession?.loading;
const isAuthenticatingWithShortLivedToken = !!currentSession?.isAuthenticatingWithShortLivedToken;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we update the app version and the user has isAuthenticatingWithShortLivedToken in the session and not in the ram only key? should we temporarily coalesce both values here to cover this case and then later remove the key from the session?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think coalescing them would actually re-trap the stuck users :)

the whole point of moving the flag to RAM-only is that we stop reading session.isAuthenticatingWithShortLivedToken. that persisted field IS the bug. currently-stuck users have it persisted as true in IndexedDB and the legacy code reads it back on every reload, blocking every reauth attempt. by ignoring it in the new code, they get unblocked on the next reload because the RAM-only key starts undefined.

if we OR both values here, a stuck user's session.isAuthenticatingWithShortLivedToken=true would still evaluate to true and reauth would keep aborting, which is exactly what we are trying to escape from.

the second unit test (reauthenticate proceeds even when a legacy session.isAuthenticatingWithShortLivedToken=true is persisted) covers this app-upgrade scenario: legacy stuck session field + undefined RAM-only key -> reauth proceeds normally.

also fwiw, this AppState file is just the diagnostic log captured when ActivityIndicator hangs. it isnt the actual reauth abort path, that lives in Reauthentication.ts. the log reflecting the new RAM-only state is correct (it shows what reauth actually reads, not the dead legacy value).

happy to add a follow-up cleanup migration to delete the leftover session.isAuthenticatingWithShortLivedToken from IndexedDB if you want it gone, but the fix here doesnt depend on it. WDYT?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah I think we can leave it

@mountiny mountiny left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@mountiny mountiny merged commit dc0672b into Expensify:main Jun 3, 2026
37 of 38 checks passed
@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

🚧 @mountiny has triggered a test Expensify/App build. You can view the workflow run here.

@OSBotify

OSBotify commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

🚀 Deployed to staging by https://github.com/mountiny in version: 9.3.99-0 🚀

platform result
🕸 web 🕸 success ✅
🤖 android 🤖 success ✅
🍎 iOS 🍎 success ✅

Bundle Size Analysis (Sentry):

@MelvinBot

Copy link
Copy Markdown
Contributor

Help site review — no changes required

I reviewed the changes in this PR against App/docs/articles and no help site updates are required.

Why: This is a purely internal session/authentication bug fix with no user-facing surface:

  • It moves the isAuthenticatingWithShortLivedToken flag from a persisted Onyx key to a new RAM-only key (RAM_ONLY_IS_AUTHENTICATING_WITH_SHORT_LIVED_TOKEN) so an interrupted SignInWithShortLivedAuthToken request can no longer leave a stuck true in IndexedDB blocking future reauth.
  • The only files touched are internal libs, types, Onyx keys, and a test (src/ONYXKEYS.ts, src/libs/AppState/index.ts, src/libs/Reauthentication.ts, src/libs/actions/Session/index.ts, src/setup/index.ts, src/types/onyx/Session.ts, tests/actions/SessionTest.ts).
  • There is no new feature, UI element, setting, tab, button, or workflow change — nothing a help article documents. The behavior change is invisible to end users (silent reauth instead of an expired-session error).

I also searched App/docs/articles for related terms (short-lived token, reauthentication, session expired); the only matches are unrelated third-party integration error articles (Xero, Sage Intacct, QuickBooks), which this PR does not affect.

Since no documentation changes are needed, I did not create a draft help site PR.


@allgandalf, no linked help site PR was created because this change has no customer-facing documentation impact. If you believe a help article should be updated for this behavior, let me know which flow and I'll draft it.

@IuliiaHerets

Copy link
Copy Markdown

Hi @allgandalf. Can it be checked internally?
This is quite challenging for the QA team to test it and verify

cc @mountiny @nyomanjyotisa @allgandalf

@OSBotify

OSBotify commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

🚀 Deployed to production by https://github.com/lakchote in version: 9.3.99-9 🚀

platform result
🕸 web 🕸 success ✅
🤖 android 🤖 success ✅
🍎 iOS 🍎 success ✅

@melvin-bot

melvin-bot Bot commented Jun 5, 2026

Copy link
Copy Markdown

🤖 Payment issue created: #92796

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants