Skip to content

fix(fab): recover from first idle MySQL disconnect in token auth#62919

Merged
potiuk merged 4 commits into
apache:mainfrom
kbohra:fix/fab-mysql-idle-disconnect-62903
Mar 10, 2026
Merged

fix(fab): recover from first idle MySQL disconnect in token auth#62919
potiuk merged 4 commits into
apache:mainfrom
kbohra:fix/fab-mysql-idle-disconnect-62903

Conversation

@kbohra

@kbohra kbohra commented Mar 5, 2026

Copy link
Copy Markdown
Contributor

Fix FAB auth token deserialization to recover from transient MySQL idle disconnects in the same request by clearing scoped session state and retrying once; add regression coverage for this recovery path.

This PR closes: #62903

Was generative AI tooling used to co-author this PR?
  • No

What this PR changes

  • Add a one-time retry in FabAuthManager.deserialize_user() after session.remove() on SQLAlchemyError.
  • Keep existing behavior for NoResultFound (still raises ValueError).
  • If retry also fails, remove session again and re-raise to avoid reusing broken transaction/session state.
  • Add regression test proving first-request recovery after transient disconnect.
  • Update existing cleanup assertions to account for retry path.

Why

With MySQL idle timeouts, the first request can hit a dead connection and return 500, while the next request succeeds. Retrying once after scoped-session cleanup makes the first request recover instead of failing.

Unit Tests

Have updated the existing unit test and validated the same

@potiuk potiuk modified the milestone: Airflow 3.1.8 Mar 5, 2026
@potiuk

potiuk commented Mar 5, 2026

Copy link
Copy Markdown
Member

Should be rebase I think and fixed :(

@kbohra kbohra force-pushed the fix/fab-mysql-idle-disconnect-62903 branch from 7bdca6c to f12a64d Compare March 5, 2026 14:56
Comment thread providers/fab/tests/unit/fab/auth_manager/test_fab_auth_manager.py Outdated
kbohra added 2 commits March 5, 2026 18:20
Retry user deserialization once after clearing the poisoned scoped session so the first request after a server-side idle timeout does not return 500. Add regression coverage for transient disconnect recovery and factorize deserialization lookup logic to avoid duplication.
@kbohra kbohra force-pushed the fix/fab-mysql-idle-disconnect-62903 branch from f12a64d to d99b042 Compare March 6, 2026 02:26
@potiuk potiuk merged commit aec3744 into apache:main Mar 10, 2026
87 checks passed
dominikhei pushed a commit to dominikhei/airflow that referenced this pull request Mar 11, 2026
…che#62919)

* fix(fab): recover from first idle MySQL disconnect in token auth

Retry user deserialization once after clearing the poisoned scoped session so the first request after a server-side idle timeout does not return 500. Add regression coverage for transient disconnect recovery and factorize deserialization lookup logic to avoid duplication.

* test(fab): rename retry mock for clarity
@kbohra kbohra deleted the fix/fab-mysql-idle-disconnect-62903 branch March 12, 2026 02:16
Pyasma pushed a commit to Pyasma/airflow that referenced this pull request Mar 13, 2026
…che#62919)

* fix(fab): recover from first idle MySQL disconnect in token auth

Retry user deserialization once after clearing the poisoned scoped session so the first request after a server-side idle timeout does not return 500. Add regression coverage for transient disconnect recovery and factorize deserialization lookup logic to avoid duplication.

* test(fab): rename retry mock for clarity
f-necas added a commit to georchestra/datafeeder that referenced this pull request May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

apache-airflow-providers-fab 3.4.0: Intermittent OperationalError (4031) when FAB session reuses a MySQL connection dropped due to idle timeout

4 participants