fix: Verify durable cached agent steps match the request before replay by kaxil · Pull Request #68372 · apache/airflow

kaxil · 2026-06-11T01:01:27Z

durable=True caches each model response and tool result under positional keys (model_step_{N}, tool_step_{N}), and on a hit it returned the cached entry without ever looking at the current request. So a retry after the operator tweaked the system prompt (or upgraded the model, or a deploy changed the toolset) replayed responses recorded for the old conversation against the new agent. No error, nothing above DEBUG in the logs, just a wrong answer that looks fine. Changing something before retrying is the normal human workflow, which is what makes this the common path rather than an edge case.

The fix stores a fingerprint with each cache entry and only replays when it matches the current request:

model steps hash the model identity, the message history (minus the timestamp/run_id/conversation_id fields pydantic-ai regenerates on every attempt), the settings, and the whole ModelRequestParameters, so tool definitions and output mode are covered too
tool steps hash the tool name, the args, and the model-issued tool_call_id

On mismatch the step logs a warning and runs live. The tool_call_id part does more work than it looks: ids round-trip through the cache unchanged, but a live model call mints new ones, so once a model step diverges the downstream tool entries stop matching as well. I kept the positional keys from #64199 instead of switching to content-addressed ones; verify-on-hit gets the same invalidation chain without changing the storage layout.

A few decisions worth flagging for review:

if a request can't be serialized to JSON, the fingerprint is None and that step replays unverified, i.e. the old behavior. I specifically avoided default=str in the digest: hashing <object at 0x...> reprs would never match across processes, which quietly turns replay off forever while the warning blames the user for changing the agent.
entries written by older provider versions carry no fingerprint and are treated as a miss. A provider upgrade is itself a deploy landing between attempts, so re-running once is the right call even though it costs tokens.
verification compares requests, not code. Fixing a tool's implementation between attempts won't invalidate a cached result for an identical call, and neither will repointing llm_conn_id at a different endpoint serving the same model name. Both documented in the operator guide along with the delete-the-cache-file escape hatch.

Verified end to end in breeze with a real DAG: two AgentOperator tasks on pydantic-ai's built-in TestModel, each with a tool that fails on attempt 1 only. The unchanged task logged Durable: replayed 2 cached steps (1 model, 1 tool), executed 2 new steps (1 model, 1 tool) on attempt 2, so only the failed step re-ran. The second task templates its prompt on try_number so the request changes per attempt; its retry fired the new warning and replayed 0 model steps. The cache file was gone after the run succeeded.

durable=True cached model responses and tool results under purely positional keys, so a retry replayed cached steps even when the agent changed between attempts (prompt tweak, model upgrade, toolset change, or a deploy landing between retries). The retry silently continued a different conversation with no warning above DEBUG. Each cache entry now stores a fingerprint of the request that produced it (model identity, message history minus per-attempt fields, settings, and the full ModelRequestParameters; tool name, args, and tool_call_id for tool steps). On a hit the fingerprint is compared first: a mismatch logs a warning and re-runs the step live. A divergence invalidates downstream tool steps too, because a fresh model response mints new tool_call_ids. Entries written by older provider versions have no fingerprint and re-run instead of replaying.

boring-cyborg Bot added area:providers kind:documentation provider:common-ai labels Jun 11, 2026

kaxil force-pushed the durable-replay-verification branch 2 times, most recently from d4c16ba to d0036eb Compare June 11, 2026 18:56

kaxil marked this pull request as ready for review June 12, 2026 23:19

kaxil requested a review from gopidesupavan as a code owner June 12, 2026 23:19

gopidesupavan reviewed Jun 15, 2026

View reviewed changes

Comment thread providers/common/ai/src/airflow/providers/common/ai/durable/caching_model.py

gopidesupavan reviewed Jun 15, 2026

View reviewed changes

Comment thread providers/common/ai/src/airflow/providers/common/ai/durable/storage.py Outdated

kaxil force-pushed the durable-replay-verification branch from d0036eb to 020da0f Compare June 16, 2026 23:55

kaxil force-pushed the durable-replay-verification branch from 020da0f to 3a30867 Compare June 17, 2026 00:34

gopidesupavan approved these changes Jun 17, 2026

View reviewed changes

kaxil merged commit db26df7 into apache:main Jun 18, 2026
81 checks passed

kaxil deleted the durable-replay-verification branch June 18, 2026 00:05

shahar1 mentioned this pull request Jun 19, 2026

Status of testing Providers that were prepared on June 19, 2026 #68751

Open

75 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Verify durable cached agent steps match the request before replay#68372

fix: Verify durable cached agent steps match the request before replay#68372
kaxil merged 1 commit into
apache:mainfrom
astronomer:durable-replay-verification

kaxil commented Jun 11, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kaxil commented Jun 11, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants