fix: Verify durable cached agent steps match the request before replay#68372
Merged
Conversation
d4c16ba to
d0036eb
Compare
d0036eb to
020da0f
Compare
durable=True cached model responses and tool results under purely positional keys, so a retry replayed cached steps even when the agent changed between attempts (prompt tweak, model upgrade, toolset change, or a deploy landing between retries). The retry silently continued a different conversation with no warning above DEBUG. Each cache entry now stores a fingerprint of the request that produced it (model identity, message history minus per-attempt fields, settings, and the full ModelRequestParameters; tool name, args, and tool_call_id for tool steps). On a hit the fingerprint is compared first: a mismatch logs a warning and re-runs the step live. A divergence invalidates downstream tool steps too, because a fresh model response mints new tool_call_ids. Entries written by older provider versions have no fingerprint and re-run instead of replaying.
020da0f to
3a30867
Compare
gopidesupavan
approved these changes
Jun 17, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
durable=Truecaches each model response and tool result under positional keys (model_step_{N},tool_step_{N}), and on a hit it returned the cached entry without ever looking at the current request. So a retry after the operator tweaked the system prompt (or upgraded the model, or a deploy changed the toolset) replayed responses recorded for the old conversation against the new agent. No error, nothing above DEBUG in the logs, just a wrong answer that looks fine. Changing something before retrying is the normal human workflow, which is what makes this the common path rather than an edge case.The fix stores a fingerprint with each cache entry and only replays when it matches the current request:
timestamp/run_id/conversation_idfields pydantic-ai regenerates on every attempt), the settings, and the wholeModelRequestParameters, so tool definitions and output mode are covered tootool_call_idOn mismatch the step logs a warning and runs live. The
tool_call_idpart does more work than it looks: ids round-trip through the cache unchanged, but a live model call mints new ones, so once a model step diverges the downstream tool entries stop matching as well. I kept the positional keys from #64199 instead of switching to content-addressed ones; verify-on-hit gets the same invalidation chain without changing the storage layout.A few decisions worth flagging for review:
Noneand that step replays unverified, i.e. the old behavior. I specifically avoideddefault=strin the digest: hashing<object at 0x...>reprs would never match across processes, which quietly turns replay off forever while the warning blames the user for changing the agent.llm_conn_idat a different endpoint serving the same model name. Both documented in the operator guide along with the delete-the-cache-file escape hatch.Verified end to end in breeze with a real DAG: two
AgentOperatortasks on pydantic-ai's built-inTestModel, each with a tool that fails on attempt 1 only. The unchanged task loggedDurable: replayed 2 cached steps (1 model, 1 tool), executed 2 new steps (1 model, 1 tool)on attempt 2, so only the failed step re-ran. The second task templates its prompt ontry_numberso the request changes per attempt; its retry fired the new warning and replayed 0 model steps. The cache file was gone after the run succeeded.