code-mode: make create and observe retry-safe#29397
Closed
cconger wants to merge 1 commit into
Closed
Conversation
c79e006 to
4a76c0a
Compare
4232322 to
5cc94cd
Compare
4a76c0a to
457f6f2
Compare
5cc94cd to
e647c2f
Compare
1152f3f to
5df525b
Compare
e647c2f to
b3ba157
Compare
5df525b to
a6be94f
Compare
b3ba157 to
a9ed5af
Compare
This was referenced Jun 22, 2026
a9ed5af to
96581e9
Compare
a6be94f to
3081aba
Compare
3081aba to
098e9f0
Compare
jif-oai
reviewed
Jun 22, 2026
jif-oai
left a comment
Collaborator
There was a problem hiding this comment.
The deadlock is a major blocker
| if tool_name.namespace.is_none() && tool_name.name.as_str() == WAIT_TOOL_NAME => | ||
| { | ||
| let args: ExecWaitArgs = parse_arguments(&arguments)?; | ||
| let idempotency_key = format!("{}:{call_id}", session.thread_id()); |
Collaborator
There was a problem hiding this comment.
We generate a retry key for wait, but the terminate=true branch drops it
Termination claims the cell before awaiting its outcome, so a lost IPC response makes the retry return already terminating or missing instead of the original terminal output
| }, | ||
| ); | ||
| let runtime = Arc::clone(&self.runtime); | ||
| tokio::spawn(async move { |
Collaborator
There was a problem hiding this comment.
This task keeps the runtime alive even after the service and all replay receivers are gone, so dropping the session no longer cancels the cell until the observation deadline finishes
This is a standard ownership deadlock. This needs a proper teardown or a weaken hold
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The session protocol crosses a cancelable IPC boundary. A lost create or observe response must be retryable without starting another cell or consuming output that Core never received.
What
idempotency_keyfields to the wireCreateCellRequestandObserveRequesttypes.ObserveOutcomewire value when an observation key is retried.Host/Core wire sequence
sequenceDiagram participant Core participant Host Core->>Host: CreateCellRequest { idempotency_key, ... } Host-->>Core: CellId opt create response is lost Core->>Host: same CreateCellRequest Host-->>Core: same CellId end Core->>Host: ObserveRequest { idempotency_key, cell_id, yield_time_ms } Host->>Host: finish observation independently opt observe response is lost Core->>Host: same ObserveRequest Host-->>Core: same ObserveOutcome endStack boundary
Observation records are keyed independently and retained for the session lifetime in this PR. #29398 replaces arbitrary observation keys with per-cell generations and bounds retention to the latest observation.
Validation
just test -p codex-code-modejust test -p codex-code-mode-protocolStack parent: #29291.