Python: Fix function_approval_response extraction in AG-UI workflow path by moonbox3 · Pull Request #4550 · microsoft/agent-framework

moonbox3 · 2026-03-09T03:35:23Z

Motivation and Context

The _extract_responses_from_messages helper only handled function_result content, silently dropping function_approval_response entries. This meant approval/denial decisions from the user were never forwarded to the agent runtime, causing human-in-the-loop tool-approval workflows to hang indefinitely.

Fixes #4546

Description

The root cause was a guard clause in _extract_responses_from_messages that skipped any content whose type was not function_result. The fix restructures the conditional to also handle function_approval_response content, extracting its approved flag, id, and associated function_call into the responses dict keyed by the approval id. Tests covering approved, denied, mixed, and edge-case scenarios are added to prevent regression.

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the Contribution Guidelines
All unit tests pass, and I have added new tests where possible
Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Note: PR autogenerated by moonbox3's agent

…4546) _extract_responses_from_messages now handles function_approval_response content in addition to function_result content. Previously, approval responses sent via the messages field were silently dropped because the function only checked for content.type == "function_result". The approval response is keyed by content.id and includes the approved status, id, and serialized function_call — consistent with how _coerce_content identifies approval response payloads. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

moonbox3

Automated Code Review

Reviewers: 4 | Confidence: 97% | Result: All clear

Reviewed: Correctness, Security Reliability, Test Coverage, Design Approach

Automated review by moonbox3's agents

markwallace-microsoft · 2026-03-09T03:37:52Z

Python Test Coverage Report •

File	Stmts	Miss	Cover	Missing
packages/ag-ui/agent_framework_ag_ui
_workflow_run.py	507	27	94%	180, 216–219, 247, 252, 280, 290, 301, 306, 309, 322, 362–363, 384, 433, 453, 461, 464, 469, 484, 557, 599–600, 671, 714
TOTAL	22686	2582	88%

Python Unit Test Overview

Tests	Skipped	Failures	Errors	Time
4948	20 💤	0 ❌	0 🔥	1m 24s ⏱️

Copilot

Pull request overview

Fixes AG-UI workflow runs hanging in human-in-the-loop scenarios by ensuring user-sent tool approval decisions are extracted from incoming messages and forwarded into the workflow responses map.

Changes:

Extend _extract_responses_from_messages to also extract function_approval_response content (in addition to function_result).
Add unit tests covering approval extraction (approved/denied/mixed/ignored content).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
python/packages/ag-ui/agent_framework_ag_ui/_workflow_run.py	Extracts `function_approval_response` payloads from incoming messages so workflow runs can resume from message-based approvals.
python/packages/ag-ui/tests/ag_ui/test_workflow_run.py	Adds unit tests for the updated extraction helper, covering approval response handling.

You can also share your feedback on Copilot code review. Take the survey.

python/packages/ag-ui/agent_framework_ag_ui/_workflow_run.py

python/packages/ag-ui/tests/ag_ui/test_workflow_run.py

…ssage-based approvals - Update _extract_responses_from_messages docstring to reflect that it now handles function_approval_response content in addition to function_result content. - Add integration tests for run_workflow_stream across two turns with approval responses provided via messages (function_approvals) rather than resume.interrupts, covering both approved and denied scenarios. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

moonbox3

Automated Code Review

Reviewers: 4 | Confidence: 82%

✓ Correctness

This diff adds a docstring clarification and two new integration tests for approval-via-messages functionality. The tests are well-structured and symmetric (approved vs denied). No blocking correctness issues found. The tests correctly match request IDs, use the right async patterns (no @pytest.mark.asyncio needed per project convention), and verify both the happy path and denial path. One minor concern: the "interrupt" not in resumed_finished assertion may be fragile depending on how the Pydantic model serializes optional fields.

✓ Security Reliability

This diff contains only a docstring clarification in production code and two new integration tests. There are no production logic changes, no new trust boundaries, no deserialization of untrusted input, no secrets, and no resource-leak risks. The tests are well-structured and exercise both the approved and denied approval paths. No security or reliability issues found.

✓ Test Coverage

Two solid integration tests are added covering both approved and denied approval-via-messages paths. The assertions are meaningful: they verify event types, text output content, and absence of leftover interrupts. However, the docstring update to _extract_responses_from_messages describes new function_approval_response handling, yet no unit test is added to the existing TestExtractResponsesFromMessages class to exercise that extraction logic directly. The integration tests cover the end-to-end flow but a targeted unit test would catch regressions in the extraction function more precisely.

✓ Design Approach

The diff adds a docstring clarification and two new integration tests covering the approval-via-messages path (approved and denied). The production code change is documentation-only, and the tests exercise a real workflow end-to-end. There are no fundamental design approach concerns: the tests verify a legitimate second message-delivery channel for approval responses and cover both positive and negative outcomes. The minor duplication between the two test functions is acceptable given they assert different behavioral branches.

Suggestions

The assertion assert "interrupt" not in resumed_finished (lines 1328 and 1399) could be fragile: if the Pydantic model always includes interrupt in model_dump() (e.g., as None or []), the key would still be present and the assertion would fail. Consider assert not resumed_finished.get("interrupt") which handles both absent keys and falsy values (None, []).
Add a unit test to TestExtractResponsesFromMessages that constructs a Message containing function_approval_response content and asserts _extract_responses_from_messages returns the expected {id: content} mapping. This directly validates the extraction logic described in the updated docstring, independent of the full workflow machinery.
Consider adding an edge-case integration test (or unit test) where both a function_result and a function_approval_response are present in the same message, verifying both are extracted into the responses dict without interference.

Automated review by moonbox3's agents

python/packages/ag-ui/tests/ag_ui/test_workflow_run.py

- Use safer 'not .get("interrupt")' assertion instead of 'not in' to handle Pydantic v2 model_dump() including keys with None values - Add unit test for mixed function_result and function_approval_response in the same message to TestExtractResponsesFromMessages Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

moonbox3

Automated Code Review

Reviewers: 4 | Confidence: 86%

✓ Correctness

The diff adds a straightforward new test for mixed result/approval extraction and weakens two assertions from checking key absence ("interrupt" not in dict) to checking falsiness (not dict.get("interrupt")). The new test looks correct. The assertion changes are likely motivated by Pydantic's model_dump() including "interrupt": None for optional fields, but using not is overly permissive—it would also pass for unexpected falsy values like {}, [], or False, potentially masking real bugs.

✓ Security Reliability

This diff modifies only test code: it adds a new test for mixed result/approval extraction and changes two assertions from assert "interrupt" not in dict to assert not dict.get("interrupt"). No production code is touched. The assertion change subtly weakens the check—dict.get("interrupt") returns None if the key is absent but would also pass if the key is present with a falsy value (e.g., {}, 0, False, None, ""), whereas the original "interrupt" not in dict strictly checks key absence. This is unlikely to mask a real bug in practice but is worth noting as a minor reliability concern in tests. No security issues found.

✓ Test Coverage

The diff adds a useful test for mixed result and approval content in the same message, and tweaks two assertions from assert "interrupt" not in resumed_finished to assert not resumed_finished.get("interrupt"). The new test is well-structured with meaningful assertions. The assertion change subtly weakens the check: the original verified the key was absent, while .get() would also pass if the key is present but holds a falsy value (e.g., None, False, {}, [], 0). If the intent is truly to verify the field is absent or null, consider an explicit check. No missing test scenarios were identified.

✓ Design Approach

This diff adds a well-structured test for mixed result/approval extraction and relaxes two assertions from key-absence checks to falsy-value checks. The changes are minor and appropriate. The assertion change from "interrupt" not in dict to not dict.get("interrupt") makes tests less brittle against Pydantic serialization behavior (whether optional None fields are included or excluded in model_dump()), which is a reasonable improvement rather than a design concern.

Suggestions

Consider using assert resumed_finished.get("interrupt") is None instead of assert not resumed_finished.get("interrupt"). The is None check precisely allows both key-absent and key-is-None cases while still catching unexpected falsy values like {} or False.
The assertion change from assert "interrupt" not in resumed_finished to assert not resumed_finished.get("interrupt") is slightly weaker: it now also passes when interrupt is present but falsy (e.g., {}, None, 0). If the intent is to verify the key is truly absent, the original form is more precise. If the intent is to accept both absence and falsy values, the new form is fine—just be aware of the semantic difference.
The assertion change from assert "interrupt" not in resumed_finished to assert not resumed_finished.get("interrupt") is semantically weaker — it now passes when interrupt is present but falsy (e.g., None, {}, 0). If the intent is to accept both absent and None, this is fine; if the field should truly be absent, the original assertion was stricter. Consider documenting the intent or using assert resumed_finished.get("interrupt") is None for clarity.

Automated review by moonbox3's agents

python/packages/ag-ui/tests/ag_ui/test_workflow_run.py

moonbox3 and others added 2 commits March 9, 2026 12:31

Apply pre-commit auto-fixes

2095d79

Copilot AI review requested due to automatic review settings March 9, 2026 03:35

moonbox3 self-assigned this Mar 9, 2026

markwallace-microsoft added the python label Mar 9, 2026

moonbox3 commented Mar 9, 2026

View reviewed changes

Copilot started reviewing on behalf of moonbox3 March 9, 2026 03:36 View session

Copilot AI reviewed Mar 9, 2026

View reviewed changes

python/packages/ag-ui/agent_framework_ag_ui/_workflow_run.py Show resolved Hide resolved

python/packages/ag-ui/tests/ag_ui/test_workflow_run.py Show resolved Hide resolved

moonbox3 commented Mar 9, 2026

View reviewed changes

python/packages/ag-ui/tests/ag_ui/test_workflow_run.py Show resolved Hide resolved

python/packages/ag-ui/tests/ag_ui/test_workflow_run.py Show resolved Hide resolved

python/packages/ag-ui/tests/ag_ui/test_workflow_run.py Show resolved Hide resolved

moonbox3 commented Mar 9, 2026

View reviewed changes

Merge branch 'main' into agent/fix-4546-1

fd312b8

moonbox3 requested a review from eavanvalkenburg March 9, 2026 11:15

Merge branch 'main' into agent/fix-4546-1

155f0a4

eavanvalkenburg approved these changes Mar 9, 2026

View reviewed changes

moonbox3 enabled auto-merge March 9, 2026 11:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Fix function_approval_response extraction in AG-UI workflow path#4550

Python: Fix function_approval_response extraction in AG-UI workflow path#4550
moonbox3 wants to merge 6 commits intomicrosoft:mainfrom
moonbox3:agent/fix-4546-1

moonbox3 commented Mar 9, 2026

Uh oh!

moonbox3 left a comment

Uh oh!

markwallace-microsoft commented Mar 9, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

moonbox3 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

moonbox3 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

moonbox3 commented Mar 9, 2026

Motivation and Context

Description

Contribution Checklist

Uh oh!

moonbox3 left a comment

Choose a reason for hiding this comment

Automated Code Review

Uh oh!

markwallace-microsoft commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python Unit Test Overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

moonbox3 left a comment

Choose a reason for hiding this comment

Automated Code Review

✓ Correctness

✓ Security Reliability

✓ Test Coverage

✓ Design Approach

Suggestions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

moonbox3 left a comment

Choose a reason for hiding this comment

Automated Code Review

✓ Correctness

✓ Security Reliability

✓ Test Coverage

✓ Design Approach

Suggestions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

markwallace-microsoft commented Mar 9, 2026 •

edited

Loading