fix(install): wire ADO --update preflight through PAT->bearer fallback (#1212)#1214
Conversation
#1212) Before this change, `apm install --update` raised AuthenticationError on a stale ADO_APM_PAT even when `az login` was active and `apm install` (no flag) worked with the same env. The preflight auth probe in pipeline.py was the only ADO call site that did NOT use the canonical `AuthResolver.execute_with_bearer_fallback` helper -- it open-coded a ls-remote and raised on 401/403 immediately. Changes: - Extract `is_ado_auth_failure_signal()` predicate into utils/github_host.py as the single source of truth for the 5-signal PAT-failure detection set (was duplicated across auth.py, pipeline.py, github_downloader.py, validation.py). - Rewrite `_preflight_auth_check` (pipeline.py) on top of `execute_with_bearer_fallback`. The bearer attempt uses a clean env via `_build_git_env(scheme="bearer")` so a rejected PAT cannot leak into the bearer probe via GIT_TOKEN. - Refactor `list_remote_refs` (github_downloader.py) onto the same canonical helper for symmetry with the clone path. - `AuthResolver.build_error_context` now takes a `bearer_also_failed` flag and prefixes the rendered error with a clear "AAD bearer fallback also failed" line when both creds were tried, so users know both paths were exercised. - Per-host dedup for the stale-PAT diagnostic so it surfaces once per cluster, not per dep. - Install hint expanded to `brew/winget/choco install azure-cli`. Anti-regression: - New `scripts/lint-auth-signals.sh` (bash 3.2 compatible) enforces: Rule A -- only `auth_resolver` may import bearer providers, and Rule B -- every `git ls-remote` site must have an `# auth-delegated` annotation pointing to its delegation strategy. Hooked into ci.yml. - Hermetic E2E test (test_ado_preflight_bearer_fallback_e2e.py) using PATH-injected fake `git` and `az` binaries, wired into both scripts/test-integration.sh and scripts/windows/test-integration.ps1. Tests: - 19 new predicate tests (test_github_host_predicate.py). - 3 new bearer-fallback cases in test_pipeline_auth_preflight.py (stale-PAT->bearer success, both fail, no-PAT-leak in bearer env). - 2 new hermetic E2E tests reproducing the #1212 scenario end-to-end. - All 7823 unit tests pass; new E2E tests pass. Refs: #1212 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Synthesizes recommendations from the local panel review (auth-expert, doc-writer, devx-ux, test-coverage, supply-chain-security): - list_remote_refs error path now propagates bearer_also_failed to build_error_context, matching install preflight UX. - build_error_context Case 4 retry hint now expands install steps with apt/dnf coverage for Linux users; Case 2 no longer renders the contradictory PAT-rejected prefix when no PAT was tried. - lint-auth-signals.sh Rule A now scans for any reference to get_bearer_provider (not just single-line imports), catching multi-line import blocks and module-attribute access. Clone path in github_downloader.py:_execute_transport_plan exempted with documented follow-up rationale (refactor onto execute_with_bearer_fallback is non-trivial in the tier-attempt loop). - Added unit tests for build_error_context(bearer_also_failed=True) prefix in Case 4, defensive Case 2 ignores the kwarg, and emit_stale_pat_diagnostic per-host dedup. - Tightened CHANGELOG entry per doc-writer (drop internal predicate refactor detail; lead with user-visible behavior). All 7826 unit tests pass, both hermetic E2E tests pass, ruff clean, auth-signal lint clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Local apm-review-panel pass: panelist findings addressedRan the full advisory panel locally (8 panelists in parallel). Zero blocking findings; addressed the highest-signal recommended items in Addressed in this push
Filed as follow-ups (out of scope for this hotfix)
Validation
Marking ready for review. |
The bearer_also_failed propagation in list_remote_refs pushed the file to 2407 lines (max 2400). Tightened the comments around _bearer_op SECURITY note and the ado_eligible branch to bring it back to 2400. No behavior change. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes an Azure DevOps authentication regression where apm install --update could fail preflight auth on a stale ADO_APM_PAT even when an active az login bearer token would allow the actual install to succeed. It centralizes the ADO auth-failure signal predicate, routes the --update preflight probe (and ref listing) through the canonical PAT-to-bearer fallback helper, adds a repo-level anti-regression lint, and adds unit/integration coverage.
Changes:
- Extracts an
is_ado_auth_failure_signal()helper and replaces duplicated string-signal checks across call sites. - Rewrites
installpreflight ADO probing andlist_remote_refsto useAuthResolver.execute_with_bearer_fallbackand attempts to surface a "bearer also failed" diagnostic signal. - Adds a shell-based lint gate for auth-protocol drift and introduces a hermetic E2E test (plus wiring into integration runners), and updates the changelog.
Show a summary per file
| File | Description |
|---|---|
src/apm_cli/utils/github_host.py |
Adds is_ado_auth_failure_signal() and a centralized signal list. |
src/apm_cli/install/pipeline.py |
Routes --update preflight git ls-remote through bearer fallback and adds verbose tracing. |
src/apm_cli/deps/github_downloader.py |
Refactors list_remote_refs() to use the canonical bearer fallback helper. |
src/apm_cli/install/validation.py |
Reuses the centralized auth-failure predicate and builds a clean bearer env via _build_git_env. |
src/apm_cli/core/auth.py |
Adds bearer_also_failed support in error context and dedups stale-PAT warnings per host. |
src/apm_cli/cache/git_cache.py |
Adds an # auth-delegated: annotation near a ls-remote site for the new lint. |
tests/unit/utils/test_github_host_predicate.py |
Adds unit tests for the centralized predicate. |
tests/unit/install/test_pipeline_auth_preflight.py |
Adds unit tests for preflight bearer fallback behavior and env cleanliness. |
tests/unit/test_list_remote_refs.py |
Adjusts downloader test wiring around execute_with_bearer_fallback. |
tests/unit/test_auth.py |
Adds tests for the new bearer_also_failed messaging and stale-PAT warning dedup. |
tests/integration/test_ado_preflight_bearer_fallback_e2e.py |
Adds a hermetic fake git/az E2E regression test. |
scripts/lint-auth-signals.sh |
New anti-regression lint for bearer-provider boundary and ls-remote annotation. |
.github/workflows/ci.yml |
Hooks the new auth-protocol lint into the Tier-1 lint job. |
scripts/test-integration.sh |
Runs the new hermetic E2E test on non-Windows integration runs. |
scripts/windows/test-integration.ps1 |
Runs the new E2E test in Windows integration runner. |
CHANGELOG.md |
Adds an Unreleased entry documenting the fix. |
Copilot's findings
Comments suppressed due to low confidence (1)
tests/integration/test_ado_preflight_bearer_fallback_e2e.py:79
- Same issue as above for
FAKE_AZ: the triple-quoted string literal starting here is a standalone expression and will be flagged by RuffB018. Either append it toFAKE_AZ(e.g., by including it inside the initial assignment) or remove it so the integration test file passes lint.
FAKE_AZ = r"""#!/usr/bin/env python3
"""
"""Fake az CLI for #1212 -- returns a fixed JWT.
Used by AzureCliBearerProvider.get_bearer_token. The JWT structure must
have three dot-separated base64url segments so ``str.startswith('eyJ')``
checks pass.
"""
- Files reviewed: 16/16 changed files
- Comments generated: 7
Address Copilot review feedback on PR #1214: * execute_with_bearer_fallback now returns BearerFallbackOutcome (NamedTuple) carrying outcome + bearer_attempted flag. Callers in install/pipeline.py and deps/github_downloader.py use the flag so bearer_also_failed only renders 'az cli bearer was also rejected' when the bearer leg actually ran -- not on early returns (az unavailable, JWT acquisition failed). * build_error_context docstring fixed: bearer_also_failed prefix applies to Case 4 (PAT set + az_available), not Case 2. * emit_stale_pat_diagnostic warning no longer hard-codes '(HTTP 401)' -- the predicate also matches 403 and other auth signals. * Windows test-integration.ps1 stops invoking the POSIX-only E2E (which skipif win32) and notes the unit suite covers the contract. No behavior change beyond diagnostic accuracy; all 7826 unit tests pass and lint is clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
APM Review Panel:
|
| Persona | B | R | N | Takeaway |
|---|---|---|---|---|
| Python Architect | 0 | 1 | 3 | Clean ADO auth-fallback fix; NamedTuple return type and SSOT predicate are idiomatic; two nits on type annotation precision. |
| CLI Logging Expert | 0 | 1 | 1 | One recommended: Case 4 body text misleads when bearer_also_failed=True (steps contradict the prefix). Rest of the logging changes are clean improvements. |
| DevX UX Expert | 0 | 1 | 2 | Good UX net-positive: platform-aware az-cli hints and per-host warn dedup. Two minor polish items: 'az cli' casing inconsistency and Case 4 error prefix missing host context. |
| Supply Chain Security | 0 | 1 | 2 | Bearer-env isolation is largely sound; one residual GIT_TOKEN inheritance gap in _build_git_env and a Rule B coverage blind-spot on clone calls worth addressing. |
| OSS Growth Hacker | 0 | 0 | 1 | Solid enterprise ADO retention fix; CHANGELOG is user-facing and clear; no adoption-harming surface touched. |
| Auth Expert | 0 | 1 | 1 | API contract correctly migrated; bearer env isolation sound; _stale_pat_warned_hosts has a TOCTOU dedup race under parallel install but no security impact. |
| Doc Writer | 0 | 1 | 2 | CHANGELOG entry is clean; auth docs already cover the --update preflight guarantee; two minor gaps worth addressing: 401-only wording in the stale-PAT fallback example and missing discoverability pointer for lint-auth-signals.sh. |
| Test Coverage Expert | 0 | 1 | 1 | Critical auth/install surfaces have unit + hermetic integration coverage; one gap: list_remote_refs bearer fallback_used=True branch is never exercised at any tier. |
B = blocking-severity findings, R = recommended, N = nits.
Counts are signal strength, not gates. The maintainer ships.
Top 5 follow-ups
- [Supply Chain Security] Add
env.pop('GIT_TOKEN', None)in_build_git_envbearer branch and a matching unit test asserting the env is clean when GIT_TOKEN pre-exists. -- Missing-test finding on asecure-by-defaultsurface: the stated isolation guarantee is not enforced when a stale GIT_TOKEN is already in the caller's environment. One-line fix + one test case; highest-signal item from the panel. - [Auth Expert] Guard the
_stale_pat_warned_hostscheck-then-add withself._lockto close the TOCTOU dedup race under parallel install. -- Correctness bug: two threads on the same ADO host can both pass theincheck before either callsadd(), emitting duplicate stale-PAT warnings and defeating the whole point of the dedup set. Three-line fix. - [Test Coverage Expert] Add a
parametrizevariant intest_list_remote_refs.pywhereexecute_with_bearer_fallbackreturnsBearerFallbackOutcome(result, True)and assertemit_stale_pat_diagnosticis called. -- Missing-coverage onsecure-by-default+devxprinciples: thefallback_used=Truebranch (stale-PAT diagnostic emission, correct ref return) has no test at any tier. A future refactor to the outcome shape or diagnostic call will silently break this path. - [CLI Logging Expert] Branch on
bearer_also_failedinside the Case 4 error block and includehost_displayin the prefix f-string (convergent fix with devx-ux finding). -- Whenbearer_also_failed=Truethe current body actively misleads: "Azure CLI MAY also be available" is false and Step 1 ("unset ADO_APM_PAT") is busywork. The devx-ux finding adds that the host name is absent from the prefix, leaving multi-tenant users unable to identify which ADO tenant failed. One edit resolves both. - [Doc Writer] Add a sentence to CONTRIBUTING.md pointing to
scripts/lint-auth-signals.shand explaining the PAT->bearer boundary rule and exemption list process. -- The lint script is currently undiscoverable to contributors adding new ADO call sites. Without this pointer, the next developer to add a rawls-remotecall has no path to learn about the rule, making the anti-regression guard fragile in practice.
Architecture
classDiagram
direction LR
class AuthResolver {
<<Strategy>>
+execute_with_bearer_fallback(dep_ref, primary_op, bearer_op, is_auth_failure) BearerFallbackOutcome
+build_error_context(host, op, *, bearer_also_failed) str
+emit_stale_pat_diagnostic(host_display) None
+_build_git_env(token, scheme, host_kind) dict
-_stale_pat_warned_hosts set
}
class BearerFallbackOutcome {
<<ValueObject>>
+outcome object
+bearer_attempted bool
}
class AzureCliBearerProvider {
<<ConcreteStrategy>>
+is_available() bool
+get_bearer_token() str
+clear_cache() None
}
class GitHubDownloader {
<<Collaborator>>
+list_remote_refs(dep_ref) list
+auth_resolver AuthResolver
}
class InstallPipeline {
<<Collaborator>>
+_preflight_auth_check(ctx, auth_resolver, verbose) None
}
class DependencyReference {
<<ValueObject>>
+host str
+repo_url str
+is_azure_devops() bool
}
class is_ado_auth_failure_signal {
<<Pure>>
+__call__(text) bool
}
note for AuthResolver "Chain of Responsibility: PAT primary_op -> auth-failure? -> bearer_op. Returns BearerFallbackOutcome(outcome, bearer_attempted)"
note for is_ado_auth_failure_signal "SSOT for ADO auth signals: 401 | 403 | authentication failed | unauthorized | could not read username"
AuthResolver ..> BearerFallbackOutcome : returns
AuthResolver ..> AzureCliBearerProvider : delegates bearer
AuthResolver ..> is_ado_auth_failure_signal : uses
GitHubDownloader ..> AuthResolver : calls execute_with_bearer_fallback
InstallPipeline ..> AuthResolver : calls execute_with_bearer_fallback
GitHubDownloader ..> DependencyReference : reads
InstallPipeline ..> DependencyReference : reads
class BearerFallbackOutcome:::touched
class AuthResolver:::touched
class is_ado_auth_failure_signal:::touched
class GitHubDownloader:::touched
class InstallPipeline:::touched
classDef touched fill:#fff3b0,stroke:#d47600
flowchart TD
A(["apm install --update"])
B["_preflight_auth_check\ninstall/pipeline.py"]
C{"dep.is_azure_devops()\n&& scheme==basic\n&& source==ADO_APM_PAT?"}
D["primary_op()\ngit ls-remote PAT URL"]
E{"is_ado_auth_failure_signal\n(stderr)?"}
F["BearerFallbackOutcome\n(result, False)"]
G["AuthResolver.execute_with_bearer_fallback\ncore/auth.py"]
H["AzureCliBearerProvider.is_available()\naz account get-access-token"]
I{"az available\n& JWT acquired?"}
J["bearer_op(bearer)\ngit ls-remote bearer URL\n_build_git_env(scheme='bearer')\nCLEAN env -- no GIT_TOKEN leak"]
K{"bearer outcome\nauth failure?"}
L["emit_stale_pat_diagnostic\n-- dedup per host"]
M["BearerFallbackOutcome\n(bearer_result, True) -- SUCCESS"]
N["BearerFallbackOutcome\n(primary, True) -- BOTH FAILED"]
O["bearer_also_failed=True\nbuild_error_context\nraise AuthenticationError"]
P["continue -- preflight passed"]
Q["non-ADO or no PAT\nprimary_op only"]
A --> B
B --> C
C -- yes --> D
C -- no --> Q
D --> E
E -- no --> F
F --> P
E -- yes --> G
G --> H
H --> I
I -- no --> F
I -- yes --> J
J --> K
K -- no --> L
L --> M
M --> P
K -- yes --> N
N --> O
Q --> P
sequenceDiagram
participant CLI as apm install --update
participant PF as _preflight_auth_check
participant AR as AuthResolver
participant GIT as git ls-remote
participant AZ as az (fake/real)
CLI->>PF: ctx, auth_resolver
PF->>AR: execute_with_bearer_fallback(dep, _primary_op, _bearer_op, _is_auth_failure)
AR->>GIT: ls-remote --heads PAT-URL [GIT_TOKEN=stale-pat]
GIT-->>AR: rc=128 stderr='401 Unauthorized'
AR->>AR: is_ado_auth_failure_signal('401') -> True
AR->>AZ: get-access-token --resource guid
AZ-->>AR: eyJAA...jwt
AR->>GIT: ls-remote --heads bearer-URL [GIT_CONFIG_VALUE_0=Bearer eyJ...]
GIT-->>AR: rc=0 refs/heads/main
AR->>AR: emit_stale_pat_diagnostic(host) -- dedup
AR-->>PF: BearerFallbackOutcome(result_ok, bearer_attempted=True)
PF->>CLI: return (no raise)
Recommendation
Ship this PR. No panelist found a blocking issue; the auth wiring, env isolation structure, SSOT predicate, and lint guard are all sound net-positives. The five follow-ups above are concrete, low-friction, and well-scoped for a fast follow PR -- the GIT_TOKEN strip and TOCTOU lock fix in particular are one-to-three-line changes that should be bundled together. Recommend tracking them in a single follow-up issue referencing #1212 so they do not drift.
Full per-persona findings
Python Architect
-
[nit] BearerFallbackOutcome.outcome typed as
object; should be TypeVar/Generic for type-safe unwrapping atsrc/apm_cli/core/auth.py
NamedTuple fields typed asobjectlose all downstream type inference. A Generic NamedTuple (BearerFallbackOutcome[T]) or a TypeVar bound to the union of expected outcome shapes would let callers skip cast() and catch misuse at lint time rather than runtime.
Suggested:from typing import Generic, TypeVar; T_co = TypeVar('T_co', covariant=True); class BearerFallbackOutcome(NamedTuple, Generic[T_co]): outcome: T_co; bearer_attempted: bool -
[nit] _stale_pat_warned_hosts annotated as bare
set; should beset[str]atsrc/apm_cli/core/auth.py
APM code quality standard requires type hints on all public APIs. The baresetannotation propagates through mypy asset[Unknown], losing host-display string checks.
Suggested:self._stale_pat_warned_hosts: set[str] = set() -
[nit] bearer_also_failed re-evaluates is_ado_auth_failure_signal after _is_auth_failure closure already checked the same text at
src/apm_cli/install/pipeline.py
Minor redundancy; does not affect correctness but could cause confusion when the predicate is extended.
Suggested:bearer_also_failed = fallback_result.bearer_attempted and _is_auth_failure(result) -
[recommended] Design patterns record (Architecture Reviewer note)
Patterns used: Dataclass-as-value-object (BearerFallbackOutcome NamedTuple), Extract-when-shared (is_ado_auth_failure_signal to utils/github_host.py consolidates 4+ open-coded copies), Strategy (execute_with_bearer_fallback returns typed outcome envelope; callers own outcome shape, resolver owns fallback protocol). The two exempt call sites in validation.py and github_downloader.py are correctly tracked as follow-ups.
CLI Logging Expert
-
[recommended] Case 4 error body is stale / contradicts its own bearer_also_failed prefix at
src/apm_cli/core/auth.py
When bearer_also_failed=True the prefix correctly says "ADO_APM_PAT was rejected; az cli bearer was also rejected." but the paragraph that follows still says "Azure CLI credentials MAY also be available" (they were tried and rejected -- "may" is false) and Step 1 says "Unset the PAT to test Azure CLI auth only" -- the bearer path has already been tested, so Step 1 gives the user busywork. The only actionable step when both legs fail is Step 2 (az login).
Suggested: Branch on bearer_also_failed inside the Case 4 block. When True: suppress Step 1, say "Both ADO_APM_PAT and Azure CLI bearer were tried and rejected." and keep only Step 2 (az login) and Step 3 (retry). -
[nit] Verbose trace token 'accepted' is not in STATUS_SYMBOLS vocabulary at
src/apm_cli/install/pipeline.py
pipeline.py emits_trace(f'Preflight: {host_display} -- accepted')on success. The established vocabulary uses 'ok', 'passed', or a [+] symbol prefix.
Suggested: Change to:_trace(f'Preflight: {host_display} -- ok')
DevX UX Expert
-
[recommended] Case 4 error prefix drops the host name, leaving users guessing which ADO tenant failed at
src/apm_cli/core/auth.py
The new stale-PAT warning correctly includesfor {host_display}so users know which host to fix. But the Case 4 prepended line is a static string with no host context. If a user has deps on two ADO tenants and one fails both legs, the error message gives no hint about which host to look at.
Suggested: Change the static prefix to an f-string:f' ADO_APM_PAT was rejected for {host_display}; Azure CLI bearer was also rejected.\n\n' -
[nit] 'az cli bearer' is lowercase in new messages; rest of codebase uses 'Azure CLI' at
src/apm_cli/core/auth.py
The new warning string uses 'az cli' lowercase while every other user-facing string in the file uses 'Azure CLI'. Inconsistent product name casing can confuse first-time users who Google the error.
Suggested: Replace 'az cli bearer' with 'Azure CLI bearer' in both the emit_stale_pat_diagnostic message and the Case 4 prefix string. -
[nit] Step 1 recovery hint 'unset ADO_APM_PAT' is Unix-only after PR adds Windows-aware az-cli install hints at
src/apm_cli/core/auth.py
The PR correctly improved the az-cli install hints to show brew/winget/apt/dnf per platform. But step 1 of the same error block still shows 'unset ADO_APM_PAT' which is Bash-only. Windows users on PowerShell need 'Remove-Item Env:ADO_APM_PAT'.
Suggested: Add a parenthetical:unset ADO_APM_PAT (PowerShell: Remove-Item Env:ADO_APM_PAT)
Supply Chain Security
-
[recommended] _build_git_env(scheme='bearer') does not strip a pre-existing GIT_TOKEN from the inherited environment at
src/apm_cli/core/auth.py:731
The function starts withenv = os.environ.copy()and correctly avoids setting GIT_TOKEN in the bearer branch, but never removes a pre-existing GIT_TOKEN that may already be present in the caller's environment. If a staleGIT_TOKEN=<PAT>is in the shell environment the value propagates into every bearer-mode subprocess call, contradicting the stated security guarantee.
Suggested: Addenv.pop('GIT_TOKEN', None)immediately after the bearer branch guard so the clean-env invariant holds unconditionally.
Proof (missing):tests/test_auth.py-- proves: Bearer env is clean of GIT_TOKEN even when the parent process environment contains one -
[nit] Lint Rule B covers ls-remote only; raw git clone calls in src/ are not caught at
scripts/lint-auth-signals.sh
A new raw git clone invocation added outside core/auth.py would bypass the Rule B check entirely. The exempted github_downloader.py clone path is the current risk surface, but the lint gap means a future contributor could introduce an unauthenticated clone call without any CI signal.
Suggested: Add a parallel Rule C block scanning for subprocess git clone calls requiring the same auth-delegated annotation or execute_with_bearer_fallback proximity. -
[nit] Lint exemption list carries no issue reference or expiry mechanism at
scripts/lint-auth-signals.sh
The two exempted files are described as 'refactor tracked as follow-up to [BUG]apm install -g --updatefails with a DevOps auth error whereapm install -gsucceeds #1212' but the lint script contains no machine-checkable link to a tracking issue. Without one the exemptions are invisible to future maintainers and may persist indefinitely.
Suggested: Add a comment line per exemption with a GitHub issue URL, e.g.:# TODO(#NNNN): remove once validation.py is migrated to execute_with_bearer_fallback
OSS Growth Hacker
- [nit] CHANGELOG entry omits the cross-platform error-message improvement mentioned in the PR body at
CHANGELOG.md
The PR body calls out that platform-specific install commands (brew/winget/apt/dnf) now appear on auth failure instead of a Windows-only URL -- a first-run friction win for Linux/macOS users. That story is absent from the CHANGELOG, losing a quick win for release notes and social copy.
Suggested: Append a second bullet:- apm install auth-failure hint now shows the platform-appropriate install command (brew / winget / apt / dnf) instead of a Windows-only URL. (#1212)
Auth Expert
-
[recommended] _stale_pat_warned_hosts check-then-add is not protected by self._lock, creating a TOCTOU dedup race under parallel install at
src/apm_cli/core/auth.py:758
self._lockguardsself._cachebut notself._stale_pat_warned_hosts.emit_stale_pat_diagnosticperforms an unguardedif X in set: return; set.add(X)pattern. Under ThreadPoolExecutor parallel install, two threads for different ADO deps on the same host can both pass theincheck before either callsadd(), emitting duplicate stale-PAT warnings. Not a security issue, but the dedup is the whole point of the set.
Suggested: Guard the check-then-add withself._lock:with self._lock: if host_display in self._stale_pat_warned_hosts: return; self._stale_pat_warned_hosts.add(host_display) -
[nit] 403 in _ADO_AUTH_FAILURE_SIGNALS will match any response body containing the string '403' at
src/apm_cli/utils/github_host.py:281
is_ado_auth_failure_signaldoes a substring match on lowered text. '403' as a bare substring will match ADO error codes, commit hashes, or repo names that happen to contain '403'. Callers gate on additional context (host is ADO, token was presented), limiting blast radius, but the signal itself is noisier than a targeted 'http 403' match.
Suggested: Consider'http 403'or' 403 '(space-padded) to reduce false-positive substring matches on arbitrary ADO error output.
Doc Writer
-
[nit] Stale-PAT fallback docs say 'HTTP 401' but the fix covers 401/403 at
docs/src/content/docs/getting-started/authentication.md:182
authentication.md reads 'rejected (HTTP 401)' and the example output block shows '(HTTP 401)'. The CHANGELOG for [BUG]apm install -g --updatefails with a DevOps auth error whereapm install -gsucceeds #1212 states the preflight now handles 401/403. A user who hits a 403 (insufficient scope, not expired) will not recognise the description in the docs as matching their situation.
Suggested: Change 'rejected (HTTP 401)' to 'rejected (HTTP 401/403)' and update the example output block accordingly. -
[recommended] scripts/lint-auth-signals.sh is undiscoverable from docs and CONTRIBUTING at
scripts/lint-auth-signals.sh
The script enforces the PAT->bearer boundary as an anti-regression CI check. Its exemption list is managed inside the script itself. Developers adding new ADO call sites have no obvious path to learn about the rule or the exemption list unless they happen to read the CI workflow diff.
Suggested: Add a sentence to CONTRIBUTING.md (or to the 'Stale-PAT fallback' section in authentication.md) pointing to scripts/lint-auth-signals.sh and explaining that new ADO call sites must either route through execute_with_bearer_fallback or be added to the exemption list with a tracked follow-up. -
[nit] CHANGELOG entry uses 'stale' inconsistently with the 401/403 trigger at
CHANGELOG.md:22
The CHANGELOG says 'falls back from a stale ADO_APM_PAT'. A PAT rejected with 403 is not necessarily stale -- it may have insufficient scope. The word 'stale' implies expiry, not rejection in general.
Suggested: Change 'a stale ADO_APM_PAT' to 'a rejected ADO_APM_PAT (401 or 403)' for consistency with the auth docs.
Test Coverage Expert
-
[recommended] list_remote_refs bearer fallback_used=True branch has no test at any tier at
tests/unit/test_list_remote_refs.py
test_list_remote_refs.py wires execute_with_bearer_fallback with a side_effect that always returns BearerFallbackOutcome(primary_op(), False) -- fallback_used is hardcoded False. The branch in the caller that handles a successful bearer retry (fallback_used=True) is never exercised. If the fallback_used=True path drifts (e.g. stale-PAT warning emission, emit_stale_pat_diagnostic call, or .outcome attribute access changes), no test will catch it.
Suggested: Add a parametrize variant where the mock execute_with_bearer_fallback side_effect returns BearerFallbackOutcome(result, True) and assert that emit_stale_pat_diagnostic is called and the return value is correct.
Proof (missing at integration-with-fixtures):tests/unit/test_list_remote_refs.py::test_list_remote_refs_ado_bearer_fallback_used-- proves: When git ls-remote fails with a stale ADO PAT and bearer succeeds, list_remote_refs returns the correct refs and emits the stale-PAT diagnostic [secure-by-default,devx] -
[nit] Hermetic E2E drives _preflight_auth_check directly, not the full apm install --update CLI argv path at
tests/integration/test_ado_preflight_bearer_fallback_e2e.py
test_ado_preflight_bearer_fallback_e2e.py calls _preflight_auth_check directly. The e2e floor for CLI command surface requires a real CLI invocation. Gap is minor because preflight is well unit-tested and the hermetic test is real-subprocess.
Suggested: Consider adding one parametrized case to test_ado_bearer_e2e.py that invokes apm via subprocess with ADO_APM_PAT set to a stale value and az faked to succeed, asserting exit code 0.
This panel is advisory. It does not block merge. Re-apply the
panel-review label after addressing feedback to re-run.
Generated by PR Review Panel for issue #1214 · ● 4.8M · ◷
… dedup Two PR #1214 panel follow-ups: - Supply Chain Security: `_build_git_env` bearer branch now explicitly drops any pre-existing `GIT_TOKEN` from the env copied from `os.environ`. Without this, a stale GIT_TOKEN set by a prior shell or CI step would survive the `os.environ.copy()` and silently defeat the bearer-isolation guarantee (the JWT is meant to flow only via `GIT_CONFIG_VALUE_0`). New unit test `TestBuildGitEnvBearerIsolation` pins the clean-env behaviour. - Auth Expert: `emit_stale_pat_diagnostic` check-then-add of `_stale_pat_warned_hosts` is now wrapped in `self._lock`, closing the TOCTOU race where two threads in a parallel install on the same ADO host could both pass the membership check before either added. Without the lock the dedup set defeated its own purpose. New concurrency test `test_concurrent_same_host_emits_once` (64 threads on a single host -> one warning) provides the regression trap. CHANGELOG: extended the existing #1212 line under [Unreleased] to call out both follow-ups; no new entry to keep one-line-per-PR discipline. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ostic (#1219) - Clarify that the stale-PAT->bearer fallback applies to 'apm install --update' preflight (previously undocumented; behavior fixed in #1214) - Add the 'AAD bearer fallback also failed' diagnostic so users know both paths were attempted and understand which credential to refresh Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ostic (#1219) - Clarify that the stale-PAT->bearer fallback applies to 'apm install --update' preflight (previously undocumented; behavior fixed in #1214) - Add the 'AAD bearer fallback also failed' diagnostic so users know both paths were attempted and understand which credential to refresh Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…crosoft#1189) * fix: normalise flat-format dependencies for transitive resolution Flat list dependencies (dependencies: [...]) were silently dropped during parsing because from_apm_yml() only handled the dict form (dependencies: {apm: [...]}). The isinstance(..., dict) guard caused the parser to skip flat lists entirely, leaving dependencies as None and preventing transitive resolution. Normalise flat lists to {apm: [...]} at parse time with a DeprecationWarning guiding authors to the structured format. Apply the same fix for devDependencies. Add defensive normalisation in _apm_yml_writer.py to prevent AttributeError when reading raw YAML with flat-format dependencies. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * refactor: reject flat-format dependencies with clear error instead of normalising Replace the DeprecationWarning + silent normalisation with a hard ValueError that tells authors exactly how to fix their apm.yml. The flat list format was never documented or supported -- silently accepting it is worse than failing loudly with actionable guidance. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: reject all non-dict dependency types, not just lists Address Copilot review feedback: the previous fix only rejected lists but silently ignored strings, ints, and other scalars -- recreating the original silent-drop failure mode. Now any non-dict value raises ValueError with the actual type name. Also hardens _apm_yml_writer.py to handle dependencies: null and other scalar types. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: target-agnostic local-bundle install (microsoft#1207) (microsoft#1217) * fix: target-agnostic local-bundle install (microsoft#1207) apm pack no longer hardcodes pack.target: copilot. Bundles now ship in the Anthropic plugin layout as a transport convention only -- the consumer target is resolved by apm install from project context and primitives are routed per the resolved target's layout. Defects fixed (per microsoft#1207 RCA): - D1: pack records pack.target = "all" by default (or the auto-detected target). The flag remains for back-compat but is no longer the authoritative binding. - D2.a: plugin.json is skipped on install regardless of casing in the bundle manifest. .mcp.json is also skipped from the verbatim deploy loop -- MCP wiring is surfaced as a follow-up notice. - D2.b: instructions/*.md for compile-only targets (opencode, codex, gemini) are now staged under apm_modules/<slug>/.apm/instructions/ so apm compile picks them up. Slug is validated through validate_path_segments and the destination is checked with ensure_path_within to block traversal escapes. - D3: the local-bundle install path sets summary_rendered = True before returning so the outer finally-block no longer prints a misleading "Install interrupted" line on success. check_target_mismatch returns None when the bundle records "all", so target-agnostic bundles never warn against any consumer layout. Tests: - tests/unit/install/test_install_local_bundle_issue1207.py (13 cases) - tests/integration/test_install_local_bundle_e2e.py TestInstallLocalBundleIssue1207: 7-row pack -> install matrix over copilot, claude, cursor, opencode, codex, gemini plus a multi-target consumer asserting both native and staged deployment paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore: stamp CHANGELOG with PR microsoft#1217 * Address panel review findings for microsoft#1207 - Case-insensitive .mcp.json skip in deploy loop and fallback walk (services.py) so case-folding filesystems cannot smuggle a renamed MCP file past the skip. - Tighten staged_instructions disjunct in local_bundle_handler.py (drop dead first disjunct that always triggered for compile-only targets) and case-insensitive bundle_mcp detection. - Compile hint now names the resolved target and primitive count. - "Share with: apm install <bundle>" success line in apm pack output. - Unit test class TestMcpJsonNeverDeployed covers case variants. - e2e compile-hint assertion uses whitespace-collapsed output to survive logger line-wrap. - Docs sweep: * pack-distribute.md: rewrote "Targeting mental model" as target-agnostic; removed --target examples; simplified cross- target -> plugin layout normalization. * cli-commands.md: marked apm pack --target deprecated; updated bundle install section; removed target-filter table. * lockfile-spec.md: pack.target is deprecated/optional informational metadata; bundle_files documented. * enterprise/security.md: added apm_modules/ allowed prefix with slug-validation note; new "Local bundle install trust model" subsection. * apm-usage skill (commands.md, workflow.md): pack.target is informational; OpenCode added to compile-needed list; new "Local bundle install" subsection. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Iterate on panel review: harden slug validation + cleanup Address apm-review-panel findings on PR microsoft#1217: - sec-1: Enforce documented [A-Za-z0-9._-] slug whitelist in install/services.py before validate_path_segments, with explicit rejection of leading/trailing dots, '..', forward slashes, null bytes, and whitespace. Aligns code with security.md trust model. - sec-3: Add parameterized adversarial slug test cases (forward slash, null byte, leading/trailing dot, '@', whitespace). - py-arch-2: Hoist plugin.json / .mcp.json case-insensitive metadata skip above the per-target deploy loop so multi-target installs do not inflate the skipped counter once per target. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address PR microsoft#1217 review: ASCII slug, MCP wiring, doc audit - ASCII-only slug whitelist - Preserve nested instruction subdirs in stage_root - Pack success line mentions embedded apm.lock.yaml - Strengthened plugin.json leak assertion (whole-tree walk) - URL trailing-dot fix in security/marketplace error messages - Auto-wire bundle .mcp.json through MCPIntegrator (multi-target) so non-Claude harnesses get servers in their native MCP config - New TestBundleMcpWiring unit suite (parse + wire helpers) - Doc audit: pack-distribute, lockfile-spec, plugins, cli-commands, first-package, security, CHANGELOG Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * tests: add e2e coverage for bundle .mcp.json -> MCPIntegrator boundary Four e2e cases covering the wiring path that was previously only covered at the unit level: - bundle .mcp.json reaches MCPIntegrator.install with the bundle's servers and the resolved-target CSV passed via explicit_target - bundle without .mcp.json does not invoke the integrator (no spurious 'No MCP dependencies found' warnings on every install) - integrator failure does not break the install (file deploys are not undone by an MCP wiring hiccup) - dry-run never fires the integrator (zero side effects on the consumer's MCP config) Per-target file writes (Claude project .mcp.json, .vscode/mcp.json, .cursor/mcp.json) are owned by MCPIntegrator's own suite; testing them here would require installed runtime binaries on the CI host. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: fix broken anchor #authoring-workflow -> #plugin-authoring Deploy Docs CI fails link validation; the actual heading is 'Plugin authoring' at the top of guides/plugins.md. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Daniel Meppiel <copilot-rework@github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Daniel Meppiel <[email protected]> * docs(auth): document ADO --update bearer fallback and dual-fail diagnostic (microsoft#1219) - Clarify that the stale-PAT->bearer fallback applies to 'apm install --update' preflight (previously undocumented; behavior fixed in microsoft#1214) - Add the 'AAD bearer fallback also failed' diagnostic so users know both paths were attempted and understand which credential to refresh Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: surface malformed transitive manifests and add integration tests Address reviewer feedback: - Resolver catch site in _try_load_dependency_package now only catches FileNotFoundError (expected for missing manifests); ValueError propagates to the BFS result loop where it is logged at WARNING level, making malformed transitive deps user-visible. - build_dependency_tree root-parse catch site now logs a warning instead of silently returning an empty tree. - Writer error message includes structured-format hint matching the model-layer quality. - Integration test (CliRunner): flat-list and string deps both exit non-zero with actionable error including 'expected a mapping' and structured-format example. - Resolver test: transitive dep with flat-list deps surfaces WARNING and resolution continues for valid siblings. - CHANGELOG entry under [Unreleased] Fixed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Sergio Sisternes <sergio.sisternes@epam.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Daniel Meppiel <51440732+danielmeppiel@users.noreply.github.com> Co-authored-by: Daniel Meppiel <copilot-rework@github.com>
#1212) (#1214) * fix(install): wire ADO --update preflight through PAT->bearer fallback (#1212) Before this change, `apm install --update` raised AuthenticationError on a stale ADO_APM_PAT even when `az login` was active and `apm install` (no flag) worked with the same env. The preflight auth probe in pipeline.py was the only ADO call site that did NOT use the canonical `AuthResolver.execute_with_bearer_fallback` helper -- it open-coded a ls-remote and raised on 401/403 immediately. Changes: - Extract `is_ado_auth_failure_signal()` predicate into utils/github_host.py as the single source of truth for the 5-signal PAT-failure detection set (was duplicated across auth.py, pipeline.py, github_downloader.py, validation.py). - Rewrite `_preflight_auth_check` (pipeline.py) on top of `execute_with_bearer_fallback`. The bearer attempt uses a clean env via `_build_git_env(scheme="bearer")` so a rejected PAT cannot leak into the bearer probe via GIT_TOKEN. - Refactor `list_remote_refs` (github_downloader.py) onto the same canonical helper for symmetry with the clone path. - `AuthResolver.build_error_context` now takes a `bearer_also_failed` flag and prefixes the rendered error with a clear "AAD bearer fallback also failed" line when both creds were tried, so users know both paths were exercised. - Per-host dedup for the stale-PAT diagnostic so it surfaces once per cluster, not per dep. - Install hint expanded to `brew/winget/choco install azure-cli`. Anti-regression: - New `scripts/lint-auth-signals.sh` (bash 3.2 compatible) enforces: Rule A -- only `auth_resolver` may import bearer providers, and Rule B -- every `git ls-remote` site must have an `# auth-delegated` annotation pointing to its delegation strategy. Hooked into ci.yml. - Hermetic E2E test (test_ado_preflight_bearer_fallback_e2e.py) using PATH-injected fake `git` and `az` binaries, wired into both scripts/test-integration.sh and scripts/windows/test-integration.ps1. Tests: - 19 new predicate tests (test_github_host_predicate.py). - 3 new bearer-fallback cases in test_pipeline_auth_preflight.py (stale-PAT->bearer success, both fail, no-PAT-leak in bearer env). - 2 new hermetic E2E tests reproducing the #1212 scenario end-to-end. - All 7823 unit tests pass; new E2E tests pass. Refs: #1212 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(install): address apm-review-panel findings on #1212 Synthesizes recommendations from the local panel review (auth-expert, doc-writer, devx-ux, test-coverage, supply-chain-security): - list_remote_refs error path now propagates bearer_also_failed to build_error_context, matching install preflight UX. - build_error_context Case 4 retry hint now expands install steps with apt/dnf coverage for Linux users; Case 2 no longer renders the contradictory PAT-rejected prefix when no PAT was tried. - lint-auth-signals.sh Rule A now scans for any reference to get_bearer_provider (not just single-line imports), catching multi-line import blocks and module-attribute access. Clone path in github_downloader.py:_execute_transport_plan exempted with documented follow-up rationale (refactor onto execute_with_bearer_fallback is non-trivial in the tier-attempt loop). - Added unit tests for build_error_context(bearer_also_failed=True) prefix in Case 4, defensive Case 2 ignores the kwarg, and emit_stale_pat_diagnostic per-host dedup. - Tightened CHANGELOG entry per doc-writer (drop internal predicate refactor detail; lead with user-visible behavior). All 7826 unit tests pass, both hermetic E2E tests pass, ruff clean, auth-signal lint clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(install): trim github_downloader.py to file-length guardrail The bearer_also_failed propagation in list_remote_refs pushed the file to 2407 lines (max 2400). Tightened the comments around _bearer_op SECURITY note and the ado_eligible branch to bring it back to 2400. No behavior change. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(auth): tighten bearer-fallback diagnostics from PR #1214 review Address Copilot review feedback on PR #1214: * execute_with_bearer_fallback now returns BearerFallbackOutcome (NamedTuple) carrying outcome + bearer_attempted flag. Callers in install/pipeline.py and deps/github_downloader.py use the flag so bearer_also_failed only renders 'az cli bearer was also rejected' when the bearer leg actually ran -- not on early returns (az unavailable, JWT acquisition failed). * build_error_context docstring fixed: bearer_also_failed prefix applies to Case 4 (PAT set + az_available), not Case 2. * emit_stale_pat_diagnostic warning no longer hard-codes '(HTTP 401)' -- the predicate also matches 403 and other auth signals. * Windows test-integration.ps1 stops invoking the POSIX-only E2E (which skipif win32) and notes the unit suite covers the contract. No behavior change beyond diagnostic accuracy; all 7826 unit tests pass and lint is clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(auth): pop stale GIT_TOKEN in bearer env and lock-guard stale-PAT dedup Two PR #1214 panel follow-ups: - Supply Chain Security: `_build_git_env` bearer branch now explicitly drops any pre-existing `GIT_TOKEN` from the env copied from `os.environ`. Without this, a stale GIT_TOKEN set by a prior shell or CI step would survive the `os.environ.copy()` and silently defeat the bearer-isolation guarantee (the JWT is meant to flow only via `GIT_CONFIG_VALUE_0`). New unit test `TestBuildGitEnvBearerIsolation` pins the clean-env behaviour. - Auth Expert: `emit_stale_pat_diagnostic` check-then-add of `_stale_pat_warned_hosts` is now wrapped in `self._lock`, closing the TOCTOU race where two threads in a parallel install on the same ADO host could both pass the membership check before either added. Without the lock the dedup set defeated its own purpose. New concurrency test `test_concurrent_same_host_emits_once` (64 threads on a single host -> one warning) provides the regression trap. CHANGELOG: extended the existing #1212 line under [Unreleased] to call out both follow-ups; no new entry to keep one-line-per-PR discipline. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Daniel Meppiel <copilot-rework@github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: danielmeppiel <danielmeppiel@users.noreply.github.com>
…ostic (#1219) - Clarify that the stale-PAT->bearer fallback applies to 'apm install --update' preflight (previously undocumented; behavior fixed in #1214) - Add the 'AAD bearer fallback also failed' diagnostic so users know both paths were attempted and understand which credential to refresh Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
) * fix: normalise flat-format dependencies for transitive resolution Flat list dependencies (dependencies: [...]) were silently dropped during parsing because from_apm_yml() only handled the dict form (dependencies: {apm: [...]}). The isinstance(..., dict) guard caused the parser to skip flat lists entirely, leaving dependencies as None and preventing transitive resolution. Normalise flat lists to {apm: [...]} at parse time with a DeprecationWarning guiding authors to the structured format. Apply the same fix for devDependencies. Add defensive normalisation in _apm_yml_writer.py to prevent AttributeError when reading raw YAML with flat-format dependencies. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * refactor: reject flat-format dependencies with clear error instead of normalising Replace the DeprecationWarning + silent normalisation with a hard ValueError that tells authors exactly how to fix their apm.yml. The flat list format was never documented or supported -- silently accepting it is worse than failing loudly with actionable guidance. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: reject all non-dict dependency types, not just lists Address Copilot review feedback: the previous fix only rejected lists but silently ignored strings, ints, and other scalars -- recreating the original silent-drop failure mode. Now any non-dict value raises ValueError with the actual type name. Also hardens _apm_yml_writer.py to handle dependencies: null and other scalar types. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: target-agnostic local-bundle install (#1207) (#1217) * fix: target-agnostic local-bundle install (#1207) apm pack no longer hardcodes pack.target: copilot. Bundles now ship in the Anthropic plugin layout as a transport convention only -- the consumer target is resolved by apm install from project context and primitives are routed per the resolved target's layout. Defects fixed (per #1207 RCA): - D1: pack records pack.target = "all" by default (or the auto-detected target). The flag remains for back-compat but is no longer the authoritative binding. - D2.a: plugin.json is skipped on install regardless of casing in the bundle manifest. .mcp.json is also skipped from the verbatim deploy loop -- MCP wiring is surfaced as a follow-up notice. - D2.b: instructions/*.md for compile-only targets (opencode, codex, gemini) are now staged under apm_modules/<slug>/.apm/instructions/ so apm compile picks them up. Slug is validated through validate_path_segments and the destination is checked with ensure_path_within to block traversal escapes. - D3: the local-bundle install path sets summary_rendered = True before returning so the outer finally-block no longer prints a misleading "Install interrupted" line on success. check_target_mismatch returns None when the bundle records "all", so target-agnostic bundles never warn against any consumer layout. Tests: - tests/unit/install/test_install_local_bundle_issue1207.py (13 cases) - tests/integration/test_install_local_bundle_e2e.py TestInstallLocalBundleIssue1207: 7-row pack -> install matrix over copilot, claude, cursor, opencode, codex, gemini plus a multi-target consumer asserting both native and staged deployment paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * chore: stamp CHANGELOG with PR #1217 * Address panel review findings for #1207 - Case-insensitive .mcp.json skip in deploy loop and fallback walk (services.py) so case-folding filesystems cannot smuggle a renamed MCP file past the skip. - Tighten staged_instructions disjunct in local_bundle_handler.py (drop dead first disjunct that always triggered for compile-only targets) and case-insensitive bundle_mcp detection. - Compile hint now names the resolved target and primitive count. - "Share with: apm install <bundle>" success line in apm pack output. - Unit test class TestMcpJsonNeverDeployed covers case variants. - e2e compile-hint assertion uses whitespace-collapsed output to survive logger line-wrap. - Docs sweep: * pack-distribute.md: rewrote "Targeting mental model" as target-agnostic; removed --target examples; simplified cross- target -> plugin layout normalization. * cli-commands.md: marked apm pack --target deprecated; updated bundle install section; removed target-filter table. * lockfile-spec.md: pack.target is deprecated/optional informational metadata; bundle_files documented. * enterprise/security.md: added apm_modules/ allowed prefix with slug-validation note; new "Local bundle install trust model" subsection. * apm-usage skill (commands.md, workflow.md): pack.target is informational; OpenCode added to compile-needed list; new "Local bundle install" subsection. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Iterate on panel review: harden slug validation + cleanup Address apm-review-panel findings on PR #1217: - sec-1: Enforce documented [A-Za-z0-9._-] slug whitelist in install/services.py before validate_path_segments, with explicit rejection of leading/trailing dots, '..', forward slashes, null bytes, and whitespace. Aligns code with security.md trust model. - sec-3: Add parameterized adversarial slug test cases (forward slash, null byte, leading/trailing dot, '@', whitespace). - py-arch-2: Hoist plugin.json / .mcp.json case-insensitive metadata skip above the per-target deploy loop so multi-target installs do not inflate the skipped counter once per target. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address PR #1217 review: ASCII slug, MCP wiring, doc audit - ASCII-only slug whitelist - Preserve nested instruction subdirs in stage_root - Pack success line mentions embedded apm.lock.yaml - Strengthened plugin.json leak assertion (whole-tree walk) - URL trailing-dot fix in security/marketplace error messages - Auto-wire bundle .mcp.json through MCPIntegrator (multi-target) so non-Claude harnesses get servers in their native MCP config - New TestBundleMcpWiring unit suite (parse + wire helpers) - Doc audit: pack-distribute, lockfile-spec, plugins, cli-commands, first-package, security, CHANGELOG Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * tests: add e2e coverage for bundle .mcp.json -> MCPIntegrator boundary Four e2e cases covering the wiring path that was previously only covered at the unit level: - bundle .mcp.json reaches MCPIntegrator.install with the bundle's servers and the resolved-target CSV passed via explicit_target - bundle without .mcp.json does not invoke the integrator (no spurious 'No MCP dependencies found' warnings on every install) - integrator failure does not break the install (file deploys are not undone by an MCP wiring hiccup) - dry-run never fires the integrator (zero side effects on the consumer's MCP config) Per-target file writes (Claude project .mcp.json, .vscode/mcp.json, .cursor/mcp.json) are owned by MCPIntegrator's own suite; testing them here would require installed runtime binaries on the CI host. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: fix broken anchor #authoring-workflow -> #plugin-authoring Deploy Docs CI fails link validation; the actual heading is 'Plugin authoring' at the top of guides/plugins.md. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Daniel Meppiel <copilot-rework@github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Daniel Meppiel <[email protected]> * docs(auth): document ADO --update bearer fallback and dual-fail diagnostic (#1219) - Clarify that the stale-PAT->bearer fallback applies to 'apm install --update' preflight (previously undocumented; behavior fixed in #1214) - Add the 'AAD bearer fallback also failed' diagnostic so users know both paths were attempted and understand which credential to refresh Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: surface malformed transitive manifests and add integration tests Address reviewer feedback: - Resolver catch site in _try_load_dependency_package now only catches FileNotFoundError (expected for missing manifests); ValueError propagates to the BFS result loop where it is logged at WARNING level, making malformed transitive deps user-visible. - build_dependency_tree root-parse catch site now logs a warning instead of silently returning an empty tree. - Writer error message includes structured-format hint matching the model-layer quality. - Integration test (CliRunner): flat-list and string deps both exit non-zero with actionable error including 'expected a mapping' and structured-format example. - Resolver test: transitive dep with flat-list deps surfaces WARNING and resolution continues for valid siblings. - CHANGELOG entry under [Unreleased] Fixed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Sergio Sisternes <sergio.sisternes@epam.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Daniel Meppiel <51440732+danielmeppiel@users.noreply.github.com> Co-authored-by: Daniel Meppiel <copilot-rework@github.com>
Fixes #1212.
TL;DR
apm install --updateraisedAuthenticationErroragainstdev.azure.comon a staleADO_APM_PATeven whenaz loginwas active andapm install(no flag) succeeded with the same env. The--updatepreflight probe was the only ADO call site that did not use the canonical PAT->AAD-bearer fallback. This PR wires it throughAuthResolver.execute_with_bearer_fallback, extracts the duplicated 5-signal failure predicate, plugs three PAT-leak holes in bearer-env composition, and adds an anti-regression lint + hermetic E2E test so this drift class cannot silently re-emerge.Problem (WHY)
The PAT->AAD-bearer protocol is the contract that lets ADO users keep working when their cached PAT goes stale and
az loginis fresh. It exists inAuthResolver.execute_with_bearer_fallback(auth.py:771-829). Every ADO call site is supposed to delegate through it._preflight_auth_checkininstall/pipeline.pywas added later for--updateperformance (one ls-remote per host cluster instead of one per dep). It open-coded its own ls-remote, ran a 5-signal substring check on stderr, and raised on 401/403 immediately. No bearer fallback. Result: same env, same creds,apm installworks,apm install --updatedoes not.The deeper issue is drift. The 5-signal set was duplicated in 4 files (auth.py, pipeline.py, github_downloader.py, validation.py). The PAT->bearer protocol was open-coded in 3 of those. There was no lint preventing a 5th copy.
Approach (WHAT)
is_ado_auth_failure_signal(text) -> boolintoutils/github_host.py. Single source of truth._preflight_auth_checkon top ofexecute_with_bearer_fallback. Bearer attempt uses a CLEAN env via_build_git_env(scheme="bearer")so a rejected PAT cannot leak viaGIT_TOKEN.list_remote_refs(github_downloader.py) onto the same canonical helper for symmetry with clone.build_error_contextwithbearer_also_failed: boolso when both PAT and bearer fail, the user sees an explicit "AAD bearer fallback also failed" prefix instead of a misleading "your PAT is bad" message.scripts/lint-auth-signals.sh, hooked into ci.yml):auth_resolvermay import bearer providersgit ls-remotesite must carry an# auth-delegated:annotationgit(401 on PAT URLs, 200 on bearer) and fakeaz(returns a synthetic JWT). Wired into bothscripts/test-integration.shandscripts/windows/test-integration.ps1.Implementation (HOW)
sequenceDiagram participant CLI as apm install --update participant Pre as _preflight_auth_check participant Auth as AuthResolver participant Git as git ls-remote CLI->>Pre: ctx, resolver Pre->>Auth: execute_with_bearer_fallback primary, bearer Auth->>Git: ls-remote with PAT Git-->>Auth: 401 Unauthorized Auth->>Auth: is_ado_auth_failure_signal stderr Auth->>Git: ls-remote with bearer clean env Git-->>Auth: 200 OK Auth-->>Pre: success Pre-->>CLI: continueTrade-offs
validation.pystill has its own (legacy) PAT-bearer fallback. It is exempt from Rule A via the lint script's allowlist and tracked for migration. Migrating it now would balloon this PR; the lint blocks a new copy from appearing.pipeline.py:_run_ls_remoteandgit_cache.pyare annotated# auth-delegated: caller-supplies-credsbecause they are sub-primitives invoked by sites that already delegate. Annotation is documentation, not a runtime guard -- by design.AuthResolveragainst a fakeazbinary. That couples the test to the AzureCliBearerProvider contract, but the alternative (mocking the provider) would not exercise the path users actually hit.Validation evidence
uv run --extra dev pytest tests/unit/ tests/integration/test_ado_preflight_bearer_fallback_e2e.py -q-> 7823 passeduv run --extra dev ruff check src/ tests/ && uv run --extra dev ruff format --check src/ tests/-> silentbash scripts/lint-auth-signals.sh-> cleanHow to test
Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com