Skip to content

fix(install): wire ADO --update preflight through PAT->bearer fallback (#1212)#1214

Merged
danielmeppiel merged 7 commits into
mainfrom
fix/ado-update-bearer-fallback-1212
May 8, 2026
Merged

fix(install): wire ADO --update preflight through PAT->bearer fallback (#1212)#1214
danielmeppiel merged 7 commits into
mainfrom
fix/ado-update-bearer-fallback-1212

Conversation

@danielmeppiel

Copy link
Copy Markdown
Collaborator

Fixes #1212.

TL;DR

apm install --update raised AuthenticationError against dev.azure.com on a stale ADO_APM_PAT even when az login was active and apm install (no flag) succeeded with the same env. The --update preflight probe was the only ADO call site that did not use the canonical PAT->AAD-bearer fallback. This PR wires it through AuthResolver.execute_with_bearer_fallback, extracts the duplicated 5-signal failure predicate, plugs three PAT-leak holes in bearer-env composition, and adds an anti-regression lint + hermetic E2E test so this drift class cannot silently re-emerge.

Problem (WHY)

The PAT->AAD-bearer protocol is the contract that lets ADO users keep working when their cached PAT goes stale and az login is fresh. It exists in AuthResolver.execute_with_bearer_fallback (auth.py:771-829). Every ADO call site is supposed to delegate through it.

_preflight_auth_check in install/pipeline.py was added later for --update performance (one ls-remote per host cluster instead of one per dep). It open-coded its own ls-remote, ran a 5-signal substring check on stderr, and raised on 401/403 immediately. No bearer fallback. Result: same env, same creds, apm install works, apm install --update does not.

The deeper issue is drift. The 5-signal set was duplicated in 4 files (auth.py, pipeline.py, github_downloader.py, validation.py). The PAT->bearer protocol was open-coded in 3 of those. There was no lint preventing a 5th copy.

Approach (WHAT)

  1. Extract is_ado_auth_failure_signal(text) -> bool into utils/github_host.py. Single source of truth.
  2. Rewrite _preflight_auth_check on top of execute_with_bearer_fallback. Bearer attempt uses a CLEAN env via _build_git_env(scheme="bearer") so a rejected PAT cannot leak via GIT_TOKEN.
  3. Refactor list_remote_refs (github_downloader.py) onto the same canonical helper for symmetry with clone.
  4. Annotate build_error_context with bearer_also_failed: bool so when both PAT and bearer fail, the user sees an explicit "AAD bearer fallback also failed" prefix instead of a misleading "your PAT is bad" message.
  5. Per-host dedup for the stale-PAT diagnostic so it surfaces once per (host, org) cluster rather than once per dep.
  6. Anti-regression lint (scripts/lint-auth-signals.sh, hooked into ci.yml):
    • Rule A: only auth_resolver may import bearer providers
    • Rule B: every git ls-remote site must carry an # auth-delegated: annotation
  7. Hermetic E2E with PATH-injected fake git (401 on PAT URLs, 200 on bearer) and fake az (returns a synthetic JWT). Wired into both scripts/test-integration.sh and scripts/windows/test-integration.ps1.

Implementation (HOW)

sequenceDiagram
    participant CLI as apm install --update
    participant Pre as _preflight_auth_check
    participant Auth as AuthResolver
    participant Git as git ls-remote
    CLI->>Pre: ctx, resolver
    Pre->>Auth: execute_with_bearer_fallback primary, bearer
    Auth->>Git: ls-remote with PAT
    Git-->>Auth: 401 Unauthorized
    Auth->>Auth: is_ado_auth_failure_signal stderr
    Auth->>Git: ls-remote with bearer clean env
    Git-->>Auth: 200 OK
    Auth-->>Pre: success
    Pre-->>CLI: continue
Loading

Trade-offs

  • validation.py still has its own (legacy) PAT-bearer fallback. It is exempt from Rule A via the lint script's allowlist and tracked for migration. Migrating it now would balloon this PR; the lint blocks a new copy from appearing.
  • pipeline.py:_run_ls_remote and git_cache.py are annotated # auth-delegated: caller-supplies-creds because they are sub-primitives invoked by sites that already delegate. Annotation is documentation, not a runtime guard -- by design.
  • E2E uses a real AuthResolver against a fake az binary. That couples the test to the AzureCliBearerProvider contract, but the alternative (mocking the provider) would not exercise the path users actually hit.

Validation evidence

  • uv run --extra dev pytest tests/unit/ tests/integration/test_ado_preflight_bearer_fallback_e2e.py -q -> 7823 passed
  • New tests: 19 predicate cases, 3 preflight bearer-fallback cases, 2 hermetic E2E cases
  • uv run --extra dev ruff check src/ tests/ && uv run --extra dev ruff format --check src/ tests/ -> silent
  • bash scripts/lint-auth-signals.sh -> clean

How to test

# Hermetic regression (no creds needed):
uv run --extra dev pytest tests/integration/test_ado_preflight_bearer_fallback_e2e.py -v

# End-to-end repro of #1212 (requires a real ADO repo + stale PAT + az login):
ADO_APM_PAT=stale-pat-value apm install -g --update
# Should now succeed via bearer fallback instead of raising AuthenticationError.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Daniel Meppiel and others added 2 commits May 8, 2026 22:01
#1212)

Before this change, `apm install --update` raised AuthenticationError on a
stale ADO_APM_PAT even when `az login` was active and `apm install` (no flag)
worked with the same env. The preflight auth probe in pipeline.py was the
only ADO call site that did NOT use the canonical
`AuthResolver.execute_with_bearer_fallback` helper -- it open-coded a
ls-remote and raised on 401/403 immediately.

Changes:

- Extract `is_ado_auth_failure_signal()` predicate into utils/github_host.py
  as the single source of truth for the 5-signal PAT-failure detection set
  (was duplicated across auth.py, pipeline.py, github_downloader.py,
  validation.py).
- Rewrite `_preflight_auth_check` (pipeline.py) on top of
  `execute_with_bearer_fallback`. The bearer attempt uses a clean env via
  `_build_git_env(scheme="bearer")` so a rejected PAT cannot leak into the
  bearer probe via GIT_TOKEN.
- Refactor `list_remote_refs` (github_downloader.py) onto the same canonical
  helper for symmetry with the clone path.
- `AuthResolver.build_error_context` now takes a `bearer_also_failed` flag
  and prefixes the rendered error with a clear "AAD bearer fallback also
  failed" line when both creds were tried, so users know both paths were
  exercised.
- Per-host dedup for the stale-PAT diagnostic so it surfaces once per
  cluster, not per dep.
- Install hint expanded to `brew/winget/choco install azure-cli`.

Anti-regression:

- New `scripts/lint-auth-signals.sh` (bash 3.2 compatible) enforces:
  Rule A -- only `auth_resolver` may import bearer providers, and
  Rule B -- every `git ls-remote` site must have an `# auth-delegated`
  annotation pointing to its delegation strategy. Hooked into ci.yml.
- Hermetic E2E test (test_ado_preflight_bearer_fallback_e2e.py) using
  PATH-injected fake `git` and `az` binaries, wired into both
  scripts/test-integration.sh and scripts/windows/test-integration.ps1.

Tests:

- 19 new predicate tests (test_github_host_predicate.py).
- 3 new bearer-fallback cases in test_pipeline_auth_preflight.py
  (stale-PAT->bearer success, both fail, no-PAT-leak in bearer env).
- 2 new hermetic E2E tests reproducing the #1212 scenario end-to-end.
- All 7823 unit tests pass; new E2E tests pass.

Refs: #1212

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Synthesizes recommendations from the local panel review (auth-expert,
doc-writer, devx-ux, test-coverage, supply-chain-security):

- list_remote_refs error path now propagates bearer_also_failed to
  build_error_context, matching install preflight UX.
- build_error_context Case 4 retry hint now expands install steps
  with apt/dnf coverage for Linux users; Case 2 no longer renders the
  contradictory PAT-rejected prefix when no PAT was tried.
- lint-auth-signals.sh Rule A now scans for any reference to
  get_bearer_provider (not just single-line imports), catching
  multi-line import blocks and module-attribute access. Clone path
  in github_downloader.py:_execute_transport_plan exempted with
  documented follow-up rationale (refactor onto
  execute_with_bearer_fallback is non-trivial in the tier-attempt
  loop).
- Added unit tests for build_error_context(bearer_also_failed=True)
  prefix in Case 4, defensive Case 2 ignores the kwarg, and
  emit_stale_pat_diagnostic per-host dedup.
- Tightened CHANGELOG entry per doc-writer (drop internal predicate
  refactor detail; lead with user-visible behavior).

All 7826 unit tests pass, both hermetic E2E tests pass, ruff clean,
auth-signal lint clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel

Copy link
Copy Markdown
Collaborator Author

Local apm-review-panel pass: panelist findings addressed

Ran the full advisory panel locally (8 panelists in parallel). Zero blocking findings; addressed the highest-signal recommended items in fdf6ce4d:

Addressed in this push

  • auth-expert -- list_remote_refs error path now propagates bearer_also_failed to build_error_context, matching install preflight UX.
  • auth-expert + supply-chain-security -- Strengthened lint-auth-signals.sh Rule A to catch any reference to get_bearer_provider (not just single-line imports), so multi-line imports and module-attribute access can no longer slip past.
  • devx-ux -- Case 1 install hint now covers apt/dnf for Linux users alongside brew/winget.
  • auth-expert + cli-logging -- Case 2 no longer renders the contradictory ADO_APM_PAT was rejected prefix when no PAT was tried (structurally unreachable, defensively guarded).
  • doc-writer + oss-growth-hacker -- Tightened CHANGELOG entry to lead with user-visible behavior; dropped the internal predicate-refactor detail.
  • test-coverage-expert -- Added unit tests for build_error_context(bearer_also_failed=True) Case 4 wording, Case 2 defensive guard, and emit_stale_pat_diagnostic per-host dedup.

Filed as follow-ups (out of scope for this hotfix)

  • auth-expert -- _execute_transport_plan clone path still open-codes the PAT->bearer protocol (exempted in lint with documented rationale; the tier-attempt loop wraps a stateful clone_action so the refactor is non-trivial).
  • python-architect -- validation.py direct get_bearer_provider import (same exemption).
  • doc-writer + oss-growth-hacker -- Contributor auth-architecture.md page.

Validation

  • 7826 unit tests pass
  • 2 hermetic E2E tests (tests/integration/test_ado_preflight_bearer_fallback_e2e.py) pass
  • ruff check + ruff format --check clean
  • lint-auth-signals.sh clean (with strengthened Rule A)
  • Wired into scripts/test-integration.sh and scripts/windows/test-integration.ps1

Marking ready for review.

@danielmeppiel danielmeppiel marked this pull request as ready for review May 8, 2026 20:15
Copilot AI review requested due to automatic review settings May 8, 2026 20:15
The bearer_also_failed propagation in list_remote_refs pushed the file
to 2407 lines (max 2400). Tightened the comments around _bearer_op
SECURITY note and the ado_eligible branch to bring it back to 2400.

No behavior change.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an Azure DevOps authentication regression where apm install --update could fail preflight auth on a stale ADO_APM_PAT even when an active az login bearer token would allow the actual install to succeed. It centralizes the ADO auth-failure signal predicate, routes the --update preflight probe (and ref listing) through the canonical PAT-to-bearer fallback helper, adds a repo-level anti-regression lint, and adds unit/integration coverage.

Changes:

  • Extracts an is_ado_auth_failure_signal() helper and replaces duplicated string-signal checks across call sites.
  • Rewrites install preflight ADO probing and list_remote_refs to use AuthResolver.execute_with_bearer_fallback and attempts to surface a "bearer also failed" diagnostic signal.
  • Adds a shell-based lint gate for auth-protocol drift and introduces a hermetic E2E test (plus wiring into integration runners), and updates the changelog.
Show a summary per file
File Description
src/apm_cli/utils/github_host.py Adds is_ado_auth_failure_signal() and a centralized signal list.
src/apm_cli/install/pipeline.py Routes --update preflight git ls-remote through bearer fallback and adds verbose tracing.
src/apm_cli/deps/github_downloader.py Refactors list_remote_refs() to use the canonical bearer fallback helper.
src/apm_cli/install/validation.py Reuses the centralized auth-failure predicate and builds a clean bearer env via _build_git_env.
src/apm_cli/core/auth.py Adds bearer_also_failed support in error context and dedups stale-PAT warnings per host.
src/apm_cli/cache/git_cache.py Adds an # auth-delegated: annotation near a ls-remote site for the new lint.
tests/unit/utils/test_github_host_predicate.py Adds unit tests for the centralized predicate.
tests/unit/install/test_pipeline_auth_preflight.py Adds unit tests for preflight bearer fallback behavior and env cleanliness.
tests/unit/test_list_remote_refs.py Adjusts downloader test wiring around execute_with_bearer_fallback.
tests/unit/test_auth.py Adds tests for the new bearer_also_failed messaging and stale-PAT warning dedup.
tests/integration/test_ado_preflight_bearer_fallback_e2e.py Adds a hermetic fake git/az E2E regression test.
scripts/lint-auth-signals.sh New anti-regression lint for bearer-provider boundary and ls-remote annotation.
.github/workflows/ci.yml Hooks the new auth-protocol lint into the Tier-1 lint job.
scripts/test-integration.sh Runs the new hermetic E2E test on non-Windows integration runs.
scripts/windows/test-integration.ps1 Runs the new E2E test in Windows integration runner.
CHANGELOG.md Adds an Unreleased entry documenting the fix.

Copilot's findings

Comments suppressed due to low confidence (1)

tests/integration/test_ado_preflight_bearer_fallback_e2e.py:79

  • Same issue as above for FAKE_AZ: the triple-quoted string literal starting here is a standalone expression and will be flagged by Ruff B018. Either append it to FAKE_AZ (e.g., by including it inside the initial assignment) or remove it so the integration test file passes lint.
FAKE_AZ = r"""#!/usr/bin/env python3
"""
"""Fake az CLI for #1212 -- returns a fixed JWT.

Used by AzureCliBearerProvider.get_bearer_token. The JWT structure must
have three dot-separated base64url segments so ``str.startswith('eyJ')``
checks pass.
"""
  • Files reviewed: 16/16 changed files
  • Comments generated: 7

Comment thread src/apm_cli/core/auth.py
Comment thread src/apm_cli/install/pipeline.py
Comment thread src/apm_cli/deps/github_downloader.py
Comment thread tests/integration/test_ado_preflight_bearer_fallback_e2e.py
Comment thread scripts/windows/test-integration.ps1 Outdated
Comment thread src/apm_cli/core/auth.py Outdated
Comment thread src/apm_cli/core/auth.py Outdated
Daniel Meppiel and others added 2 commits May 8, 2026 22:28
Address Copilot review feedback on PR #1214:

* execute_with_bearer_fallback now returns BearerFallbackOutcome
  (NamedTuple) carrying outcome + bearer_attempted flag. Callers in
  install/pipeline.py and deps/github_downloader.py use the flag so
  bearer_also_failed only renders 'az cli bearer was also rejected'
  when the bearer leg actually ran -- not on early returns (az
  unavailable, JWT acquisition failed).

* build_error_context docstring fixed: bearer_also_failed prefix
  applies to Case 4 (PAT set + az_available), not Case 2.

* emit_stale_pat_diagnostic warning no longer hard-codes '(HTTP 401)'
  -- the predicate also matches 403 and other auth signals.

* Windows test-integration.ps1 stops invoking the POSIX-only E2E
  (which skipif win32) and notes the unit suite covers the contract.

No behavior change beyond diagnostic accuracy; all 7826 unit tests
pass and lint is clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel danielmeppiel added the panel-review Trigger the apm-review-panel gh-aw workflow label May 8, 2026
@github-actions

github-actions Bot commented May 8, 2026

Copy link
Copy Markdown

APM Review Panel: ship_with_followups

Wires ADO --update preflight through PAT->bearer fallback, plugs GIT_TOKEN env leaks, and adds an anti-regression lint guard -- a correctness + security net-positive with five well-scoped follow-ups.

cc @danielmeppiel @sergio-sisternes-epam -- a fresh advisory pass is ready for your review.

All eight panelists returned advisory-tier (recommended or nit) findings only; no panelist flagged a blocking issue. The panel converges clearly: the core auth wiring, NamedTuple value-object shape, SSOT predicate, and per-host dedup design are all sound. The two highest-signal recommended items are the GIT_TOKEN strip gap and the TOCTOU dedup race -- both are correctness defects that undercut guarantees the PR explicitly sets out to provide, and both are one-to-three-line fixes. The supply-chain-security finding on _build_git_env carries an outcome: missing evidence block on a secure-by-default surface; per panel weighting rules that ranks above any opinion-only recommended finding. The auth-expert TOCTOU finding on _stale_pat_warned_hosts is a correctness bug (dedup can silently fail under parallel install) but carries no security impact; it is the second-highest item. The test-coverage gap on the list_remote_refs fallback_used=True branch is genuinely missing coverage on a secure-by-default + devx surface and ranks third.

The cli-logging-expert and devx-ux-expert findings converge on the same Case 4 error body defect from different angles: cli-logging flags that the text contradicts the bearer_also_failed=True state (Step 1 is busywork), devx flags that the host name is absent from the prefix. These two findings should be addressed together in a single edit. The doc-writer finding on lint-script discoverability is low-friction and high-value for contributor retention: a one-sentence pointer in CONTRIBUTING.md prevents the next ADO call site author from unknowingly violating the PAT->bearer boundary rule.

The python-architect pattern-record finding and the oss-growth-hacker CHANGELOG nit are genuinely low signal relative to the above; neither warrants delaying merge. The test-coverage-expert E2E finding is marked outcome: unknown (sandbox could not execute the test); per panel weighting rules it carries no weight beyond opinion-only and is not escalated above the missing-coverage item.

Dissent. No inter-panelist disagreement on severity classification. The test-coverage-expert returned outcome: unknown on the E2E test (sandbox could not execute); this is noted but does not affect the ranking of the missing fallback_used=True unit-test finding, which carries outcome: missing on a secure-by-default surface and is correctly weighted above opinion-only recommended items.

Aligned with: Secure by default (bearer-env isolation is structurally improved; the residual GIT_TOKEN strip gap is the one fix to land before or immediately after merge), Governed by policy (the anti-regression lint script makes the PAT->bearer boundary machine-checkable; a CONTRIBUTING.md pointer converts it from an undiscoverable internal tool into an enforced contributor contract), Multi-harness / multi-host (per-host dedup for stale-PAT diagnostics and the SSOT predicate both strengthen multi-host correctness; the TOCTOU race on the dedup set is the one remaining gap), Pragmatic as npm (platform-aware az-cli install hints on auth failure close a first-run friction gap for Linux and macOS users that the pre-existing Windows-only URL messaging created).

Growth signal. The stale-PAT silent-drop scenario on apm install --update is a high-churn enterprise retention risk; fixing it with a clear CHANGELOG entry and a dedicated "Enterprise reliability" beat in the next release post is worth the explicit callout. The anti-regression lint script is itself a credibility signal for enterprise evaluators: "we guard the auth boundary so this class of bug cannot silently re-emerge." The cross-platform error-message improvement (brew/winget/apt/dnf) is a first-run UX win for Linux/macOS users that should appear in release notes alongside the auth fix.

Panel summary

Persona B R N Takeaway
Python Architect 0 1 3 Clean ADO auth-fallback fix; NamedTuple return type and SSOT predicate are idiomatic; two nits on type annotation precision.
CLI Logging Expert 0 1 1 One recommended: Case 4 body text misleads when bearer_also_failed=True (steps contradict the prefix). Rest of the logging changes are clean improvements.
DevX UX Expert 0 1 2 Good UX net-positive: platform-aware az-cli hints and per-host warn dedup. Two minor polish items: 'az cli' casing inconsistency and Case 4 error prefix missing host context.
Supply Chain Security 0 1 2 Bearer-env isolation is largely sound; one residual GIT_TOKEN inheritance gap in _build_git_env and a Rule B coverage blind-spot on clone calls worth addressing.
OSS Growth Hacker 0 0 1 Solid enterprise ADO retention fix; CHANGELOG is user-facing and clear; no adoption-harming surface touched.
Auth Expert 0 1 1 API contract correctly migrated; bearer env isolation sound; _stale_pat_warned_hosts has a TOCTOU dedup race under parallel install but no security impact.
Doc Writer 0 1 2 CHANGELOG entry is clean; auth docs already cover the --update preflight guarantee; two minor gaps worth addressing: 401-only wording in the stale-PAT fallback example and missing discoverability pointer for lint-auth-signals.sh.
Test Coverage Expert 0 1 1 Critical auth/install surfaces have unit + hermetic integration coverage; one gap: list_remote_refs bearer fallback_used=True branch is never exercised at any tier.

B = blocking-severity findings, R = recommended, N = nits.
Counts are signal strength, not gates. The maintainer ships.

Top 5 follow-ups

  1. [Supply Chain Security] Add env.pop('GIT_TOKEN', None) in _build_git_env bearer branch and a matching unit test asserting the env is clean when GIT_TOKEN pre-exists. -- Missing-test finding on a secure-by-default surface: the stated isolation guarantee is not enforced when a stale GIT_TOKEN is already in the caller's environment. One-line fix + one test case; highest-signal item from the panel.
  2. [Auth Expert] Guard the _stale_pat_warned_hosts check-then-add with self._lock to close the TOCTOU dedup race under parallel install. -- Correctness bug: two threads on the same ADO host can both pass the in check before either calls add(), emitting duplicate stale-PAT warnings and defeating the whole point of the dedup set. Three-line fix.
  3. [Test Coverage Expert] Add a parametrize variant in test_list_remote_refs.py where execute_with_bearer_fallback returns BearerFallbackOutcome(result, True) and assert emit_stale_pat_diagnostic is called. -- Missing-coverage on secure-by-default + devx principles: the fallback_used=True branch (stale-PAT diagnostic emission, correct ref return) has no test at any tier. A future refactor to the outcome shape or diagnostic call will silently break this path.
  4. [CLI Logging Expert] Branch on bearer_also_failed inside the Case 4 error block and include host_display in the prefix f-string (convergent fix with devx-ux finding). -- When bearer_also_failed=True the current body actively misleads: "Azure CLI MAY also be available" is false and Step 1 ("unset ADO_APM_PAT") is busywork. The devx-ux finding adds that the host name is absent from the prefix, leaving multi-tenant users unable to identify which ADO tenant failed. One edit resolves both.
  5. [Doc Writer] Add a sentence to CONTRIBUTING.md pointing to scripts/lint-auth-signals.sh and explaining the PAT->bearer boundary rule and exemption list process. -- The lint script is currently undiscoverable to contributors adding new ADO call sites. Without this pointer, the next developer to add a raw ls-remote call has no path to learn about the rule, making the anti-regression guard fragile in practice.

Architecture

classDiagram
    direction LR
    class AuthResolver {
        <<Strategy>>
        +execute_with_bearer_fallback(dep_ref, primary_op, bearer_op, is_auth_failure) BearerFallbackOutcome
        +build_error_context(host, op, *, bearer_also_failed) str
        +emit_stale_pat_diagnostic(host_display) None
        +_build_git_env(token, scheme, host_kind) dict
        -_stale_pat_warned_hosts set
    }
    class BearerFallbackOutcome {
        <<ValueObject>>
        +outcome object
        +bearer_attempted bool
    }
    class AzureCliBearerProvider {
        <<ConcreteStrategy>>
        +is_available() bool
        +get_bearer_token() str
        +clear_cache() None
    }
    class GitHubDownloader {
        <<Collaborator>>
        +list_remote_refs(dep_ref) list
        +auth_resolver AuthResolver
    }
    class InstallPipeline {
        <<Collaborator>>
        +_preflight_auth_check(ctx, auth_resolver, verbose) None
    }
    class DependencyReference {
        <<ValueObject>>
        +host str
        +repo_url str
        +is_azure_devops() bool
    }
    class is_ado_auth_failure_signal {
        <<Pure>>
        +__call__(text) bool
    }
    note for AuthResolver "Chain of Responsibility: PAT primary_op -> auth-failure? -> bearer_op. Returns BearerFallbackOutcome(outcome, bearer_attempted)"
    note for is_ado_auth_failure_signal "SSOT for ADO auth signals: 401 | 403 | authentication failed | unauthorized | could not read username"
    AuthResolver ..> BearerFallbackOutcome : returns
    AuthResolver ..> AzureCliBearerProvider : delegates bearer
    AuthResolver ..> is_ado_auth_failure_signal : uses
    GitHubDownloader ..> AuthResolver : calls execute_with_bearer_fallback
    InstallPipeline ..> AuthResolver : calls execute_with_bearer_fallback
    GitHubDownloader ..> DependencyReference : reads
    InstallPipeline ..> DependencyReference : reads
    class BearerFallbackOutcome:::touched
    class AuthResolver:::touched
    class is_ado_auth_failure_signal:::touched
    class GitHubDownloader:::touched
    class InstallPipeline:::touched
    classDef touched fill:#fff3b0,stroke:#d47600
Loading
flowchart TD
    A(["apm install --update"])
    B["_preflight_auth_check\ninstall/pipeline.py"]
    C{"dep.is_azure_devops()\n&& scheme==basic\n&& source==ADO_APM_PAT?"}
    D["primary_op()\ngit ls-remote PAT URL"]
    E{"is_ado_auth_failure_signal\n(stderr)?"}
    F["BearerFallbackOutcome\n(result, False)"]
    G["AuthResolver.execute_with_bearer_fallback\ncore/auth.py"]
    H["AzureCliBearerProvider.is_available()\naz account get-access-token"]
    I{"az available\n& JWT acquired?"}
    J["bearer_op(bearer)\ngit ls-remote bearer URL\n_build_git_env(scheme='bearer')\nCLEAN env -- no GIT_TOKEN leak"]
    K{"bearer outcome\nauth failure?"}
    L["emit_stale_pat_diagnostic\n-- dedup per host"]
    M["BearerFallbackOutcome\n(bearer_result, True) -- SUCCESS"]
    N["BearerFallbackOutcome\n(primary, True) -- BOTH FAILED"]
    O["bearer_also_failed=True\nbuild_error_context\nraise AuthenticationError"]
    P["continue -- preflight passed"]
    Q["non-ADO or no PAT\nprimary_op only"]

    A --> B
    B --> C
    C -- yes --> D
    C -- no --> Q
    D --> E
    E -- no --> F
    F --> P
    E -- yes --> G
    G --> H
    H --> I
    I -- no --> F
    I -- yes --> J
    J --> K
    K -- no --> L
    L --> M
    M --> P
    K -- yes --> N
    N --> O
    Q --> P
Loading
sequenceDiagram
    participant CLI as apm install --update
    participant PF as _preflight_auth_check
    participant AR as AuthResolver
    participant GIT as git ls-remote
    participant AZ as az (fake/real)

    CLI->>PF: ctx, auth_resolver
    PF->>AR: execute_with_bearer_fallback(dep, _primary_op, _bearer_op, _is_auth_failure)
    AR->>GIT: ls-remote --heads PAT-URL [GIT_TOKEN=stale-pat]
    GIT-->>AR: rc=128 stderr='401 Unauthorized'
    AR->>AR: is_ado_auth_failure_signal('401') -> True
    AR->>AZ: get-access-token --resource guid
    AZ-->>AR: eyJAA...jwt
    AR->>GIT: ls-remote --heads bearer-URL [GIT_CONFIG_VALUE_0=Bearer eyJ...]
    GIT-->>AR: rc=0 refs/heads/main
    AR->>AR: emit_stale_pat_diagnostic(host) -- dedup
    AR-->>PF: BearerFallbackOutcome(result_ok, bearer_attempted=True)
    PF->>CLI: return (no raise)
Loading

Recommendation

Ship this PR. No panelist found a blocking issue; the auth wiring, env isolation structure, SSOT predicate, and lint guard are all sound net-positives. The five follow-ups above are concrete, low-friction, and well-scoped for a fast follow PR -- the GIT_TOKEN strip and TOCTOU lock fix in particular are one-to-three-line changes that should be bundled together. Recommend tracking them in a single follow-up issue referencing #1212 so they do not drift.


Full per-persona findings

Python Architect

  • [nit] BearerFallbackOutcome.outcome typed as object; should be TypeVar/Generic for type-safe unwrapping at src/apm_cli/core/auth.py
    NamedTuple fields typed as object lose all downstream type inference. A Generic NamedTuple (BearerFallbackOutcome[T]) or a TypeVar bound to the union of expected outcome shapes would let callers skip cast() and catch misuse at lint time rather than runtime.
    Suggested: from typing import Generic, TypeVar; T_co = TypeVar('T_co', covariant=True); class BearerFallbackOutcome(NamedTuple, Generic[T_co]): outcome: T_co; bearer_attempted: bool

  • [nit] _stale_pat_warned_hosts annotated as bare set; should be set[str] at src/apm_cli/core/auth.py
    APM code quality standard requires type hints on all public APIs. The bare set annotation propagates through mypy as set[Unknown], losing host-display string checks.
    Suggested: self._stale_pat_warned_hosts: set[str] = set()

  • [nit] bearer_also_failed re-evaluates is_ado_auth_failure_signal after _is_auth_failure closure already checked the same text at src/apm_cli/install/pipeline.py
    Minor redundancy; does not affect correctness but could cause confusion when the predicate is extended.
    Suggested: bearer_also_failed = fallback_result.bearer_attempted and _is_auth_failure(result)

  • [recommended] Design patterns record (Architecture Reviewer note)
    Patterns used: Dataclass-as-value-object (BearerFallbackOutcome NamedTuple), Extract-when-shared (is_ado_auth_failure_signal to utils/github_host.py consolidates 4+ open-coded copies), Strategy (execute_with_bearer_fallback returns typed outcome envelope; callers own outcome shape, resolver owns fallback protocol). The two exempt call sites in validation.py and github_downloader.py are correctly tracked as follow-ups.

CLI Logging Expert

  • [recommended] Case 4 error body is stale / contradicts its own bearer_also_failed prefix at src/apm_cli/core/auth.py
    When bearer_also_failed=True the prefix correctly says "ADO_APM_PAT was rejected; az cli bearer was also rejected." but the paragraph that follows still says "Azure CLI credentials MAY also be available" (they were tried and rejected -- "may" is false) and Step 1 says "Unset the PAT to test Azure CLI auth only" -- the bearer path has already been tested, so Step 1 gives the user busywork. The only actionable step when both legs fail is Step 2 (az login).
    Suggested: Branch on bearer_also_failed inside the Case 4 block. When True: suppress Step 1, say "Both ADO_APM_PAT and Azure CLI bearer were tried and rejected." and keep only Step 2 (az login) and Step 3 (retry).

  • [nit] Verbose trace token 'accepted' is not in STATUS_SYMBOLS vocabulary at src/apm_cli/install/pipeline.py
    pipeline.py emits _trace(f'Preflight: {host_display} -- accepted') on success. The established vocabulary uses 'ok', 'passed', or a [+] symbol prefix.
    Suggested: Change to: _trace(f'Preflight: {host_display} -- ok')

DevX UX Expert

  • [recommended] Case 4 error prefix drops the host name, leaving users guessing which ADO tenant failed at src/apm_cli/core/auth.py
    The new stale-PAT warning correctly includes for {host_display} so users know which host to fix. But the Case 4 prepended line is a static string with no host context. If a user has deps on two ADO tenants and one fails both legs, the error message gives no hint about which host to look at.
    Suggested: Change the static prefix to an f-string: f' ADO_APM_PAT was rejected for {host_display}; Azure CLI bearer was also rejected.\n\n'

  • [nit] 'az cli bearer' is lowercase in new messages; rest of codebase uses 'Azure CLI' at src/apm_cli/core/auth.py
    The new warning string uses 'az cli' lowercase while every other user-facing string in the file uses 'Azure CLI'. Inconsistent product name casing can confuse first-time users who Google the error.
    Suggested: Replace 'az cli bearer' with 'Azure CLI bearer' in both the emit_stale_pat_diagnostic message and the Case 4 prefix string.

  • [nit] Step 1 recovery hint 'unset ADO_APM_PAT' is Unix-only after PR adds Windows-aware az-cli install hints at src/apm_cli/core/auth.py
    The PR correctly improved the az-cli install hints to show brew/winget/apt/dnf per platform. But step 1 of the same error block still shows 'unset ADO_APM_PAT' which is Bash-only. Windows users on PowerShell need 'Remove-Item Env:ADO_APM_PAT'.
    Suggested: Add a parenthetical: unset ADO_APM_PAT (PowerShell: Remove-Item Env:ADO_APM_PAT)

Supply Chain Security

  • [recommended] _build_git_env(scheme='bearer') does not strip a pre-existing GIT_TOKEN from the inherited environment at src/apm_cli/core/auth.py:731
    The function starts with env = os.environ.copy() and correctly avoids setting GIT_TOKEN in the bearer branch, but never removes a pre-existing GIT_TOKEN that may already be present in the caller's environment. If a stale GIT_TOKEN=<PAT> is in the shell environment the value propagates into every bearer-mode subprocess call, contradicting the stated security guarantee.
    Suggested: Add env.pop('GIT_TOKEN', None) immediately after the bearer branch guard so the clean-env invariant holds unconditionally.
    Proof (missing): tests/test_auth.py -- proves: Bearer env is clean of GIT_TOKEN even when the parent process environment contains one

  • [nit] Lint Rule B covers ls-remote only; raw git clone calls in src/ are not caught at scripts/lint-auth-signals.sh
    A new raw git clone invocation added outside core/auth.py would bypass the Rule B check entirely. The exempted github_downloader.py clone path is the current risk surface, but the lint gap means a future contributor could introduce an unauthenticated clone call without any CI signal.
    Suggested: Add a parallel Rule C block scanning for subprocess git clone calls requiring the same auth-delegated annotation or execute_with_bearer_fallback proximity.

  • [nit] Lint exemption list carries no issue reference or expiry mechanism at scripts/lint-auth-signals.sh
    The two exempted files are described as 'refactor tracked as follow-up to [BUG] apm install -g --update fails with a DevOps auth error where apm install -g succeeds #1212' but the lint script contains no machine-checkable link to a tracking issue. Without one the exemptions are invisible to future maintainers and may persist indefinitely.
    Suggested: Add a comment line per exemption with a GitHub issue URL, e.g.: # TODO(#NNNN): remove once validation.py is migrated to execute_with_bearer_fallback

OSS Growth Hacker

  • [nit] CHANGELOG entry omits the cross-platform error-message improvement mentioned in the PR body at CHANGELOG.md
    The PR body calls out that platform-specific install commands (brew/winget/apt/dnf) now appear on auth failure instead of a Windows-only URL -- a first-run friction win for Linux/macOS users. That story is absent from the CHANGELOG, losing a quick win for release notes and social copy.
    Suggested: Append a second bullet: - apm install auth-failure hint now shows the platform-appropriate install command (brew / winget / apt / dnf) instead of a Windows-only URL. (#1212)

Auth Expert

  • [recommended] _stale_pat_warned_hosts check-then-add is not protected by self._lock, creating a TOCTOU dedup race under parallel install at src/apm_cli/core/auth.py:758
    self._lock guards self._cache but not self._stale_pat_warned_hosts. emit_stale_pat_diagnostic performs an unguarded if X in set: return; set.add(X) pattern. Under ThreadPoolExecutor parallel install, two threads for different ADO deps on the same host can both pass the in check before either calls add(), emitting duplicate stale-PAT warnings. Not a security issue, but the dedup is the whole point of the set.
    Suggested: Guard the check-then-add with self._lock: with self._lock: if host_display in self._stale_pat_warned_hosts: return; self._stale_pat_warned_hosts.add(host_display)

  • [nit] 403 in _ADO_AUTH_FAILURE_SIGNALS will match any response body containing the string '403' at src/apm_cli/utils/github_host.py:281
    is_ado_auth_failure_signal does a substring match on lowered text. '403' as a bare substring will match ADO error codes, commit hashes, or repo names that happen to contain '403'. Callers gate on additional context (host is ADO, token was presented), limiting blast radius, but the signal itself is noisier than a targeted 'http 403' match.
    Suggested: Consider 'http 403' or ' 403 ' (space-padded) to reduce false-positive substring matches on arbitrary ADO error output.

Doc Writer

  • [nit] Stale-PAT fallback docs say 'HTTP 401' but the fix covers 401/403 at docs/src/content/docs/getting-started/authentication.md:182
    authentication.md reads 'rejected (HTTP 401)' and the example output block shows '(HTTP 401)'. The CHANGELOG for [BUG] apm install -g --update fails with a DevOps auth error where apm install -g succeeds #1212 states the preflight now handles 401/403. A user who hits a 403 (insufficient scope, not expired) will not recognise the description in the docs as matching their situation.
    Suggested: Change 'rejected (HTTP 401)' to 'rejected (HTTP 401/403)' and update the example output block accordingly.

  • [recommended] scripts/lint-auth-signals.sh is undiscoverable from docs and CONTRIBUTING at scripts/lint-auth-signals.sh
    The script enforces the PAT->bearer boundary as an anti-regression CI check. Its exemption list is managed inside the script itself. Developers adding new ADO call sites have no obvious path to learn about the rule or the exemption list unless they happen to read the CI workflow diff.
    Suggested: Add a sentence to CONTRIBUTING.md (or to the 'Stale-PAT fallback' section in authentication.md) pointing to scripts/lint-auth-signals.sh and explaining that new ADO call sites must either route through execute_with_bearer_fallback or be added to the exemption list with a tracked follow-up.

  • [nit] CHANGELOG entry uses 'stale' inconsistently with the 401/403 trigger at CHANGELOG.md:22
    The CHANGELOG says 'falls back from a stale ADO_APM_PAT'. A PAT rejected with 403 is not necessarily stale -- it may have insufficient scope. The word 'stale' implies expiry, not rejection in general.
    Suggested: Change 'a stale ADO_APM_PAT' to 'a rejected ADO_APM_PAT (401 or 403)' for consistency with the auth docs.

Test Coverage Expert

  • [recommended] list_remote_refs bearer fallback_used=True branch has no test at any tier at tests/unit/test_list_remote_refs.py
    test_list_remote_refs.py wires execute_with_bearer_fallback with a side_effect that always returns BearerFallbackOutcome(primary_op(), False) -- fallback_used is hardcoded False. The branch in the caller that handles a successful bearer retry (fallback_used=True) is never exercised. If the fallback_used=True path drifts (e.g. stale-PAT warning emission, emit_stale_pat_diagnostic call, or .outcome attribute access changes), no test will catch it.
    Suggested: Add a parametrize variant where the mock execute_with_bearer_fallback side_effect returns BearerFallbackOutcome(result, True) and assert that emit_stale_pat_diagnostic is called and the return value is correct.
    Proof (missing at integration-with-fixtures): tests/unit/test_list_remote_refs.py::test_list_remote_refs_ado_bearer_fallback_used -- proves: When git ls-remote fails with a stale ADO PAT and bearer succeeds, list_remote_refs returns the correct refs and emits the stale-PAT diagnostic [secure-by-default,devx]

  • [nit] Hermetic E2E drives _preflight_auth_check directly, not the full apm install --update CLI argv path at tests/integration/test_ado_preflight_bearer_fallback_e2e.py
    test_ado_preflight_bearer_fallback_e2e.py calls _preflight_auth_check directly. The e2e floor for CLI command surface requires a real CLI invocation. Gap is minor because preflight is well unit-tested and the hermetic test is real-subprocess.
    Suggested: Consider adding one parametrized case to test_ado_bearer_e2e.py that invokes apm via subprocess with ADO_APM_PAT set to a stale value and az faked to succeed, asserting exit code 0.

This panel is advisory. It does not block merge. Re-apply the
panel-review label after addressing feedback to re-run.

Generated by PR Review Panel for issue #1214 · ● 4.8M ·

@github-actions github-actions Bot removed the panel-review Trigger the apm-review-panel gh-aw workflow label May 8, 2026
danielmeppiel and others added 2 commits May 8, 2026 22:52
… dedup

Two PR #1214 panel follow-ups:

- Supply Chain Security: `_build_git_env` bearer branch now explicitly
  drops any pre-existing `GIT_TOKEN` from the env copied from
  `os.environ`. Without this, a stale GIT_TOKEN set by a prior shell or
  CI step would survive the `os.environ.copy()` and silently defeat the
  bearer-isolation guarantee (the JWT is meant to flow only via
  `GIT_CONFIG_VALUE_0`). New unit test
  `TestBuildGitEnvBearerIsolation` pins the clean-env behaviour.

- Auth Expert: `emit_stale_pat_diagnostic` check-then-add of
  `_stale_pat_warned_hosts` is now wrapped in `self._lock`, closing the
  TOCTOU race where two threads in a parallel install on the same ADO
  host could both pass the membership check before either added.
  Without the lock the dedup set defeated its own purpose. New
  concurrency test `test_concurrent_same_host_emits_once` (64 threads
  on a single host -> one warning) provides the regression trap.

CHANGELOG: extended the existing #1212 line under [Unreleased] to call
out both follow-ups; no new entry to keep one-line-per-PR discipline.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel danielmeppiel merged commit c9d7c3a into main May 8, 2026
9 checks passed
@danielmeppiel danielmeppiel deleted the fix/ado-update-bearer-fallback-1212 branch May 8, 2026 20:56
danielmeppiel added a commit that referenced this pull request May 9, 2026
…ostic (#1219)

- Clarify that the stale-PAT->bearer fallback applies to 'apm install --update'
  preflight (previously undocumented; behavior fixed in #1214)
- Add the 'AAD bearer fallback also failed' diagnostic so users know both
  paths were attempted and understand which credential to refresh

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
sergio-sisternes-epam pushed a commit that referenced this pull request May 9, 2026
…ostic (#1219)

- Clarify that the stale-PAT->bearer fallback applies to 'apm install --update'
  preflight (previously undocumented; behavior fixed in #1214)
- Add the 'AAD bearer fallback also failed' diagnostic so users know both
  paths were attempted and understand which credential to refresh

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
danielmeppiel added a commit to slava-kudzinau/apm that referenced this pull request May 10, 2026
…crosoft#1189)

* fix: normalise flat-format dependencies for transitive resolution

Flat list dependencies (dependencies: [...]) were silently dropped
during parsing because from_apm_yml() only handled the dict form
(dependencies: {apm: [...]}).  The isinstance(..., dict) guard
caused the parser to skip flat lists entirely, leaving dependencies
as None and preventing transitive resolution.

Normalise flat lists to {apm: [...]} at parse time with a
DeprecationWarning guiding authors to the structured format.  Apply
the same fix for devDependencies.  Add defensive normalisation in
_apm_yml_writer.py to prevent AttributeError when reading raw YAML
with flat-format dependencies.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* refactor: reject flat-format dependencies with clear error instead of normalising

Replace the DeprecationWarning + silent normalisation with a hard
ValueError that tells authors exactly how to fix their apm.yml.

The flat list format was never documented or supported -- silently
accepting it is worse than failing loudly with actionable guidance.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: reject all non-dict dependency types, not just lists

Address Copilot review feedback: the previous fix only rejected lists
but silently ignored strings, ints, and other scalars -- recreating
the original silent-drop failure mode. Now any non-dict value raises
ValueError with the actual type name.

Also hardens _apm_yml_writer.py to handle dependencies: null and
other scalar types.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: target-agnostic local-bundle install (microsoft#1207) (microsoft#1217)

* fix: target-agnostic local-bundle install (microsoft#1207)

apm pack no longer hardcodes pack.target: copilot. Bundles now ship in
the Anthropic plugin layout as a transport convention only -- the
consumer target is resolved by apm install from project context and
primitives are routed per the resolved target's layout.

Defects fixed (per microsoft#1207 RCA):

- D1: pack records pack.target = "all" by default (or the auto-detected
  target). The flag remains for back-compat but is no longer the
  authoritative binding.
- D2.a: plugin.json is skipped on install regardless of casing in
  the bundle manifest. .mcp.json is also skipped from the verbatim
  deploy loop -- MCP wiring is surfaced as a follow-up notice.
- D2.b: instructions/*.md for compile-only targets (opencode, codex,
  gemini) are now staged under apm_modules/<slug>/.apm/instructions/
  so apm compile picks them up. Slug is validated through
  validate_path_segments and the destination is checked with
  ensure_path_within to block traversal escapes.
- D3: the local-bundle install path sets summary_rendered = True
  before returning so the outer finally-block no longer prints a
  misleading "Install interrupted" line on success.

check_target_mismatch returns None when the bundle records "all", so
target-agnostic bundles never warn against any consumer layout.

Tests:
- tests/unit/install/test_install_local_bundle_issue1207.py (13 cases)
- tests/integration/test_install_local_bundle_e2e.py
  TestInstallLocalBundleIssue1207: 7-row pack -> install matrix over
  copilot, claude, cursor, opencode, codex, gemini plus a multi-target
  consumer asserting both native and staged deployment paths.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore: stamp CHANGELOG with PR microsoft#1217

* Address panel review findings for microsoft#1207

- Case-insensitive .mcp.json skip in deploy loop and fallback walk
  (services.py) so case-folding filesystems cannot smuggle a renamed
  MCP file past the skip.
- Tighten staged_instructions disjunct in local_bundle_handler.py
  (drop dead first disjunct that always triggered for compile-only
  targets) and case-insensitive bundle_mcp detection.
- Compile hint now names the resolved target and primitive count.
- "Share with: apm install <bundle>" success line in apm pack output.
- Unit test class TestMcpJsonNeverDeployed covers case variants.
- e2e compile-hint assertion uses whitespace-collapsed output to
  survive logger line-wrap.
- Docs sweep:
  * pack-distribute.md: rewrote "Targeting mental model" as
    target-agnostic; removed --target examples; simplified cross-
    target -> plugin layout normalization.
  * cli-commands.md: marked apm pack --target deprecated; updated
    bundle install section; removed target-filter table.
  * lockfile-spec.md: pack.target is deprecated/optional informational
    metadata; bundle_files documented.
  * enterprise/security.md: added apm_modules/ allowed prefix with
    slug-validation note; new "Local bundle install trust model"
    subsection.
  * apm-usage skill (commands.md, workflow.md): pack.target is
    informational; OpenCode added to compile-needed list; new
    "Local bundle install" subsection.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Iterate on panel review: harden slug validation + cleanup

Address apm-review-panel findings on PR microsoft#1217:

- sec-1: Enforce documented [A-Za-z0-9._-] slug whitelist in
  install/services.py before validate_path_segments, with explicit
  rejection of leading/trailing dots, '..', forward slashes, null bytes,
  and whitespace.  Aligns code with security.md trust model.

- sec-3: Add parameterized adversarial slug test cases
  (forward slash, null byte, leading/trailing dot, '@', whitespace).

- py-arch-2: Hoist plugin.json / .mcp.json case-insensitive metadata
  skip above the per-target deploy loop so multi-target installs do
  not inflate the skipped counter once per target.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR microsoft#1217 review: ASCII slug, MCP wiring, doc audit

- ASCII-only slug whitelist
- Preserve nested instruction subdirs in stage_root
- Pack success line mentions embedded apm.lock.yaml
- Strengthened plugin.json leak assertion (whole-tree walk)
- URL trailing-dot fix in security/marketplace error messages
- Auto-wire bundle .mcp.json through MCPIntegrator (multi-target)
  so non-Claude harnesses get servers in their native MCP config
- New TestBundleMcpWiring unit suite (parse + wire helpers)
- Doc audit: pack-distribute, lockfile-spec, plugins,
  cli-commands, first-package, security, CHANGELOG

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* tests: add e2e coverage for bundle .mcp.json -> MCPIntegrator boundary

Four e2e cases covering the wiring path that was previously only
covered at the unit level:
- bundle .mcp.json reaches MCPIntegrator.install with the bundle's
  servers and the resolved-target CSV passed via explicit_target
- bundle without .mcp.json does not invoke the integrator (no
  spurious 'No MCP dependencies found' warnings on every install)
- integrator failure does not break the install (file deploys are
  not undone by an MCP wiring hiccup)
- dry-run never fires the integrator (zero side effects on the
  consumer's MCP config)

Per-target file writes (Claude project .mcp.json, .vscode/mcp.json,
.cursor/mcp.json) are owned by MCPIntegrator's own suite; testing
them here would require installed runtime binaries on the CI host.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: fix broken anchor #authoring-workflow -> #plugin-authoring

Deploy Docs CI fails link validation; the actual heading is
'Plugin authoring' at the top of guides/plugins.md.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Daniel Meppiel <copilot-rework@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Daniel Meppiel <[email protected]>

* docs(auth): document ADO --update bearer fallback and dual-fail diagnostic (microsoft#1219)

- Clarify that the stale-PAT->bearer fallback applies to 'apm install --update'
  preflight (previously undocumented; behavior fixed in microsoft#1214)
- Add the 'AAD bearer fallback also failed' diagnostic so users know both
  paths were attempted and understand which credential to refresh

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: surface malformed transitive manifests and add integration tests

Address reviewer feedback:

- Resolver catch site in _try_load_dependency_package now only catches
  FileNotFoundError (expected for missing manifests); ValueError
  propagates to the BFS result loop where it is logged at WARNING
  level, making malformed transitive deps user-visible.
- build_dependency_tree root-parse catch site now logs a warning
  instead of silently returning an empty tree.
- Writer error message includes structured-format hint matching the
  model-layer quality.
- Integration test (CliRunner): flat-list and string deps both exit
  non-zero with actionable error including 'expected a mapping' and
  structured-format example.
- Resolver test: transitive dep with flat-list deps surfaces WARNING
  and resolution continues for valid siblings.
- CHANGELOG entry under [Unreleased] Fixed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Sergio Sisternes <sergio.sisternes@epam.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Daniel Meppiel <51440732+danielmeppiel@users.noreply.github.com>
Co-authored-by: Daniel Meppiel <copilot-rework@github.com>
sergio-sisternes-epam pushed a commit that referenced this pull request May 19, 2026
#1212) (#1214)

* fix(install): wire ADO --update preflight through PAT->bearer fallback (#1212)

Before this change, `apm install --update` raised AuthenticationError on a
stale ADO_APM_PAT even when `az login` was active and `apm install` (no flag)
worked with the same env. The preflight auth probe in pipeline.py was the
only ADO call site that did NOT use the canonical
`AuthResolver.execute_with_bearer_fallback` helper -- it open-coded a
ls-remote and raised on 401/403 immediately.

Changes:

- Extract `is_ado_auth_failure_signal()` predicate into utils/github_host.py
  as the single source of truth for the 5-signal PAT-failure detection set
  (was duplicated across auth.py, pipeline.py, github_downloader.py,
  validation.py).
- Rewrite `_preflight_auth_check` (pipeline.py) on top of
  `execute_with_bearer_fallback`. The bearer attempt uses a clean env via
  `_build_git_env(scheme="bearer")` so a rejected PAT cannot leak into the
  bearer probe via GIT_TOKEN.
- Refactor `list_remote_refs` (github_downloader.py) onto the same canonical
  helper for symmetry with the clone path.
- `AuthResolver.build_error_context` now takes a `bearer_also_failed` flag
  and prefixes the rendered error with a clear "AAD bearer fallback also
  failed" line when both creds were tried, so users know both paths were
  exercised.
- Per-host dedup for the stale-PAT diagnostic so it surfaces once per
  cluster, not per dep.
- Install hint expanded to `brew/winget/choco install azure-cli`.

Anti-regression:

- New `scripts/lint-auth-signals.sh` (bash 3.2 compatible) enforces:
  Rule A -- only `auth_resolver` may import bearer providers, and
  Rule B -- every `git ls-remote` site must have an `# auth-delegated`
  annotation pointing to its delegation strategy. Hooked into ci.yml.
- Hermetic E2E test (test_ado_preflight_bearer_fallback_e2e.py) using
  PATH-injected fake `git` and `az` binaries, wired into both
  scripts/test-integration.sh and scripts/windows/test-integration.ps1.

Tests:

- 19 new predicate tests (test_github_host_predicate.py).
- 3 new bearer-fallback cases in test_pipeline_auth_preflight.py
  (stale-PAT->bearer success, both fail, no-PAT-leak in bearer env).
- 2 new hermetic E2E tests reproducing the #1212 scenario end-to-end.
- All 7823 unit tests pass; new E2E tests pass.

Refs: #1212

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(install): address apm-review-panel findings on #1212

Synthesizes recommendations from the local panel review (auth-expert,
doc-writer, devx-ux, test-coverage, supply-chain-security):

- list_remote_refs error path now propagates bearer_also_failed to
  build_error_context, matching install preflight UX.
- build_error_context Case 4 retry hint now expands install steps
  with apt/dnf coverage for Linux users; Case 2 no longer renders the
  contradictory PAT-rejected prefix when no PAT was tried.
- lint-auth-signals.sh Rule A now scans for any reference to
  get_bearer_provider (not just single-line imports), catching
  multi-line import blocks and module-attribute access. Clone path
  in github_downloader.py:_execute_transport_plan exempted with
  documented follow-up rationale (refactor onto
  execute_with_bearer_fallback is non-trivial in the tier-attempt
  loop).
- Added unit tests for build_error_context(bearer_also_failed=True)
  prefix in Case 4, defensive Case 2 ignores the kwarg, and
  emit_stale_pat_diagnostic per-host dedup.
- Tightened CHANGELOG entry per doc-writer (drop internal predicate
  refactor detail; lead with user-visible behavior).

All 7826 unit tests pass, both hermetic E2E tests pass, ruff clean,
auth-signal lint clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(install): trim github_downloader.py to file-length guardrail

The bearer_also_failed propagation in list_remote_refs pushed the file
to 2407 lines (max 2400). Tightened the comments around _bearer_op
SECURITY note and the ado_eligible branch to bring it back to 2400.

No behavior change.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(auth): tighten bearer-fallback diagnostics from PR #1214 review

Address Copilot review feedback on PR #1214:

* execute_with_bearer_fallback now returns BearerFallbackOutcome
  (NamedTuple) carrying outcome + bearer_attempted flag. Callers in
  install/pipeline.py and deps/github_downloader.py use the flag so
  bearer_also_failed only renders 'az cli bearer was also rejected'
  when the bearer leg actually ran -- not on early returns (az
  unavailable, JWT acquisition failed).

* build_error_context docstring fixed: bearer_also_failed prefix
  applies to Case 4 (PAT set + az_available), not Case 2.

* emit_stale_pat_diagnostic warning no longer hard-codes '(HTTP 401)'
  -- the predicate also matches 403 and other auth signals.

* Windows test-integration.ps1 stops invoking the POSIX-only E2E
  (which skipif win32) and notes the unit suite covers the contract.

No behavior change beyond diagnostic accuracy; all 7826 unit tests
pass and lint is clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(auth): pop stale GIT_TOKEN in bearer env and lock-guard stale-PAT dedup

Two PR #1214 panel follow-ups:

- Supply Chain Security: `_build_git_env` bearer branch now explicitly
  drops any pre-existing `GIT_TOKEN` from the env copied from
  `os.environ`. Without this, a stale GIT_TOKEN set by a prior shell or
  CI step would survive the `os.environ.copy()` and silently defeat the
  bearer-isolation guarantee (the JWT is meant to flow only via
  `GIT_CONFIG_VALUE_0`). New unit test
  `TestBuildGitEnvBearerIsolation` pins the clean-env behaviour.

- Auth Expert: `emit_stale_pat_diagnostic` check-then-add of
  `_stale_pat_warned_hosts` is now wrapped in `self._lock`, closing the
  TOCTOU race where two threads in a parallel install on the same ADO
  host could both pass the membership check before either added.
  Without the lock the dedup set defeated its own purpose. New
  concurrency test `test_concurrent_same_host_emits_once` (64 threads
  on a single host -> one warning) provides the regression trap.

CHANGELOG: extended the existing #1212 line under [Unreleased] to call
out both follow-ups; no new entry to keep one-line-per-PR discipline.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Daniel Meppiel <copilot-rework@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: danielmeppiel <danielmeppiel@users.noreply.github.com>
sergio-sisternes-epam pushed a commit that referenced this pull request May 19, 2026
…ostic (#1219)

- Clarify that the stale-PAT->bearer fallback applies to 'apm install --update'
  preflight (previously undocumented; behavior fixed in #1214)
- Add the 'AAD bearer fallback also failed' diagnostic so users know both
  paths were attempted and understand which credential to refresh

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
sergio-sisternes-epam added a commit that referenced this pull request May 19, 2026
)

* fix: normalise flat-format dependencies for transitive resolution

Flat list dependencies (dependencies: [...]) were silently dropped
during parsing because from_apm_yml() only handled the dict form
(dependencies: {apm: [...]}).  The isinstance(..., dict) guard
caused the parser to skip flat lists entirely, leaving dependencies
as None and preventing transitive resolution.

Normalise flat lists to {apm: [...]} at parse time with a
DeprecationWarning guiding authors to the structured format.  Apply
the same fix for devDependencies.  Add defensive normalisation in
_apm_yml_writer.py to prevent AttributeError when reading raw YAML
with flat-format dependencies.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* refactor: reject flat-format dependencies with clear error instead of normalising

Replace the DeprecationWarning + silent normalisation with a hard
ValueError that tells authors exactly how to fix their apm.yml.

The flat list format was never documented or supported -- silently
accepting it is worse than failing loudly with actionable guidance.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: reject all non-dict dependency types, not just lists

Address Copilot review feedback: the previous fix only rejected lists
but silently ignored strings, ints, and other scalars -- recreating
the original silent-drop failure mode. Now any non-dict value raises
ValueError with the actual type name.

Also hardens _apm_yml_writer.py to handle dependencies: null and
other scalar types.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: target-agnostic local-bundle install (#1207) (#1217)

* fix: target-agnostic local-bundle install (#1207)

apm pack no longer hardcodes pack.target: copilot. Bundles now ship in
the Anthropic plugin layout as a transport convention only -- the
consumer target is resolved by apm install from project context and
primitives are routed per the resolved target's layout.

Defects fixed (per #1207 RCA):

- D1: pack records pack.target = "all" by default (or the auto-detected
  target). The flag remains for back-compat but is no longer the
  authoritative binding.
- D2.a: plugin.json is skipped on install regardless of casing in
  the bundle manifest. .mcp.json is also skipped from the verbatim
  deploy loop -- MCP wiring is surfaced as a follow-up notice.
- D2.b: instructions/*.md for compile-only targets (opencode, codex,
  gemini) are now staged under apm_modules/<slug>/.apm/instructions/
  so apm compile picks them up. Slug is validated through
  validate_path_segments and the destination is checked with
  ensure_path_within to block traversal escapes.
- D3: the local-bundle install path sets summary_rendered = True
  before returning so the outer finally-block no longer prints a
  misleading "Install interrupted" line on success.

check_target_mismatch returns None when the bundle records "all", so
target-agnostic bundles never warn against any consumer layout.

Tests:
- tests/unit/install/test_install_local_bundle_issue1207.py (13 cases)
- tests/integration/test_install_local_bundle_e2e.py
  TestInstallLocalBundleIssue1207: 7-row pack -> install matrix over
  copilot, claude, cursor, opencode, codex, gemini plus a multi-target
  consumer asserting both native and staged deployment paths.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore: stamp CHANGELOG with PR #1217

* Address panel review findings for #1207

- Case-insensitive .mcp.json skip in deploy loop and fallback walk
  (services.py) so case-folding filesystems cannot smuggle a renamed
  MCP file past the skip.
- Tighten staged_instructions disjunct in local_bundle_handler.py
  (drop dead first disjunct that always triggered for compile-only
  targets) and case-insensitive bundle_mcp detection.
- Compile hint now names the resolved target and primitive count.
- "Share with: apm install <bundle>" success line in apm pack output.
- Unit test class TestMcpJsonNeverDeployed covers case variants.
- e2e compile-hint assertion uses whitespace-collapsed output to
  survive logger line-wrap.
- Docs sweep:
  * pack-distribute.md: rewrote "Targeting mental model" as
    target-agnostic; removed --target examples; simplified cross-
    target -> plugin layout normalization.
  * cli-commands.md: marked apm pack --target deprecated; updated
    bundle install section; removed target-filter table.
  * lockfile-spec.md: pack.target is deprecated/optional informational
    metadata; bundle_files documented.
  * enterprise/security.md: added apm_modules/ allowed prefix with
    slug-validation note; new "Local bundle install trust model"
    subsection.
  * apm-usage skill (commands.md, workflow.md): pack.target is
    informational; OpenCode added to compile-needed list; new
    "Local bundle install" subsection.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Iterate on panel review: harden slug validation + cleanup

Address apm-review-panel findings on PR #1217:

- sec-1: Enforce documented [A-Za-z0-9._-] slug whitelist in
  install/services.py before validate_path_segments, with explicit
  rejection of leading/trailing dots, '..', forward slashes, null bytes,
  and whitespace.  Aligns code with security.md trust model.

- sec-3: Add parameterized adversarial slug test cases
  (forward slash, null byte, leading/trailing dot, '@', whitespace).

- py-arch-2: Hoist plugin.json / .mcp.json case-insensitive metadata
  skip above the per-target deploy loop so multi-target installs do
  not inflate the skipped counter once per target.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address PR #1217 review: ASCII slug, MCP wiring, doc audit

- ASCII-only slug whitelist
- Preserve nested instruction subdirs in stage_root
- Pack success line mentions embedded apm.lock.yaml
- Strengthened plugin.json leak assertion (whole-tree walk)
- URL trailing-dot fix in security/marketplace error messages
- Auto-wire bundle .mcp.json through MCPIntegrator (multi-target)
  so non-Claude harnesses get servers in their native MCP config
- New TestBundleMcpWiring unit suite (parse + wire helpers)
- Doc audit: pack-distribute, lockfile-spec, plugins,
  cli-commands, first-package, security, CHANGELOG

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* tests: add e2e coverage for bundle .mcp.json -> MCPIntegrator boundary

Four e2e cases covering the wiring path that was previously only
covered at the unit level:
- bundle .mcp.json reaches MCPIntegrator.install with the bundle's
  servers and the resolved-target CSV passed via explicit_target
- bundle without .mcp.json does not invoke the integrator (no
  spurious 'No MCP dependencies found' warnings on every install)
- integrator failure does not break the install (file deploys are
  not undone by an MCP wiring hiccup)
- dry-run never fires the integrator (zero side effects on the
  consumer's MCP config)

Per-target file writes (Claude project .mcp.json, .vscode/mcp.json,
.cursor/mcp.json) are owned by MCPIntegrator's own suite; testing
them here would require installed runtime binaries on the CI host.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs: fix broken anchor #authoring-workflow -> #plugin-authoring

Deploy Docs CI fails link validation; the actual heading is
'Plugin authoring' at the top of guides/plugins.md.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Daniel Meppiel <copilot-rework@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Daniel Meppiel <[email protected]>

* docs(auth): document ADO --update bearer fallback and dual-fail diagnostic (#1219)

- Clarify that the stale-PAT->bearer fallback applies to 'apm install --update'
  preflight (previously undocumented; behavior fixed in #1214)
- Add the 'AAD bearer fallback also failed' diagnostic so users know both
  paths were attempted and understand which credential to refresh

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: surface malformed transitive manifests and add integration tests

Address reviewer feedback:

- Resolver catch site in _try_load_dependency_package now only catches
  FileNotFoundError (expected for missing manifests); ValueError
  propagates to the BFS result loop where it is logged at WARNING
  level, making malformed transitive deps user-visible.
- build_dependency_tree root-parse catch site now logs a warning
  instead of silently returning an empty tree.
- Writer error message includes structured-format hint matching the
  model-layer quality.
- Integration test (CliRunner): flat-list and string deps both exit
  non-zero with actionable error including 'expected a mapping' and
  structured-format example.
- Resolver test: transitive dep with flat-list deps surfaces WARNING
  and resolution continues for valid siblings.
- CHANGELOG entry under [Unreleased] Fixed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Sergio Sisternes <sergio.sisternes@epam.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Daniel Meppiel <51440732+danielmeppiel@users.noreply.github.com>
Co-authored-by: Daniel Meppiel <copilot-rework@github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] apm install -g --update fails with a DevOps auth error where apm install -g succeeds

2 participants