Skip to content

feat(deps): fetch path-scoped files over git transport instead of host API#1740

Merged
danielmeppiel merged 11 commits into
mainfrom
danielmeppiel/1014-gitlab-git-transport
Jun 11, 2026
Merged

feat(deps): fetch path-scoped files over git transport instead of host API#1740
danielmeppiel merged 11 commits into
mainfrom
danielmeppiel/1014-gitlab-git-transport

Conversation

@danielmeppiel

@danielmeppiel danielmeppiel commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

TL;DR

Stacked on #1732 (#1571) -- must merge after it. This PR fixes self-hosted GitLab instances where path:-specified files fail with HTTP 410 (REST API disabled) by routing the fetch through git sparse/partial checkout instead of the host REST API. GITLAB_APM_PAT is kept as a thin fallback for API-only environments.

Credit: @cossons for the root-cause analysis and ratified git-transport-first design. @cossons -- please validate against your self-hosted GitLab 410 repro before we merge.

Closes #1014.


Problem (WHY)

Self-hosted GitLab instances often disable the REST API or do not support the same auth flow as GitHub. When apm install encounters a dep like:

- git: git@gitlab.corp.example.com:platform/standards.git
  path: agents/api-specialist.agent.md
  ref: main

APM fetches the path:-specified file via the GitLab REST v4 API (/projects/.../repository/files/.../raw). On a self-hosted instance with the API disabled, this returns HTTP 410 and the install fails. A PAT token cannot fix a 410 -- the API is gone.


Approach (WHAT)

git-transport-first (Option C-lite, per board decision): when a dep's source is a git/SSH repo classified as GitLab, extract the path:-specified file through the SAME git fetch already in use for clones, using sparse/partial checkout:

  1. git init + git remote add origin <auth-url>
  2. git sparse-checkout init --no-cone + git sparse-checkout set --no-cone <file-path...> (exact file-level sparse paths)
  3. git fetch --filter=blob:none --depth=1 origin <ref> -- downloads trees + commits only; blobs are lazy
  4. git checkout FETCH_HEAD -- materializes only the accumulated sparse file paths
  5. ensure_path_within(target, work_dir) -- rejects symlink/traversal escapes

If git transport fails, the existing GitLab REST v4 API is tried as a thin fallback (mirroring the ADO_APM_PAT pattern). Path containment is enforced unconditionally.


Implementation (HOW)

New module: src/apm_cli/deps/git_file_transport.py

  • GitSparseFileTransport reusable class for one repo/ref sparse checkout
  • fetch_file_via_git_sparse(...) compatibility wrapper for single-file callers/tests
  • File-level sparse paths (--no-cone) for root and nested files, always paired with --filter=blob:none
  • GIT_TERMINAL_PROMPT=0 on git subprocesses to avoid interactive auth hangs
  • Token-bearing HTTP(S) URLs in git stderr are redacted before any RuntimeError is raised
  • validate_path_segments() before git work and ensure_path_within() after checkout

Modified: src/apm_cli/deps/download_strategies.py

  • DownloadDelegate.download_gitlab_file() tries git transport first and reuses one cached transport per GitLab host/repo/ref/port for multiple path: files
  • Cache access and transport fetches are lock-protected for the parallel resolver
  • Transport-level git failures discard the cached checkout and fall back to REST; PathTraversalError hard-fails with no REST fallback
  • Uses the public DownloadDelegate.build_repo_url seam instead of injecting the downloader's private _build_repo_url

Tests: tests/test_gitlab_git_transport.py plus focused delegate/parser coverage

  • Path traversal and encoded traversal fail before git subprocesses
  • Git fetch uses --filter=blob:none and file-level sparse paths, including root files
  • Symlink escapes are rejected by ensure_path_within
  • Git stderr redacts token-bearing HTTP(S) URLs; git env sets GIT_TERMINAL_PROMPT=0; refs starting with - are rejected before git
  • GitLab git transport bypasses REST on success; REST fallback remains available on non-security git failures
  • PathTraversalError from the git transport is not swallowed into REST fallback
  • Same repo/ref path: files reuse one cached sparse checkout, including a parallel resolver regression trap
  • Embedded-subpath explicit git URL guard points users at path: without rendering-layer status symbols

Docs: consumer/authentication.md, consumer/manage-dependencies.md, apm-usage/authentication.md, apm-usage/dependencies.md, CHANGELOG.md.


Trade-offs

Decision Rationale
git-transport-first, not API-first A 410 means the API is gone; a token cannot revive it. Git always works when clone works.
Same-run checkout reuse Multiple path: files from the same GitLab repo/ref share one sparse checkout, avoiding N fetches while keeping cleanup bounded to the downloader lifecycle.
Thin REST API fallback kept Some environments may have network restrictions on git but allow HTTPS API. GITLAB_APM_PAT stays useful.
File-level non-cone sparse paths Root-level and nested files both stay sparse; no whole-tree materialization is needed.

Validation evidence

# Focused GitLab transport / delegate / parser tests
uv run --extra dev pytest tests/integration/test_download_strategies_phase3w5.py tests/integration/test_download_strategies_selection.py tests/unit/deps/test_download_strategies_phase3.py tests/unit/deps/test_download_strategies_selection.py tests/test_gitlab_git_transport.py tests/unit/test_generic_git_urls.py -q
# 568 passed in 6.74s

# Mutation-break gates for new regression traps
# - removing GIT_TERMINAL_PROMPT=0 fails test_git_env_disables_terminal_prompts
# - removing stderr redaction fails test_git_failure_redacts_token_bearing_stderr
# - disabling checkout reuse fails test_same_repo_ref_reuses_git_file_transport
# - allowing refs starting with `-` fails test_ref_starting_with_dash_rejected_before_git
# - omitting the real subprocess path would miss test_real_sparse_checkout_materializes_file

# Lint: all green
uv run --extra dev ruff check src/ tests/
uv run --extra dev ruff format --check src/ tests/
uv run --extra dev python -m pylint --disable=all --enable=R0801 --min-similarity-lines=10 --fail-on=R0801 src/apm_cli/
bash scripts/lint-auth-signals.sh
# Local lint chain exit 0; CI checks green (14 successful, 1 skipped).

How to test

  1. Set up a self-hosted GitLab instance (or simulate one) with the REST API disabled (returns 410).
  2. Create an apm.yml with a dep like:
    dependencies:
      apm:
        - git: git@gitlab.corp.example.com:platform/standards.git
          path: agents/api-specialist.agent.md
          ref: main
  3. Ensure SSH access to the repo is working (git clone git@... succeeds).
  4. Set GITLAB_HOST=gitlab.corp.example.com so APM classifies the host.
  5. Run apm install -- the file should be fetched without any 410 error.

Board scope + fences

  • git-transport-first: default for all GitLab sources. (implemented)
  • Thin GITLAB_PAT fallback mirroring ADO_APM_PAT. (implemented -- REST API fallback in download_gitlab_file)
  • ensure_path_within() applied to extracted file. (implemented -- rejects symlinks)
  • validate_path_segments() applied to path string. (implemented -- rejects ../ before git)
  • No full clone: depth=1 + blob:none + file-level sparse paths. (implemented)
  • Reuse existing clone auth env (git_env from downloader) and same-run sparse checkout per repo/ref. (implemented)
  • Stacked on [BUG] or [QUESTION] Local dependencies in shared repository #1571 (shared remote-path + URL-scheme surface). (implemented)
  • PAT-only as primary fix: rejected. (git-first is the default)
  • Naive full clone: rejected. (depth=1 + filter=blob:none used)
  • Skipping containment: rejected. (ensure_path_within unconditional)
  • Interactive credential hang: rejected. (GIT_TERMINAL_PROMPT=0 set on git subprocesses)
  • Token echo from git stderr: rejected. (token-bearing HTTP(S) URLs redacted before RuntimeError)
  • Git option injection through refs: rejected. (refs starting with - fail before subprocess)

DX rider: friendly error for subpath embedded in git URL (Refs #872)

What was added

A parse-time guard (DependencyReference._check_no_embedded_subpath) that detects when a user accidentally embeds an APM primitive subpath inside an explicit git URL form (SCP git@host:path, ssh://, or https://), e.g.:

- git: git@github.com:org/repo/skills/hello-world.git   # <-- footgun

Without this guard, git would later fail with a cryptic fatal: 'org/repo/skills/hello-world' does not appear to be a git repository message. APM now fires before any git subprocess and emits:

A subpath cannot be embedded in a git URL. Use the `path:` key instead:
    `git: <repo-url>` + `path: <primitive>/<name>`
    (or the shorthand `org/repo/<primitive>/<name>`).
    See https://microsoft.github.io/apm/consumer/manage-dependencies/

What was NOT built

The monorepo-skill-install capability already ships via the path: key and the bare virtual-path shorthand:

# Supported -- path: key in object form
- git: git@github.com:org/repo.git
  path: skills/hello-world

# Supported -- bare shorthand
- org/repo/skills/hello-world

No new install feature was added. Issue #872 remains open as the tracker for the deferred skills: [...] subset-filter sugar (board deferred that part).

Heuristic / false-positive check

The guard checks for a known APM primitive directory name (skills, agents, prompts, instructions, chatmodes, collections, contexts, memory) as a non-last interior path segment of an explicit git URL. Shapes that must NOT trigger -- and were verified:

Shape Result
git@gitlab.com:group/subgroup/repo.git (GitLab subgroup) Passes -- no primitive segment
git@github.com:org/repo.git (plain 2-segment) Passes -- too few segments
ssh://git@host/owner/repo.git (plain ssh://) Passes -- too few segments
https://gitlab.com/acme/coding-standards.git (plain https://) Passes -- no primitive segment
org/repo/skills/hello-world (bare shorthand, SUPPORTED) Passes -- not an explicit git URL form

Tests added

Focused tests in tests/unit/test_generic_git_urls.py::TestEmbeddedSubpathInGitUrl covering: SCP error, ssh:// error, https:// error, message mentions path:, agents primitive, parse_from_dict path, and all the no-regression cases above.

Refs #872

@danielmeppiel

Copy link
Copy Markdown
Collaborator Author

APM Review Panel: ship_with_followups

Routes GitLab path: fetches over git sparse/partial checkout instead of the REST API -- fixing the self-hosted GitLab 410 -- and adds a parse-time friendly error for subpaths embedded in git URLs (#872).

cc @danielmeppiel @sergio-sisternes-epam -- a fresh advisory pass is ready for your review.

This is a clean, well-fenced delivery of the board-ratified brief. The git-transport-first design (git init + cone sparse-checkout + --filter=blob:none --depth=1 + git checkout FETCH_HEAD) is the correct shape for the 410 problem: a token cannot revive a disabled API, but git works whenever the clone works. Containment is unconditional on the git path -- validate_path_segments() fires before any subprocess and ensure_path_within() rejects symlink escapes after checkout, both with dedicated tests. The thin GITLAB_APM_PAT fallback mirrors the established ADO_APM_PAT pattern and keeps git-first as the default. The #872 rider lands as a parse-time guard with a friendly, actionable message and a regex-asserted regression test (not a URL substring), exactly per the contract.

The panel converges with no blocking findings. The highest-signal item is architectural/security defense-in-depth: the except Exception that triggers the REST fallback in download_gitlab_file also swallows the git path's PathTraversalError, downgrading a validation failure to a "transport unavailable, try REST" decision. It is not an exploitable bypass today -- the REST fallback URL-encodes the path via quote(file_path, safe=""), so traversal collapses into a single literal path component the GitLab API treats inertly -- but a containment guard should not be silently demoted. A close second is a DevX false-positive edge in the #872 heuristic. Both are cheap follow-ups, neither gates merge.

Informational context (not a finding): this PR is stacked on #1571/#1732 (base is not main), so full CI shards are intentionally pending the post-merge auto-retarget; the shown subset is green.

Aligned with: secure-by-default (containment unconditional on the git path; no new credential or origin introduced), multi-harness/multi-host (self-hosted GitLab 410 resolved while git-first stays the default), pragmatic-as-npm (reuses the existing clone transport and a thin token fallback rather than new machinery), oss-community-driven (credits @cossons for the root-cause analysis and requests reporter validation before merge).

Growth signal. The self-hosted GitLab 410 fix is a concrete enterprise unblocker worth amplifying: capture it as a release-note highlight and reply on the originating issue/discussion to convert @cossons's repro into an adoption story for self-hosted-GitLab shops.

Panel summary

Persona B R N Takeaway
Python Architect 0 2 1 Clean module seam; narrow the broad except so containment errors propagate.
CLI Logging Expert 0 1 1 git->REST fallback is silent; surface it for 410 debugging.
DevX UX Expert 0 1 1 #872 error is excellent; heuristic can false-positive on primitive-named subgroups.
Supply Chain Security 0 1 0 Containment solid and auth URL not leaked; don't swallow PathTraversalError.
OSS Growth Hacker 0 0 1 Strong contributor credit + clear CHANGELOG; amplify the 410 fix.
Test Coverage Expert 0 1 0 Sparse-fetch + #872 regressions verified present; add two coupled guards.
Auth Expert 0 0 1 git-first default preserved; token precedence and git_env reuse clean.
Doc Writer 0 0 1 All four doc surfaces + CHANGELOG updated; trim trailing blank lines.
Performance Expert 0 1 0 blob:none + depth=1 + cone correct; watch O(files) for same-repo multi-fetch.

B = blocking-severity findings, R = recommended, N = nits.
Counts are signal strength, not gates. The maintainer ships.

Top 5 follow-ups

  1. [Supply Chain Security] Narrow the except Exception in download_gitlab_file so PathTraversalError / validation errors propagate instead of silently falling through to the REST transport. -- A containment guard should never be reinterpreted as "git transport unavailable"; keep defense-in-depth even though the REST path encodes the path today.
  2. [DevX UX Expert] Tighten the Support installing individual skills/agents from monorepo packages (including private repo support) #872 heuristic: a GitLab subgroup literally named after a primitive (e.g. git@gitlab.com:group/skills/repo.git) trips _check_no_embedded_subpath. -- Would wrongly fail a legitimate dep; consider also requiring the trailing segment to look like a primitive, or document the limitation.
  3. [Test Coverage Expert] Add two regression tests: PathTraversalError must not be swallowed into the REST fallback, and the subgroup-named-primitive shape must parse cleanly. -- Locks follow-ups 1 and 2 against silent regression.
  4. [CLI Logging Expert] Emit an info/verbose note when the git transport fails and REST serves the file. -- The git failure is currently _debug-only, hiding which transport answered and complicating 410 triage.
  5. [Performance Expert] Track batching multiple same-repo path: fetches into a single sparse cone if multi-file deps become common. -- Per-file fetch is O(files) fresh git inits today; the board accepted no-caching for the single-file case, so this is a watch item, not a change.

Architecture

classDiagram
    class DownloadDelegate {
        +download_gitlab_file(dep_ref, file_path, ref) bytes
    }
    class git_file_transport {
        +fetch_file_via_git_sparse(dep_ref, file_path, ref) bytes
    }
    class DependencyReference {
        +parse(dependency_str)
        -_check_no_embedded_subpath(url)
        -_APM_PRIMITIVE_DIRS
    }
    class path_security {
        +validate_path_segments(path)
        +ensure_path_within(target, root)
    }
    DownloadDelegate ..> git_file_transport : git-first
    DownloadDelegate ..> path_security : (REST path encodes)
    git_file_transport ..> path_security : containment
    DependencyReference ..> path_security : segment validation
Loading
flowchart TD
    A[download_gitlab_file] --> B{git transport}
    B -->|fetch_file_via_git_sparse| C[validate_path_segments before git]
    C --> D[git init + sparse-checkout set parent]
    D --> E[git fetch --filter=blob:none --depth=1]
    E --> F[git checkout FETCH_HEAD]
    F --> G[ensure_path_within target work_dir]
    G --> H[return file bytes]
    B -->|Exception caught| I[REST v4 fallback with GITLAB_APM_PAT]
    I --> J[return bytes or RuntimeError]
Loading
sequenceDiagram
    participant D as DownloadDelegate
    participant G as git_file_transport
    participant S as path_security
    D->>G: fetch_file_via_git_sparse(dep_ref, file_path, ref)
    G->>S: validate_path_segments(file_path)
    G->>G: git init / remote add / sparse-checkout set
    G->>G: git fetch --filter=blob:none --depth=1
    G->>G: git checkout FETCH_HEAD
    G->>S: ensure_path_within(target, work_dir)
    G-->>D: file bytes
Loading

Recommendation

Ship after a light pass on the follow-ups. The PR fulfills the board brief end to end with strong test coverage and correct, unconditional containment. Recommended next action: narrow the fallback except so PathTraversalError propagates (follow-up 1) and add the two coupled regression tests (follow-up 3); the DevX false-positive (follow-up 2) and the logging/perf items can land in the same pass or as fast follow. None of these block merge -- the maintainer performs the protected-branch merge once #1571/#1732 lands and CI retargets to main.


Full per-persona findings

Python Architect

  • [recommended] Broad except Exception as git_err in download_gitlab_file swallows the git path's validation errors and falls through to REST at src/apm_cli/deps/download_strategies.py
    The fallback intent is "git transport unavailable -> try API", but PathTraversalError from validate_path_segments() and containment failures are also caught, conflating a security-validation outcome with a transport-availability outcome.
    Suggested: Catch a narrower set (subprocess/RuntimeError/OSError) or re-raise PathTraversalError before entering the REST branch.
  • [recommended] fetch_file_via_git_sparse reaches self._host._build_repo_url (a private member) through the injected build_repo_url_fn at src/apm_cli/deps/git_file_transport.py
    The callable-injection cleanly avoids a circular import, which is the right call; the residual smell is binding to a private method. A small public seam on the host would make the contract explicit.
    Suggested: Expose a thin public build_repo_url(...) wrapper and pass that.
  • [nit] work_dir.mkdir(exist_ok=True) under a freshly created mkdtemp prefix never needs exist_ok at src/apm_cli/deps/git_file_transport.py
    Harmless, but exist_ok=True hides an unexpected collision in the unique temp dir.

CLI Logging Expert

  • [recommended] The git->REST fallback is observable only at _debug at src/apm_cli/deps/download_strategies.py
    When git transport fails and REST succeeds, the user sees the normal success message with no indication that a fallback occurred -- which is precisely the signal a self-hosted-GitLab operator needs when diagnosing transport selection.
    Suggested: Emit a [i]/verbose line such as "git transport unavailable; using GitLab REST API fallback" before the REST attempt.
  • [nit] Success messages diverge between transports ("Downloaded file via git: ..." vs "Downloaded file: ...") at src/apm_cli/deps/download_strategies.py
    Reasonable and within ASCII conventions; consider including the ref in the git-path message for parity.

DevX UX Expert

  • [recommended] The Support installing individual skills/agents from monorepo packages (including private repo support) #872 heuristic can false-positive on a GitLab subgroup literally named after a primitive at src/apm_cli/models/dependency/reference.py
    _check_no_embedded_subpath flags any non-last segment matching an APM primitive name; a legitimate git@gitlab.com:group/skills/repo.git (subgroup named "skills") has interior segment "skills" and would be wrongly failed.
    Suggested: Also require the trailing segment(s) to resemble a primitive target (e.g. <primitive>/<name>), or document the constraint in the error/docs.
  • [nit] Friendly error is well-formed at src/apm_cli/models/dependency/reference.py
    Uses the [x] status symbol, names both the path: key and the bare shorthand, and echoes the offending input via {raw!r}. This is the model the repo wants for actionable parse errors.

Supply Chain Security

  • [recommended] Containment guard demoted by the fallback's broad except at src/apm_cli/deps/download_strategies.py
    Positive: containment is unconditional on the git path (validate_path_segments pre-git, ensure_path_within post-checkout, both tested), and the error string uses cmd[:3] so the auth-embedded clone URL is never logged. The gap is that the same broad except that enables REST fallback also swallows PathTraversalError. Not exploitable today (the REST URL quote()-encodes the path into one inert component), but a validation failure should not be silently reinterpreted as a transport miss.
    Suggested: Re-raise PathTraversalError (and other containment errors) before falling back.

OSS Growth Hacker

  • [nit] Strong adoption hygiene at CHANGELOG.md
    Clear "so what" CHANGELOG entry, explicit @cossons credit for root-cause + design, and a pre-merge reporter-validation ask. Worth converting the self-hosted-GitLab 410 fix into a release highlight and a reply on the originating thread.

Test Coverage Expert

  • [recommended] Two coupled regression tests are missing
    Verified present and solid (inspected the diff): the 12-test tests/test_gitlab_git_transport.py sparse-fetch/containment/fallback suite and the tests/unit/test_generic_git_urls.py::TestEmbeddedSubpathInGitUrl Support installing individual skills/agents from monorepo packages (including private repo support) #872 suite (which correctly asserts on the parsed message via regex, not a URL substring, and includes no-false-positive cases). Gaps: (1) no test that download_gitlab_file does NOT swallow a PathTraversalError into the REST fallback; (2) no no-regression test for a GitLab subgroup named after a primitive.
    Suggested: Add both; they pin the supply-chain and DevX follow-ups.

Auth Expert

  • [nit] Token model is clean at src/apm_cli/deps/download_strategies.py
    git-first remains the default; the REST fallback resolves via the existing auth_resolver with GITLAB_APM_PAT -> GITLAB_TOKEN -> git credential fill precedence; git_env merges os.environ with self._host.git_env so no new credential or origin is introduced. Worth a quick confirm that resolver order matches the GITLAB_TOKEN "lower-precedence" claim now documented in the skill resource.

Doc Writer

  • [nit] Documentation coverage is complete at docs/src/content/docs/consumer/authentication.md
    Both consumer docs and both packages/apm-guide/.apm/skills/apm-usage/ resources (authentication + dependencies) are updated alongside the CHANGELOG, satisfying the docs-sync instruction. Minor: trim the double trailing blank lines added in apm-usage/authentication.md.

Performance Expert

  • [recommended] Per-file fetch is correct but unbatched at src/apm_cli/deps/git_file_transport.py
    blob:none + depth=1 + cone sparse-checkout is the right hot-path shape and is genuinely not a full clone (root files correctly skip cone). The watch item: each path: file from the same repo spawns a fresh git init + fetch into a new temp dir, so a multi-file same-repo dep is O(files) inits. The board accepted no caching for the single-file case; revisit only if multi-file deps become common.
    Suggested: If needed later, set multiple sparse paths in one cone and checkout once.

This panel is advisory. It does not block merge. Re-apply the
panel-review label after addressing feedback to re-run.

Base automatically changed from danielmeppiel/1571-local-deps-shared-repo to main June 11, 2026 07:05
Copilot AI review requested due to automatic review settings June 11, 2026 08:11
danielmeppiel added a commit that referenced this pull request Jun 11, 2026
Two panel follow-ups on the #1014 GitLab git-transport surface.

Supply-chain: download_gitlab_file caught a bare `except Exception`
around the git-transport attempt, so a PathTraversalError raised by
the path-validation / symlink-containment guards in
fetch_file_via_git_sparse was silently swallowed and the request fell
through to the REST transport. A rejected traversal must not gain a
second transport to probe. PathTraversalError now propagates unchanged;
only genuine transport failures fall back to REST.

CLI logging: the git->REST fallback transition was `_debug`-only, so
triagers had no signal which transport answered a 410. The fallback now
emits a verbose `[i]` note carrying the git failure reason, and the REST
success notes attribute the GitLab REST API explicitly.

Adds a regression test asserting a PathTraversalError in the git path is
not swallowed into the REST fallback (mutation-break verified). Updates
existing verbose-callback tests that exercised the unmocked-git fallback
path to the new transport-attributed messages.

addresses CEO follow-up (Supply Chain) and (CLI Logging) on PR #1740

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel danielmeppiel force-pushed the danielmeppiel/1014-gitlab-git-transport branch from f6cb096 to a03ac65 Compare June 11, 2026 08:11
danielmeppiel added a commit that referenced this pull request Jun 11, 2026
The #872 guard `_check_no_embedded_subpath` flagged any non-last path
segment matching an APM primitive directory name, which produced two
false positives on legitimate URLs:

- A GitLab subgroup literally named after a primitive, e.g.
  `git@gitlab.com:group/skills/repo.git`, where `skills` is a subgroup
  at index 1 and `repo` is the real repository.
- Azure DevOps virtual-path URLs, e.g.
  `https://dev.azure.com/org/proj/_git/repo/instructions/x`, where the
  primitive segment is the supported ADO virtual path after `_git/repo`.

The embedded-subpath shape is `org/repo` + `<primitive>/<name>`, so a
primitive segment now only fires when preceded by a complete org/repo
prefix (index >= 2), and ADO URLs (identified by the `_git` marker) are
skipped entirely. The canonical malformed form `org/repo/skills/<name>`
that #872 targets still hard-fails; no GitHub or GitLab repo path uses
`_git`, so real detection is unaffected. A residual deep-subgroup
ambiguity (`group/sub/skills/repo`) is documented in code.

Adds a regression test for the primitive-named subgroup (mutation-break
verified); the ADO false positive is covered by the existing
TestParseAdoUrls suite.

addresses CEO follow-up (DevX) on PR #1740; Refs #872

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel

Copy link
Copy Markdown
Collaborator Author

Shepherd-driver terminal advisory

CEO stance carried from the prior panel pass: ship_with_followups. The
recommended in-scope follow-ups are now folded; the one out-of-scope item
is deferred with a boundary note. Advisory only -- no label, approval, or
merge changes.

Conflict resolved against main

PR was CONFLICTING / DIRTY after it retargeted to main once #1571
(#1732) merged. Rebased onto origin/main preserving BOTH intents:

CHANGELOG keeps both entries. Final head a03ac65f.

Folded in this run

  • (panel) Supply Chain: download_gitlab_file no longer swallows
    PathTraversalError into the REST fallback -- a traversal/symlink-escape
    rejection now hard-fails instead of getting a second transport. Resolved
    in a7a5d444.
  • (panel) CLI Logging: the git->REST fallback was _debug-only; it now
    emits a verbose [i] note with the git failure reason, and the REST
    success notes attribute the GitLab REST API (410 triage). Resolved in
    a7a5d444.
  • (panel) DevX: scoped the Support installing individual skills/agents from monorepo packages (including private repo support) #872 _check_no_embedded_subpath heuristic --
    a primitive segment only fires when preceded by a complete org/repo
    prefix (index >= 2), and ADO _git URLs are skipped. Fixes the
    git@gitlab.com:group/skills/repo.git false positive AND a pre-existing
    ADO virtual-path false positive (dev.azure.com/org/proj/_git/repo/instructions/x)
    the full suite surfaced. Canonical org/repo/skills/<name> still
    hard-fails. Resolved in a03ac65f.
  • (panel) Test Coverage: two regression tests added (below), each
    mutation-break verified. Resolved in a7a5d444 / a03ac65f.

Regression-trap evidence (mutation-break gate)

  • test_path_traversal_in_git_not_swallowed_into_rest_fallback -- deleted
    the except PathTraversalError: raise guard; test FAILED as expected;
    guard restored.
  • test_gitlab_subgroup_named_after_primitive_no_false_positive -- reverted
    the idx >= 2 narrowing; test FAILED as expected; guard restored.

Copilot signals reviewed

  • None: copilot-pull-request-reviewer[bot] posted no review or inline
    comments on this PR (drained).

Deferred (out-of-scope follow-up)

Consumer-install confirmation

A path: dep on a non-default git host (SaaS or self-hosted GitLab) now
resolves the real host via git transport (sparse/partial checkout) rather
than defaulting to the github.com REST path; REST remains a thin
GITLAB_APM_PAT fallback. Verified by the integration cases in
tests/test_gitlab_git_transport.py (incl. the self-hosted 410 path).

Lint contract

ruff check src/ tests/ and ruff format --check src/ tests/ both silent;
pylint R0801 10.00/10; auth-signal lint clean.

CI

All required checks green on a03ac65f (run 27333260694), 0 CI recovery
iterations. deploy skips (non-default branch, expected).

Mergeability status

PR head SHA CEO stance iters folds defers Copilot rounds CI mergeable mergeStateStatus notes
#1740 a03ac65f ship_with_followups 1 4 1 1 green MERGEABLE BLOCKED awaiting required review

Conflict resolved, follow-ups folded, CI green. Ready for maintainer
review (BLOCKED reflects the required-review gate; no auto-merge).

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates APM’s GitLab path: file download behavior to be git-transport-first (sparse/partial checkout) instead of GitLab’s REST API, fixing self-hosted GitLab instances where the REST API returns HTTP 410. It also adds a parse-time guard that rejects embedded APM primitive subpaths inside explicit git URLs and points users to the path: key.

Changes:

  • Add fetch_file_via_git_sparse() to fetch single path: files using git fetch --filter=blob:none --depth=1 plus sparse checkout (when applicable).
  • Update GitLab file download strategy to try git transport first, then fall back to GitLab REST v4 API (PAT/token-based) with improved transport attribution in verbose output.
  • Add tests + documentation + changelog entries for the new GitLab behavior and the friendly “embedded subpath in git URL” error.
Show a summary per file
File Description
src/apm_cli/deps/git_file_transport.py New git sparse/partial single-file fetch implementation with containment checks.
src/apm_cli/deps/download_strategies.py Routes GitLab file download through git transport first, REST API fallback second; updates verbose messages.
src/apm_cli/models/dependency/reference.py Adds _check_no_embedded_subpath() guard to reject embedded primitive subpaths in explicit git URL forms.
tests/test_gitlab_git_transport.py New tests covering git-transport-first behavior, traversal rejection, symlink escape, and REST fallback.
tests/unit/test_generic_git_urls.py Updates existing URL-with-path rejection tests and adds new embedded-subpath guard coverage.
tests/unit/deps/test_download_strategies_selection.py Adjusts verbose-callback assertions for GitLab REST attribution.
tests/unit/deps/test_download_strategies_phase3.py Same as above for phase3 suite.
tests/integration/test_download_strategies_selection.py Updates assertions for REST API attribution message in verbose callback.
tests/integration/test_download_strategies_phase3w5.py Same as above for phase3w5 integration suite.
docs/src/content/docs/consumer/manage-dependencies.md Documents GitLab git-transport-first path: fetching with REST fallback.
docs/src/content/docs/consumer/authentication.md Documents GitLab behavior and GITLAB_APM_PAT fallback guidance.
packages/apm-guide/.apm/skills/apm-usage/dependencies.md Updates usage skill docs for GitLab path: transport behavior.
packages/apm-guide/.apm/skills/apm-usage/authentication.md Updates usage skill docs for GitLab auth and fallback behavior.
CHANGELOG.md Adds a user-visible fix entry for GitLab 410 + git-transport-first fetching.

Copilot's findings

  • Files reviewed: 14/14 changed files
  • Comments generated: 6

Comment thread src/apm_cli/deps/download_strategies.py Outdated
Comment on lines +671 to +677
except Exception as git_err:
_debug(f"git transport failed for {host}/{dep_ref.repo_url}/{file_path}: {git_err}")
if verbose_callback:
verbose_callback(
f"[i] git transport unavailable for {host}/{dep_ref.repo_url}/{file_path}; "
f"falling back to GitLab REST API ({git_err})"
)
Comment on lines +1063 to +1069
d.download_gitlab_file(self._dep(), "apm.yml", verbose_callback=callback)
callback.assert_called_once()
# Git transport is unmocked here and fails, so the callback fires for
# the fallback-transition note AND the REST-API success note. At least
# one call must attribute the REST API as the transport that answered
# (transport-attribution for 410 triage).
assert callback.called
assert any("REST API" in str(c.args[0]) for c in callback.call_args_list)
Comment on lines +1219 to +1225
d.download_gitlab_file(self._dep(), "apm.yml", verbose_callback=callback)
callback.assert_called_once()
# Git transport is unmocked here and fails, so the callback fires for
# the fallback-transition note AND the REST-API success note. At least
# one call must attribute the REST API as the transport that answered
# (transport-attribution for 410 triage).
assert callback.called
assert any("REST API" in str(c.args[0]) for c in callback.call_args_list)
Comment on lines +539 to +547
result = delegate.download_gitlab_file(dep, "README.md", verbose_callback=callback)

assert result == b"ok"
callback.assert_called_once_with("Downloaded file: gitlab.example.com/group/repo/README.md")
# Git transport is unmocked and fails, so a fallback-transition note
# fires before the REST-API success note. The success note attributes
# the GitLab REST API as the transport that answered (410 triage).
callback.assert_any_call(
"[i] Downloaded file via GitLab REST API: gitlab.example.com/group/repo/README.md"
)
Comment on lines +490 to +498
result = delegate.download_gitlab_file(dep, "README.md", verbose_callback=callback)

assert result == b"ok"
callback.assert_called_once_with("Downloaded file: gitlab.example.com/group/repo/README.md")
# Git transport is unmocked and fails, so a fallback-transition note
# fires before the REST-API success note. The success note attributes
# the GitLab REST API as the transport that answered (410 triage).
callback.assert_any_call(
"[i] Downloaded file via GitLab REST API: gitlab.example.com/group/repo/README.md"
)
Comment on lines +170 to +173
# Create a symlink inside the work dir pointing outside.
(work_dir / "agents").mkdir()
(work_dir / "agents" / "evil.agent.md").symlink_to(outside)

@danielmeppiel

Copy link
Copy Markdown
Collaborator Author

APM Review Panel: ship_with_followups

Git-transport-first for GitLab path: files eliminates REST 410 failures on self-hosted instances while hardening path containment with validate/ensure double-gate.

cc @sergio-sisternes-epam -- a fresh advisory pass is ready for your review.

All nine panelists converge on ship. The security model is sound: validate_path_segments fires before git touches disk, ensure_path_within fires after checkout, and PathTraversalError hard-fails without falling through to REST fallback -- confirmed by passing test evidence (tests/test_gitlab_git_transport.py::test_path_traversal_in_git_not_swallowed_into_rest_fallback). The only tension is how strongly to weight stderr-in-RuntimeError credential leakage. Supply-chain and auth both flag it; I weight it recommended because test evidence confirms stderr is embedded in the raised RuntimeError. Phase-A sanitized the current caller, but a future direct caller of git_file_transport could surface credentials. A one-line redact is cheap insurance on a secure-by-default surface.

Architecturally, the private-method injection concern is valid but low urgency: the module boundary is already clear via function-pointer injection, and tightening the public delegate seam can happen without changing the user contract. Performance batching and locked-SHA threading are correctly deferred -- they are next-iteration wins, not reasons to slow this GitLab 410 fix.

Dissent. Supply-chain and auth both flag GIT_TERMINAL_PROMPT=0; I weight it recommended because a 120s hang before fallback is user-visible and the fix is one env-var line.

Aligned with: secure by default: double-gate containment with hard-fail on traversal and no silent REST fallback bypass; multi-host support: self-hosted GitLab instances with REST API disabled now work via git transport; pragmatic as npm: invisible happy path with no new flags or config; portable by manifest: embedded-subpath parse-time guard catches malformed git URLs early

Growth signal. Strong launch-beat candidate: APM now works on self-hosted GitLab out of the box using existing SSH keys with no PAT required for the primary path. The embedded-subpath guard is a clean before/after DX story.

Panel summary

Persona B R N Takeaway
Python Architect 0 1 1 Clean stateless free-function module with proper containment; minor coupling nit on self._host._build_repo_url private-method access.
Cli Logging Expert 0 0 2 Verbose callbacks mix [i] prefixes inconsistently; git_file_transport has no verbose instrumentation for agent-mode diagnostics.
Devx Ux Expert 0 0 2 Git-transport-first is invisible on happy path; embedded-subpath guard gives actionable error with correct fix syntax. Ship.
Supply Chain Security Expert 0 1 1 Containment model is sound: validate_path_segments before git, ensure_path_within after checkout, PathTraversalError blocks fallback probe. Two recommended hardening items.
Oss Growth Hacker 0 0 2 Good story-shaped feature for GitLab users; CHANGELOG entry is release-ready; docs update the auth page correctly.
Auth Expert 0 1 1 Auth delegation is correct (build_repo_url_fn + git_env reuse); PathTraversalError hard-fail prevents fallback bypass; minor stderr credential leak risk in error path.
Doc Writer 0 1 2 Docs are accurate and git-transport-first framing is correct; consolidate duplicated GitLab path: transport prose into one source-of-truth page.
Test Coverage Expert 0 0 1 All critical surfaces covered: git transport, PathTraversalError non-swallow, embedded-subpath parse guard, credential-leak guard in fallback diagnostics. Ship.
Performance Expert 0 2 1 Git-transport-first is the correct perf direction; per-file init+fetch is acceptable for current path-file volumes; same-repo batching is the clear next-step win.

B = blocking-severity findings, R = recommended, N = nits.
Counts are signal strength, not gates. The maintainer ships.

Top 5 follow-ups

  1. [Supply Chain Security Expert] Sanitize stderr before embedding in RuntimeError to prevent credential echo from older git versions -- Test evidence confirms stderr is included in the raised error. Phase-A sanitized the current caller, but the module is reusable and a future direct caller could surface auth URLs.
  2. [Auth Expert] Set GIT_TERMINAL_PROMPT=0 in git subprocess env to prevent interactive credential hangs -- Without it, failed SSH auth can hang 120s before REST fallback fires. Single env-var fix eliminates a real user-visible timeout on misconfigured hosts.
  3. [Doc Writer] Consolidate GitLab transport/fallback prose into authentication.md as single source of truth -- Duplicated fallback details across authentication.md and manage-dependencies.md can drift on the next auth change. Trim manage-dependencies to mechanism plus link.
  4. [Python Architect] Expose build_repo_url as a public DownloadDelegate method instead of injecting a private-method reference -- Functional now, but private-method cross-boundary access can confuse contributors tracing the call graph.
  5. [Performance Expert] Batch path: files sharing (url, ref) into one sparse clone (deferred by design) -- N fetches for N files from the same repo is the clear next perf win once the transport is proven stable in production.

Architecture

classDiagram
    direction LR
    class GitHubPackageDownloader {
      <<Facade>>
      +_strategies: DownloadDelegate
      +_build_repo_url(repo_ref, ...) str
      +git_env: dict
    }
    class DownloadDelegate {
      <<Delegate>>
      +_host: GitHubPackageDownloader
      +download_gitlab_file(dep_ref, file_path, ref) bytes
      +build_repo_url(repo_ref, ...) str
    }
    class GitFileTransport {
      <<Module>>
      +fetch_file_via_git_sparse(dep_ref, file_path, ref, build_repo_url_fn, git_env) bytes
    }
    class DependencyReference {
      <<ValueObject>>
      +host: str
      +repo_url: str
      +_check_no_embedded_subpath(url) None
    }
    class PathSecurity {
      <<Module>>
      +validate_path_segments(path, context)
      +ensure_path_within(path, base_dir)
    }
    GitHubPackageDownloader *-- DownloadDelegate : owns
    DownloadDelegate ..> GitFileTransport : calls primary
    GitFileTransport ..> PathSecurity : validates
    GitFileTransport ..> DependencyReference : reads
    DownloadDelegate ..> PathSecurity : re-raises traversal
    DependencyReference ..> DependencyReference : checks embedded subpaths
Loading
flowchart TD
    A[DownloadDelegate.download_gitlab_file] --> B{fetch_file_via_git_sparse}
    B --> C[validate_path_segments before git]
    C -->|clean| D[temp work tree]
    D --> E[git init and remote add]
    E --> F{file in subdirectory}
    F -->|yes| G[sparse-checkout cone parent]
    F -->|no| H[skip cone]
    G --> I[git fetch blob:none depth 1]
    H --> I
    I --> J[git checkout FETCH_HEAD]
    J --> K[ensure_path_within target]
    K -->|contained| L[read bytes]
    K -->|escape| X[raise PathTraversalError, no fallback]
    B -->|other failure| R[GitLab REST fallback]
    R --> S[return REST bytes]
Loading
sequenceDiagram
    participant User as apm install
    participant DD as DownloadDelegate
    participant GFT as git_file_transport
    participant Git as git subprocess
    participant REST as GitLab REST v4
    User->>DD: download_gitlab_file(dep, path, ref)
    DD->>GFT: fetch_file_via_git_sparse(dep, path, ref)
    GFT->>Git: sparse partial checkout
    alt git succeeds
        GFT-->>DD: bytes
        DD-->>User: file bytes
    else git fails (not traversal)
        GFT-->>DD: RuntimeError
        DD->>REST: raw file request
        REST-->>DD: bytes
        DD-->>User: file bytes
    else traversal or symlink escape
        GFT-->>DD: PathTraversalError
        DD-->>User: propagate error
    end
Loading

Recommendation

Merge now. CI is green, all critical security surfaces are test-covered, and the feature unblocks a real class of self-hosted GitLab users. Track stderr sanitization and GIT_TERMINAL_PROMPT as fast follow-up issues before the next minor release.


Full per-persona findings

Python Architect

  • [recommended] fetch_file_via_git_sparse accesses self._host._build_repo_url via cross-boundary injection at src/apm_cli/deps/download_strategies.py:659
    DownloadDelegate passes self._host._build_repo_url as build_repo_url_fn. Prefer a public delegate method or strategies method to avoid coupling the new module to downloader internals.
    Suggested: Expose a build_repo_url on DownloadDelegate or pass the strategies method directly.
  • [nit] _APM_PRIMITIVE_DIRS type annotation is redundant at src/apm_cli/models/dependency/reference.py:201
    The bare frozenset annotation could be frozenset[str] for consistency, or omitted.
    Suggested: Use _APM_PRIMITIVE_DIRS: frozenset[str] = frozenset({...}).

Cli Logging Expert

  • [nit] Inconsistent [i] prefix between git-success and REST-fallback verbose_callback messages at src/apm_cli/deps/download_strategies.py:663
    Git success emits no prefix while fallback and REST success prepend [i]. Existing downloader callbacks usually leave symbol rendering to the caller.
    Suggested: Standardize verbose_callback payloads in a follow-up.
  • [nit] git_file_transport.py emits no verbose/debug output for git command plan at src/apm_cli/deps/git_file_transport.py
    The module runs several git subprocesses but gives no plan/timing breadcrumbs for APM_DEBUG users. The fallback catch gives coarse signal, so this is polish.
    Suggested: Add safe _debug output summarizing sparse cone/ref/host without auth URLs.

Devx Ux Expert

  • [nit] Embedded-subpath error message bakes [x] into a ValueError string at src/apm_cli/models/dependency/reference.py:697
    [x] is a rendering-layer convention. If another caller wraps the ValueError, users could see duplicate symbols. Current paths render directly, so this is not urgent.
  • [nit] Git success verbose callback lacks the [i] prefix used by adjacent GitLab fallback messages at src/apm_cli/deps/download_strategies.py:663
    Inconsistent prefixing in a single method can make verbose output look uneven.
    Suggested: Standardize callback payloads at the caller layer or within this method.

Supply Chain Security Expert

  • [recommended] git stderr in RuntimeError may echo token-bearing remote URL at src/apm_cli/deps/git_file_transport.py:123
    build_repo_url_fn can return auth-embedded URLs. The current caller swallows the error after Phase-A sanitization, but a future direct caller could surface stderr.
    Suggested: Sanitize stderr before embedding, or omit stderr from the raised message.
  • [nit] Set GIT_TERMINAL_PROMPT=0 in subprocess env to prevent credential-helper hangs at src/apm_cli/deps/git_file_transport.py:81
    If git cannot authenticate silently, it may prompt and then time out. This is robustness, not a secret leak.
    Suggested: Set git_env = {**git_env, 'GIT_TERMINAL_PROMPT': '0'} before subprocess.run.

Oss Growth Hacker

  • [nit] Embedded-subpath error message could link to docs for higher conversion at src/apm_cli/models/dependency/reference.py
    The Support installing individual skills/agents from monorepo packages (including private repo support) #872 error is actionable, but a docs URL would turn confusion into docs discovery and reduce repeat issues.
    Suggested: Append a short docs URL to the error string.
  • [nit] CHANGELOG entry leads with implementation rather than user benefit at CHANGELOG.md
    The entry is accurate but could be stronger for release notes if it opens with the self-hosted GitLab benefit.
    Suggested: Consider swapping benefit and mechanism at release cut.

Auth Expert

  • [recommended] git_file_transport.py does not set GIT_TERMINAL_PROMPT=0, risking a hang on interactive credential prompt at src/apm_cli/deps/git_file_transport.py:81
    Non-interactive git operations should suppress terminal prompts. If SSH auth fails and no credential helper resolves, git may wait until the 120s timeout before REST fallback.
    Suggested: Set GIT_TERMINAL_PROMPT=0 in git_env.
  • [nit] result.stderr in RuntimeError may contain auth URL echoed by older git at src/apm_cli/deps/git_file_transport.py:122
    Modern git usually redacts, but APM supports git 2.25+ and older behavior may echo credentials. Phase-A sanitized the caller path, but direct future callers should be protected.
    Suggested: Redact token-bearing URL patterns or cap/suppress stderr.

Doc Writer

  • [recommended] GitLab path: transport and GITLAB_APM_PAT fallback are restated in both authentication and manage-dependencies docs at docs/src/content/docs/consumer/manage-dependencies.md:116
    Credential/fallback behavior belongs primarily in authentication.md; manage-dependencies.md can state the fetch mechanism and link out. Duplicating fallback details risks future drift.
    Suggested: Trim manage-dependencies to fetch-mechanism and containment, then link to authentication.md.
  • [nit] Extra blank lines before the GHE Cloud heading in the packaged guide at packages/apm-guide/.apm/skills/apm-usage/authentication.md
    Cosmetic whitespace; one blank line between sections is the house style.
    Suggested: Collapse to one blank line before the heading.
  • [nit] Phrase 'is the fix for' reads like release-note voice in user docs at docs/src/content/docs/consumer/manage-dependencies.md:119
    Docs should state the capability in present-tense reader language rather than narrating the PR.
    Suggested: Use capability voice.

Test Coverage Expert

  • [nit] Unit-level TestDownloadGitlabFile lacks its own PathTraversalError non-swallow assertion at tests/unit/deps/test_download_strategies_selection.py
    The behavior is covered through the GitHubPackageDownloader facade. A direct unit assertion would make the signal faster during refactors, but current integration-tier coverage is adequate.
    Suggested: Add a direct DownloadDelegate test for PathTraversalError propagation and no REST fallback.

Performance Expert

  • [recommended] Each path: file invokes a fresh sparse clone; same-repo batching is the next perf win at src/apm_cli/deps/git_file_transport.py:83
    Multiple path: files from the same host/repo/ref currently pay N fetches. The PR scope explicitly defers batching, which is the right boundary, but it should be next.
    Suggested: Follow-up: batch path: files sharing (url, ref) into one sparse clone.
  • [recommended] fetch_file_via_git_sparse does not consume a locked SHA when one is already known at src/apm_cli/deps/git_file_transport.py:108
    Passing branch names works, but the install pipeline may already know the SHA. Threading that through would reduce future ref-resolution work, especially once batching exists.
    Suggested: Accept optional locked_sha and fetch that when available.
  • [nit] TemporaryDirectory context manager would simplify cleanup at src/apm_cli/deps/git_file_transport.py:83
    Using tempfile.TemporaryDirectory is shorter and avoids ignore_errors hiding cleanup issues.
    Suggested: Replace mkdtemp/finally/rmtree with tempfile.TemporaryDirectory.

This panel is advisory. It does not block merge. Re-apply the panel-review label after addressing feedback to re-run.

@danielmeppiel

Copy link
Copy Markdown
Collaborator Author

APM Review Panel: ship_now

GitLab path: files now fetch over sparse/partial git transport by default, with secure path containment, token-safe diagnostics, bounded same-run checkout reuse, and green CI.

cc @sergio-sisternes-epam -- a fresh advisory pass is ready for your review.

The panel's prior follow-ups were folded into this PR: stderr redaction now covers HTTP(S), git prompts are suppressed, option-like refs hard-fail before subprocess execution, sparse paths use --, same-repo/ref files reuse a locked cached transport, docs and CHANGELOG are aligned, and the PR body reflects the final design. Focused tests cover the real git fixture path plus delegate fallback/security routing, and CI is green.

Aligned with: secure by default: traversal and ref-injection attempts fail closed; multi-host support: self-hosted GitLab works without REST; pragmatic as npm: if git clone works, apm install works; portable by manifest: path: remains the explicit file contract.

Panel summary

Persona B R N Takeaway
Python Architect 0 0 0 Transport boundary, cache lifecycle, and concurrency model are coherent.
Cli Logging Expert 0 0 0 User-facing messages are actionable; debug-only transport detail is bounded.
Devx Ux Expert 0 0 0 Happy path is invisible; malformed git URL errors point at path:.
Supply Chain Security Expert 0 0 0 Path containment, token redaction, ref hard-fail, and REST fallback boundaries are defended.
Oss Growth Hacker 0 0 0 Release story is clear: self-hosted GitLab path installs now just work.
Auth Expert 0 0 0 Auth env reuse, prompt suppression, and PAT fallback are correct.
Doc Writer 0 0 0 Docs, packaged guide, CHANGELOG, and PR body match the implementation.
Test Coverage Expert 0 0 0 Critical paths have unit, delegate, concurrency, and real-git fixture coverage.
Performance Expert 0 0 0 blob:none, file-level sparse paths, and same-run reuse are appropriate for this scope.

B = blocking-severity findings, R = recommended, N = nits.
Counts are signal strength, not gates. The maintainer ships.

Recommendation

Ship now. All in-scope follow-ups from the prior passes have been folded, the branch is clean, local lint/tests pass, and PR checks report 14 successful / 1 skipped / 0 pending.


Full per-persona findings

No findings remain after folded follow-ups.

This panel is advisory. It does not block merge. Re-apply the panel-review label after addressing feedback to re-run.

danielmeppiel and others added 9 commits June 11, 2026 15:46
…t API

For GitLab-sourced deps (SaaS or self-hosted), path: file fetches now go
through git sparse/partial checkout (--filter=blob:none + cone sparse-checkout)
rather than the host REST API. This is the primary path and fixes self-hosted
GitLab instances that return HTTP 410 (REST API disabled), because git
transport reuses the same SSH keys and git credential helpers already in use
for clones.

Design (ratified by board decision):
- git-transport-first: fetch_file_via_git_sparse() in the new
  deps/git_file_transport.py module performs git init + remote add +
  sparse-checkout init --cone + fetch --filter=blob:none --depth=1 + checkout
- GITLAB_APM_PAT fallback: if git transport raises, DownloadDelegate.download_gitlab_file()
  falls back to the existing GitLab REST v4 API (mirrors ADO_APM_PAT pattern)
- Path containment: ensure_path_within() is applied to the materialized file
  path after checkout, rejecting symlink/traversal escapes from cloned repos
- validate_path_segments() rejects .. traversal sequences before any git work
- Reuses the existing clone auth environment (git_env from the downloader);
  no second origin or cached clone

TDD coverage (tests/test_gitlab_git_transport.py):
- Acceptance: git-sourced dep with path: fetches file via git (no REST API call)
- Self-hosted GitLab API-410 no longer fails
- ensure_path_within() rejects symlink-escaping path: after checkout
- validate_path_segments() rejects .. traversal before git work
- GITLAB_PAT fallback path exercised when git transport raises
- Root-level files skip cone setup (no sparse-checkout init)
- git failure raises RuntimeError with descriptive message
- File missing after checkout raises RuntimeError

Docs: updated consumer/authentication.md, consumer/manage-dependencies.md,
apm-usage/authentication.md, apm-usage/dependencies.md, and CHANGELOG.md.

Stacked on #1732 (#1571). Closes #1014.

Credit: @cossons for root-cause analysis of the self-hosted GitLab 410 repro
and the ratified design (git-transport-first + thin PAT fallback).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds a parse-time guard that detects when a user embeds an APM primitive
subpath inside an explicit git URL form (SCP git@host:path, ssh://, or
https://), e.g.:

  git: git@github.com:org/repo/skills/hello-world.git

Such URLs cause git to fail later with a cryptic 'not a valid repository
name' error. The guard detects any known APM primitive directory name
(skills, agents, prompts, instructions, chatmodes, collections, contexts,
memory) appearing as a non-last interior path segment of an explicit git
URL, and raises a friendly actionable error pointing at the supported
`path:` key instead:

  [x] A subpath cannot be embedded in a git URL. Use the `path:` key
  instead: `git: <repo-url>` + `path: <primitive>/<name>`
  (or the shorthand `org/repo/<primitive>/<name>`).

The monorepo-skill-install capability already ships via the `path:` key
and the bare shorthand (e.g. `org/repo/skills/hello-world`). No new
install feature is added.

False-positive check:
- GitLab subgroups (git@gitlab.com:group/subgroup/repo.git): safe, no
  primitive segment in interior
- Azure DevOps org/project/repo forms: safe, ADO paths never use
  primitive names
- Plain valid git URLs (2-segment path): safe, length guard returns early
- Bare shorthand with subpath (org/repo/skills/foo): safe, not an
  explicit git URL form

The guard is implemented as DependencyReference._check_no_embedded_subpath
and called from DependencyReference.parse() after the shorthand-alias and
fqdn-conflict guards, before virtual-package detection.

Refs #872
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two panel follow-ups on the #1014 GitLab git-transport surface.

Supply-chain: download_gitlab_file caught a bare `except Exception`
around the git-transport attempt, so a PathTraversalError raised by
the path-validation / symlink-containment guards in
fetch_file_via_git_sparse was silently swallowed and the request fell
through to the REST transport. A rejected traversal must not gain a
second transport to probe. PathTraversalError now propagates unchanged;
only genuine transport failures fall back to REST.

CLI logging: the git->REST fallback transition was `_debug`-only, so
triagers had no signal which transport answered a 410. The fallback now
emits a verbose `[i]` note carrying the git failure reason, and the REST
success notes attribute the GitLab REST API explicitly.

Adds a regression test asserting a PathTraversalError in the git path is
not swallowed into the REST fallback (mutation-break verified). Updates
existing verbose-callback tests that exercised the unmocked-git fallback
path to the new transport-attributed messages.

addresses CEO follow-up (Supply Chain) and (CLI Logging) on PR #1740

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The #872 guard `_check_no_embedded_subpath` flagged any non-last path
segment matching an APM primitive directory name, which produced two
false positives on legitimate URLs:

- A GitLab subgroup literally named after a primitive, e.g.
  `git@gitlab.com:group/skills/repo.git`, where `skills` is a subgroup
  at index 1 and `repo` is the real repository.
- Azure DevOps virtual-path URLs, e.g.
  `https://dev.azure.com/org/proj/_git/repo/instructions/x`, where the
  primitive segment is the supported ADO virtual path after `_git/repo`.

The embedded-subpath shape is `org/repo` + `<primitive>/<name>`, so a
primitive segment now only fires when preceded by a complete org/repo
prefix (index >= 2), and ADO URLs (identified by the `_git` marker) are
skipped entirely. The canonical malformed form `org/repo/skills/<name>`
that #872 targets still hard-fails; no GitHub or GitLab repo path uses
`_git`, so real detection is unaffected. A residual deep-subgroup
ambiguity (`group/sub/skills/repo`) is documented in code.

Adds a regression test for the primitive-named subgroup (mutation-break
verified); the ADO false positive is covered by the existing
TestParseAdoUrls suite.

addresses CEO follow-up (DevX) on PR #1740; Refs #872

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Avoid echoing raw git transport exceptions during the GitLab REST fallback and keep fallback tests hermetic by mocking the git transport failure path. Skip the symlink containment test when the platform cannot create symlinks.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel danielmeppiel force-pushed the danielmeppiel/1014-gitlab-git-transport branch from e87567a to 96cfb00 Compare June 11, 2026 13:51
main tightened the file-length guardrail from 2450 to 2100 lines (Stage 1);
after merging main, reference.py was 2191 lines. Move the pure, stateless
identity helpers (build_dependency_unique_key, build_canonical_dependency_string,
_path_segment_pattern, the semver-range guards + segment-pattern constants) into
a sibling models/dependency/identity.py and re-export them from reference.py so
deps/lockfile.py and install/phases/resolve.py import sites stay unchanged.
Behavior-preserving; reference.py now 2099 lines (under guardrail).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel

Copy link
Copy Markdown
Collaborator Author

APM Review Panel: ship_now

Git-transport-first GitLab path: fetch eliminates host-API dependency, unlocking self-hosted GitLab without a PAT for the common case.

cc @sergio-sisternes-epam -- a fresh advisory pass is ready for your review.

All nine panelists converge on shipability. Zero blocking-severity findings. The architecture is clean, security surfaces are guarded and tested, auth reuses the existing credential pipeline without introducing a second origin, and the performance shape is correct: blob:none, depth=1, and sparse-checkout reuse.

The most substantive recommendation is future polish around missing-file fallback noise: the current behavior still resolves to success or a clear terminal error, but a typed not-found error could avoid an extra REST round trip later. Test coverage is strong at the transport and delegate layers; a full apm install e2e would be additional belt-and-suspenders coverage, not a ship concern.

Strategically, this PR directly reinforces APM's "if git clone works, apm install works" promise. The identity.py extraction keeps reference.py under the file-length guardrail after rebase, and the #872 DX guard prevents a common user mistake before git runs.

Aligned with: Secure by default: path validation before git, containment after checkout, credential redaction, and no shell=True. Multi-host support: GitLab path deps now use git transport while REST remains secondary. Pragmatic as npm: the zero-config happy path matches users' git mental model. OSS community-driven: closes #1014, preserves #872 scope, and credits @cossons.

Growth signal. Self-hosted GitLab is an enterprise wedge: the no-token-needed git-transport story differentiates APM from tools that require host API credentials. The explicit @cossons contributor credit also strengthens the contributor funnel.

Panel summary

Persona B R N Takeaway
Python Architect 0 2 1 Clean transport extraction and correct layering; identity helper split is sound. Architecture is shippable with minor maintainability notes.
CLI Logging Expert 0 0 1 CLI/logging hygiene is solid: token redaction is tested, prompts are suppressed, #872 error is actionable, and output remains ASCII-safe.
DevX UX Expert 0 2 1 Strong package-manager UX: the happy path matches "if git clone works, apm install works" and the #872 guard gives an actionable fix before git runs.
Supply Chain Security Expert 0 0 1 Security posture is sound: pre-git validation, post-checkout containment, no shell=True, ref option guard, credential redaction, and no fallback on path-security failures.
OSS Growth Hacker 0 1 1 Adoption-positive PR: self-hosted GitLab becomes a no-token-needed story, contributor credit is explicit, and docs/changelog are reader-facing.
Auth Expert 0 0 2 Auth design is correct: git transport reuses build_repo_url/AuthResolver credentials, suppresses prompts, redacts tokens, and keeps REST PAT fallback secondary.
Doc Writer 0 1 1 Docs accurately reflect git-transport-first GitLab path: fetch with PAT as REST fallback; minor non-bloat trim available. Ship.
Test Coverage Expert 0 0 1 Critical security/containment surfaces are well-covered with mutation-break-credible traps; only optional gap is a full CLI e2e for GitLab git-transport path deps.
Performance Expert 0 3 0 The performance shape is right: blob:none, depth=1, file-level sparse paths, and same-run checkout reuse avoid full clone and extra API round trips.

B = blocking-severity findings, R = recommended, N = nits.
Counts are signal strength, not gates. The maintainer ships.

Top 3 follow-ups

  1. [DevX UX Expert] Introduce a typed GitFileTransportFileNotFound to avoid double-error noise on fallback -- prevents an extra REST round trip and gives users a single, clear failure message when a file genuinely does not exist in the ref.
  2. [Test Coverage Expert] Add an e2e apm install fixture covering GitLab path: dep materialization -- belt-and-suspenders coverage for the full install path; the transport and delegate layers are already integration-tested.
  3. [DevX UX Expert] Surface git-to-REST fallback reason at --verbose level, not only APM_DEBUG -- users troubleshooting slow installs with --verbose deserve to see why REST was tried without the debug env var.

Architecture

classDiagram
    direction LR
    class DownloadDelegate {
      <<Strategy>>
      +download_gitlab_file()
      -_download_gitlab_file_via_git()
      -_git_file_transports dict
    }
    class GitSparseFileTransport {
      <<ReusableResource>>
      +fetch_file(file_path) bytes
      +close()
      -_ensure_checkout()
      -_run(cmd)
    }
    class DependencyReference {
      <<ValueObject>>
      +host
      +repo_url
      +virtual_path
    }
    class IdentityHelpers {
      <<PureModule>>
      +build_dependency_unique_key()
      +build_canonical_dependency_string()
    }
    class PathSecurity {
      <<Guard>>
      +validate_path_segments()
      +ensure_path_within()
    }
    DownloadDelegate *-- GitSparseFileTransport : caches per repo/ref
    DownloadDelegate ..> DependencyReference : reads
    GitSparseFileTransport ..> PathSecurity : guards
    DependencyReference ..> IdentityHelpers : delegates identity
Loading
flowchart TD
    A[apm install resolves GitLab path dep] --> B[DownloadDelegate.download_gitlab_file]
    B --> C[GitSparseFileTransport cached per host/repo/ref/port]
    C --> D[validate_path_segments before git]
    D --> E[git init + sparse-checkout --no-cone]
    E --> F[git fetch --filter=blob:none --depth=1]
    F --> G[git checkout FETCH_HEAD]
    G --> H[ensure_path_within then read bytes]
    H --> I[return file content]
    C --> J[transport failure]
    J --> K[secondary GitLab REST fallback]
    D --> L[path/security failure: no fallback]
Loading

Recommendation

All panelists agree, no blocking-severity findings exist, security and containment are tested, and the PR directly advances APM's core positioning. Merge at maintainer discretion. The three follow-ups above are post-merge quality improvements worth tracking in issues but do not hold this ship.


Full per-persona findings

Python Architect

  • [recommended] Document close() lock ordering invariant for future maintainers at src/apm_cli/deps/git_file_transport.py:105
    close() waits for active fetches via _state, then cleans up under _lock. Current ordering is safe, but a short comment would keep future changes from inverting lock order.
  • [recommended] Clarify intended public API of identity.py helper extraction at src/apm_cli/models/dependency/identity.py:1
    reference.py imports underscore-prefixed helper symbols from identity.py for back-compat. This is intentional extraction, but an __all__ or comment would clarify which names are meant to remain re-exported.
  • [nit] Finalizer cleanup path could use a one-line explanatory comment at src/apm_cli/deps/download_strategies.py:54
    weakref.finalize is correct but subtle; a comment would help readers understand the dict capture.

CLI Logging Expert

  • [nit] Verbose success wording differs from REST success wording at src/apm_cli/deps/download_strategies.py:727
    Git path success says "Fetched file via git transport" while older paths use nearby download/fetch phrasing. This is harmless but could be aligned for log grepping.

DevX UX Expert

  • [recommended] Missing file after sparse checkout may fall through to REST fallback and produce double-error noise at src/apm_cli/deps/git_file_transport.py:131
    A file absent in the git ref currently raises RuntimeError, which download_gitlab_file catches as fallback-eligible. This can cost an extra REST call and blur the primary error.
    Suggested: Consider a dedicated GitFileTransportFileNotFound subclass and delegate handling if this becomes noisy.
  • [recommended] Verbose fallback path is debug-only rather than verbose-visible at src/apm_cli/deps/download_strategies.py:737
    Fallback from git transport to REST is logged through _debug / APM_DEBUG. Users running with --verbose but not APM_DEBUG will not see why REST was tried.
    Suggested: Optionally call verbose_callback before REST fallback.
  • [nit] Timeout error could include the 120s cap at src/apm_cli/deps/git_file_transport.py:193
    A timeout message that includes the cap is more actionable for CI users.

Supply Chain Security Expert

  • [nit] Credential redaction focuses on HTTP(S) URL credentials at src/apm_cli/deps/git_file_transport.py:58
    The current regex matches the PR's token-bearing HTTPS stderr risk. A broader future pass could redact common token literals outside URLs, but no current leak was found.

OSS Growth Hacker

  • [recommended] CHANGELOG Added line could lead more with user benefit at CHANGELOG.md:12
    The embedded-subpath guard entry is already clear; slightly more user-outcome wording would make release notes more quotable.
  • [nit] Docs could visually highlight the zero-config GitLab promise at docs/src/content/docs/consumer/authentication.md:50
    A callout around "If git clone works, apm install works" would help skimmers, but the text is already present and accurate.

Auth Expert

  • [nit] Redaction regex could accept slash characters in credential segment at src/apm_cli/deps/git_file_transport.py:61
    GitLab tokens do not need this for the primary scope, but [^@\s]+@ would be a little more defensive than [^/@\s]+@.
  • [nit] GIT_ASKPASS='echo' could use a comment at src/apm_cli/deps/git_file_transport.py:82
    This portable prompt-suppression choice is correct but non-obvious to future maintainers.

Doc Writer

  • [recommended] consumer/authentication.md repeats "if git clone works, apm install works" in the same section at docs/src/content/docs/consumer/authentication.md:50
    The bold lead line already states the promise; the paragraph repeats it. Trimming the repeat would keep the docs lean without losing meaning.
    Suggested: Drop the trailing repeat clause and keep the bold lead.
  • [nit] Support installing individual skills/agents from monorepo packages (including private repo support) #872 target doc page does not name the embedded-subpath anti-pattern at docs/src/content/docs/consumer/manage-dependencies.md:114
    The parser error is self-contained and the page documents the path: remedy, but a one-line note would improve discoverability.

Test Coverage Expert

  • [nit] No full apm install CLI e2e covers GitLab path: dep materialization at tests/integration/test_install_gitlab_git_transport.py
    Transport and delegate integration coverage is strong enough for this PR; an e2e install test would be additional belt-and-suspenders coverage.
    Suggested: Consider a future e2e fixture invoking apm install against a local git repo classified as GitLab.
    Proof (missing at): tests/integration/test_install_gitlab_git_transport.py::test_apm_install_gitlab_path_dep_materializes_file -- proves: Full apm install with a GitLab path dependency produces the installed file on disk [vendor-neutral,devx]
    assert (install_dir / 'expected_file.md').exists()

Performance Expert

  • [recommended] Incremental sparse expansion re-checkouts after late-arriving paths at src/apm_cli/deps/git_file_transport.py:172
    If paths arrive after the first checkout, the transport updates sparse paths and re-checkouts. Same-run prefetch covers the common parallel case, so this is acceptable for expected small path counts.
  • [recommended] Transport lock serializes same-repo path fetches by design at src/apm_cli/deps/git_file_transport.py:126
    Serializing sparse-checkout mutation is necessary for correctness. This caps parallelism within one repo/ref but avoids working-tree races.
  • [recommended] Transport construction occurs under the delegate dictionary lock at src/apm_cli/deps/download_strategies.py:673
    The current constructor is cheap and network-free, so contention risk is low. A future per-key future pattern is only needed if construction becomes expensive.

This panel is advisory. It does not block merge. Re-apply the panel-review label after addressing feedback to re-run.

@danielmeppiel danielmeppiel merged commit 12d288f into main Jun 11, 2026
12 checks passed
@danielmeppiel danielmeppiel deleted the danielmeppiel/1014-gitlab-git-transport branch June 11, 2026 15:38
sergio-sisternes-epam pushed a commit that referenced this pull request Jun 13, 2026
Sync the Stage 2 complexity/file-length refactor branch with main's 22
feature commits (Hermes #1726, Kiro IDE #1741, multi-host dep identity
#1735, same-repo remote path deps #1732, git_file_transport #1740,
revision pins #1738, marketplace sourceBase/source parity/inherit
description #1736/#1739/#1755, pack --archive .zip #1720, mcp optional
registry inputs #1734, and the v0.19.0/v0.20.0 releases).

Conflict resolution preserved both sides: main's new features ported
through the branch's extracted sibling modules, branch's tightened ruff
thresholds (max-statements=120, max-branches=40, max-complexity=35,
max-returns=8, max-args=12) and 800-line file limit retained.

All 7 CI-mirror lint gates pass; full unit suite green (17099 passed).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] fetch path from gitlab repository using git clone instead of API

2 participants