Skip to content

fix(lifecycle): make script executor + lifecycle tests pass on Windows#1947

Merged
danielmeppiel merged 4 commits into
mainfrom
danielmeppiel-fix-windows-ci-tests
Jun 28, 2026
Merged

fix(lifecycle): make script executor + lifecycle tests pass on Windows#1947
danielmeppiel merged 4 commits into
mainfrom
danielmeppiel-fix-windows-ci-tests

Conversation

@danielmeppiel

@danielmeppiel danielmeppiel commented Jun 28, 2026

Copy link
Copy Markdown
Collaborator

TL;DR

The lifecycle / script-executor test suite fails only on Windows CI (latest main run is red on the Build & Test (windows-latest) job). This PR fixes two genuine cross-platform source bugs plus three Windows-only test-portability bugs.

Root causes

Source bugs (real Windows defects, not just test issues):

  1. scripts.log always empty on Windows. _append_to_script_log opens the log, then os.fstats it and rejects the fd if _st.st_mode & 0o077 (a POSIX "world/group readable" tamper check). On Windows, os.fstat reports 0o666/0o444-style modes whose 0o077 bits are always set, so the code unlinked + O_EXCL-recreated a fresh file and then returned without writing — leaving an empty scripts.log. This broke every test that asserts on logged content (event=..., status=..., type=http, etc.). Fixed by gating the permission-bit check behind os.name == "posix".
  2. _resolve_cwd containment used a hardcoded /. The str(resolved).startswith(str(root_resolved) + "/") check never matches Windows \-separated paths, so valid relative cwds were clamped to the project root. Fixed with os.sep.

Test-portability bugs (Windows-only):

  1. test_lifecycle_trust_gate._write_apm_yml built apm.yml by hand; Windows backslash paths (C:\Users\...) broke both YAML double-quoted escapes (\U...) and the embedded Python string literal. Now emits via yaml.safe_dump + a raw Python string.
  2. test_script_executors had hardcoded POSIX path-literal assertions (/my/project/scripts, /custom/apm/logs/..., /absolute/path) — replaced with platform-agnostic Path/os.path equivalents.
  3. test_lifecycle_executor_paths used sleep 5 (not a Windows shell builtin) for the timeout path — now uses a portable python -c "time.sleep(5)".
  4. test_lifecycle_scripts ..._prefers_bash_on_unix didn't pin the platform, so on Windows effective_command correctly preferred command — now pins platform.system to Linux.

Validation

  • tests/unit/core/test_script_executors.py, test_lifecycle_scripts.py, tests/unit/install/test_lifecycle_trust_gate.py, tests/unit/test_lifecycle_executor_paths.py: 136 passed on macOS.
  • Full local lint gate green: ruff check, ruff format --check, pylint R0801 (10.00/10), and scripts/lint-auth-signals.sh.
  • Windows matrix verified by dispatching build-release.yml against this branch.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

apm-spec-waiver: restores PR #1137 cache-pin (.apm-pin) self-heal regressed by #1926; APM-internal install audit hardening, not OpenAPM v0.1 normative surface

Two cross-platform source bugs and three test-portability bugs caused
the lifecycle/script-executor suite to fail only on Windows CI:

- _append_to_script_log dropped every write on Windows: the POSIX
  world-readable tamper check (st_mode & 0o077) is always truthy for
  Windows os.fstat modes, so the log self-healed to a fresh O_EXCL file
  and returned without writing, leaving scripts.log empty. Gate the
  permission-bit check behind a POSIX check.
- _resolve_cwd containment used a hardcoded '/' separator, clamping
  valid relative cwds to the project root on Windows. Use os.sep.
- test_lifecycle_trust_gate: emit apm.yml via yaml.safe_dump and a raw
  Python string so Windows backslash paths survive YAML + Python parsing.
- test_script_executors: replace hardcoded POSIX path-literal assertions
  with platform-agnostic Path/os.path equivalents.
- test_lifecycle_executor_paths: use a portable python sleep instead of
  the non-Windows 'sleep' builtin for the timeout path.
- test_lifecycle_scripts: pin platform.system to Linux for the unix
  effective_command test.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 28, 2026 20:10

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses Windows-only CI failures in the lifecycle/script executor suite by fixing cross-platform path/permission handling in the runtime code and making several unit tests platform-agnostic.

Changes:

  • POSIX-gate the scripts.log permission-bit tamper check so Windows writes no longer self-heal-and-drop.
  • Make lifecycle cwd containment checks Windows-aware, and update multiple tests to avoid POSIX-only assumptions (paths, sleep, platform selection).
  • Emit apm.yml for trust-gate testing via yaml.safe_dump to avoid Windows backslash escaping pitfalls.
Show a summary per file
File Description
src/apm_cli/core/script_executors.py Fix Windows behavior for scripts.log tamper check; adjust cwd containment separator handling.
tests/unit/test_lifecycle_executor_paths.py Use a portable long-running command for timeout tests (python sleep).
tests/unit/install/test_lifecycle_trust_gate.py Generate YAML via yaml.safe_dump and use raw Python strings to survive Windows backslashes.
tests/unit/core/test_script_executors.py Make path assertions platform-agnostic for _resolve_cwd and scripts.log path tests.
tests/unit/core/test_lifecycle_scripts.py Pin platform behavior for bash-vs-command selection tests.

Review details

  • Files reviewed: 5/5 changed files
  • Comments generated: 1
  • Review effort level: Low

resolved = (root / script.cwd).resolve()
root_resolved = root.resolve()
if not str(resolved).startswith(str(root_resolved) + "/") and resolved != root_resolved:
if not str(resolved).startswith(str(root_resolved) + os.sep) and resolved != root_resolved:
danielmeppiel and others added 3 commits June 28, 2026 22:45
Integration test test_apm_install_writes_cache_pin_marker_for_each_remote_dep
regressed after PR #1926 added orphan lockfile pruning: a remote dep recorded
in the lockfile but absent from the manifest is now treated as an orphan and
pruned from the rebuilt lockfile, so its `.apm-pin` marker was never written --
even though the dep's cached payload survives on disk under apm_modules/.

That broke the supply-chain contract from PR #1137: every cached remote dep
must carry a `.apm-pin` marker so a later `apm audit` drift replay can
fail-closed on a stale cache. The regression went unnoticed because the
integration matrix only runs on dispatch/tag/schedule, not on PRs.

Restore self-heal at the product level: build_and_save now syncs markers from
the pre-existing on-disk lockfile FIRST, before orphan-pruning rewrites it.
Best-effort and idempotent; unpinned-dep warnings stay suppressed here since
the primary post-build sync already surfaces them.

- cache_pin.py: add keyword-only `warn_unpinned: bool = True` to
  sync_markers_for_lockfile so the secondary pass stays silent.
- phases/lockfile.py: add _sync_cache_pin_markers_from_existing() driven by
  ctx.existing_lockfile, invoked at the top of build_and_save.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The preceding commit restores the PR #1137 cache-pin (.apm-pin) marker
self-heal contract that PR #1926's orphan-lockfile-pruning silently
regressed. This touches src/apm_cli/install/ (a Mode B critical path)
but introduces NO net-new OpenAPM v0.1 normative behaviour: the
.apm-pin cache marker is an APM-implementation-internal supply-chain /
audit-hardening feature, not part of the OpenAPM interop spec (it has
no spec anchor, no requirements-manifest row, and no Appendix C entry,
because other implementations are not required to honour it). Adding a
req-XXX for it would wrongly pollute the interop spec with
implementation-specific surface, so the waiver -- not a new spec
requirement -- is the correct classification.

apm-spec-waiver: restores PR #1137 cache-pin (.apm-pin) self-heal regressed by #1926; APM-internal install audit hardening, not OpenAPM v0.1 normative surface

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel danielmeppiel merged commit 1c9923b into main Jun 28, 2026
18 checks passed
@danielmeppiel danielmeppiel deleted the danielmeppiel-fix-windows-ci-tests branch June 28, 2026 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants