Problem
The awf-squid proxy container intermittently fails its Docker Compose healthcheck during workflow startup, causing the entire AWF stack to exit with code 1. This has recurred across multiple unrelated workflows after the prior tracker (#34920) was closed.
Context
Upstream report: github/gh-aw#35485
Failure signature (consistent across all occurrences):
Container awf-squid Error
dependency failed to start: container awf-squid is unhealthy
[ERROR] Failed to start containers: Error: Command failed with exit code 1: docker compose up -d --pull never
awf-api-proxy reports Healthy; only awf-squid fails. Approximately 39/50 runs in a 6-hour window were green, indicating a timing-sensitive flake rather than a fully broken image.
Root Cause
Most likely the Squid healthcheck probe fires before Squid finishes parsing its base64-decoded config and binding port 3128 on first boot. The current start_period and retries values in src/docker-manager.ts (the generated docker-compose.yml) may be too tight for slower GitHub Actions runners.
Proposed Solution
- In
src/docker-manager.ts, increase healthcheck.start_period for awf-squid (e.g. from current value to 30s) and increase retries to at least 5.
- In
src/cli.ts, when docker compose up exits non-zero due to an unhealthy dependency, capture docker logs awf-squid and docker inspect awf-squid --format='{{json .State.Health}}' and surface them in the error output to aid future diagnosis.
- Consider adding a single retry of
docker compose up (after docker compose down -v) for healthcheck failures to distinguish transient first-boot races from persistent breakage.
Generated by Firewall Issue Dispatcher · sonnet46 2.5M · ◷
Problem
The
awf-squidproxy container intermittently fails its Docker Compose healthcheck during workflow startup, causing the entire AWF stack to exit with code 1. This has recurred across multiple unrelated workflows after the prior tracker (#34920) was closed.Context
Upstream report: github/gh-aw#35485
Failure signature (consistent across all occurrences):
awf-api-proxyreportsHealthy; onlyawf-squidfails. Approximately 39/50 runs in a 6-hour window were green, indicating a timing-sensitive flake rather than a fully broken image.Root Cause
Most likely the Squid healthcheck probe fires before Squid finishes parsing its base64-decoded config and binding port 3128 on first boot. The current
start_periodandretriesvalues insrc/docker-manager.ts(the generateddocker-compose.yml) may be too tight for slower GitHub Actions runners.Proposed Solution
src/docker-manager.ts, increasehealthcheck.start_periodforawf-squid(e.g. from current value to30s) and increaseretriesto at least 5.src/cli.ts, whendocker compose upexits non-zero due to an unhealthy dependency, capturedocker logs awf-squidanddocker inspect awf-squid --format='{{json .State.Health}}'and surface them in the error output to aid future diagnosis.docker compose up(afterdocker compose down -v) for healthcheck failures to distinguish transient first-boot races from persistent breakage.