Skip to content

[awf] squid: healthcheck timing flakiness causes intermittent container startup failure #3985

Description

@lpcox

Problem

The awf-squid proxy container intermittently fails its Docker Compose healthcheck during workflow startup, causing the entire AWF stack to exit with code 1. This has recurred across multiple unrelated workflows after the prior tracker (#34920) was closed.

Context

Upstream report: github/gh-aw#35485

Failure signature (consistent across all occurrences):

Container awf-squid  Error
dependency failed to start: container awf-squid is unhealthy
[ERROR] Failed to start containers: Error: Command failed with exit code 1: docker compose up -d --pull never

awf-api-proxy reports Healthy; only awf-squid fails. Approximately 39/50 runs in a 6-hour window were green, indicating a timing-sensitive flake rather than a fully broken image.

Root Cause

Most likely the Squid healthcheck probe fires before Squid finishes parsing its base64-decoded config and binding port 3128 on first boot. The current start_period and retries values in src/docker-manager.ts (the generated docker-compose.yml) may be too tight for slower GitHub Actions runners.

Proposed Solution

  1. In src/docker-manager.ts, increase healthcheck.start_period for awf-squid (e.g. from current value to 30s) and increase retries to at least 5.
  2. In src/cli.ts, when docker compose up exits non-zero due to an unhealthy dependency, capture docker logs awf-squid and docker inspect awf-squid --format='{{json .State.Health}}' and surface them in the error output to aid future diagnosis.
  3. Consider adding a single retry of docker compose up (after docker compose down -v) for healthcheck failures to distinguish transient first-boot races from persistent breakage.

Generated by Firewall Issue Dispatcher · sonnet46 2.5M ·

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions