Skip to content

fix(cli): defer release creation until images are pullable#1371

Merged
larryro merged 2 commits into
mainfrom
ci/issue-1349-release-race
Apr 11, 2026
Merged

fix(cli): defer release creation until images are pullable#1371
larryro merged 2 commits into
mainfrom
ci/issue-1349-release-race

Conversation

@yannickmonney

@yannickmonney yannickmonney commented Apr 11, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Reorder release pipeline so the GitHub release is only created after all multi-arch manifests are pushed, eliminating the 15-20 minute window where a release is visible but container images cannot be pulled
  • Add user-friendly error message in pullImage() when an image is not found in the registry, suggesting the user wait if it is a recent release
  • Filter unpublished versions (those without a manifest) from the CLI version selector so users cannot select a version that is still being built

Changes

  • .github/workflows/release.yml: create-release now depends on [prepare, manifest] instead of [prepare]; trigger-cli now depends on [prepare, manifest] instead of [prepare, create-release]
  • tools/cli/src/lib/docker/pull-image.ts: detect not-found/manifest-unknown errors and show a timing-aware hint
  • tools/cli/src/lib/registry/get-available-versions.ts: verify manifest existence via HEAD requests before listing versions
  • Added tests for both CLI changes (14 tests total)

Testing

  • 14 unit tests pass (pull-image.test.ts, get-available-versions.test.ts)
  • Lint passes
  • Typecheck passes (pre-existing errors in fetch-reference.ts unrelated)
  • Pipeline logic verified on next release
  • No regressions in CI config

Related issues

Closes #1349

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Improved Docker image pull error handling with clearer error messaging for "image not found" scenarios, helping users troubleshoot pull failures.
    • Enhanced version discovery to verify manifest availability, ensuring only published versions are displayed in available versions lists.
  • Tests

    • Added comprehensive test coverage for Docker image operations and registry version lookups.

@yannickmonney

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@coderabbitai

coderabbitai Bot commented Apr 11, 2026

Copy link
Copy Markdown
Contributor
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai

coderabbitai Bot commented Apr 11, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

This pull request adds manifest-existence verification to Docker image pulling and registry version retrieval. It modifies the GitHub Actions release workflow to adjust job dependencies, updates the pullImage function to classify "not found" errors for better diagnostic messaging, and updates getAvailableVersions to filter out semantic version tags whose Docker manifests are unavailable. Two new test suites are added to validate the pullImage and getAvailableVersions functions with various scenarios including network failures, missing manifests, and tag filtering.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(cli): defer release creation until images are pullable' accurately summarizes the main change: reordering the release pipeline to prevent releasing before images are available.
Linked Issues check ✅ Passed The PR successfully addresses issue #1349 by reordering the release pipeline to wait for manifests before creating releases, adding user-friendly error messages for not-found errors, and filtering unpublished versions.
Out of Scope Changes check ✅ Passed All changes are directly related to addressing the release timing issue: workflow reordering, error message handling, manifest verification, and comprehensive test coverage for the new functionality.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ci/issue-1349-release-race

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/release.yml:
- Around line 248-250: The workflow "trigger-cli" triggers the dispatched
"cli.yml" which runs a step invoking "gh release upload" (with --clobber) but
has no retry or pre-check, so if the "create-release" run hasn't completed yet
the upload can fail with "release not found"; update the CI by adding either
retry logic around the upload step in "cli.yml" (retry the "gh release upload"
command with a few backoff attempts) or add an explicit existence check before
upload (e.g., run "gh release view <tag>" and wait/retry until it succeeds) so
the "gh release upload" step used by cli.yml will only run after the release is
confirmed to exist.

In `@tools/cli/src/lib/docker/pull-image.ts`:
- Around line 14-20: In the not-found branch inside pull-image.ts (the block
that checks isNotFoundError(result.stderr)), preserve the original registry
stderr by including it alongside the user-friendly hint when calling
logger.error; update the logger.error call to log both the timing hint and
result.stderr (either as a single concatenated message or two consecutive
logger.error calls) so the friendly guidance is shown but the raw stderr is
retained for troubleshooting.

In `@tools/cli/src/lib/registry/get-available-versions.test.ts`:
- Around line 189-213: The test "respects limit parameter" only checks length;
update or add a test for getAvailableVersions that simulates missing manifests
for the newest semantic tags by adjusting the fetchMock behavior (return
404/absent HEAD responses for specific tag digests) so older published tags must
backfill the result, then assert the exact deterministic list of returned tags
(e.g., expected ordered tag names) rather than only length; refer to
getAvailableVersions, mockTagsResponse and mockManifestHead in the fetchMock
setup and assert equality of result.versions to the expected array.

In `@tools/cli/src/lib/registry/get-available-versions.ts`:
- Around line 158-167: The current logic limits probing to
Math.min(sortedSemanticTags.length, limit + 1) then filters to build
publishedVersions, which can skip older valid tags if many recent tags lack
manifests; change to a lazy/iterative probe: iterate over sortedSemanticTags
until publishedVersions.size reaches limit or tags exhausted, calling
getManifestDigest(image, tag, token) per tag and adding tag to publishedVersions
only when hasManifest is true, instead of pre-slicing into manifestChecks;
update references in this block (sortedSemanticTags, limit, getManifestDigest,
manifestChecks, publishedVersions) to use the iterative approach so you stop
probing as soon as you have enough published versions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8829b137-5adc-4d2b-b4fa-349dd1ddbaba

📥 Commits

Reviewing files that changed from the base of the PR and between 6feddbe and 86e85c1.

📒 Files selected for processing (5)
  • .github/workflows/release.yml
  • tools/cli/src/lib/docker/pull-image.test.ts
  • tools/cli/src/lib/docker/pull-image.ts
  • tools/cli/src/lib/registry/get-available-versions.test.ts
  • tools/cli/src/lib/registry/get-available-versions.ts

Comment on lines 248 to +250
trigger-cli:
name: Trigger CLI build
needs: [prepare, create-release]
needs: [prepare, manifest]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check how cli.yml handles release asset uploads

# Find the CLI workflow file
fd -t f 'cli.yml' .github/workflows --exec cat {}

# Look for release upload logic in CLI workflow or related scripts
rg -n -C3 'gh release upload|release.*upload|upload.*release' .github/

Repository: tale-project/tale

Length of output: 4993


🏁 Script executed:

# Examine the release.yml file, focusing on create-release and trigger-cli jobs
fd -t f 'release.yml' .github/workflows --exec sed -n '1,350p' {}

Repository: tale-project/tale

Length of output: 10870


🌐 Web query:

gh release upload behavior when release doesn't exist --clobber flag

💡 Result:

The gh release upload command requires an existing GitHub release identified by the provided <tag>. If the release does not exist for that tag, the command fails with a "release not found" error. The --clobber flag enables overwriting (deleting then re-uploading) existing assets with the same name on an existing release. It does not create a new release or affect behavior when the release is missing. If the upload fails after deletion (e.g., interruption), original assets are lost, as clarified in recent docs and updates (as of 2026). To upload assets: 1. Create the release first: gh release create <tag> 2. Then upload: gh release upload <tag> <files> [--clobber] Sources confirm gh release upload fetches the release by tag before uploading assets, failing if not found (e.g., similar to gh release download and gh release view behaviors documented and discussed in issues). Official manual states it "Upload asset files to a GitHub Release" implying prerequisite existence.

Citations:


Verify CLI workflow handles missing release gracefully.

Both trigger-cli and create-release depend only on [prepare, manifest], allowing them to run in parallel. The cli.yml workflow dispatched by trigger-cli eventually calls gh release upload, which requires the release to exist—the --clobber flag does not create a release if missing.

However, gh release upload will fail with "release not found" if the release doesn't exist. cli.yml contains no retry logic or pre-upload release existence checks.

In practice, this is unlikely to fail because create-release (a single gh release create command) completes in seconds, while cli.yml's release job runs only after multi-platform builds complete (taking many minutes). The release will almost certainly exist before the upload is attempted.

Consider adding either:

  1. Retry logic in cli.yml's release upload step, or
  2. An explicit check that the release exists before attempting upload
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/release.yml around lines 248 - 250, The workflow
"trigger-cli" triggers the dispatched "cli.yml" which runs a step invoking "gh
release upload" (with --clobber) but has no retry or pre-check, so if the
"create-release" run hasn't completed yet the upload can fail with "release not
found"; update the CI by adding either retry logic around the upload step in
"cli.yml" (retry the "gh release upload" command with a few backoff attempts) or
add an explicit existence check before upload (e.g., run "gh release view <tag>"
and wait/retry until it succeeds) so the "gh release upload" step used by
cli.yml will only run after the release is confirmed to exist.

Comment on lines +14 to +20
if (isNotFoundError(result.stderr)) {
logger.error(
'The image does not exist in the registry. If this is a recent release, the images may still be building. Wait a few minutes and try again.',
);
} else {
logger.error(result.stderr);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Preserve original registry stderr alongside the timing hint

The not-found branch currently replaces raw stderr with only the friendly hint. That can hide actionable diagnostics when the tag/repo is actually wrong. Log both (hint + stderr) to keep troubleshooting fidelity.

Suggested patch
       if (isNotFoundError(result.stderr)) {
         logger.error(
           'The image does not exist in the registry. If this is a recent release, the images may still be building. Wait a few minutes and try again.',
         );
+        if (result.stderr) {
+          logger.error(result.stderr);
+        }
       } else {
         logger.error(result.stderr);
       }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/cli/src/lib/docker/pull-image.ts` around lines 14 - 20, In the
not-found branch inside pull-image.ts (the block that checks
isNotFoundError(result.stderr)), preserve the original registry stderr by
including it alongside the user-friendly hint when calling logger.error; update
the logger.error call to log both the timing hint and result.stderr (either as a
single concatenated message or two consecutive logger.error calls) so the
friendly guidance is shown but the raw stderr is retained for troubleshooting.

Comment on lines +189 to +213
test('respects limit parameter', async () => {
fetchMock.mockImplementation((url: string, options?: RequestInit) => {
const urlStr = url.toString();

if (urlStr.includes('/token')) {
return Promise.resolve(mockTokenResponse());
}

if (urlStr.includes('/tags/list')) {
return Promise.resolve(
mockTagsResponse(['0.3.0', '0.2.16', '0.2.15', '0.2.14', '0.2.13']),
);
}

if (options?.method === 'HEAD') {
return Promise.resolve(mockManifestHead('sha256:valid'));
}

return Promise.resolve(new Response(null, { status: 404 }));
});

const result = await getAvailableVersions('ghcr.io/tale-project/tale', 3);

expect(result.versions.length).toBeLessThanOrEqual(3);
});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a backfill edge-case test for limit with missing newest manifests

This test only asserts length <= limit. It doesn’t catch the case where multiple newest semantic tags are missing manifests and older published tags should still fill the limit. Please add a deterministic assertion for expected returned tags in that scenario.

As per coding guidelines: "Tests should cover happy paths, edge cases, and error conditions".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/cli/src/lib/registry/get-available-versions.test.ts` around lines 189 -
213, The test "respects limit parameter" only checks length; update or add a
test for getAvailableVersions that simulates missing manifests for the newest
semantic tags by adjusting the fetchMock behavior (return 404/absent HEAD
responses for specific tag digests) so older published tags must backfill the
result, then assert the exact deterministic list of returned tags (e.g.,
expected ordered tag names) rather than only length; refer to
getAvailableVersions, mockTagsResponse and mockManifestHead in the fetchMock
setup and assert equality of result.versions to the expected array.

Comment on lines +158 to +167
const candidateCount = Math.min(sortedSemanticTags.length, limit + 1);
const manifestChecks = await Promise.all(
sortedSemanticTags.slice(0, candidateCount).map(async (tag) => ({
tag,
hasManifest: !!(await getManifestDigest(image, tag, token)),
})),
);
const publishedVersions = new Set(
manifestChecks.filter((c) => c.hasManifest).map((c) => c.tag),
);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Manifest probe window can hide valid older versions

On Line 158, probing only limit + 1 tags and then filtering on Line 185 means any semantic tag beyond that window is always excluded, even if it has a valid manifest. If several newest tags are still unpublished, the CLI can return too few versions despite older pullable tags existing.

Suggested fix (lazy manifest check while iterating)
-    const candidateCount = Math.min(sortedSemanticTags.length, limit + 1);
-    const manifestChecks = await Promise.all(
-      sortedSemanticTags.slice(0, candidateCount).map(async (tag) => ({
-        tag,
-        hasManifest: !!(await getManifestDigest(image, tag, token)),
-      })),
-    );
-    const publishedVersions = new Set(
-      manifestChecks.filter((c) => c.hasManifest).map((c) => c.tag),
-    );
+    const publishedVersions = new Set<string>();
+    const unpublishedVersions = new Set<string>();
@@
-      if (!publishedVersions.has(sortedSemanticTags[i])) continue;
-      versionInfos.push({ tag: sortedSemanticTags[i], aliases: [] });
+      const tag = sortedSemanticTags[i];
+      if (!publishedVersions.has(tag) && !unpublishedVersions.has(tag)) {
+        const hasManifest = !!(await getManifestDigest(image, tag, token));
+        (hasManifest ? publishedVersions : unpublishedVersions).add(tag);
+      }
+      if (!publishedVersions.has(tag)) continue;
+      versionInfos.push({ tag, aliases: [] });

Also applies to: 185-185

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/cli/src/lib/registry/get-available-versions.ts` around lines 158 - 167,
The current logic limits probing to Math.min(sortedSemanticTags.length, limit +
1) then filters to build publishedVersions, which can skip older valid tags if
many recent tags lack manifests; change to a lazy/iterative probe: iterate over
sortedSemanticTags until publishedVersions.size reaches limit or tags exhausted,
calling getManifestDigest(image, tag, token) per tag and adding tag to
publishedVersions only when hasManifest is true, instead of pre-slicing into
manifestChecks; update references in this block (sortedSemanticTags, limit,
getManifestDigest, manifestChecks, publishedVersions) to use the iterative
approach so you stop probing as soon as you have enough published versions.

Reorder release pipeline so the GitHub release is created after all
multi-arch manifests are pushed, eliminating the 15-20 minute window
where a release is visible but container images cannot be pulled.

Also improve the CLI: add a user-friendly error when an image pull
fails due to a not-found/manifest-unknown error, and filter unpublished
versions from the interactive version selector.

Closes #1349
- Add retry/wait loop in cli.yml before uploading to release, using
  exponential backoff to handle the race where the release may not
  exist yet when the CLI workflow runs
- Log raw stderr alongside the friendly hint in pull-image for better
  diagnostics when an image is not found
- Switch manifest probing to lazy iteration so older valid tags are
  returned when recent tags have missing manifests
- Add test covering the backfill edge case where newest tags lack
  manifests
@larryro larryro force-pushed the ci/issue-1349-release-race branch from 678bd7b to 777638f Compare April 11, 2026 06:46
@larryro larryro merged commit 3c4fe57 into main Apr 11, 2026
9 checks passed
@larryro larryro deleted the ci/issue-1349-release-race branch April 11, 2026 06:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Error when install application 0.2.16

2 participants