feat(cli): add Bun-based deployment CLI tool#304
Conversation
📝 WalkthroughWalkthroughThis PR introduces the Tale CLI, a comprehensive deployment management tool written in TypeScript using Bun, replacing the previous bash-based deployment script and Docker Compose overlay files. The CLI provides commands for deploying, rolling back, viewing status, managing logs, and resetting deployments. It generates Docker Compose configurations dynamically, manages deployment state, handles health checks, orchestrates blue-green deployments, and integrates with GitHub Container Registry for image and version management. A new GitHub Actions workflow automates cross-platform CLI binary builds. The old scripts/deploy.sh and compose overlay files are removed, and compose.yml is updated to reference the CLI for production deployments. Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes The change introduces extensive new functionality across multiple domains: CLI command wiring, Docker orchestration, composition generation, configuration management, state persistence, health checking, registry integration, and action handlers. While many individual modules follow similar patterns, each domain presents distinct logic requiring separate reasoning. The removal of 1000+ lines of deployment logic and addition of equivalent functionality in modular TypeScript form increases review scope. Particular attention is needed for deploy/rollback orchestration, health check loops, state management consistency, and Docker Compose generation correctness. Possibly related PRs
Comment |
There was a problem hiding this comment.
Actionable comments posted: 22
🤖 Fix all issues with AI agents
In @.github/workflows/cli.yml:
- Around line 10-15: The workflow's push trigger only includes branch pushes so
the "Upload to latest release" step (the job/step named "Upload to latest
release", lines ~96-111) never runs on tag pushes; update the workflow's on.push
configuration to also include tag pushes (e.g., add a tags pattern such as
'refs/tags/*' or a specific tag glob) so tag-created runs will trigger the job,
or alternatively split the tag-upload logic into a separate workflow that is
triggered by on.push.tags; modify the on.push block (not the step) to add the
tags key or create a dedicated tag-triggered workflow.
In @.github/workflows/release.yml:
- Around line 27-30: The workflow currently matches pre-release tags via the
patterns 'v*.*.*-*' and '[0-9]*.*.*-*', causing pre-releases to run and update
:latest; to fix this either remove those two patterns from the tags list so only
stable tags ('v*.*.*' and '[0-9]*.*.*') trigger the workflow, or keep the
patterns but add a guard in the publish step that only pushes the :latest tag
when the git tag does not contain a hyphen (e.g., check the tag string for '-'
and skip publishing :latest if present).
In `@compose.yml`:
- Around line 1-11: Update the compose.yml header comment to list port 6380
alongside the other development-only ports (5432, 6379, 8001-8003) so the
warning includes the graph-db browser UI; locate the header block at the top of
compose.yml and add ", 6380 (graph-db browser UI)" to the sentence that
enumerates dev-only exposed ports.
In `@tools/cli/.gitignore`:
- Around line 1-2: Add common OS/editor/dependency artifacts to the repository
ignore list by extending the existing node_modules/ and dist/ entries with
patterns such as .DS_Store, Thumbs.db, .env, .env.local, .vscode/, .idea/,
*.log, npm-debug.log, yarn-error.log, package-lock.json (if desired), and
coverage/ (or other build/test artifacts); update the .gitignore contents to
include these lines so developer machines don’t accidentally commit editor or OS
files.
- Around line 1-2: Add patterns to the existing .gitignore so environment files
are not committed: update the file that currently lists node_modules/ and dist/
to also include .env and .env.local (and optionally .env.* or .env.local.*) to
prevent accidental leakage of secrets created by the CLI's interactive
environment setup or configuration features.
In `@tools/cli/README.md`:
- Around line 150-156: The "Deployment Flow" section ends abruptly after the
numbered steps—update the "Deployment Flow" under the "### Deployment Flow"
heading to either add the missing final step(s) (e.g., step 6 "Remove old
containers" followed by a step confirming completion or a closing note like
"Deployment complete: verify services and monitoring") or add a brief concluding
sentence clarifying the flow is finished; modify the listed sequence (steps 1–5
as shown) and append the concluding step/text so the section reads complete and
intentional.
In `@tools/cli/src/commands/config/index.ts`:
- Around line 43-60: The action handler for the "set-dir" command accepts any
path without validation; update the .action callback to validate deployDir
before calling setConfig: resolve tilde as you already do, then use
fs.promises.access or fs.existsSync to check that deployDir exists and is a
directory and is readable/writable, and if not either log a clear warning via
logger.warn or throw/logger.error and exit; keep the rest of the flow
(getConfig, setConfig, CURRENT_CONFIG_VERSION, logger.success) unchanged and
ensure the checks reference the deployDir variable in the .action async
function.
In `@tools/cli/src/commands/deploy/index.ts`:
- Around line 15-80: DEFAULT_HOST_ALIAS is evaluated before loadEnv() runs so a
HOST set in .env is ignored; update createDeployCommand to stop using
DEFAULT_HOST_ALIAS as the option's compile-time default and instead compute
hostAlias after loadEnv() (e.g., const hostAlias = options.host ??
process.env.HOST ?? DEFAULT_HOST_ALIAS) and pass that hostAlias into
deploy(...). Change the .option("--host <hostname>", ...) to not hardcode
DEFAULT_HOST_ALIAS as the default (or remove the third arg) and ensure the
computed hostAlias replaces options.host in the deploy call.
In `@tools/cli/src/commands/reset.ts`:
- Around line 15-35: The reset command can act on default env values if .env is
missing because loadEnv falls back to defaults; update createResetCommand so it
ensures the deployment env is explicitly set up before calling loadEnv: after
await ensureConfig({ explicitDir: options.dir }) verify or create the project
.env (reuse or add a helper like ensureEnvSetup/requireEnvExists) and fail with
a clear error or interactive prompt if the env file is absent, then call
loadEnv(deployDir) and proceed to reset; modify createResetCommand (and
ensureConfig or add ensureEnvSetup helper) to enforce this check so reset never
runs against defaults.
In `@tools/cli/src/lib/actions/deploy.ts`:
- Around line 176-221: Before performing the in-place update (the inPlaceUpdate
branch that uses currentColor, generateColorCompose, dockerCompose and
waitForHealthy), capture and save the current deployed version by invoking
setPreviousVersion with the appropriate identifier so rollbacks target this
pre-update state; add a call to setPreviousVersion(...) immediately after
verifying currentColor (before ensureInfrastructure and any dockerCompose/up)
and pass the current deployment identity (e.g., project name and color or the
value returned by your current-version lookup) so the existing version is
recorded for precise rollback.
In `@tools/cli/src/lib/actions/logs.ts`:
- Around line 93-96: The current post-run check throws on any non-zero exit code
(using proc.exited and exitCode), which misreports normal Ctrl+C termination in
follow mode (exit code 130); update the logic in the logs action to treat SIGINT
termination as expected: after awaiting proc.exited, if exitCode === 0 return
normally, else if options.follow (or the follow flag variable used in this
module) and exitCode === 130 (128 + SIGINT) do not throw (consider logging a
benign message or simply return); otherwise keep throwing the Error(`docker logs
exited with code ${exitCode}`) to preserve existing error behavior. Ensure you
reference proc and exitCode in your change and use the same options/follow
identifier already present in the function.
In `@tools/cli/src/lib/actions/rollback.ts`:
- Around line 68-79: The current parallel image pull using Promise.all over
ROTATABLE_SERVICES with pullImage loses per-image progress and precise failure
attribution; change this to a sequential loop that iterates ROTATABLE_SERVICES,
constructs image = `${env.GHCR_REGISTRY}/tale-${service}:${rollbackVersion}`,
logs each pull with logger.step (or similar), awaits pullImage(image) for each,
and immediately throws a descriptive Error(`Failed to pull image: ${image}`) if
pullImage returns a failure so the CLI shows clear progress and the exact image
that failed.
In `@tools/cli/src/lib/compose/services/create-graph-db-service.ts`:
- Around line 11-16: The healthcheck block in create-graph-db-service.ts for the
graph DB service (healthcheck object created in createGraphDBService / the Graph
DB service definition) is missing start_period; add a start_period (e.g., "60s")
to the healthcheck alongside test, interval, timeout, and retries so the
container has time to initialize before health checks begin failing, matching
the pattern used by the platform service and other services (adjust duration as
appropriate for FalkorDB startup).
In `@tools/cli/src/lib/config/ensure-config.ts`:
- Around line 22-27: When options.explicitDir is provided, ensureConfig
currently returns the path without creating or validating it; update
ensureConfig to validate and create the directory (mirror first-run behavior)
before returning it by using fs.promises.mkdir(pathToUse, { recursive: true })
or an equivalent existence check+mkdir, handle and surface any filesystem
errors, and keep returning the same explicitDir value; refer to the ensureConfig
function and the options.explicitDir branch to locate where to add the
mkdir/validation logic.
In `@tools/cli/src/lib/config/ensure-env.ts`:
- Around line 8-13: The checkDbVolumeExists function uses a POSIX-only stderr
redirection string (`2>/dev/null`) in the execSync call which breaks on Windows;
remove the redirection (or replace it with a platform-neutral suppression) and
rely on the existing try/catch to handle errors so the volume check works on
Windows too—update the execSync invocation in checkDbVolumeExists to call
`docker volume inspect ${projectName}_db-data` without `2>/dev/null`.
In `@tools/cli/src/lib/docker/check-http-health.ts`:
- Around line 8-19: The per-request AbortSignal timeout in checkHttpHealth
(AbortSignal.timeout(5000)) should be made configurable to avoid false
negatives: add a requestTimeout (in ms or seconds) to the options used by
checkHttpHealth (and exposed via the options interface in wait-for-healthy.ts),
default it sensibly (e.g., 5000ms) if not provided, and replace the hardcoded
5000 with the new options.requestTimeout when creating the fetch signal; ensure
caller code that constructs the options (e.g., functions/classes that call
checkHttpHealth) passes the new field or relies on the default.
In `@tools/cli/src/lib/docker/exec.ts`:
- Around line 10-39: Add an optional timeout to exec by extending the options
parameter (e.g., options: { cwd?: string; silent?: boolean; timeoutMs?: number
}) and use it to race the process completion (proc.exited) against a timer; if
the timeout wins, call proc.kill() (or proc.kill("SIGTERM")/proc.kill("SIGKILL")
as needed), await proc.exited to settle, and return an ExecResult with success:
false, an exitCode indicating timeout (e.g., -1), and stderr containing a clear
timeout message. Ensure you reference the existing symbols proc, proc.exited,
and proc.kill() and preserve existing behavior when timeoutMs is undefined or 0
(no timeout).
In `@tools/cli/src/lib/docker/remove-container.ts`:
- Around line 4-14: The removeContainer function currently treats a docker "No
such container" error as failure; update removeContainer to treat that specific
stderr message as success for idempotent cleanup: after calling docker("rm",
"-f", containerName) in removeContainer, if result.success is false check
result.stderr (trimmed) and if it includes "No such container" return true (and
do not emit a warn); only log a warning and return false for other errors—keep
using the same function name removeContainer and the existing logger/dockercall
symbols.
In `@tools/cli/src/lib/registry/select-version.ts`:
- Around line 63-70: The regex that detects semantic aliases only matches digits
and will miss tags like "v1.0.0"; in the block that checks version.tag ===
"latest" (using version.aliases.find and assigning semanticAlias to value),
update the test from /^\d+\.\d+\.\d+/ to allow an optional "v" prefix (e.g.
/^v?\d+\.\d+\.\d+/) so tags like "v1.0.0" are recognized as semantic aliases;
ensure you adjust only the regex used in the aliases.find call and leave the
rest of the logic (semanticAlias -> value assignment) intact.
- Around line 49-55: The duplicate dynamic import of the confirm prompt should
be removed and the already-imported confirm reused: drop the await
import("@inquirer/prompts") and replace the usage of confirmDone with the
previously imported confirm (the top-level confirm symbol) so the code calls
confirm({ message: "Press Enter after you have logged in...", default: true })
and returns that result; ensure you only reference the existing confirm
identifier rather than creating confirmDone.
In `@tools/cli/src/lib/state/acquire-lock.ts`:
- Around line 34-38: The catch block swallowing all errors from unlink(lockPath)
hides stale-lock removal failures; change it to catch the error (e) and emit a
debug-level log including the lockPath and error details (e.g., logger.debug or
console.debug) so failures are visible while preserving current behavior (do not
rethrow); reference unlink and lockPath and note that writeFile with the 'wx'
flag still provides atomicity.
In `@tools/cli/src/utils/confirm.ts`:
- Line 12: Trim the user's input before comparing in the confirm prompt: update
the comparison that currently uses answer.toLowerCase() (the resolve(...) call)
to first trim the answer (e.g., answer.trim().toLowerCase()) so responses like "
y" or "y " are accepted; modify the resolve expression in the confirm utility
(the resolve(answer.toLowerCase() === "y" || answer.toLowerCase() === "yes")) to
use trimmed input instead.
Create a new deployment CLI that generates Docker Compose configs inline with security-hardened settings. Only ports 80/443 are exposed in production. Features: - Blue-green deployment with zero-downtime - Rollback to previous version - Status monitoring - Cleanup of inactive containers - Reset functionality The CLI compiles to a single binary for easy server deployment. Refs #294
Build the tale-deploy binary for Linux x64 and upload to GitHub Releases. Triggers: - Manual dispatch with optional release tag - Push to main when tools/deploy/ changes Refs #294
Add clear warning that compose.yml exposes ports for development only. Production deployments should use the tale-deploy CLI. Refs #294
Remove files replaced by the new tale-deploy CLI: - compose.blue.yml - compose.green.yml - scripts/deploy.sh Refs #294
- Fix network aliases format (use object instead of array) - Fix external volume declaration (remove driver when external) - Write compose file to deploy dir for correct env_file resolution - Add ensureVolumes and ensureNetwork to create resources before deploy Refs #294
…lify proxy - Add --dry-run flag to deploy and reset commands for previewing changes - Add --services flag to deploy specific services without full blue-green switch - Add in-place update mode when deploying individual services - Add logs command for viewing service container logs - Add --version flag to rollback for targeting a specific version - Rename --update-stateful/--include-stateful to -a/--all - Simplify Caddyfile to route via single platform DNS alias - Add service type guards and ALL_SERVICES constant
Split monolithic docker/client.ts and docker/health.ts into focused single-responsibility modules (container, exec, network, volume, pull-image, wait-for-healthy, check-http-health, image-exists). Rename command exports from xxxCommand to plain names. Add mac and windows build targets.
…-purpose files Break down monolithic modules (state/lock.ts, state/deployment.ts, compose/services/*, utils/env.ts) into focused single-function files following the same pattern applied to docker modules. Update all command imports accordingly.
Build Linux x64, macOS ARM64, and Windows x64 binaries in CI instead of only Linux. Each platform binary is uploaded as a separate artifact and attached to releases.
Now that CI builds for all platforms, replace string template paths with path.join to handle platform-specific path separators correctly.
…bcommand Restructure the deployment CLI to be a general-purpose Tale CLI tool: - Rename package from @tale/deploy to @tale/cli - Move tools/deploy/ to tools/cli/ - Change binary output from tale-deploy-* to tale - Nest deploy commands under `tale deploy` subcommand: - tale deploy <version> - Deploy a new version - tale deploy rollback - Rollback to previous version - tale deploy status - Show deployment status - tale deploy cleanup - Remove inactive containers - tale deploy reset - Reset all containers - tale deploy logs - View service logs - Reorganize source structure: - src/commands/deploy/ - Deploy command group - src/lib/compose/ - Docker Compose generation - src/lib/docker/ - Docker operations - src/lib/state/ - State management - Update CI workflow to build tale binary with platform suffixes - Add tools/cli to npm workspaces This structure enables future command groups (tale db, tale config, etc.)
Prevent accidental overwriting of published release artifacts by only allowing uploads when the push is from a tag ref.
Changed tale-deploy to tale to match the actual CLI binary name.
Switch all build-*.yml workflows from ubuntu-latest to self-hosted runner to reduce GitHub Actions costs and improve build performance with persistent Docker cache.
Required by i18next-icu but was not explicitly installed, causing vite build to fail.
Summary
@tale/cli) built with Bun for managing deploymentsKey Features
deploy- Blue-green deployments with health checks and automatic rollbackrollback- Revert to previous deployment versionstatus- View current deployment status and container healthlogs- Stream logs from deployed servicesconfig- Manage deployment configurationcleanup- Remove old containers and imagesreset- Reset deployment stateTest plan
npm run build --workspace=@tale/cli🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Chores
✏️ Tip: You can customize this high-level summary in your review settings.