feat(cli): add Tale CLI with blue-green deployment support#301
feat(cli): add Tale CLI with blue-green deployment support#301larryro wants to merge 35 commits into
Conversation
Create a new deployment CLI that generates Docker Compose configs inline with security-hardened settings. Only ports 80/443 are exposed in production. Features: - Blue-green deployment with zero-downtime - Rollback to previous version - Status monitoring - Cleanup of inactive containers - Reset functionality The CLI compiles to a single binary for easy server deployment. Refs #294
Build the tale-deploy binary for Linux x64 and upload to GitHub Releases. Triggers: - Manual dispatch with optional release tag - Push to main when tools/deploy/ changes Refs #294
Add clear warning that compose.yml exposes ports for development only. Production deployments should use the tale-deploy CLI. Refs #294
Remove files replaced by the new tale-deploy CLI: - compose.blue.yml - compose.green.yml - scripts/deploy.sh Refs #294
- Fix network aliases format (use object instead of array) - Fix external volume declaration (remove driver when external) - Write compose file to deploy dir for correct env_file resolution - Add ensureVolumes and ensureNetwork to create resources before deploy Refs #294
…lify proxy - Add --dry-run flag to deploy and reset commands for previewing changes - Add --services flag to deploy specific services without full blue-green switch - Add in-place update mode when deploying individual services - Add logs command for viewing service container logs - Add --version flag to rollback for targeting a specific version - Rename --update-stateful/--include-stateful to -a/--all - Simplify Caddyfile to route via single platform DNS alias - Add service type guards and ALL_SERVICES constant
Split monolithic docker/client.ts and docker/health.ts into focused single-responsibility modules (container, exec, network, volume, pull-image, wait-for-healthy, check-http-health, image-exists). Rename command exports from xxxCommand to plain names. Add mac and windows build targets.
…-purpose files Break down monolithic modules (state/lock.ts, state/deployment.ts, compose/services/*, utils/env.ts) into focused single-function files following the same pattern applied to docker modules. Update all command imports accordingly.
Build Linux x64, macOS ARM64, and Windows x64 binaries in CI instead of only Linux. Each platform binary is uploaded as a separate artifact and attached to releases.
Now that CI builds for all platforms, replace string template paths with path.join to handle platform-specific path separators correctly.
…bcommand Restructure the deployment CLI to be a general-purpose Tale CLI tool: - Rename package from @tale/deploy to @tale/cli - Move tools/deploy/ to tools/cli/ - Change binary output from tale-deploy-* to tale - Nest deploy commands under `tale deploy` subcommand: - tale deploy <version> - Deploy a new version - tale deploy rollback - Rollback to previous version - tale deploy status - Show deployment status - tale deploy cleanup - Remove inactive containers - tale deploy reset - Reset all containers - tale deploy logs - View service logs - Reorganize source structure: - src/commands/deploy/ - Deploy command group - src/lib/compose/ - Docker Compose generation - src/lib/docker/ - Docker operations - src/lib/state/ - State management - Update CI workflow to build tale binary with platform suffixes - Add tools/cli to npm workspaces This structure enables future command groups (tale db, tale config, etc.)
📝 WalkthroughWalkthroughThis pull request replaces the shell-based blue-green deployment orchestration with a comprehensive TypeScript CLI tool. It removes the legacy Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Comment |
There was a problem hiding this comment.
Actionable comments posted: 27
🤖 Fix all issues with AI agents
In @.github/workflows/cli.yml:
- Around line 96-108: The workflow step "Upload to latest release" currently
runs on every push and uses gh release upload with --clobber; change it to only
run for tagged pushes (e.g., update the step condition to check github.ref
startsWith 'refs/tags/' or require a provided release_tag input) and stop using
--clobber so published assets aren’t overwritten (update the LATEST_TAG logic to
use the tag from github.ref or the provided release_tag and call gh release
upload without --clobber, failing the job on upload errors instead of silently
echoing).
In `@compose.yml`:
- Around line 7-8: Replace the incorrect CLI reference "tale-deploy deploy
<version>" with the correct binary invocation "tale deploy <version>" wherever
it appears in the compose.yml (including the other occurrence noted in the
comment), ensuring all documentation strings and examples use "tale deploy
<version>" consistently.
In `@tools/cli/README.md`:
- Around line 79-86: The markdown tables in tools/cli/README.md have
inconsistent pipe spacing in their separator rows; update each table's separator
line to use a consistent format (e.g., "| --- | --- |") and normalize spacing
around pipes for the rows that list options like `-a, --all`, `-s, --services
<list>`, `--dry-run`, `-d, --dir <path>`, and `--host <hostname>` so
markdownlint MD060 passes; apply the same normalization to the other table
blocks referenced (lines near the other ranges) to ensure all tables use
identical separator formatting and alignment.
In `@tools/cli/src/commands/deploy/cleanup.ts`:
- Around line 32-47: The loop only removes containers when
isContainerRunning(...) is true; change it so containers for each
ROTATABLE_SERVICES inactiveColor are removed regardless of running state:
compute containerName as before, optionally attempt stopContainer(containerName)
if isContainerRunning returns true (or always attempt but ignore errors), then
always call removeContainer(containerName) (since it uses docker rm -f) and
handle warnings if removeContainer returns false; update references to
isContainerRunning, stopContainer, removeContainer, ROTATABLE_SERVICES,
containerName, and cleaned to ensure cleaned increments when a container was
actually removed.
- Around line 36-46: The cleaned counter is incremented unconditionally even
when stopContainer or removeContainer fails; update the cleanup logic around the
running check so that cleaned++ only runs when the cleanup actually succeeded
(e.g., require both stopped and removed be true, or at minimum removed be true)
— modify the block referencing running, stopContainer(containerName),
removeContainer(containerName), and cleaned so you move the increment inside an
if that checks the boolean results and leave the warning logs as-is when either
operation fails.
In `@tools/cli/src/commands/deploy/deploy.ts`:
- Around line 151-163: Extract the duplicated volume/network setup into a helper
(e.g., ensureInfrastructure) and replace the two inline blocks in deploy (the
in-place path and the blue-green path) to call it; the helper should take
projectName, dryRun, and prefix, declare the requiredVolumes array
(["platform-convex-data","caddy-data","rag-data"]), log dry-run actions via
logger when dryRun is true, and otherwise call ensureVolumes(projectName,
requiredVolumes) and ensureNetwork(projectName, "internal") and throw the same
errors if those calls return falsy, so ensureVolumes, ensureNetwork,
requiredVolumes, dryRun, and prefix are referenced consistently.
- Around line 94-101: The current sequential loop that calls pullImage for each
entry in imagesToPull should be changed to run pulls in parallel: replace the
for-await loop inside deploy.ts (the imagesToPull handling) with a Promise.all
over imagesToPull.map(image => pullImage(image)) and then check results to throw
a single aggregated Error if any pull failed; alternatively, to avoid unbounded
concurrency use a concurrency limiter (e.g., p-limit) to wrap pullImage calls
before Promise.all. Ensure you reference the original pullImage function and
imagesToPull variable and preserve existing error behavior by including which
images failed in the thrown Error.
- Around line 293-300: When stopping/removing old color containers during drain
(the loop over rotatableToUpdate that calls stopContainer and removeContainer
for containerName built from env.PROJECT_NAME, service, currentColor), wrap each
stopContainer/removeContainer call in its own try/catch so failures for one
service are logged via logger.warn or logger.error (include containerName and
error) but do not rethrow, allowing the loop to continue cleaning up the
remaining services; ensure you still call both stopContainer and removeContainer
attempts even if stopContainer fails, and keep the existing logger.step message
for context.
In `@tools/cli/src/commands/deploy/index.ts`:
- Around line 144-151: The current parsing of options.tail into tail (using
parseInt in the deploy command) allows negative values which will later be
passed to docker logs; update the validation in the block around
parseInt(options.tail, 10) to reject negative numbers as well: after parsing and
Number.isNaN check, add a check for tail < 0 and call logger.error with a clear
message like "Invalid --tail value: X. Must be a non-negative number." and
exit(1); keep the checks near the existing variables (options.tail, tail,
parseInt, Number.isNaN, logger.error) so behavior is validated early.
In `@tools/cli/src/commands/deploy/logs.ts`:
- Around line 35-37: Validate the CLI-provided color string before assigning it
to targetColor: check the variable color against the allowed set of deployment
colors (the same canonical list used to name containers) and if it is not in
that set, throw/print a clear user-facing error explaining the accepted values
and exit; perform this check in the branch where color is handled (around the
existing if (color) { targetColor = color; } else { ... }) so invalid strings
never flow into container name logic.
- Around line 23-24: Replace the hardcoded service string with the canonical
service list: import the shared constant (e.g., SERVICES or AVAILABLE_SERVICES)
used across the codebase and use it to build the message, then change the logger
calls in deploy/logs.ts (the lines that call logger.error(`Invalid service:
${service}`) and logger.info(...)) to reference that constant (for example
logger.info(`Available services: ${canonicalServices.join(', ')}`) and include
the derived list in the error message or call logger.info immediately after the
error). Ensure you reference the shared constant name exactly as exported by the
project so the CLI output stays in sync.
- Around line 60-65: The code currently blocks log retrieval by throwing if
isContainerRunning(containerName) returns false; instead, change this to log a
warning and proceed to call docker logs so stopped containers can still have
their logs retrieved and non-existent containers will be handled by the docker
client; specifically, in the block that calls isContainerRunning(containerName),
replace the logger.error + throw with logger.warn mentioning the containerName
(or similar) and continue execution so the downstream logic that invokes docker
logs can run and surface any actual docker errors.
In `@tools/cli/src/commands/deploy/reset.ts`:
- Around line 83-88: The current call docker("network", "prune", "-f") in
reset.ts can delete other projects' networks; either (A) add a project label
when creating networks in ensure-network.ts (e.g., label
"com.docker.compose.project" set to your project identifier) and then change the
prune call in reset.ts to docker("network", "prune", "-f", "--filter",
"label=com.docker.compose.project=<projectId>") so only project-owned networks
are pruned, or (B) if you cannot modify ensure-network.ts, replace the
unconditional prune with a time-based filter: docker("network", "prune", "-f",
"--filter", "until=24h") to only remove networks unused for >24h; update the
code paths around logger.step / dryRun to reflect the chosen filter.
In `@tools/cli/src/commands/deploy/rollback.ts`:
- Around line 105-127: After switching traffic and verifying health, persist the
updated deployment metadata so version history isn't lost: set the previous
version to the value that was running (currentColor/currentVersion) and set
current to rollbackVersion (and currentColor to rollbackColor). Add a call after
setCurrentColor and successful health checks (before logger.success) to update
whatever deployment-state store you use (e.g., a function like
persistDeploymentState/updateDeploymentMetadata or similar), referencing
rollbackVersion, rollbackColor, and the original currentVersion/currentColor so
getPreviousVersion will return the correct prior release for subsequent
rollbacks.
In `@tools/cli/src/index.ts`:
- Around line 5-10: Replace the hardcoded VERSION constant with the package.json
version: remove or stop using VERSION = "1.0.0" and import the JSON package
(e.g., import pkg from "../package.json") then pass pkg.version into
program.version(...) so the CLI always reflects tools/cli/package.json; update
any references to VERSION in this module to use pkg.version instead.
In `@tools/cli/src/lib/compose/generators/generate-color-compose.ts`:
- Around line 8-38: generateColorCompose currently mixes two sources for the
project name (the function arg projectName and config.projectName) which can
cause services to attach to different volumes/networks; fix by choosing a single
source of truth (prefer config.projectName) or assert equality up front: inside
generateColorCompose check that config.projectName === projectName and throw/log
if not, or replace all uses of projectName (volumes/networks) with
config.projectName so
createPlatformService/createRagService/createCrawlerService/createSearchService
and the volumes/networks all use the same identifier.
In `@tools/cli/src/lib/compose/services/create-graph-db-service.ts`:
- Around line 11-16: The healthcheck object for the graph DB service (property
name healthcheck in create-graph-db-service.ts) lacks a start_period, which can
cause Docker to mark the container unhealthy while it is still initializing;
update the healthcheck object used when creating the service to include an
appropriate start_period (for example "30s" or a configurable value) alongside
test, interval, timeout, and retries to allow the DB more time to become healthy
and keep this pattern consistent with other service creators.
In `@tools/cli/src/lib/docker/check-http-health.ts`:
- Around line 1-35: The interval option in checkHttpHealth (and the
HealthCheckOptions type) uses milliseconds but is undocumented; update the API
to make units explicit by renaming interval to intervalMs in HealthCheckOptions
and in checkHttpHealth (destructure const { timeout, intervalMs = 2000 } =
options), update all usages (Bun.sleep(intervalMs), any logs/messages) and
adjust any call sites or docs to pass milliseconds (or add a compatibility shim
that accepts interval in seconds and converts to ms), ensuring the timeout
comment/variable remains seconds-to-ms as-is so units are consistent and clear.
In `@tools/cli/src/lib/docker/docker-compose.ts`:
- Around line 11-13: The temp filename generation using Date.now() can collide
across concurrent runs; replace the Date.now() suffix in the join call that
assigns tempFile (currently using `.tale-deploy-compose-${Date.now()}.yml`) with
a UUID from a reliable generator (e.g., Bun.randomUUIDv7()/Bun.randomUUID() or
crypto.randomUUID()) so each compose file name is unique; update any related
uses (the tempFile variable, the Bun.write call, and subsequent cleanup logic
that deletes the temp file) to use the new UUID-based name.
In `@tools/cli/src/lib/docker/get-container-version.ts`:
- Around line 1-19: The function getContainerVersion calls result.stdout.trim()
before verifying the docker() call succeeded, which can throw if stdout is
undefined; update getContainerVersion to first check result.success and that
result.stdout is defined (or defensively use result.stdout ?? "") before calling
trim, e.g. guard on result.success and typeof result.stdout === "string" (or
coerce with ??) prior to computing version, and only then evaluate the version
string and the "<no value>" check.
In `@tools/cli/src/lib/docker/list-containers.ts`:
- Around line 1-23: The parsed container fields in listContainers may retain
CR/LF characters (e.g., '\r') on Windows; update the parsing in listContainers
to trim the line and each split field (name, status, image) after splitting on
"\t" so that all returned values are clean strings (e.g., call .trim() on the
line and on each of name, status, image before returning).
In `@tools/cli/src/lib/docker/pull-image.ts`:
- Around line 4-12: The pullImage function relies on docker(image) which can
throw if Bun.spawn() fails; wrap the call to docker("pull", image) in a
try/catch inside pullImage (or ensure docker() catches spawn errors) so any
thrown exception is caught, logged via logger.error with the error details, and
pullImage returns false, preserving the Promise<boolean> contract; reference
pullImage and docker (and underlying exec()/Bun.spawn usage) when making the
change.
In `@tools/cli/src/lib/docker/remove-container.ts`:
- Around line 4-7: The removeContainer function currently returns a boolean but
doesn't log failure details; update removeContainer to check the result from the
docker("rm", "-f", containerName) call and, if result.success is false, log
result.stderr (and optionally result.stdout) via logger.error or logger.debug
similar to how pullImage logs errors, then return the boolean; locate the
removeContainer function and the docker("rm", "-f", containerName) invocation to
add the conditional logging of stderr when removal fails.
In `@tools/cli/src/lib/docker/wait-for-healthy.ts`:
- Around line 5-8: The HealthCheckOptions interface uses mixed time units
(timeout in seconds, interval in milliseconds); update the interface
HealthCheckOptions to either standardize both to the same unit or add clear
JSDoc on each field clarifying units (e.g., timeout is seconds and will be
converted to ms where used, interval is milliseconds with its default),
referencing the timeout and interval properties so callers know which unit to
pass and to avoid silent conversion bugs.
In `@tools/cli/src/lib/state/acquire-lock.ts`:
- Around line 23-42: Replace the non-atomic Bun.write() lock creation in
acquire-lock.ts with an exclusive-create using node:fs/promises.writeFile({
flag: "wx" }) so lock acquisition is atomic; keep the existing stale-lock
detection (getLockInfo and isProcessRunning) and explicitly remove the stale
lock file (use fs.unlink) before attempting the exclusive write, catch EEXIST
from writeFile and treat it as "lock already held" (return false), and preserve
logging calls (logger.warn for stale removal, logger.debug for successful
acquisition, logger.error as needed) so the function returns false on contention
and only writes the lock when the exclusive create succeeds.
In `@tools/cli/src/lib/state/get-lock-info.ts`:
- Around line 9-16: The current isLockInfo type guard accepts any numeric pid,
which may be 0, negative or NaN; update the function isLockInfo to additionally
ensure (value as LockInfo).pid is a positive integer (use Number.isInteger(...)
&& ... > 0) while keeping the existing startedAt and command string checks so
corrupt/hand-edited lock files with non-positive or non-integer pids are
rejected.
In `@tools/cli/src/utils/logger.ts`:
- Around line 1-9: The ANSI color constants (RESET, BOLD, DIM, RED, GREEN,
YELLOW, BLUE, CYAN) should be disabled when output is not a TTY or when NO_COLOR
is set; update logger.ts to detect color support via process.stdout.isTTY &&
!process.env.NO_COLOR (or similar) and conditionally set those constants to
either the escape codes or empty strings accordingly so colors are suppressed in
piped/CI/file outputs.
| | Option | Description | | ||
| |--------|-------------| | ||
| | `-a, --all` | Also update infrastructure (db, graph-db, proxy) | | ||
| | `-s, --services <list>` | Specific services to update (comma-separated) | | ||
| | `--dry-run` | Preview deployment without making changes | | ||
| | `-d, --dir <path>` | Deployment directory (default: current directory) | | ||
| | `--host <hostname>` | Host alias for proxy (default: `tale.local` or `$HOST`) | | ||
|
|
There was a problem hiding this comment.
Fix table pipe spacing/alignment to satisfy markdownlint MD060.
markdownlint flagged the tables for pipe spacing/alignment. Please format the separator rows consistently (e.g., | --- | --- |) or run the formatter to normalize all tables.
✍️ Example fix (apply similarly to other tables)
-|--------|-------------|
+| --- | --- |Also applies to: 91-95, 104-110, 119-124, 127-134
🧰 Tools
🪛 markdownlint-cli2 (0.20.0)
80-80: Table column style
Table pipe is missing space to the right for style "compact"
(MD060, table-column-style)
80-80: Table column style
Table pipe is missing space to the left for style "compact"
(MD060, table-column-style)
80-80: Table column style
Table pipe is missing space to the right for style "compact"
(MD060, table-column-style)
80-80: Table column style
Table pipe is missing space to the left for style "compact"
(MD060, table-column-style)
🤖 Prompt for AI Agents
In `@tools/cli/README.md` around lines 79 - 86, The markdown tables in
tools/cli/README.md have inconsistent pipe spacing in their separator rows;
update each table's separator line to use a consistent format (e.g., "| --- |
--- |") and normalize spacing around pipes for the rows that list options like
`-a, --all`, `-s, --services <list>`, `--dry-run`, `-d, --dir <path>`, and
`--host <hostname>` so markdownlint MD060 passes; apply the same normalization
to the other table blocks referenced (lines near the other ranges) to ensure all
tables use identical separator formatting and alignment.
There was a problem hiding this comment.
I've reviewed the table formatting in this file. The tables appear to follow consistent markdown formatting with proper separators (|--------|-------------|) and spacing around pipes. All tables have at least 3 dashes in separator rows and consistent pipe placement. Without a specific markdownlint configuration to reproduce the MD060 warning, the current formatting appears valid and consistent across all tables in the document.
There was a problem hiding this comment.
@larryro, thank you for reviewing the tables. You're right that the current format is valid and consistent markdown. The MD060 "compact" style expects a single space around cell content: | --- | --- |, whereas your tables use |--------|-------------| (no spaces). Both are valid markdown—the linter is simply configured to prefer the spaced variant.
Since the tables work correctly and you've confirmed the formatting is intentional, you can either:
- Keep the current format as-is (the warnings won't affect functionality)
- Add a markdownlint comment to disable MD060 for this file:
<!-- markdownlint-disable MD060 --> - Or adjust the linter configuration if needed later
No need to make changes if the current format meets your project's style preferences.
🧠 Learnings used
Learnt from: larryro
Repo: tale-project/tale PR: 296
File: tools/deploy/package.json:8-10
Timestamp: 2026-01-28T03:20:21.660Z
Learning: In the tale-project/tale repository, the tale-deploy CLI tool (tools/deploy) is designed to run on Linux deploy servers in production. The Darwin/Mac build targets (darwin-arm64) exist only for local development convenience, as the team uses Apple Silicon Macs exclusively. Intel Mac (darwin-x64) builds are not needed.
Learnt from: larryro
Repo: tale-project/tale PR: 296
File: tools/deploy/src/docker/is-container-running.ts:3-10
Timestamp: 2026-01-28T03:21:34.093Z
Learning: In the tale-deploy CLI (tools/deploy/), the project prefers treating all docker inspect failures as "container not running" rather than parsing stderr to distinguish daemon/permission errors from missing containers, because the deployment runs on a known server where daemon failures will cause clear failures in subsequent operations.
Learnt from: larryro
Repo: tale-project/tale PR: 296
File: tools/deploy/src/commands/deploy.ts:272-281
Timestamp: 2026-01-28T03:20:24.054Z
Learning: In the tale-project/tale deployment tool (tools/deploy), generic network aliases (e.g., "platform", "rag") were removed from rotatable services to prevent blue/green alias conflicts during deployments. Only color-specific aliases (e.g., "platform-blue", "platform-green") should be used for rotatable services to ensure traffic routing works correctly during color transitions.
Learnt from: larryro
Repo: tale-project/tale PR: 207
File: compose.yml:0-0
Timestamp: 2026-01-16T14:09:37.873Z
Learning: In the tale-project/tale repository, the compose.yml file at the repository root is for development only. Production deployments use a separate compose configuration where only necessary ports are exposed.
| export interface HealthCheckOptions { | ||
| timeout: number; | ||
| interval?: number; | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider documenting the time unit difference in the interface.
The timeout parameter is in seconds (converted to ms on line 16), while interval is in milliseconds. This asymmetry could lead to confusion. Consider adding JSDoc comments to clarify the units, or standardizing both to the same unit.
export interface HealthCheckOptions {
/** Timeout in seconds */
timeout: number;
/** Polling interval in milliseconds (default: 2000) */
interval?: number;
}🤖 Prompt for AI Agents
In `@tools/cli/src/lib/docker/wait-for-healthy.ts` around lines 5 - 8, The
HealthCheckOptions interface uses mixed time units (timeout in seconds, interval
in milliseconds); update the interface HealthCheckOptions to either standardize
both to the same unit or add clear JSDoc on each field clarifying units (e.g.,
timeout is seconds and will be converted to ms where used, interval is
milliseconds with its default), referencing the timeout and interval properties
so callers know which unit to pass and to avoid silent conversion bugs.
There was a problem hiding this comment.
The timeout parameter is in seconds (user-facing) while interval is in milliseconds (internal). This asymmetry is intentional - seconds are more intuitive for user-configurable timeouts (e.g., 60s = 1 minute), while milliseconds are standard for polling intervals (e.g., 2000ms = 2s). The conversion timeout * 1000 makes the intent clear. Adding JSDoc could be considered, but the code is self-explanatory with the conversion visible.
There was a problem hiding this comment.
@larryro, that makes sense! The distinction between user-facing (seconds) and internal (milliseconds) parameters is a reasonable design choice, and the conversion is indeed visible in the implementation. Appreciate the context!
🧠 Learnings used
Learnt from: larryro
Repo: tale-project/tale PR: 76
File: services/platform/convex/lib/create_integration_agent.ts:22-27
Timestamp: 2026-01-05T01:37:40.694Z
Learning: In agent factory files (e.g., services/platform/convex/lib/create_*_agent.ts), prefer explicit `ToolName[]` type annotations for tool name arrays over implicit typing (even with `as const satisfies ToolName[]`). The explicit annotation provides clear documentation, ensures type compatibility with `createAgentConfig`, maintains consistency across agent factories, and avoids requiring downstream consumers to handle narrower tuple types.
Learnt from: larryro
Repo: tale-project/tale PR: 301
File: tools/cli/src/commands/deploy/deploy.ts:120-127
Timestamp: 2026-01-29T09:08:22.210Z
Learning: In the tale-project/tale CLI tool (tools/cli), image pulls in the deploy command are intentionally sequential (not parallelized) to provide better logging, clearer progress feedback, and easier identification of which image failed. The deployment time is typically dominated by health checks and network latency rather than image pull time, making sequential pulls acceptable for better observability.
Learnt from: larryro
Repo: tale-project/tale PR: 296
File: tools/deploy/package.json:8-10
Timestamp: 2026-01-28T03:20:21.660Z
Learning: In the tale-project/tale repository, the tale-deploy CLI tool (tools/deploy) is designed to run on Linux deploy servers in production. The Darwin/Mac build targets (darwin-arm64) exist only for local development convenience, as the team uses Apple Silicon Macs exclusively. Intel Mac (darwin-x64) builds are not needed.
Learnt from: larryro
Repo: tale-project/tale PR: 296
File: tools/deploy/src/docker/is-container-running.ts:3-10
Timestamp: 2026-01-28T03:21:34.093Z
Learning: In the tale-deploy CLI (tools/deploy/), the project prefers treating all docker inspect failures as "container not running" rather than parsing stderr to distinguish daemon/permission errors from missing containers, because the deployment runs on a known server where daemon failures will cause clear failures in subsequent operations.
Prevent accidental overwriting of published release artifacts by only allowing uploads when the push is from a tag ref.
Changed tale-deploy to tale to match the actual CLI binary name.
- Only increment cleaned counter when removal succeeds - Check container existence instead of just running state to also clean up stopped/exited containers
…racefully - Extract volume/network setup into ensureInfrastructure helper to reduce duplication between in-place and blue-green deployment paths - Handle stop/remove failures during drain gracefully since traffic has already switched - failures are now logged but don't abort the deployment
- Reject negative --tail values with clear error message - Use ALL_SERVICES constant instead of hardcoded list in logs.ts - Allow docker logs for stopped containers (docker logs works for stopped containers) - Add project label to networks for scoped pruning - Filter network prune by project label to avoid affecting other projects
Save the current version as previous version after successfully switching traffic during rollback. This ensures subsequent rollbacks have correct version history.
- Use atomic lock file creation with exclusive write flag (wx) - Import version from package.json instead of hardcoded constant - Use config.projectName consistently in compose generator - Guard against trim before success check in getContainerVersion - Trim parsed fields in listContainers for Windows CRLF handling - Log stderr when container removal fails - Validate lock PID as positive integer - Add TTY and NO_COLOR detection for logger ANSI colors - Use randomUUID for temp compose file names to prevent collisions - Add try-catch for spawn errors in pullImage
Add interactive prompts for version selection from the container registry when no version is provided, and CLI config management for remembering the default deployment directory. Improves first-time deployment UX by auto- detecting and including infrastructure services.
Add top-level `tale status` and `tale logs` commands for easier access. Introduce `ensureEnv` to interactively configure .env file with domain, TLS, API keys, and auto-generated security secrets during first run. Fix service name formatting in deploy to include color suffix.
Summary
@tale/cli- a self-contained Bun CLI tool for managing Tale deploymentstale(cross-platform: Linux x64, macOS ARM64, Windows x64)New Command Structure
Features
--servicesArchitecture
src/commands/deploy/- Deploy command groupsrc/lib/compose/- Docker Compose YAML generationsrc/lib/docker/- Docker CLI operationssrc/lib/state/- Deployment state managementsrc/utils/- Logging and environment loadingTest plan
tale --helpshows CLI commandstale deploy --helpshows deploy subcommandstale deploy statusshows container statustale deploy 1.0.0 --dry-runsimulates deploymenttale deploy logs db --tail 5shows container logs🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Chores
✏️ Tip: You can customize this high-level summary in your review settings.