Skip to content

feat(platform): env-var source for provider API keys (#1711)#1832

Merged
larryro merged 7 commits into
mainfrom
feat/provider-secrets-env
Jun 5, 2026
Merged

feat(platform): env-var source for provider API keys (#1711)#1832
larryro merged 7 commits into
mainfrom
feat/provider-secrets-env

Conversation

@larryro

@larryro larryro commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

Closes #1711.

What

Adds an opt-in secretsEnv field at provider and model level in the public <provider>.json config, so operators whose secrets live in Kubernetes Secrets / Vault / cloud secret managers can inject an LLM provider API key as an environment variable instead of round-tripping it through a *.secrets.json file.

The env-var name is not a secret, so it lives in the committable config (survives tale deploy --override, which leaves *.secrets.json untouched).

Two of the original blockers in the issue already resolved on main: SOPS is optional now (#1672), and the entrypoint syncs all platform env vars to Convex (so process.env.OPENROUTER_API_KEY already reaches Node actions). The "dependent change" to ENV_VARS_TO_SYNC is therefore not needed.

Resolution order

model.secretsEnvprovider.secretsEnv → file modelKeys[id] → file apiKey. Each tier is skipped when it yields nothing, so a configured-but-empty variable falls back to the file.

Strictly additive: env is consulted only when secretsEnv is set, so existing file-only deployments are byte-for-byte unchanged. Env values are trimmed (trailing-newline footgun); file values pass through raw (consistent with the Python loader).

Security — required allowlist

Making env-only providers reachable is a net-new exfiltration primitive: a config-write actor could point secretsEnv at a deployment secret (SOPS_AGE_KEY, BETTER_AUTH_SECRET, …) — all synced into the Node action's process.env — and have it sent as a bearer token to a provider baseUrl (the chat path does not re-check host policy).

Env reads are therefore gated by a required operator allowlist TALE_PROVIDER_SECRET_ENV_ALLOWLIST (comma-separated, empty = locked), enforced identically in TS and Python. Names are capped at 40 chars to match the platform→Convex env-name sync limit.

Recommended follow-up (out of scope here): also call checkProviderHostPolicy(baseUrl) at the chat resolution sites to close a pre-existing file-key gap. The allowlist already neutralizes the env-specific primitive.

Changes

  • secret_resolver.ts (new, pure): allowlist-gated envSecret / resolveApiKey / providerHasEnvKey / envSecretStatus + warn-once on configured-but-unresolved.
  • file_actions.ts: loadAllProviders keeps env-only providers (secrets: null); a shared resolveModelApiKey at the 3 resolution sites throws a failover-eligible MissingApiKeyError; the two direct-read paths log genuine decrypt failures instead of swallowing them; listProviders folds provider-level env into hasApiKey and model-level env into per-model hasApiKeyOverride (a keyless sibling model is never falsely unblocked); readProvider returns a tri-state envSecretStatus.
  • Python providers.py: secrets_env on both dataclasses; _resolve_api_key (3-tier, returns "" not None); all three accessors use it.
  • Settings UI: provider-level and per-model secretsEnv fields (wired through form / hydration / submit so editing a model never wipes the binding) + an env-source indicator (Set / Not set / Not allowlisted) in the API Key section.
  • i18n (en/de/fr; de-CH intentionally skipped), docs (providers.md incl. the <name>.config.json<name>.json filename fix, secrets-with-sops.md, environment-reference.md), and .env.example.

Provenance

This PR was shaped by a two-round, 15-lens adversarial review of the plan before any code was written; the security allowlist, the composer-gate fix, the per-model hydration fix, the 40-char cap, and the docs filename fix all came out of that review.

Tests

  • TS: secret_resolver.test.ts (allowlist gate, precedence, trim semantics, status), schema secretsEnv cases, MissingApiKeyError failover classification.
  • Python: test_providers.py (env-only, allowlist locked, env-overrides-file, empty-env fallback, model-env precedence, trim, empty-key contract).
  • Verified locally: tsc ✓, oxlint ✓, format:check (incl. ruff) ✓, i18n parity ✓, 387 provider-area vitest ✓, 8 pytest ✓, 139 docs tests ✓.

Summary by CodeRabbit

  • New Features

    • Providers can now source API keys from environment variables via new secretsEnv configuration, controlled by the TALE_PROVIDER_SECRET_ENV_ALLOWLIST environment variable allowlist.
    • Settings UI updated to display and configure environment variable-based API key sources for providers and individual models.
  • Documentation

    • Updated configuration guides for environment variable API key sourcing (English, German, French).
    • Provider configuration file naming updated from <name>.config.json to <name>.json.

@coderabbitai

coderabbitai Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This pull request adds environment-variable allowlist support for provider API keys, addressing the need to source secrets from external systems without writing cleartext files. The implementation includes: a schema contract with secretsEnv fields at provider and model levels validated against a 40-character identifier pattern; allowlist-driven resolution in TypeScript (secret_resolver.ts) and Python (providers.py) with precedence order (model env → provider env → file keys); schema validation with corresponding tests; updated file actions to treat secrets as nullable and apply defensive parsing for env-only providers; a new MissingApiKeyError for failover-eligible scenarios; frontend UI components for editing and displaying env-var status; localization strings in three languages; and comprehensive documentation across configuration guides explaining the allowlist, file format changes (*.config.json*.json), and process-start vs. hot-reload behavior.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • tale-project/tale#1732: Restructures the provider settings UI in provider-detail-drawer.tsx, overlapping with this PR's changes to the same component's API key section rendering and model form state.
  • tale-project/tale#1729: Updates provider API-key resolution and fetchConfiguredProviderModels flow in file_actions.ts, directly related to this PR's refactoring of those same functions for env-key support.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 65.52% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main feature: adding environment variable sourcing for provider API keys, with the issue number (#1711) for traceability.
Description check ✅ Passed The PR description comprehensively covers the what, why, security considerations, resolution order, changes, and test verification. It addresses all key aspects expected in a description template for a complex feature.
Linked Issues check ✅ Passed The PR fulfills issue #1711's core requirements: opt-in env-var fallback for provider API keys with a configurable allowlist, supports Kubernetes/Vault/cloud secret managers, maintains SOPS as primary, and implements resolution order (model env → provider env → file).
Out of Scope Changes check ✅ Passed All changes are directly aligned with the feature scope: env-var secret resolution, allowlist enforcement, settings UI updates, i18n, documentation, tests, and schema updates. No unrelated refactoring or scope creep detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/provider-secrets-env

Warning

Billing warning: we have not been able to collect payment for this subscription for more than 72 hours. Please update the payment method or pay any pending invoices in Billing to avoid service interruption.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/tale_shared/src/tale_shared/config/providers.py (1)

166-175: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate secretsEnv against the shared provider-config contract.

These assignments accept any JSON string verbatim, but this PR's contract limits secretsEnv to a 40-character identifier. A hand-edited <provider>.json can therefore be rejected by Convex and still be accepted by the Python services, so the same config file resolves differently across runtimes.

Suggested fix
+import re
@@
 _ALLOWLIST_ENV = "TALE_PROVIDER_SECRET_ENV_ALLOWLIST"
+_SECRETS_ENV_RE = re.compile(r"^[A-Z][A-Z0-9_]{0,39}$")
+
+
+def _parse_secrets_env(value: object) -> str | None:
+    if value is None:
+        return None
+    if isinstance(value, str) and _SECRETS_ENV_RE.fullmatch(value):
+        return value
+    logger.warning("Ignoring invalid secretsEnv value: %r", value)
+    return None
@@
                     description=m.get("description", ""),
                     dimensions=m.get("dimensions"),
-                    secrets_env=m.get("secretsEnv"),
+                    secrets_env=_parse_secrets_env(m.get("secretsEnv")),
                 )
             )
@@
                 supports_structured_outputs=data.get("supportsStructuredOutputs", False),
                 api_key=api_key,
                 defaults=defaults,
-                secrets_env=data.get("secretsEnv"),
+                secrets_env=_parse_secrets_env(data.get("secretsEnv")),
             )
         )

Also applies to: 190-200

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/tale_shared/src/tale_shared/config/providers.py` around lines 166 -
175, The ModelConfig construction currently accepts any JSON value for
secretsEnv; validate this value before assigning it to ModelConfig.secrets_env:
if m.get("secretsEnv") is present ensure it is a string of exactly 40 characters
and matches the allowed identifier pattern (e.g. only alphanumeric plus -/_ as
your contract requires), otherwise raise a ValueError (or skip/transform per
project convention). Update the same validation in the other model-parsing block
referenced (the code around the other ModelConfig creation at lines ~190-200) so
both paths enforce the shared provider-config contract; refer to ModelConfig and
the secrets_env/secretsEnv keys to find and change the two locations.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/tale_shared/src/tale_shared/config/providers.py`:
- Around line 54-80: _env_secret currently logs a warning every time a
secretsEnv lookup is unresolved which floods logs; change it to warn only once
per unique (name, reason) combination: add a module-level tracking set (e.g.
_env_secret_warnings: set[tuple[str, str]] = set()) and, inside _env_secret,
before each logger.warning call (the "not in allowlist" warning and the
"empty/unset" warning), check if (name, "not_allowlisted") or (name, "empty") is
in that set and only call logger.warning and then add the tuple if it wasn't
present; keep the rest of _env_secret logic unchanged so
get_chat_model/get_embedding_model/get_vision_model get the same fallback
behavior but without per-request log spam.

In `@services/platform/convex/providers/file_actions.ts`:
- Around line 1949-1969: The probe loop currently resolves per-model credentials
but always targets config.baseUrl; update the probe dispatch to honor the
per-model endpoint by using the model-level override when present (i.e., use
model.baseUrl ?? config.baseUrl) wherever probes are constructed or invoked (the
block iterating config.models that calls resolveApiKey and subsequently
dispatches probes), ensuring consistency with resolveModelData/resolveModelByTag
which already use definition.baseUrl ?? provider.config.baseUrl; adjust any
probe helper calls and the endpoint passed to them so each model is probed
against its effective baseUrl.
- Around line 291-304: The console.warn in the env-fallback branch currently
logs raw decrypt/parse error text via the local reason variable which may leak
secrets; update the branch around providerHasEnvKey(result.data) so it does not
emit the raw reason — either call the existing sanitization helper used
elsewhere in this file (e.g., mask/sanitize function used for decrypt/parse
errors) and log that sanitizedReason, or remove reason from the warn and log
only a generic message referencing providerName and that the secrets file could
not be read; update the console.warn invocation and any local variables (reason)
accordingly to ensure no secret fragments are written to logs.

In `@services/platform/lib/shared/schemas/providers.ts`:
- Around line 488-502: The EnvSecretStatus interface lacks runtime validation;
define a Zod schema (e.g., envSecretStatusSchema) that matches EnvSecretStatus
(fields: name?: string, allowlisted: boolean, resolved: boolean), export it
alongside the interface, and then validate backend action responses in the
client code (e.g., in queries.ts) using envSecretStatusSchema.safeParse() and
handle parse failures (throw/log or return a safe fallback) before casting/using
the data.

---

Outside diff comments:
In `@packages/tale_shared/src/tale_shared/config/providers.py`:
- Around line 166-175: The ModelConfig construction currently accepts any JSON
value for secretsEnv; validate this value before assigning it to
ModelConfig.secrets_env: if m.get("secretsEnv") is present ensure it is a string
of exactly 40 characters and matches the allowed identifier pattern (e.g. only
alphanumeric plus -/_ as your contract requires), otherwise raise a ValueError
(or skip/transform per project convention). Update the same validation in the
other model-parsing block referenced (the code around the other ModelConfig
creation at lines ~190-200) so both paths enforce the shared provider-config
contract; refer to ModelConfig and the secrets_env/secretsEnv keys to find and
change the two locations.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 701c7655-0a26-47a1-a7a0-3e6df21c8af5

📥 Commits

Reviewing files that changed from the base of the PR and between 296841c and 6110776.

📒 Files selected for processing (26)
  • .env.example
  • docs/de/self-hosted/configuration/environment-reference.md
  • docs/de/self-hosted/configuration/providers.md
  • docs/de/self-hosted/configuration/secrets-with-sops.md
  • docs/en/self-hosted/configuration/environment-reference.md
  • docs/en/self-hosted/configuration/providers.md
  • docs/en/self-hosted/configuration/secrets-with-sops.md
  • docs/fr/self-hosted/configuration/environment-reference.md
  • docs/fr/self-hosted/configuration/providers.md
  • docs/fr/self-hosted/configuration/secrets-with-sops.md
  • packages/tale_shared/src/tale_shared/config/providers.py
  • packages/tale_shared/tests/test_providers.py
  • services/docs/app/content/frontmatter.json
  • services/platform/app/features/settings/providers/components/provider-detail-drawer.tsx
  • services/platform/app/features/settings/providers/components/provider-edit-panel.tsx
  • services/platform/app/features/settings/providers/hooks/queries.ts
  • services/platform/convex/providers/errors.test.ts
  • services/platform/convex/providers/errors.ts
  • services/platform/convex/providers/file_actions.ts
  • services/platform/convex/providers/secret_resolver.test.ts
  • services/platform/convex/providers/secret_resolver.ts
  • services/platform/lib/shared/schemas/providers.test.ts
  • services/platform/lib/shared/schemas/providers.ts
  • services/platform/messages/de.json
  • services/platform/messages/en.json
  • services/platform/messages/fr.json

Comment on lines +54 to +80
def _env_secret(name: str | None) -> str | None:
"""Resolve an env-var name to its trimmed value, honoring the allowlist.

Returns None when the name is missing, not allowlisted, or the env var is
empty/whitespace. Trailing-newline normalization (a common Vault/k8s
injection footgun) is applied to env values here.
"""
if not name:
return None
raw = os.environ.get(_ALLOWLIST_ENV, "")
allowlist = {n.strip() for n in raw.split(",") if n.strip()}
if name not in allowlist:
logger.warning(
"secretsEnv %r is not in %s (empty allowlist disables the env key "
"source) — falling back to the secrets file",
name,
_ALLOWLIST_ENV,
)
return None
value = os.environ.get(name, "").strip()
if not value:
logger.warning(
"secretsEnv %r is set but the env var is empty/unset — falling back to the secrets file",
name,
)
return None
return value

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Don't warn on every unresolved secretsEnv lookup.

get_chat_model(), get_embedding_model(), and get_vision_model() call through this resolver on runtime paths, so a configured-but-empty or non-allowlisted env var will emit one warning per request/job even when the file fallback is working. The TypeScript side already uses warn-once behavior; the Python side should do the same to avoid log flooding.

Suggested fix
 _ALLOWLIST_ENV = "TALE_PROVIDER_SECRET_ENV_ALLOWLIST"
+_WARNED_UNRESOLVED: set[tuple[str, str]] = set()
@@
 def _env_secret(name: str | None) -> str | None:
@@
     if name not in allowlist:
-        logger.warning(
-            "secretsEnv %r is not in %s (empty allowlist disables the env key "
-            "source) — falling back to the secrets file",
-            name,
-            _ALLOWLIST_ENV,
-        )
+        warning_key = (name, "not_allowlisted")
+        if warning_key not in _WARNED_UNRESOLVED:
+            _WARNED_UNRESOLVED.add(warning_key)
+            logger.warning(
+                "secretsEnv %r is not in %s (empty allowlist disables the env key "
+                "source) — falling back to the secrets file",
+                name,
+                _ALLOWLIST_ENV,
+            )
         return None
     value = os.environ.get(name, "").strip()
     if not value:
-        logger.warning(
-            "secretsEnv %r is set but the env var is empty/unset — falling back to the secrets file",
-            name,
-        )
+        warning_key = (name, "empty_or_unset")
+        if warning_key not in _WARNED_UNRESOLVED:
+            _WARNED_UNRESOLVED.add(warning_key)
+            logger.warning(
+                "secretsEnv %r is set but the env var is empty/unset — falling back to the secrets file",
+                name,
+            )
         return None
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/tale_shared/src/tale_shared/config/providers.py` around lines 54 -
80, _env_secret currently logs a warning every time a secretsEnv lookup is
unresolved which floods logs; change it to warn only once per unique (name,
reason) combination: add a module-level tracking set (e.g. _env_secret_warnings:
set[tuple[str, str]] = set()) and, inside _env_secret, before each
logger.warning call (the "not in allowlist" warning and the "empty/unset"
warning), check if (name, "not_allowlisted") or (name, "empty") is in that set
and only call logger.warning and then add the tuple if it wasn't present; keep
the rest of _env_secret logic unchanged so
get_chat_model/get_embedding_model/get_vision_model get the same fallback
behavior but without per-request log spam.

Comment on lines 291 to +304
} catch (err) {
const reason = err instanceof Error ? err.message : String(err);
// ENOENT on the secrets file means the operator has a provider
// config but no API key yet — the common "I just created an org"
// case. Classify so the UI can point at Settings → Providers.
// No usable secrets file. Before skipping, check whether the config can
// resolve a key from the environment (issue #1711): provider-level
// `secretsEnv` or any model-level `secretsEnv`, gated by the operator
// allowlist. If so, keep the provider with `secrets: null` — the
// per-resolution `resolveApiKey` reads the env value. Otherwise skip as
// before. ENOENT (the common "config but no key yet" case) drives the
// UI's Settings → Providers hint.
if (providerHasEnvKey(result.data)) {
console.warn(
`Provider "${providerName}": no secrets file, using env key source.`,
reason,
);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Sanitize secrets-read failures before logging them.

This new env-fallback branch still logs the raw decrypt/parse error text. Those failures are already sanitized elsewhere in this file because they can carry secret material, so emitting reason here can leak key fragments into server logs.

Suggested fix
     } catch (err) {
       const reason = err instanceof Error ? err.message : String(err);
+      const safeReason = sanitizeError(err);
       // No usable secrets file. Before skipping, check whether the config can
       // resolve a key from the environment (issue `#1711`): provider-level
       // `secretsEnv` or any model-level `secretsEnv`, gated by the operator
@@
       if (providerHasEnvKey(result.data)) {
         console.warn(
           `Provider "${providerName}": no secrets file, using env key source.`,
-          reason,
+          safeReason,
         );
         providers.push({
           name: providerName,
           config: result.data,
           secrets: null,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/platform/convex/providers/file_actions.ts` around lines 291 - 304,
The console.warn in the env-fallback branch currently logs raw decrypt/parse
error text via the local reason variable which may leak secrets; update the
branch around providerHasEnvKey(result.data) so it does not emit the raw reason
— either call the existing sanitization helper used elsewhere in this file
(e.g., mask/sanitize function used for decrypt/parse errors) and log that
sanitizedReason, or remove reason from the warn and log only a generic message
referencing providerName and that the secrets file could not be read; update the
console.warn invocation and any local variables (reason) accordingly to ensure
no secret fragments are written to logs.

Comment on lines 1949 to +1969
for (const model of config.models) {
const apiKey = secrets.modelKeys?.[model.id] ?? secrets.apiKey;
// Resolve the per-model key (env source preferred). On nothing resolved,
// skip this model with a clear reason BEFORE any probe dispatch — keeps
// empty keys out of the probe helpers and the apiKey-keyed listingCache,
// and never throws the whole action for a partially-configured provider.
const apiKey = resolveApiKey({
modelSecretsEnv: model.secretsEnv,
providerSecretsEnv: config.secretsEnv,
fileModelKey: secrets?.modelKeys?.[model.id],
fileApiKey: secrets?.apiKey,
});
if (!apiKey) {
skipped.push({
modelId: model.id,
reason:
model.secretsEnv || config.secretsEnv
? 'No API key resolved (secretsEnv unset/empty or not allowlisted, and no file key)'
: 'No API key configured',
});
continue;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Honor model.baseUrl when probing each model.

This loop now resolves credentials per model, but every probe still targets config.baseUrl. Models with a baseUrl override will be tested against the wrong endpoint, even though runtime resolution in resolveModelData / resolveModelByTag uses definition.baseUrl ?? provider.config.baseUrl.

Suggested fix
     for (const model of config.models) {
+      const baseUrl = model.baseUrl ?? config.baseUrl;
+      checkProviderHostPolicy(baseUrl);
+
       // Resolve the per-model key (env source preferred). On nothing resolved,
       // skip this model with a clear reason BEFORE any probe dispatch — keeps
       // empty keys out of the probe helpers and the apiKey-keyed listingCache,
       // and never throws the whole action for a partially-configured provider.
@@
       if (isChat) {
         probes.push(
           runProbe(
-            buildProbeUrl(config.baseUrl, 'chat/completions'),
+            buildProbeUrl(baseUrl, 'chat/completions'),
             apiKey,
             {
               ...mergedProviderOptions,
@@
       } else if (isEmbedding) {
         probes.push(
           runProbe(
-            buildProbeUrl(config.baseUrl, 'embeddings'),
+            buildProbeUrl(baseUrl, 'embeddings'),
             apiKey,
             { ...mergedProviderOptions, model: model.id, input: 'hi' },
             model.id,
@@
       } else if (isTranscription) {
-        probes.push(runTranscriptionProbe(config.baseUrl, apiKey, model.id));
+        probes.push(runTranscriptionProbe(baseUrl, apiKey, model.id));
       } else if (isTextToSpeech) {
@@
           probes.push(
             runTtsProbe(
-              config.baseUrl,
+              baseUrl,
               apiKey,
               model.id,
               probeVoice,
@@
       } else if (isImageGeneration) {
@@
         probes.push(
-          runImageListingProbe(config.baseUrl, apiKey, model.id, listingCache),
+          runImageListingProbe(baseUrl, apiKey, model.id, listingCache),
         );

Also applies to: 1987-2057

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/platform/convex/providers/file_actions.ts` around lines 1949 - 1969,
The probe loop currently resolves per-model credentials but always targets
config.baseUrl; update the probe dispatch to honor the per-model endpoint by
using the model-level override when present (i.e., use model.baseUrl ??
config.baseUrl) wherever probes are constructed or invoked (the block iterating
config.models that calls resolveApiKey and subsequently dispatches probes),
ensuring consistency with resolveModelData/resolveModelByTag which already use
definition.baseUrl ?? provider.config.baseUrl; adjust any probe helper calls and
the endpoint passed to them so each model is probed against its effective
baseUrl.

Comment on lines +488 to +502

/**
* Resolution status of a `secretsEnv` name for the settings UI (issue #1711).
* Carries no key material — only whether the name is configured, allowlisted,
* and currently resolves. Single canonical shape shared by the server
* (`readProvider` / `secret_resolver.ts`) and the provider settings UI.
*/
export interface EnvSecretStatus {
/** The configured env-var name, if any. */
name?: string;
/** Whether `name` appears in `TALE_PROVIDER_SECRET_ENV_ALLOWLIST`. */
allowlisted: boolean;
/** Whether the env var currently resolves to a non-empty value. */
resolved: boolean;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | 💤 Low value

Consider using a Zod schema for runtime validation.

EnvSecretStatus is defined as a plain TypeScript interface rather than a Zod schema. Since this is consumed from the backend action response (which declares v.any()), there's no runtime validation of the shape. While acceptable for display-only data, a Zod schema would provide runtime type safety.

♻️ Optional: Define as Zod schema
-export interface EnvSecretStatus {
-  /** The configured env-var name, if any. */
-  name?: string;
-  /** Whether `name` appears in `TALE_PROVIDER_SECRET_ENV_ALLOWLIST`. */
-  allowlisted: boolean;
-  /** Whether the env var currently resolves to a non-empty value. */
-  resolved: boolean;
-}
+export const envSecretStatusSchema = z.object({
+  /** The configured env-var name, if any. */
+  name: z.string().optional(),
+  /** Whether `name` appears in `TALE_PROVIDER_SECRET_ENV_ALLOWLIST`. */
+  allowlisted: z.boolean(),
+  /** Whether the env var currently resolves to a non-empty value. */
+  resolved: z.boolean(),
+});
+
+export type EnvSecretStatus = z.infer<typeof envSecretStatusSchema>;

Then validate in queries.ts with envSecretStatusSchema.safeParse() when casting the action response.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/platform/lib/shared/schemas/providers.ts` around lines 488 - 502,
The EnvSecretStatus interface lacks runtime validation; define a Zod schema
(e.g., envSecretStatusSchema) that matches EnvSecretStatus (fields: name?:
string, allowlisted: boolean, resolved: boolean), export it alongside the
interface, and then validate backend action responses in the client code (e.g.,
in queries.ts) using envSecretStatusSchema.safeParse() and handle parse failures
(throw/log or return a safe fallback) before casting/using the data.

larryro added 4 commits June 5, 2026 15:51
Add an opt-in `secretsEnv` field at provider and model level in the public
provider config, so operators with Kubernetes Secrets / Vault / cloud secret
managers can inject an LLM provider API key as an environment variable instead
of round-tripping it through a *.secrets.json file.

Resolution order (each tier tested for a non-empty value, else falls through):
model `secretsEnv` -> provider `secretsEnv` -> file `modelKeys[id]` -> file
`apiKey`. Strictly additive: env is consulted only when `secretsEnv` is set, so
existing file-only deployments are unchanged. Env values are trimmed; file
values pass through raw (byte-identical to before, and consistent with the
Python loader).

Security: env reads are gated by a required operator allowlist
`TALE_PROVIDER_SECRET_ENV_ALLOWLIST` (comma-separated, empty = locked),
enforced identically in TS and Python. Without it, making env-only providers
reachable would let a config-write actor point `secretsEnv` at a deployment
secret (SOPS_AGE_KEY, BETTER_AUTH_SECRET, ...) and exfiltrate it as a bearer
token via a provider base URL. Names are capped at 40 chars to match the
platform->Convex env-name sync limit.

- secret_resolver.ts: pure, allowlist-gated envSecret/resolveApiKey/
  providerHasEnvKey/envSecretStatus + warn-once on configured-but-unresolved.
- file_actions.ts: loadAllProviders keeps env-only providers (secrets: null);
  shared resolveModelApiKey throws a failover-eligible MissingApiKeyError; the
  two direct-read paths log real decrypt failures instead of swallowing them;
  listProviders folds provider-level env into hasApiKey and model-level env into
  per-model hasApiKeyOverride (so a keyless sibling model is never falsely
  unblocked); readProvider returns a tri-state envSecretStatus.
- Python providers.py: secrets_env on both dataclasses; _resolve_api_key (3-tier,
  returns "" not None); all three accessors use it.
- Settings UI: provider-level and per-model secretsEnv fields (wired through
  form/hydration/submit so edits never wipe the binding) + an env-source
  indicator in the API Key section.
- i18n (en/de/fr), docs (providers.md incl. the <name>.config.json -> <name>.json
  filename fix, secrets-with-sops.md, environment-reference.md), and .env.example.
…t allowlist (#1711)

Replace the operator-managed `TALE_PROVIDER_SECRET_ENV_ALLOWLIST` (empty =
locked) with a hardcoded reserved prefix `TALE_PROVIDER_KEY_`. A `secretsEnv`
name must carry the prefix; any other name is rejected fail-closed, so it can
never point at a deployment secret (SOPS_AGE_KEY, BETTER_AUTH_SECRET, ...) and
have it sent as a bearer token to a provider's baseUrl. This drops the
deployment-config step entirely — operators just name their key vars under the
namespace.

- schemas/providers.ts: add SECRETS_ENV_PREFIX / SECRETS_ENV_REGEX, tighten
  secretsEnv validation; rename EnvSecretStatus.allowlisted -> allowed
- secret_resolver.ts: gate envSecret/envSecretStatus on the prefix; keep all
  three derivations (TS resolver, zod schema, Python loader) in sync
- tale_shared/config/providers.py: prefix check replaces allowlist parsing
- errors.ts / file_actions.ts: update unresolved-key messaging
- provider settings UI: move the secretsEnv editor out of the edit panel into
  the API Key dialog as a file/env key-source chooser; badge "Not allowlisted"
  -> "Invalid prefix"
- messages (en/de/fr) + self-hosted docs (en/de/fr): document the prefix
- add packages/tale_shared/uv.lock
Declare secretsEnv: TALE_PROVIDER_KEY_OPENROUTER on the default
OpenRouter example provider so its API key can be supplied via the
reserved-prefix deployment env var from #1711 instead of a SOPS
secrets file.
@larryro larryro force-pushed the feat/provider-secrets-env branch from e2648b8 to e3364ec Compare June 5, 2026 07:51
larryro added 3 commits June 5, 2026 16:22
…ned API-key i18n keys

The prefix-gating refactor (5d8749b) tightened the secretsEnv schema to
require the reserved TALE_PROVIDER_KEY_ prefix and made the provider API-key
dialog title generic, but missed two follow-ups that broke the Unit job:

- providers.test.ts still asserted pre-prefix names (OPENROUTER_API_KEY,
  MODEL_KEY, _KEY) were accepted. Rewrite the secretsEnv block to validate
  prefix gating: accept prefixed names, reject unprefixed / bare-prefix /
  bad-char names, and exercise the 40-char cap at both boundaries.
- settings.providers.addApiKey / replaceApiKey are no longer referenced by
  source (the dialog now uses providers.apiKey). Remove them from en/de/fr to
  clear the orphan-key i18n check.
…failover, env-key UX)

Two-round review of feat/provider-secrets-env surfaced correctness fixes that
survived adversarial verification. Addresses the High + Medium issues plus the
genuine Low defects (pre-existing ones folded in):

- cascade (High): org deletion ran four ~6K delete sweeps in one mutation,
  which could cross the per-mutation write budget on a large org, throw, and
  leave the org permanently un-deletable (GDPR Art 17). cascadeOnOrgDeleted now
  schedules a self-rescheduling drain that erases at most one table per pass
  (6000 rows for pure-DB tables, 3000 for storage-bearing ones), guaranteeing
  completion. Removes the now-dead memories+prefs-only continuation.

- providers (Medium): resolveModelByTag threw on the first keyless tag-match
  instead of trying a sibling model that holds the env key keeping the provider
  loaded — terminal on the no-failover transcription/TTS paths. It now skips
  keyless candidates and throws the failover-eligible MissingApiKeyError only
  when none resolve. Extracted as a pure, tested selectModelByTag.

- providers UI (Medium): gate the model dialog Save on secretsEnv validity,
  mirroring the provider dialog, so an invalid env-var name can't reach the
  server and fail with an opaque toast.

- notifications (Low): unreadCount now enforces org membership like its
  list/markRead siblings — it previously leaked an arbitrary org's count.

- tale_shared (Low): _env_secret tolerates a non-string secretsEnv from a
  hand-edited config, degrading to the file key instead of raising.

- cascade test (Low): fake ctx keys rows by (table, index) so the prefs delete
  path is actually exercised; adds one-table-per-pass / cap / ordering /
  termination coverage.

- docs (Low): correct the stale `<name>.config.json` provider filename in the
  en/de/fr overview pages.
…ret UI

The API Key dialog presented the env-var override and stored file secret as a
mutually-exclusive radio, but the resolver treats them as layers (env shadows
file; file survives as the fallback). That mismatch silently ignored a file
key entered while an env var resolved — with a false success toast — and hid a
shadowed key that then resurfaced on env-clear.

Drop the radio: show an effective-source banner (new pure helper
computeEffectiveKeyState, mirroring the resolver precedence) plus two
independent rows — stored key (replace) and env override (set/clear). The
read-only card now always surfaces a stored key, tagged as a fallback when an
env var shadows it. UI + i18n only; no backend change. Removing only the stored
key is deferred (no clearProviderSecret action yet).

Remove orphaned keySource*/replaceApiKeyDescription i18n keys; add
effectiveSource*/storedKey*/setEnvVar across en/de/fr.
@larryro larryro merged commit 34b7f39 into main Jun 5, 2026
30 checks passed
@larryro larryro deleted the feat/provider-secrets-env branch June 5, 2026 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: [Nuvolos, based on release 0.2.67] env-var fallback for provider API keys when SOPS / *.secrets.json is unavailable

1 participant