Skip to content

fix: end-to-end credential flow fixes#1208

Merged
markturansky merged 7 commits intoalphafrom
fix/credential-flow-fixes
Apr 4, 2026
Merged

fix: end-to-end credential flow fixes#1208
markturansky merged 7 commits intoalphafrom
fix/credential-flow-fixes

Conversation

@markturansky
Copy link
Copy Markdown
Contributor

@markturansky markturansky commented Apr 4, 2026

Summary

Six commits fixing the full credential flow from control-plane provisioning through runner token auth and session event streaming.

< /dev/null | Commit | Component | Fix |
|--------|-----------|-----|
| 085028b7 | control-plane | Credential rolebinding and project delete |
| a4cf9427 | control-plane | Default BackendURL to AMBIENT_API_SERVER_URL so runner pods reach the correct API server |
| f82422a2 | control-plane | Token refresh loop: iterate all projects (was calling ForProject("") → SDK rejected empty project) |
| 49dcf935 | control-plane | Refresh runner tokens immediately on startup (eliminates expiry gap after CP restart) |
| 2faf4424 | api-server | Lowercase session ID in runner service DNS hostname (session-{ID}.svc.cluster.local must be lowercase) |
| c16abb1d | control-plane | Reduce token refresh interval 10m → 4m (OIDC TTL is 15m; 10m left too small a margin for runner reconnects) |

Root Causes Fixed

  • Runner couldn't reach API server: BACKEND_API_URL defaulted to wrong value; now falls back to AMBIENT_API_SERVER_URL
  • Token refresh loop silently failed: ForProject("") was rejected by SDK; fixed by listing projects first
  • Token expired after CP restart: Ticker reset on restart created a gap; fixed by refreshing immediately on goroutine start
  • session events always 502: API server built DNS name with raw mixed-case session ID; K8s service names are lowercased by the control-plane
  • Token expiry under load: 10-minute refresh interval left too little margin before 15-minute OIDC TTL; reduced to 4 minutes

Test plan

  • Runner pod fetches credentials successfully (Successfully fetched github credentials from backend)
  • acpctl session events <id> streams without 502
  • Sessions running >15 min do not hit UNAUTHENTICATED: Token is expired on gRPC

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added built-in roles seeded at startup.
    • Added JWT-based username extraction fallback for bearer tokens.
  • Documentation

    • Major CLI README updates: login flow, project context commands, resource examples, declarative apply, and credentials/role-binding guides.
  • Bug Fixes

    • Fail-fast session watch authorization for unauthenticated callers.
    • More frequent and multi-project token refresh with immediate refresh on start.
    • Normalize runner names for streaming URLs.
  • Chores

    • Expanded test initialization imports to include additional plugins.

markturansky and others added 6 commits April 3, 2026 23:28
## Summary

- Seeds `credential:token-reader` and `credential:reader` roles via
migration `202603311216`
- Mounts runner BOT_TOKEN as file with 10-min background refresh loop in
control-plane
- Authorizes runner OIDC service account in `WatchSessionMessages`
- Adds exponential backoff retry in informer error handler
- Adds `acpctl apply -f` credential manifest support and role-binding
commands to CLI

## Test plan

- [ ] Deploy to OSD `ambient-s0` via ArgoCD (gitops MR \!94 already
merged)
- [ ] Verify `credential:token-reader` and `credential:reader` appear in
`acpctl get roles`
- [ ] Create role binding for `github-agent` in `credential-test`
project
- [ ] Start agent session and confirm runner pod retrieves token via
`GET /credentials/{id}/token`

🤖 Generated with [Claude Code](https://claude.ai/code)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Release Notes

* **New Features**
* Added role seeding and management capabilities to credentials system.
* Enhanced CLI with updated login, project context, and resource
management commands.
  * Introduced declarative manifest application via `acpctl apply`.
  * Added agent creation and session messaging commands.

* **Bug Fixes**
  * Strengthened authorization validation for session message access.

* **Documentation**
* Expanded CLI reference with comprehensive command examples and usage
patterns.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Ambient Code Bot <bot@ambient-code.local>
Co-authored-by: Claude <noreply@anthropic.com>
## Summary

The `BackendURL` config field defaulted to
`http://backend-service.ambient-code.svc:8080/api` — a legacy service
that no longer exists. Runner pods need `BACKEND_API_URL` set to call
`GET /credentials/{id}/token`. Since `AMBIENT_API_SERVER_URL` is already
set in all deployments, default `BackendURL` to it.

Discovered during E2E testing of the credential flow on OSD
`ambient-s0`: runner logs showed DNS failures fetching credentials from
the old backend URL.

## Test plan

- [ ] Runner pod logs show `Successfully fetched github credentials from
backend` instead of DNS failure on `backend-service.ambient-code.svc`
- [ ] Agent can retrieve GitHub token via `/credentials/{id}/token` and
use it

🤖 Generated with [Claude Code](https://claude.ai/code)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
* Backend URL configuration now uses environment variable fallback logic
(`BACKEND_API_URL` → `AMBIENT_API_SERVER_URL` → default to
`http://localhost:8000`), enabling more flexible configuration across
different deployment environments.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: Ambient Code Bot <bot@ambient-code.local>
Co-authored-by: Claude <noreply@anthropic.com>
refreshAllRunningTokens called ForProject(ctx, "") which the SDK
rejects with "project is required". Sessions are project-scoped so
the cross-project list requires iterating all projects first, then
listing running sessions per project.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
The token refresh ticker fires every 10 minutes. If the control-plane
restarts during an active session, existing runner BOT_TOKENs can
expire before the first ticker tick fires. Trigger an immediate refresh
on startup so any in-flight sessions get fresh tokens right away.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
K8s service names are lowercased by the control-plane (safeResourceName),
but StreamRunnerEvents was using the raw mixed-case session ID in the
svc.cluster.local hostname, causing DNS resolution to fail with i/o timeout
and a 502 to the client.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
OIDC tokens from Red Hat SSO have a 15-minute TTL. With a 10-minute
refresh interval the Secret update and the runner's reconnect can race
such that the runner reads an already-expired token. Reducing to 4
minutes ensures the mounted token always has ≥11 minutes of remaining
lifetime when the runner reads it on reconnect.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 4, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

Seeds built-in roles via a new DB migration; tightens session-watch authorization and normalizes runner CR names; adds JWT-based username extraction fallback for bearer tokens; updates CLI docs and backend URL fallback; and refactors token refresh to run per-project more frequently with immediate startup refresh.

Changes

Cohort / File(s) Summary
Roles Migration & Plugin init
components/ambient-api-server/plugins/credentials/migration.go, components/ambient-api-server/plugins/credentials/plugin.go
Adds rolesMigration() that JSON-encodes permissions, generates IDs, seeds two built-in role rows, and supports rollback (deletes seeded names). Registers the migration via db.RegisterMigration(...) in plugin init.
Test imports (plugin side-effects)
components/ambient-api-server/plugins/credentials/testmain_test.go
Adds multiple blank imports to include additional plugins (agents, inbox, projectSettings, projects, rbac, roleBindings, roles, sessions, users, plus rh-trex-ai events/generic) so their init side-effects run during tests.
Session auth & runner URL normalization
components/ambient-api-server/plugins/sessions/grpc_handler.go, components/ambient-api-server/plugins/sessions/handler.go
WatchSessionMessages now returns PermissionDenied when extracted username is empty; StreamRunnerEvents lowercases KubeCrName when building the runner host portion.
Bearer-token gRPC middleware (JWT fallback)
components/ambient-api-server/pkg/middleware/bearer_token_grpc.go
Unary and stream interceptors now attempt usernameFromJWT(token) when bearer token ≠ expected token; a non-empty username is set into context. Implements unverified JWT parsing to prefer preferred_username, username, or sub (ignoring values containing :).
Control plane config fallback & CLI docs
components/ambient-control-plane/internal/config/config.go, components/ambient-cli/README.md
BackendURL fallback now uses AMBIENT_API_SERVER_URL before defaulting to http://localhost:8000. README extensively updated: simplified login, acpctl apply declarative workflow, session/agent/credential/role-binding examples, and new CLI flags/options.
Token refresh loop & per-project traversal
components/ambient-control-plane/internal/reconciler/kube_reconciler.go
Refresh interval reduced from 10m to 4m and performs an immediate refresh at start. Rewrote refresh logic to iterate projects, create per-project SDK clients, paginate project/session lists, and handle per-project failures without aborting the whole run.

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 2 warnings)

Check name Status Explanation Resolution
Security And Secret Handling ❌ Error Critical authentication bypass vulnerability: bearer_token_grpc.go accepts unverified JWTs, enabling identity spoofing and session authorization bypass. Remove JWT fallback or implement proper verification with JWKS, signature, expiration, and issuer validation. Enforce consistent auth patterns across gRPC and HTTP middleware.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Kubernetes Resource Safety ⚠️ Warning Child Kubernetes resources lack ownerReferences, preventing automatic garbage collection and relying on less reliable label-selector cleanup. Add ownerReferences to Secrets, Pods, ServiceAccounts, Services in kube_reconciler.go pointing to Session with blockOwnerDeletion: true.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title follows Conventional Commits format (fix: scope) and directly aligns with PR objectives addressing credential flow fixes.
Performance And Algorithmic Complexity ✅ Passed Token refresh loop uses proper pagination for remote API calls and performs per-session operations against local Kubernetes cluster with bounded execution time.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/credential-flow-fixes
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch fix/credential-flow-fixes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Adds JWT fallback in bearerTokenGRPCUnaryInterceptor and
bearerTokenGRPCStreamInterceptor so CLI users sending OIDC JWTs
get their username extracted via jwt.ParseUnverified and set in
context via auth.SetUsernameContext, enabling WatchSessionMessages
to authorize them correctly.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@markturansky markturansky merged commit f7ef7d7 into alpha Apr 4, 2026
42 of 44 checks passed
@markturansky markturansky deleted the fix/credential-flow-fixes branch April 4, 2026 04:07
@markturansky
Copy link
Copy Markdown
Contributor Author

hello world

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant