Skip to content

enhancement(api)!: replace custom Health RPC with standard gRPC health service#25139

Merged
pront merged 4 commits intomasterfrom
pavlos/standardize-grpc-health-check
Apr 13, 2026
Merged

enhancement(api)!: replace custom Health RPC with standard gRPC health service#25139
pront merged 4 commits intomasterfrom
pavlos/standardize-grpc-health-check

Conversation

@pront
Copy link
Copy Markdown
Member

@pront pront commented Apr 7, 2026

Summary

Replace the custom (unreleased) ObservabilityService/Health RPC on the observability API (port 8686) with the standard grpc.health.v1.Health service. This enables native Kubernetes gRPC health probes, grpc-health-probe, and other standard gRPC health-checking tooling to work out of the box.

  • Added tonic_health standard health service to the gRPC server with reflection support
  • Removed the now-unused running: Arc<AtomicBool> parameter from the service and server

This change is required for vectordotdev/helm-charts#540,
which adds default gRPC readiness probes on port 8686. Kubernetes native gRPC probes call the standard
grpc.health.v1.Health/Check RPC with an empty service name, which tonic_health handles by default.

Vector configuration

No configuration changes required. The standard health service is automatically available on the
existing API address (default 0.0.0.0:8686).

How did you test this PR?

  • cargo check --features api,api-client passes
  • Verified tonic_health::server::health_reporter() responds to empty service name checks with
    SERVING by default (same pattern already used and tested in the vector source)
  • minikube local setup

Note: it's safe to ignore the failing proto check here since the affected proto file is not released yet.

Change Type

  • New feature
  • Bug fix
  • Dependencies
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

@github-actions github-actions bot added the domain: topology Anything related to Vector's topology code label Apr 7, 2026
@pront pront added the no-changelog Changes in this PR do not need user-facing explanations in the release changelog label Apr 7, 2026
@pront pront marked this pull request as ready for review April 7, 2026 18:57
@pront pront requested review from a team as code owners April 7, 2026 18:57
chatgpt-codex-connector[bot]

This comment was marked as outdated.

@domalessi domalessi self-assigned this Apr 7, 2026
chatgpt-codex-connector[bot]

This comment was marked as outdated.

chatgpt-codex-connector[bot]

This comment was marked as outdated.

…h service

Replace the custom `ObservabilityService/Health` RPC with the standard
`grpc.health.v1.Health` service on the observability API (port 8686).

This enables native Kubernetes gRPC health probes, `grpc-health-probe`,
and other standard tooling to work out of the box. The empty service
name (`""`) is used for whole-server health, matching what Kubernetes
probes and `grpc-health-probe` query by default.

Key changes:
- Remove Health RPC, HealthRequest, HealthResponse from the proto
- Add tonic_health standard health service to the gRPC server
- Switch vector-api-client to use tonic_health HealthClient; the
  client checks ServingStatus and returns NotServing error if not SERVING
- Flip health to NOT_SERVING in TopologyController::stop before draining
  the topology, so Kubernetes removes the pod from endpoints early
- Update docs, changelog, and add integration test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@pront pront force-pushed the pavlos/standardize-grpc-health-check branch from 127c9af to 69d3136 Compare April 8, 2026 13:25
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8ecc652850

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Member Author

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a release blocker, please review when you are back (cc @thomasqueirozb)

Copy link
Copy Markdown
Contributor

@thomasqueirozb thomasqueirozb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice improvement!

@pront pront enabled auto-merge April 13, 2026 15:09
Co-authored-by: Thomas <thomas.schneider@datadoghq.com>
@pront pront added this pull request to the merge queue Apr 13, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 13, 2026
@pront pront added this pull request to the merge queue Apr 13, 2026
@pront
Copy link
Copy Markdown
Member Author

pront commented Apr 13, 2026

Note: datadog-metrics e2e is flaky

Merged via the queue into master with commit c6574fd Apr 13, 2026
59 of 60 checks passed
@pront pront deleted the pavlos/standardize-grpc-health-check branch April 13, 2026 17:07
@github-actions github-actions bot locked and limited conversation to collaborators Apr 13, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

domain: topology Anything related to Vector's topology code no-changelog Changes in this PR do not need user-facing explanations in the release changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants