Skip to content

chore(api)!: replace GraphQL observability API with gRPC, remove async-graphql dependencies#24364

Merged
pront merged 81 commits intomasterfrom
pront-grpc
Mar 24, 2026
Merged

chore(api)!: replace GraphQL observability API with gRPC, remove async-graphql dependencies#24364
pront merged 81 commits intomasterfrom
pront-grpc

Conversation

@pront
Copy link
Copy Markdown
Member

@pront pront commented Dec 10, 2025

Summary

Migrates Vector's internal API from GraphQL to gRPC, significantly reducing code complexity while maintaining full backward compatibility for vector top and vector tap.

Removed (GraphQL)

  • Thousands of lines of GraphQL schema code were deleted 🎉
  • GraphQL dependencies: async-graphql, async-graphql-warp

Added (gRPC)

  • Protobuf schema: proto/vector/observability.proto
  • gRPC service implementation
  • gRPC client using tonic/prost (already existing dependencies)
  • Updated vector top and vector tap to use gRPC streaming

How did you test this PR?

  • cargo run -- top
  • cargo run -- tap
  • Updated and extended vector_api integration tests (component discovery, config reload, multi-output per-output metrics, tap across reload)
  • Existing unit tests

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

@github-actions github-actions bot added the domain: topology Anything related to Vector's topology code label Dec 10, 2025
@pront pront added the no-changelog Changes in this PR do not need user-facing explanations in the release changelog label Dec 12, 2025
@pront pront force-pushed the pront-grpc branch 7 times, most recently from a317991 to 45e8d0e Compare December 18, 2025 20:26
@pront pront force-pushed the pront-grpc branch 3 times, most recently from d3b8168 to 5cec09b Compare December 22, 2025 19:02
@pront pront force-pushed the pront-grpc branch 2 times, most recently from 216a7fc to 5b15df1 Compare January 9, 2026 23:50
pront and others added 5 commits February 25, 2026 18:03
The comment now explicitly mentions that Log.metadata_full preserves event metadata during conversion.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Migrate the vector_api integration tests from the deprecated GraphQL API to the new gRPC API.

Changes:
- Updated test infrastructure to use gRPC client instead of GraphQL
- Migrated component queries from GraphQL to gRPC GetComponents RPC
- Replaced GraphQL subscriptions with gRPC streaming for tap functionality
- Added metrics population in gRPC GetComponents handler to support metrics tests
- Fixed config reload tests to work around Vector behavior where components with
  unchanged names but modified connections don't appear during the reload transition
- Increased startup timeout from 2s to 10s to handle slower test environments
- Added 500ms delay before polling after SIGHUP to let Vector process the signal

Test fixes:
- Updated all assertions to use gRPC proto types (ComponentsResponse, ComponentType, etc.)
- Changed tap tests to collect events inline instead of storing streams to avoid lifetime issues
- Simplified reload tests to completely replace components instead of keeping old names
- All 7 tests now pass: tap (4 tests), top (3 tests)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add breaking change changelog documenting the removal of the GraphQL
Playground and replacement with gRPC API. Includes migration guide with
grpcurl examples for common operations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@pront pront changed the title chore(api)!: grpc server (remove graphql) chore(api): replace GraphQL observability API with gRPC, remove async-graphql dependencies Mar 19, 2026
pront and others added 2 commits March 19, 2026 16:06
- Add compatibility note to changelog: 0.55+ clients require 0.55+ server
- Add grpc.io link to api.md
- Link proto file directly in api.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fixes unused import/dead_code warnings when building without the api
feature (e.g. component-validation-tests).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@thomasqueirozb thomasqueirozb changed the title chore(api): replace GraphQL observability API with gRPC, remove async-graphql dependencies chore(api)!: replace GraphQL observability API with gRPC, remove async-graphql dependencies Mar 19, 2026
…etrics

Consolidates StreamComponentReceivedEventsTotal,
StreamComponentSentEventsTotal, StreamComponentReceivedBytesTotal,
StreamComponentSentBytesTotal, StreamComponentErrorsTotal,
StreamComponentReceivedEventsThroughput,
StreamComponentSentEventsThroughput,
StreamComponentReceivedBytesThroughput, and
StreamComponentSentBytesThroughput into a single
StreamComponentMetrics(ComponentMetricStreamRequest) RPC.

The unified response type ComponentMetricResponse uses
oneof value { TotalMetric total; ThroughputMetric throughput; }
to carry per-output breakdowns alongside the aggregated value,
keeping the schema honest about which fields are populated.

StreamComponentAllocatedBytes remains a separate RPC since it is
semantically distinct and has no per-output breakdown.

Also moves ApiStarted internal event from grpc.rs (gated behind
sources-vector/opentelemetry) into a new api.rs module gated
behind the api feature flag.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
pront and others added 6 commits March 20, 2026 14:31
…ust updates

- Add METRIC_NAME_ prefix to MetricName enum values (ENUM_VALUE_PREFIX)
- Rename proto package to vector.observability.v1 (PACKAGE_VERSION_SUFFIX)
- Rename service Observability -> ObservabilityService (SERVICE_SUFFIX)
- Rename all request/response messages to match their RPC names per
  RPC_REQUEST_RESPONSE_UNIQUE convention (e.g. MetaRequest ->
  GetMetaRequest, HeartbeatRequest -> StreamHeartbeatRequest, etc.)
- Add COMPONENT_TYPE_UNSPECIFIED zero value to ComponentType enum
  (ENUM_ZERO_VALUE_SUFFIX)
- Cascade all renames through Rust: service.rs, client.rs, metrics.rs,
  vector-tap, and integration tests
- Update grpcurl examples in changelog

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…fix api.md title

- Client::new now takes http::Uri instead of String; Endpoint is built
  eagerly so connect() can no longer fail with InvalidUrl
- Remove unused `url` dep from vector-api-client, add `http`
- Remove test_invalid_url and test_connection_failure (assert nothing useful)
- Update all callers to parse their string URL into Uri
- Fix api.md title to "The Vector Observability API" and update stale
  grpcurl examples to use vector.observability.v1.ObservabilityService

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move `TapRunner`, `EventFormatter`, `OutputChannel`, and `TapExecutorError`
into a new `runner` module gated by the `api` feature, and make the
associated heavy deps (`vector-api-client`, `prost`, `bytes`, `serde_*`, etc.)
optional. Also removes the unused `tokio-tungstenite` dep. Restores
`vector-lib/api = ["vector-tap/api"]` so the feature activation chain is
preserved.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move `#[allow(clippy::print_stderr)]` to the `fn cmd` level in both
`src/top/cmd.rs` and `src/tap/cmd.rs` instead of per-statement blocks,
and inline `{url}` format captures where applicable. Also applies
rustfmt line-wrapping to service.rs and top.rs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The gRPC server already normalizes throughput values to per-second by
dividing by `interval_secs` before streaming them. The `interval` field
in the four `*Throughputs` EventType variants was therefore unused
(`_interval`) and carried over from the old GraphQL client, which had to
do the `v * (1000.0 / interval_ms)` normalization itself.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
pront and others added 2 commits March 23, 2026 16:25
- Use OS entropy (thread RNG) instead of epoch-based seed for reservoir
  sampler RNG in service.rs; removes SystemTime/UNIX_EPOCH boilerplate
- Return exitcode::USAGE (bad input) instead of exitcode::UNAVAILABLE
  when URL fails to parse in tap/top cmd
- Rename tap_internal parameter initial_client -> mut client_opt,
  eliminating the intermediate local binding
- Thread http::Uri through top::cmd -> subscription -> metrics::subscribe
  and metrics::init_components, removing the String -> Uri round-trip;
  re-export http::Uri from vector-api-client for callers

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@thomasqueirozb thomasqueirozb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Happy to see these abandoned graphql libraries being removed

pront and others added 2 commits March 23, 2026 16:38
…i-client

Add http as a direct dep to vector-top and import http::Uri directly in
metrics.rs and top/cmd.rs, rather than leaking it through a re-export
in vector-api-client.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@pront pront enabled auto-merge March 23, 2026 20:44
@datadog-vectordotdev
Copy link
Copy Markdown

datadog-vectordotdev bot commented Mar 23, 2026

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: a9c56a5 | Docs | Was this helpful? Give us feedback!

@pront pront added this pull request to the merge queue Mar 23, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 23, 2026
The multi_output_transform_reports_per_output_sent_events test uses a
route transform, but the feature was not included in vector-api-tests,
causing Vector to exit with CONFIG error (78) when starting.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@pront pront enabled auto-merge March 24, 2026 13:00
@pront pront added this pull request to the merge queue Mar 24, 2026
Merged via the queue into master with commit 93a9771 Mar 24, 2026
59 checks passed
@pront pront deleted the pront-grpc branch March 24, 2026 14:04
@github-actions github-actions bot locked and limited conversation to collaborators Mar 24, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

domain: ci Anything related to Vector's CI environment domain: core Anything related to core crates i.e. vector-core, core-common, etc domain: external docs Anything related to Vector's external, public documentation domain: topology Anything related to Vector's topology code no-changelog Changes in this PR do not need user-facing explanations in the release changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants