metrics: fix input metrics double-counting with conditional routing#11948
metrics: fix input metrics double-counting with conditional routing#11948zshuang0316 wants to merge 3 commits into
Conversation
Conditional and per-record routing for logs creates one chunk copy per matching route and then performs an additional unconditional append of the original buffer. Every append incremented the input-level counters fluentbit_input_records_total and fluentbit_input_bytes_total, so the totals were inflated proportionally to the number of matching routes. Add an FLB_INPUT_CHUNK_SKIP_INPUT_METRICS flag and thread it through the internal append path via new flags-aware helpers so callers can suppress input-level accounting for these internal copies. The original ingestion is still counted once by the unconditional append, which keeps the existing pause and empty-buffer guards intact. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: zshuang0316 <zshuang0316@163.com>
split_and_append_route_payloads() appended one chunk copy per matching route through the metric-bearing append helpers, double counting the input records and bytes that are already accounted for by the unconditional append in input_log_append_processed_internal(). Use the new flags-aware append helpers with FLB_INPUT_CHUNK_SKIP_INPUT_METRICS for the per-route copies so the input-level totals reflect the original ingestion exactly once, regardless of how many routes match. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: zshuang0316 <zshuang0316@163.com>
|
@codex please review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b956a77dd3
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| ret = flb_input_chunk_append_raw_local(ins, | ||
| ret = flb_input_chunk_append_raw_local_flags(ins, | ||
| FLB_INPUT_LOGS, | ||
| FLB_INPUT_CHUNK_SKIP_INPUT_METRICS, |
There was a problem hiding this comment.
Avoid dropping metrics when the base append is skipped
In conditional routing under mem_buf_limit or storage pause, one of these routed appends can successfully create a route-specific chunk and then pause the input via flb_input_chunk_protect; because the route copy is marked FLB_INPUT_CHUNK_SKIP_INPUT_METRICS, the only planned input-counter update is the later unconditional append. That append runs after this loop and returns before counting when flb_input_buf_paused() is true, so chunks already accepted for conditional routes are delivered without ever being reflected in fluentbit_input_records_total / _bytes_total.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b956a77dd3
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| ret = flb_input_chunk_append_raw_local(ins, | ||
| ret = flb_input_chunk_append_raw_local_flags(ins, | ||
| FLB_INPUT_LOGS, | ||
| FLB_INPUT_CHUNK_SKIP_INPUT_METRICS, |
There was a problem hiding this comment.
Count route copies if the base append can fail
When conditional/per-record routing is active and a route-copy append succeeds but the later unconditional append fails (for example, the route copy pushes the input over mem_buf_limit/storage limits and flb_input_chunk_protect() pauses it before the base append runs), this flag suppresses the only input metrics for data that remains queued for that route. The function returns an error, but the per-route chunk has already been appended and had its direct outputs applied, so fluentbit_input_records_total / bytes can under-report delivered records on that failure path; count the original ingestion before splitting or account/undo the successful route copies when the base append fails.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Codex Review: Didn't find any major issues. More of your lovely PRs please.
Reviewed commit: a06ca3ca09
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Counting the input metrics on the base append while the per-route copies are skipped leaves a gap under backpressure. A route copy can succeed, push the input over its memory or storage limit and pause it via flb_input_chunk_protect(), or the base append can fail after some copies were already delivered. In both cases the base append returns before counting, so records already queued for a route go unreflected in fluentbit_input_records_total and fluentbit_input_bytes_total. Account the original ingestion exactly once at the routing boundary, before the split, gated by the entry pause state, and mark the base append with FLB_INPUT_CHUNK_SKIP_INPUT_METRICS as the per-route copies already are. Counting up front keeps the totals correct regardless of which downstream append later pauses or fails. The plain non-routed path is unchanged and keeps counting inside flb_input_chunk_append_raw(). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: zshuang0316 <zshuang0316@163.com>
In
input_log_append_processed_internal()the ingestion is additive:split_and_append_route_payloads()creates one chunk copy per matching route, and then an additional unconditional append of the original buffer
always runs so non-conditional routes still receive the data. Each of those appends incremented the input counters
inside
input_chunk_append_raw(), so a record matching N routes was counted N+1 times. This is most visible within_opentelemetry(OTLP logs) when more than one route matches, where the totals roughly double.This is a metrics-accounting fix only — routing behavior and chunk delivery are unchanged.
Fix
FLB_INPUT_CHUNK_SKIP_INPUT_METRICSflag and thread it through the internal append path via newflb_input_chunk_append_raw_flags()/flb_input_chunk_append_raw_local_flags()helpers. The existing publicfunctions are unchanged (they forward
flags = 0), so no other callers are affected.split_and_append_route_payloads(). Theoriginal ingestion is still counted exactly once by the unconditional append, which preserves the existing pause /
empty-buffer guards in
input_chunk_append_raw().After the change,
fluentbit_input_records_totalequals the original record count andfluentbit_input_bytes_totalequals the original byte size regardless of how many routes match. Metrics, traces,profiles and blobs inputs are unaffected — they append once with no route splitting.
Enter
[N/A]in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-testlabel to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.