Skip to content

metrics: fix input metrics double-counting with conditional routing#11948

Draft
zshuang0316 wants to merge 3 commits into
fluent:masterfrom
zshuang0316:fix-input-counter
Draft

metrics: fix input metrics double-counting with conditional routing#11948
zshuang0316 wants to merge 3 commits into
fluent:masterfrom
zshuang0316:fix-input-counter

Conversation

@zshuang0316

Copy link
Copy Markdown
Contributor

In input_log_append_processed_internal() the ingestion is additive: split_and_append_route_payloads()
creates one chunk copy per matching route, and then an additional unconditional append of the original buffer
always runs so non-conditional routes still receive the data. Each of those appends incremented the input counters
inside input_chunk_append_raw(), so a record matching N routes was counted N+1 times. This is most visible with
in_opentelemetry (OTLP logs) when more than one route matches, where the totals roughly double.

This is a metrics-accounting fix only — routing behavior and chunk delivery are unchanged.

Fix

  • Add an FLB_INPUT_CHUNK_SKIP_INPUT_METRICS flag and thread it through the internal append path via new
    flb_input_chunk_append_raw_flags() / flb_input_chunk_append_raw_local_flags() helpers. The existing public
    functions are unchanged (they forward flags = 0), so no other callers are affected.
  • Suppress input-level accounting on the per-route copy appends in split_and_append_route_payloads(). The
    original ingestion is still counted exactly once by the unconditional append, which preserves the existing pause /
    empty-buffer guards in input_chunk_append_raw().

    After the change, fluentbit_input_records_total equals the original record count and
    fluentbit_input_bytes_total equals the original byte size regardless of how many routes match. Metrics, traces,
    profiles and blobs inputs are unaffected — they append once with no route splitting.

Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

zshuang0316 and others added 2 commits June 15, 2026 20:07
Conditional and per-record routing for logs creates one chunk copy per
matching route and then performs an additional unconditional append of
the original buffer. Every append incremented the input-level counters
fluentbit_input_records_total and fluentbit_input_bytes_total, so the
totals were inflated proportionally to the number of matching routes.

Add an FLB_INPUT_CHUNK_SKIP_INPUT_METRICS flag and thread it through the
internal append path via new flags-aware helpers so callers can suppress
input-level accounting for these internal copies. The original ingestion
is still counted once by the unconditional append, which keeps the
existing pause and empty-buffer guards intact.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: zshuang0316 <zshuang0316@163.com>
split_and_append_route_payloads() appended one chunk copy per matching
route through the metric-bearing append helpers, double counting the
input records and bytes that are already accounted for by the
unconditional append in input_log_append_processed_internal().

Use the new flags-aware append helpers with
FLB_INPUT_CHUNK_SKIP_INPUT_METRICS for the per-route copies so the
input-level totals reflect the original ingestion exactly once,
regardless of how many routes match.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: zshuang0316 <zshuang0316@163.com>
@zshuang0316

Copy link
Copy Markdown
Contributor Author

@codex please review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b956a77dd3

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/flb_input_log.c
ret = flb_input_chunk_append_raw_local(ins,
ret = flb_input_chunk_append_raw_local_flags(ins,
FLB_INPUT_LOGS,
FLB_INPUT_CHUNK_SKIP_INPUT_METRICS,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid dropping metrics when the base append is skipped

In conditional routing under mem_buf_limit or storage pause, one of these routed appends can successfully create a route-specific chunk and then pause the input via flb_input_chunk_protect; because the route copy is marked FLB_INPUT_CHUNK_SKIP_INPUT_METRICS, the only planned input-counter update is the later unconditional append. That append runs after this loop and returns before counting when flb_input_buf_paused() is true, so chunks already accepted for conditional routes are delivered without ever being reflected in fluentbit_input_records_total / _bytes_total.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b956a77dd3

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/flb_input_log.c
ret = flb_input_chunk_append_raw_local(ins,
ret = flb_input_chunk_append_raw_local_flags(ins,
FLB_INPUT_LOGS,
FLB_INPUT_CHUNK_SKIP_INPUT_METRICS,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Count route copies if the base append can fail

When conditional/per-record routing is active and a route-copy append succeeds but the later unconditional append fails (for example, the route copy pushes the input over mem_buf_limit/storage limits and flb_input_chunk_protect() pauses it before the base append runs), this flag suppresses the only input metrics for data that remains queued for that route. The function returns an error, but the per-route chunk has already been appended and had its direct outputs applied, so fluentbit_input_records_total / bytes can under-report delivered records on that failure path; count the original ingestion before splitting or account/undo the successful route copies when the base append fails.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codex review

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex Review: Didn't find any major issues. More of your lovely PRs please.

Reviewed commit: a06ca3ca09

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@zshuang0316 zshuang0316 marked this pull request as draft June 15, 2026 15:46
@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 93791211-21a4-45fa-8c31-ad954e64781c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Counting the input metrics on the base append while the per-route copies
are skipped leaves a gap under backpressure. A route copy can succeed,
push the input over its memory or storage limit and pause it via
flb_input_chunk_protect(), or the base append can fail after some copies
were already delivered. In both cases the base append returns before
counting, so records already queued for a route go unreflected in
fluentbit_input_records_total and fluentbit_input_bytes_total.

Account the original ingestion exactly once at the routing boundary,
before the split, gated by the entry pause state, and mark the base
append with FLB_INPUT_CHUNK_SKIP_INPUT_METRICS as the per-route copies
already are. Counting up front keeps the totals correct regardless of
which downstream append later pauses or fails. The plain non-routed path
is unchanged and keeps counting inside flb_input_chunk_append_raw().

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: zshuang0316 <zshuang0316@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant