in_ebpf: Implement sched trace#11743
Conversation
📝 WalkthroughWalkthroughThis PR adds comprehensive eBPF scheduler event tracing to Fluent Bit. The change defines a new ChangesScheduler trace support
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
a8dc2f7 to
5f38fd5
Compare
5f38fd5 to
8235ef9
Compare
8235ef9 to
4916257
Compare
4916257 to
d98bc64
Compare
d98bc64 to
fa08b76
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fa08b76c7f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@plugins/in_ebpf/traces/sched/bpf.c`:
- Around line 79-103: The code is reading namespace/UID/GID from the current
(outgoing) task but builds the event for the next task; change the sched_switch
handler so mntns_id and uid_gid are read from the next task instead of using
gadget_get_mntns_id() and bpf_get_current_uid_gid(): locate where
next_pid/ctx->next_comm are used, obtain the next task pointer (the sched_switch
context field for the next task), use bpf_core_read to fetch next task's
nsproxy->mnt_ns inum (or other mntns id field) and its credentials (uid/gid)
into local vars, call gadget_should_discard_mntns_id() with that next-task
mntns_id, and assign event->common.uid/gid/mntns_id from those next-task values
before filling pid/tid/comm.
- Around line 28-31: The sched handler is reserving the full union-sized struct
event (via gadget_reserve_buf(&events, sizeof(*event))) which makes each
sched_switch sample huge; create a compact sched-specific record path: define a
dedicated compact struct sched_event (not the large union) and either use a
separate ringbuf map or ensure gadget_reserve_buf is called with sizeof(struct
sched_event) when handling sched_switch; update the sched event
producer/consumer functions (the code around the events map, gadget_reserve_buf
usage, and the sched_switch handler) to allocate, populate, and submit only the
small sched_event payload so the ring buffer capacity is not consumed by the
large execve union members.
In `@plugins/in_ebpf/traces/sched/handler.c`:
- Around line 138-153: trace_sched_handler currently returns early on failures
from encode_sched_event and flb_input_log_append without calling
flb_log_event_encoder_reset, risking leftover msgpack buffer contamination;
update the failure paths in trace_sched_handler so that immediately before each
early return (after encode_sched_event returns non-zero and after
flb_input_log_append returns -1) you call flb_log_event_encoder_reset(encoder)
to fully clear the encoder state (not just flb_log_event_encoder_reset_record),
ensuring the underlying buffer is cleared before returning.
In `@tests/runtime/in_ebpf_sched_handler.c`:
- Around line 97-117: The test currently only checks values for keys when they
appear in the loop inside verify_decoded_values, so missing keys will not fail
the test; add boolean flags (e.g., found_event_type, found_cpu, found_prev_pid,
found_next_pid, found_runq_latency_ns) and set the appropriate flag inside each
branch that matches key_matches(...) for "event_type", "cpu", "prev_pid",
"next_pid", and "runq_latency_ns"; after the loop, assert each flag is true
using TEST_CHECK (or equivalent) to ensure all required keys were present and
validated against original->details.sched fields.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b41a2fde-b47a-4623-9e1c-dd3abcb22bd7
📒 Files selected for processing (8)
plugins/in_ebpf/traces/includes/common/encoder.hplugins/in_ebpf/traces/includes/common/events.hplugins/in_ebpf/traces/sched/bpf.cplugins/in_ebpf/traces/sched/handler.cplugins/in_ebpf/traces/sched/handler.hplugins/in_ebpf/traces/traces.htests/runtime/CMakeLists.txttests/runtime/in_ebpf_sched_handler.c
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
fa08b76 to
82e0ffa
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@plugins/in_ebpf/traces/sched/bpf.c`:
- Line 76: The code is reading from next->pid which represents the
thread-specific ID in the Linux kernel, but it should read from next->tgid which
is the actual process ID (thread group ID). Change the bpf_core_read call that
reads next_pid to use &next->tgid as the source address instead of &next->pid to
correctly report the process ID rather than the thread ID for multi-threaded
tasks. Apply the same fix to the similar code at lines 104-105.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b376d7e2-f6a0-46a7-a72e-aafb90eef46c
📒 Files selected for processing (8)
plugins/in_ebpf/traces/includes/common/encoder.hplugins/in_ebpf/traces/includes/common/events.hplugins/in_ebpf/traces/sched/bpf.cplugins/in_ebpf/traces/sched/handler.cplugins/in_ebpf/traces/sched/handler.hplugins/in_ebpf/traces/traces.htests/runtime/CMakeLists.txttests/runtime/in_ebpf_sched_handler.c
✅ Files skipped from review due to trivial changes (1)
- plugins/in_ebpf/traces/includes/common/encoder.h
🚧 Files skipped from review as they are similar to previous changes (6)
- tests/runtime/CMakeLists.txt
- plugins/in_ebpf/traces/sched/handler.h
- plugins/in_ebpf/traces/includes/common/events.h
- plugins/in_ebpf/traces/traces.h
- tests/runtime/in_ebpf_sched_handler.c
- plugins/in_ebpf/traces/sched/handler.c
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
plugins/in_ebpf/traces/sched/bpf.c (1)
113-116:⚠️ Potential issue | 🟡 MinorAsymmetric ID types in
prev_pidvsnext_pidfields.
prev_pidis populated fromprev->pid(kernel thread ID), whilenext_pidis populated fromnext->tgid(process ID). This creates inconsistent semantics:
event->details.prev_pidcontains the thread ID of the previous taskevent->details.next_pidcontains the process ID of the next taskThe
commonstruct correctly distinguishes these as separatepidandtidfields, but thedetailsstruct has noprev_tidfield and no separateprev_tgid. Either populateprev_pidfromprev->tgidfor consistency, or add aprev_tidfield to match the pattern used incommon.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugins/in_ebpf/traces/sched/bpf.c` around lines 113 - 116, The prev_pid and next_pid fields have inconsistent semantics: prev_pid is populated from prev->pid (thread ID) while next_pid is populated from next->tgid (process ID). Fix this asymmetry by ensuring both fields use the same type of identifier. Change the assignment of event->details.prev_pid to use prev->tgid instead of prev->pid, making it consistent with how next_pid is populated from next->tgid. This ensures both fields represent process IDs rather than mixing thread IDs and process IDs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@plugins/in_ebpf/traces/sched/bpf.c`:
- Around line 113-116: The prev_pid and next_pid fields have inconsistent
semantics: prev_pid is populated from prev->pid (thread ID) while next_pid is
populated from next->tgid (process ID). Fix this asymmetry by ensuring both
fields use the same type of identifier. Change the assignment of
event->details.prev_pid to use prev->tgid instead of prev->pid, making it
consistent with how next_pid is populated from next->tgid. This ensures both
fields represent process IDs rather than mixing thread IDs and process IDs.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 944d01d6-172e-4f09-98fa-a25472270811
📒 Files selected for processing (1)
plugins/in_ebpf/traces/sched/bpf.c
In this PR, I implemented sched eBPF trace.
Enter
[N/A]in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-testlabel to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit
sched) event type and payload fields, and improved string rendering for the scheduler event type.