feat: plugins: add new filter llm_tag plugin by niedbalski · Pull Request #123 · telemetryforge/agent

niedbalski · 2025-11-26T09:34:04Z

User description

Add new LLM-based log tagging filter plugin with OpenAI-compatible API support.

Example config with OpenAI:

filters:
  - name: llm_tag
    match: logs
    tags_match_mode: all
    model_endpoint: https://api.openai.com
    model_id: gpt-3.5-turbo
    model_timeout: 5000
    model_api_key: sk-...
    keep_record: true
    tags:
      - tag: security
        prompt: "This log indicates a security incident or authentication failure"
      - tag: phishing
        prompt: "This log contains phishing attempt or credential request"

Example config with local Ollama:

filters:
  - name: llm_tag
    match: logs
    tags_match_mode: all
    model_endpoint: http://127.0.0.1:11434
    model_id: phi3:mini
    model_timeout: 10000
    model_api_key: ""
    keep_record: true
    tags:
      - tag: security
        prompt: "This log indicates a security incident or authentication failure"
      - tag: phishing
        prompt: "This log contains phishing attempt or credential request"

Summary by cubic

Adds a new llm_tag filter to classify logs with LLMs and retag records, and updates config parsing to support arrays of objects in filter properties. Adds safeguards to preserve records on LLM failures.

New Features
- New filter_llm_tag plugin for LLM-based log classification and tag rewriting.
- Supports OpenAI-compatible endpoints (OpenAI API, Ollama, vLLM, TGI, llama.cpp) with TLS and optional API key.
- Batch evaluates multiple tag rules per record; match modes: first or all; optional keep_record; emits via a shared emitter.
- Includes a lightweight OpenAI client for chat completions, timeout handling, and basic metrics.
Bug Fixes
- Preserve original records when LLM API requests fail to avoid data loss.
- Fix YAML variant deep copy and nested object parsing for filters; fix JSON parsing of nested response objects; prevent TLS context leaks; remove API key from logs.
- Improve Windows build compatibility (portable time, case-insensitive search, and strtok helpers).

Risk: 3/5

^{Written for commit 3e71878. Summary will update automatically on new commits.}

CodeAnt-AI Description

LLM-based log tagging filter classifies logs via OpenAI-compatible endpoints and re-emits them with configured tags

What Changed

Added the llm_tag filter that forwards log messages to OpenAI-compatible models, applies per-rule prompts, emits re-tagged records through a shared emitter, and can keep the original log if desired
Batch classification now evaluates all rules in one call, honors first/all match modes, and preserves the original log when the LLM request fails to prevent data loss
YAML parsing now preserves nested arrays and objects for filter/processor configs so tag rules can be defined as variant arrays of objects

Impact

✅ LLM-based tagging for security logs
✅ Original logs preserved when LLM API fails
✅ Filter configs accept nested rule arrays

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

Add new LLM-based log tagging filter plugin with OpenAI-compatible API support

Add build configuration for LLM tagging filter

Extend YAML config parser to support complex nested structures in filter plugins

Add support for complex array properties containing objects in filter plugins

codeant-ai · 2025-11-26T09:34:08Z

CodeAnt AI is reviewing your PR.

codeant-ai · 2025-11-26T09:40:18Z

CodeAnt AI finished reviewing your PR.

cubic-dev-ai

7 issues found across 9 files

Prompt for AI agents (all 7 issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="source/plugins/filter_llm_tag/llm_tag.h">

<violation number="1" location="source/plugins/filter_llm_tag/llm_tag.h:20">
Rename the include guard macro to FLB_FILTER_LLM_TAG_H so it matches this header and cannot collide with a future llm_classify guard.</violation>
</file>

<file name="source/src/flb_config.c">

<violation number="1" location="source/src/flb_config.c:899">
Storing a cfl_variant pointer inside flb_kv.val (which is freed with flb_sds_destroy) will corrupt memory when the filter instance is destroyed. Use a container that does not assume string ownership for variant values instead of writing the pointer into kv-&gt;val.</violation>
</file>

<file name="source/plugins/filter_llm_tag/llm_tag.c">

<violation number="1" location="source/plugins/filter_llm_tag/llm_tag.c:246">
LLM request failures drop the original log record, causing data loss</violation>

<violation number="2" location="source/plugins/filter_llm_tag/llm_tag.c:520">
Debug logging prints every filter property value, leaking secrets such as model_api_key into logs. Remove this property-dumping block or at least mask sensitive values before logging to avoid exposing credentials.</violation>
</file>

<file name="source/src/config_format/flb_cf_yaml.c">

<violation number="1" location="source/src/config_format/flb_cf_yaml.c:851">
`kvlist_deep_copy` reuses the original variant pointer for non-string/array/kvlist entries, causing both lists to own the same object and leading to use-after-free/double-free when one is destroyed. Clone the variant before inserting it into the copy.</violation>
</file>

<file name="source/src/flb_openai_client.c">

<violation number="1" location="source/src/flb_openai_client.c:238">
Internally created TLS contexts leak when upstream creation fails because the cleanup path never destroys `client-&gt;tls`.</violation>

<violation number="2" location="source/src/flb_openai_client.c:589">
Destroying the caller-owned TLS context causes double free/invalid reuse when the same TLS is shared outside this client.</violation>
</file>

Since this is your first cubic review, here's how it works:

cubic automatically reviews your code and comments on bugs and improvements
Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
Ask questions if you need clarification on any suggestion

_{Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR}

cubic-dev-ai

5 issues found across 9 files

Prompt for AI agents (all 5 issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="source/src/flb_config.c">

<violation number="1" location="source/src/flb_config.c:899">
Storing a `struct cfl_variant *` in `kv_prop-&gt;val` means the property cleanup path calls `flb_sds_destroy` on a non-SDS pointer, leading to invalid free/memory corruption when the filter instance is destroyed. Use a structure that owns the variant or avoid the generic kv list for non-string data.</violation>
</file>

<file name="source/src/flb_openai_client.c">

<violation number="1" location="source/src/flb_openai_client.c:307">
`json_get_key` assumes each key/value only consumes two tokens, so it fails to find keys that appear after nested values (e.g., `choices` after `usage`), causing valid OpenAI responses to be rejected.</violation>

<violation number="2" location="source/src/flb_openai_client.c:589">
Destroying `client-&gt;tls` here frees TLS contexts that were passed in by the caller, causing double frees/use-after-free for shared TLS objects.</violation>
</file>

<file name="source/plugins/filter_llm_tag/llm_tag.c">

<violation number="1" location="source/plugins/filter_llm_tag/llm_tag.c:397">
Records whose LLM classification fails are dropped because the code continues without re-emitting or keeping the original record, leading to data loss on transient API failures.</violation>
</file>

<file name="source/src/config_format/flb_cf_yaml.c">

<violation number="1" location="source/src/config_format/flb_cf_yaml.c:851">
Primitive variants are not really copied in kvlist_deep_copy: the default branch inserts the original variant pointer, so source and copy share ownership and freeing either list frees the same variant. Allocate a fresh variant per primitive instead of reusing the pointer.</violation>
</file>

_{Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR}

- filter_llm_tag: preserve original records on LLM API failures - filter_llm_tag: remove debug logging that exposed API keys - flb_cf_yaml: properly clone primitive variants in kvlist_deep_copy - flb_openai_client: fix TLS context leak on upstream creation failure - flb_openai_client: fix JSON parsing for nested objects in responses

codeant-ai · 2025-11-26T14:54:45Z

CodeAnt AI is running Incremental review

codeant-ai · 2025-11-26T15:00:36Z

CodeAnt AI Incremental review completed.

…_forward tests

patrick-stephens · 2025-11-28T16:56:15Z

@niedbalski looks like Windows builds are failing:

2025-11-28T16:43:03.1848016Z D:\a\agent\agent\source\plugins\filter_llm_tag\llm_tag.c(195): error C2065: 'CLOCK_MONOTONIC': undeclared identifier
2025-11-28T16:43:03.1849519Z D:\a\agent\agent\source\plugins\filter_llm_tag\llm_tag.c(242): error C2065: 'CLOCK_MONOTONIC': undeclared identifier
2025-11-28T16:43:03.1852059Z D:\a\agent\agent\source\plugins\filter_llm_tag\llm_tag.c(263): warning C4047: '=': 'char *' differs in levels of indirection from 'int'
2025-11-28T16:43:03.1854219Z D:\a\agent\agent\source\plugins\filter_llm_tag\llm_tag.c(280): warning C4047: '=': 'char *' differs in levels of indirection from 'int'

- Replace POSIX clock_gettime with flb_time_get for cross-platform timing - Add flb_strcasestr helper for case-insensitive search (strcasestr is GNU extension) - Add flb_strtok_r helper to use strtok_s on Windows, strtok_r elsewhere - Include flb_compat.h for strncasecmp macro on Windows

coveralls · 2025-11-29T07:44:39Z

coverage: 56.268% (-0.3%) from 56.596%
when pulling 3e71878 on feat/llm_rewrite_tag
into 0390f9f on main.

niedbalski added 4 commits November 26, 2025 10:22

feat(plugins): add filter_llm_tag plugin

09c61af

Add new LLM-based log tagging filter plugin with OpenAI-compatible API support

build(cmake): register filter_llm_tag plugin

df312e0

Add build configuration for LLM tagging filter

feat(config_format): add variant support for filter plugins

77c566e

Extend YAML config parser to support complex nested structures in filter plugins

feat(config): handle variant properties in filter configuration

b34ec4a

Add support for complex array properties containing objects in filter plugins

niedbalski requested a review from a team November 26, 2025 09:34

codeant-ai Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files label Nov 26, 2025

Merge branch 'main' into feat/llm_rewrite_tag

618d928

niedbalski self-assigned this Nov 26, 2025

cubic-dev-ai Bot reviewed Nov 26, 2025

View reviewed changes

Comment thread source/src/flb_config.c

Comment thread source/src/flb_openai_client.c

Comment thread source/src/flb_openai_client.c Outdated

Comment thread source/plugins/filter_llm_tag/llm_tag.c Outdated

Comment thread source/src/config_format/flb_cf_yaml.c Outdated

codeant-ai Bot added size:XXL This PR changes 1000+ lines, ignoring generated files and removed size:XXL This PR changes 1000+ lines, ignoring generated files labels Nov 26, 2025

niedbalski and others added 4 commits November 26, 2025 16:03

Merge branch 'main' into feat/llm_rewrite_tag

5f51c82

Merge branch 'main' into feat/llm_rewrite_tag

b12babe

fix(tests): use static callback to avoid stack-use-after-return in in…

94f07d8

…_forward tests

Merge branch 'main' into feat/llm_rewrite_tag

f4e08b4

patrick-stephens previously approved these changes Nov 28, 2025

View reviewed changes

patrick-stephens linked an issue Nov 28, 2025 that may be closed by this pull request

Resolve unit testing failures #61

Closed

patrick-stephens added the build-packages Option to enable all package builds for a PR to test label Nov 28, 2025

niedbalski added build-linux Option to enable building of Linux packages only for a PR to test build-windows Option to enable building of Windows packages only for a PR to test labels Nov 28, 2025

patrick-stephens removed build-packages Option to enable all package builds for a PR to test build-linux Option to enable building of Linux packages only for a PR to test size:XXL This PR changes 1000+ lines, ignoring generated files labels Nov 28, 2025

niedbalski dismissed patrick-stephens’s stale review via 937e1b6 November 28, 2025 17:34

niedbalski changed the title ~~feat: add filter_llm_tag plugin~~ plugins: add new filter llm_tag plugin Nov 29, 2025

Merge branch 'main' into feat/llm_rewrite_tag

3e71878

niedbalski changed the title ~~plugins: add new filter llm_tag plugin~~ feat: plugins: add new filter llm_tag plugin Nov 29, 2025

niedbalski added build-macos Option to enable building of macOS packages only for a PR to test build-linux Option to enable building of Linux packages only for a PR to test labels Dec 1, 2025

patrick-stephens merged commit 47c83e0 into main Dec 1, 2025
270 of 284 checks passed

patrick-stephens deleted the feat/llm_rewrite_tag branch December 1, 2025 13:51

This was referenced May 8, 2026

feat: sync upstream Fluent Bit from v4.1.0 to v4.2.4 #277

Merged

feat: add llm_tag documentation telemetryforge/documentation#129

Merged

Uh oh!

Conversation

niedbalski commented Nov 26, 2025 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Summary by cubic

CodeAnt-AI Description

What Changed

Impact

Checking Your Pull Request

Talking to CodeAnt AI

Example

Preserve Org Learnings with CodeAnt

Example

Retrigger review

Check Your Repository Health

Uh oh!

codeant-ai Bot commented Nov 26, 2025

Uh oh!

codeant-ai Bot commented Nov 26, 2025

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codeant-ai Bot commented Nov 26, 2025

Uh oh!

codeant-ai Bot commented Nov 26, 2025

Uh oh!

patrick-stephens commented Nov 28, 2025

Uh oh!

coveralls commented Nov 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

niedbalski commented Nov 26, 2025 •

edited by cubic-dev-ai Bot

Loading