Skip to content

feat: plugins: add new filter llm_tag plugin#123

Merged
patrick-stephens merged 12 commits into
mainfrom
feat/llm_rewrite_tag
Dec 1, 2025
Merged

feat: plugins: add new filter llm_tag plugin#123
patrick-stephens merged 12 commits into
mainfrom
feat/llm_rewrite_tag

Conversation

@niedbalski

@niedbalski niedbalski commented Nov 26, 2025

Copy link
Copy Markdown
Contributor

User description

Add new LLM-based log tagging filter plugin with OpenAI-compatible API support.

Example config with OpenAI:

filters:
  - name: llm_tag
    match: logs
    tags_match_mode: all
    model_endpoint: https://api.openai.com
    model_id: gpt-3.5-turbo
    model_timeout: 5000
    model_api_key: sk-...
    keep_record: true
    tags:
      - tag: security
        prompt: "This log indicates a security incident or authentication failure"
      - tag: phishing
        prompt: "This log contains phishing attempt or credential request"

Example config with local Ollama:

filters:
  - name: llm_tag
    match: logs
    tags_match_mode: all
    model_endpoint: http://127.0.0.1:11434
    model_id: phi3:mini
    model_timeout: 10000
    model_api_key: ""
    keep_record: true
    tags:
      - tag: security
        prompt: "This log indicates a security incident or authentication failure"
      - tag: phishing
        prompt: "This log contains phishing attempt or credential request"

Summary by cubic

Adds a new llm_tag filter to classify logs with LLMs and retag records, and updates config parsing to support arrays of objects in filter properties. Adds safeguards to preserve records on LLM failures.

  • New Features

    • New filter_llm_tag plugin for LLM-based log classification and tag rewriting.
    • Supports OpenAI-compatible endpoints (OpenAI API, Ollama, vLLM, TGI, llama.cpp) with TLS and optional API key.
    • Batch evaluates multiple tag rules per record; match modes: first or all; optional keep_record; emits via a shared emitter.
    • Includes a lightweight OpenAI client for chat completions, timeout handling, and basic metrics.
  • Bug Fixes

    • Preserve original records when LLM API requests fail to avoid data loss.
    • Fix YAML variant deep copy and nested object parsing for filters; fix JSON parsing of nested response objects; prevent TLS context leaks; remove API key from logs.
    • Improve Windows build compatibility (portable time, case-insensitive search, and strtok helpers).

Risk: 3/5

Written for commit 3e71878. Summary will update automatically on new commits.


CodeAnt-AI Description

LLM-based log tagging filter classifies logs via OpenAI-compatible endpoints and re-emits them with configured tags

What Changed

  • Added the llm_tag filter that forwards log messages to OpenAI-compatible models, applies per-rule prompts, emits re-tagged records through a shared emitter, and can keep the original log if desired
  • Batch classification now evaluates all rules in one call, honors first/all match modes, and preserves the original log when the LLM request fails to prevent data loss
  • YAML parsing now preserves nested arrays and objects for filter/processor configs so tag rules can be defined as variant arrays of objects

Impact

✅ LLM-based tagging for security logs
✅ Original logs preserved when LLM API fails
✅ Filter configs accept nested rule arrays

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

Add new LLM-based log tagging filter plugin with OpenAI-compatible API support
Add build configuration for LLM tagging filter
Extend YAML config parser to support complex nested structures in filter plugins
Add support for complex array properties containing objects in filter plugins
@niedbalski niedbalski requested a review from a team November 26, 2025 09:34
@codeant-ai

codeant-ai Bot commented Nov 26, 2025

Copy link
Copy Markdown

CodeAnt AI is reviewing your PR.

@codeant-ai codeant-ai Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files label Nov 26, 2025
@codeant-ai

codeant-ai Bot commented Nov 26, 2025

Copy link
Copy Markdown

CodeAnt AI finished reviewing your PR.

@niedbalski niedbalski self-assigned this Nov 26, 2025

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 issues found across 9 files

Prompt for AI agents (all 7 issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="source/plugins/filter_llm_tag/llm_tag.h">

<violation number="1" location="source/plugins/filter_llm_tag/llm_tag.h:20">
Rename the include guard macro to FLB_FILTER_LLM_TAG_H so it matches this header and cannot collide with a future llm_classify guard.</violation>
</file>

<file name="source/src/flb_config.c">

<violation number="1" location="source/src/flb_config.c:899">
Storing a cfl_variant pointer inside flb_kv.val (which is freed with flb_sds_destroy) will corrupt memory when the filter instance is destroyed. Use a container that does not assume string ownership for variant values instead of writing the pointer into kv-&gt;val.</violation>
</file>

<file name="source/plugins/filter_llm_tag/llm_tag.c">

<violation number="1" location="source/plugins/filter_llm_tag/llm_tag.c:246">
LLM request failures drop the original log record, causing data loss</violation>

<violation number="2" location="source/plugins/filter_llm_tag/llm_tag.c:520">
Debug logging prints every filter property value, leaking secrets such as model_api_key into logs. Remove this property-dumping block or at least mask sensitive values before logging to avoid exposing credentials.</violation>
</file>

<file name="source/src/config_format/flb_cf_yaml.c">

<violation number="1" location="source/src/config_format/flb_cf_yaml.c:851">
`kvlist_deep_copy` reuses the original variant pointer for non-string/array/kvlist entries, causing both lists to own the same object and leading to use-after-free/double-free when one is destroyed. Clone the variant before inserting it into the copy.</violation>
</file>

<file name="source/src/flb_openai_client.c">

<violation number="1" location="source/src/flb_openai_client.c:238">
Internally created TLS contexts leak when upstream creation fails because the cleanup path never destroys `client-&gt;tls`.</violation>

<violation number="2" location="source/src/flb_openai_client.c:589">
Destroying the caller-owned TLS context causes double free/invalid reuse when the same TLS is shared outside this client.</violation>
</file>

Since this is your first cubic review, here's how it works:

  • cubic automatically reviews your code and comments on bugs and improvements
  • Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
  • Ask questions if you need clarification on any suggestion

Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR

Comment thread source/plugins/filter_llm_tag/llm_tag.h
Comment thread source/src/flb_config.c
Comment thread source/plugins/filter_llm_tag/llm_tag.c
Comment thread source/src/config_format/flb_cf_yaml.c Outdated
Comment thread source/plugins/filter_llm_tag/llm_tag.c Outdated
Comment thread source/src/flb_openai_client.c
Comment thread source/src/flb_openai_client.c

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 issues found across 9 files

Prompt for AI agents (all 5 issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="source/src/flb_config.c">

<violation number="1" location="source/src/flb_config.c:899">
Storing a `struct cfl_variant *` in `kv_prop-&gt;val` means the property cleanup path calls `flb_sds_destroy` on a non-SDS pointer, leading to invalid free/memory corruption when the filter instance is destroyed. Use a structure that owns the variant or avoid the generic kv list for non-string data.</violation>
</file>

<file name="source/src/flb_openai_client.c">

<violation number="1" location="source/src/flb_openai_client.c:307">
`json_get_key` assumes each key/value only consumes two tokens, so it fails to find keys that appear after nested values (e.g., `choices` after `usage`), causing valid OpenAI responses to be rejected.</violation>

<violation number="2" location="source/src/flb_openai_client.c:589">
Destroying `client-&gt;tls` here frees TLS contexts that were passed in by the caller, causing double frees/use-after-free for shared TLS objects.</violation>
</file>

<file name="source/plugins/filter_llm_tag/llm_tag.c">

<violation number="1" location="source/plugins/filter_llm_tag/llm_tag.c:397">
Records whose LLM classification fails are dropped because the code continues without re-emitting or keeping the original record, leading to data loss on transient API failures.</violation>
</file>

<file name="source/src/config_format/flb_cf_yaml.c">

<violation number="1" location="source/src/config_format/flb_cf_yaml.c:851">
Primitive variants are not really copied in kvlist_deep_copy: the default branch inserts the original variant pointer, so source and copy share ownership and freeing either list frees the same variant. Allocate a fresh variant per primitive instead of reusing the pointer.</violation>
</file>

Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR

Comment thread source/src/flb_config.c
Comment thread source/src/flb_openai_client.c
Comment thread source/src/flb_openai_client.c Outdated
Comment thread source/plugins/filter_llm_tag/llm_tag.c Outdated
Comment thread source/src/config_format/flb_cf_yaml.c Outdated
- filter_llm_tag: preserve original records on LLM API failures
- filter_llm_tag: remove debug logging that exposed API keys
- flb_cf_yaml: properly clone primitive variants in kvlist_deep_copy
- flb_openai_client: fix TLS context leak on upstream creation failure
- flb_openai_client: fix JSON parsing for nested objects in responses
@codeant-ai

codeant-ai Bot commented Nov 26, 2025

Copy link
Copy Markdown

CodeAnt AI is running Incremental review

@codeant-ai codeant-ai Bot added size:XXL This PR changes 1000+ lines, ignoring generated files and removed size:XXL This PR changes 1000+ lines, ignoring generated files labels Nov 26, 2025
@codeant-ai

codeant-ai Bot commented Nov 26, 2025

Copy link
Copy Markdown

CodeAnt AI Incremental review completed.

@patrick-stephens patrick-stephens linked an issue Nov 28, 2025 that may be closed by this pull request
@patrick-stephens patrick-stephens added the build-packages Option to enable all package builds for a PR to test label Nov 28, 2025
@niedbalski niedbalski added build-linux Option to enable building of Linux packages only for a PR to test build-windows Option to enable building of Windows packages only for a PR to test labels Nov 28, 2025
@patrick-stephens patrick-stephens removed build-packages Option to enable all package builds for a PR to test build-linux Option to enable building of Linux packages only for a PR to test size:XXL This PR changes 1000+ lines, ignoring generated files labels Nov 28, 2025
@patrick-stephens

Copy link
Copy Markdown
Contributor

@niedbalski looks like Windows builds are failing:

2025-11-28T16:43:03.1848016Z D:\a\agent\agent\source\plugins\filter_llm_tag\llm_tag.c(195): error C2065: 'CLOCK_MONOTONIC': undeclared identifier
2025-11-28T16:43:03.1849519Z D:\a\agent\agent\source\plugins\filter_llm_tag\llm_tag.c(242): error C2065: 'CLOCK_MONOTONIC': undeclared identifier
2025-11-28T16:43:03.1852059Z D:\a\agent\agent\source\plugins\filter_llm_tag\llm_tag.c(263): warning C4047: '=': 'char *' differs in levels of indirection from 'int'
2025-11-28T16:43:03.1854219Z D:\a\agent\agent\source\plugins\filter_llm_tag\llm_tag.c(280): warning C4047: '=': 'char *' differs in levels of indirection from 'int'

- Replace POSIX clock_gettime with flb_time_get for cross-platform timing
- Add flb_strcasestr helper for case-insensitive search (strcasestr is GNU extension)
- Add flb_strtok_r helper to use strtok_s on Windows, strtok_r elsewhere
- Include flb_compat.h for strncasecmp macro on Windows
@niedbalski niedbalski changed the title feat: add filter_llm_tag plugin plugins: add new filter llm_tag plugin Nov 29, 2025
@niedbalski niedbalski changed the title plugins: add new filter llm_tag plugin feat: plugins: add new filter llm_tag plugin Nov 29, 2025
@coveralls

Copy link
Copy Markdown

Coverage Status

coverage: 56.268% (-0.3%) from 56.596%
when pulling 3e71878 on feat/llm_rewrite_tag
into 0390f9f on main.

@niedbalski niedbalski added build-macos Option to enable building of macOS packages only for a PR to test build-linux Option to enable building of Linux packages only for a PR to test labels Dec 1, 2025
@patrick-stephens patrick-stephens merged commit 47c83e0 into main Dec 1, 2025
270 of 284 checks passed
@patrick-stephens patrick-stephens deleted the feat/llm_rewrite_tag branch December 1, 2025 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build-linux Option to enable building of Linux packages only for a PR to test build-macos Option to enable building of macOS packages only for a PR to test build-windows Option to enable building of Windows packages only for a PR to test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Resolve unit testing failures

3 participants