Skip to content

tls: openssl: Reload and watch certificates mainly on Linux#11950

Open
cosmo0920 wants to merge 4 commits into
masterfrom
cosmo0920-reload-and-watch-certificates
Open

tls: openssl: Reload and watch certificates mainly on Linux#11950
cosmo0920 wants to merge 4 commits into
masterfrom
cosmo0920-reload-and-watch-certificates

Conversation

@cosmo0920

@cosmo0920 cosmo0920 commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

On the current shorter validness of certificates proposals, we need to handle reloading certificates on Linux.
This is because shorter period of certificates" validness requests us to relaunching Fluent Bit executable before reaching the expire date of the certificates.
However, there is needed to take care of the possibility to care buffered chunks before flushing/terminating Fluent Bit executable.
Instead, if we had a capability to handle reloading certificates on changes, it'll be quite useful for users who need to handle short period of certificates.
But there is a limitation of this patch because we just storing with the same conditions the previous certificates like passphrase.

Closes #10692.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • New Features
    • Added automatic runtime TLS certificate/key reload by detecting changes to configured CA/cert/key files.
    • Introduced thread-safe reload handling with tracked file metadata to determine when a reload is needed.
    • Improved system certificate refresh behavior on Windows and macOS during TLS context reloads, including certstore/thumbprint settings.
  • Tests
    • Added a TLS reload test covering initial stability and triggering after certificate/key file changes.

Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f55160be-8d46-4503-ab08-7ca2f11751e1

📥 Commits

Reviewing files that changed from the base of the PR and between 8ce04a2 and 78f867c.

📒 Files selected for processing (3)
  • src/tls/flb_tls.c
  • src/tls/openssl.c
  • tests/internal/upstream_tls.c
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/tls/openssl.c
  • src/tls/flb_tls.c
  • tests/internal/upstream_tls.c

📝 Walkthrough

Walkthrough

Adds runtime TLS certificate reload support. A new flb_tls_file_status struct tracks file metadata (size, mtime, ctime). flb_tls_reload_if_needed() checks whether any watched certificate file has changed and, if so, invokes a new context_reload callback on the OpenSSL backend to atomically swap the SSL_CTX. This is triggered before each TLS session handshake.

Changes

TLS Runtime Certificate Reload

Layer / File(s) Summary
Public structs and API declarations
include/fluent-bit/tls/flb_tls.h
Adds flb_tls_file_status struct to track file metadata, context_reload function pointer to flb_tls_backend, new fields in flb_tls for config strings (ca_path, ca_file, crt_file, key_file), per-file status tracking, system_certificates_loaded flag, reload_mutex, and declares flb_tls_reload_if_needed.
File metadata helpers and string storage utility
src/tls/flb_tls.c
Adds includes for file and memory operations. Updates flb_tls_load_system_certificates to set system_certificates_loaded flag on success. Implements tls_file_status_get (stat-based metadata fetch), tls_file_status_changed (comparison), tls_file_status_refresh, tls_file_status_has_changed, tls_should_reload_context (reload decision chain including Windows/macOS system cert logic), and tls_store_string managed string helper.
flb_tls_create, destroy, and configuration setters
src/tls/flb_tls.c
Extends flb_tls_create to initialize reload_mutex, store config strings via tls_store_string, set system_certificates_loaded, and refresh initial file statuses. Implements flb_tls_reload_if_needed with mutex locking, should-reload check, backend context_reload invocation, file status refresh, and return codes. Updates flb_tls_destroy to free all stored strings and destroy reload_mutex. Updates setters for minmax_proto, ciphers, alpn, verify_client, and Windows certstore_name/use_enterprise_store/client_thumbprints to persist values and handle allocation errors.
Session creation reload integration
src/tls/flb_tls.c
Calls flb_tls_reload_if_needed at the start of flb_tls_session_create before handshake initiation to apply any detected certificate/config changes.
OpenSSL tls_context_reload callback
src/tls/openssl.c
Implements tls_context_reload: builds a new SSL_CTX via tls_context_create, applies verification, protocol bounds, cipher list, ALPN, Windows certstore and thumbprints, and optionally reloads system certificates. Atomically swaps ctx->ctx under mutex, reinstalls server ALPN callback, and frees superseded context. Registers callback in tls_openssl backend struct.
TLS reload test and file helpers
tests/internal/upstream_tls.c
Adds stdio.h and string.h includes. Implements copy_file and append_file static helpers for test setup/teardown via file operations. Implements test_tls_reload_when_certificate_file_changes to verify no-reload before file change and reload-triggered-by-key-file-modification behavior, then cleans up temporary files. Registers test under FLB_HAVE_TLS.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant flb_tls_session_create
  participant flb_tls_reload_if_needed
  participant tls_should_reload_context
  participant tls_context_reload

  Caller->>flb_tls_session_create: initiate TLS session
  flb_tls_session_create->>flb_tls_reload_if_needed: pre-handshake reload check
  flb_tls_reload_if_needed->>flb_tls_reload_if_needed: lock reload_mutex
  flb_tls_reload_if_needed->>tls_should_reload_context: file metadata changed?
  tls_should_reload_context-->>flb_tls_reload_if_needed: true/false
  alt reload needed
    flb_tls_reload_if_needed->>tls_context_reload: context_reload(tls)
    tls_context_reload->>tls_context_reload: build new SSL_CTX, apply settings
    tls_context_reload->>tls_context_reload: mutex-swap ctx->ctx
    tls_context_reload-->>flb_tls_reload_if_needed: 0 (success) or -1 (failure)
    flb_tls_reload_if_needed->>flb_tls_reload_if_needed: refresh file statuses
  end
  flb_tls_reload_if_needed->>flb_tls_reload_if_needed: unlock reload_mutex
  flb_tls_reload_if_needed-->>flb_tls_session_create: 1/0/-1
  flb_tls_session_create-->>Caller: session handle or error
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • edsiper
  • fujimotos
  • koleini

🐇 A certificate changed, the rabbit did say,
"Reload it now — no downtime today!"
With a mutex locked tight and a stat() call clear,
New SSL_CTX swapped in, no need to fear.
Hop hop, fresh certs on every new peer! 🔐✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 4.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: implementing TLS certificate reloading primarily on Linux.
Linked Issues check ✅ Passed The PR implements automatic TLS certificate reloading by watching certificate files and reloading when changes are detected, directly addressing issue #10692's core requirement.
Out of Scope Changes check ✅ Passed All changes focus on TLS certificate reload functionality: header extensions, reload implementation, OpenSSL backend support, and related tests remain within scope.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch cosmo0920-reload-and-watch-certificates

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 09c64c5705

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/tls/openssl.c

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
tests/internal/upstream_tls.c (1)

253-255: ⚡ Quick win

Add a post-reload assertion to verify status refresh behavior.

After asserting reload returns 1, add one more flb_tls_reload_if_needed(tls) == 0 check immediately. This locks in the contract that file status is refreshed after reload and prevents repeated false-positive reloads without further file changes.

Suggested enhancement
     TEST_CHECK(append_file(dst_key, "\n") == 0);
     ret = flb_tls_reload_if_needed(tls);
     TEST_CHECK(ret == 1);
+    TEST_CHECK(flb_tls_reload_if_needed(tls) == 0);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/internal/upstream_tls.c` around lines 253 - 255, After the existing
TEST_CHECK assertion that verifies flb_tls_reload_if_needed(tls) returns 1, add
an immediate follow-up TEST_CHECK assertion that calls
flb_tls_reload_if_needed(tls) again and verifies it returns 0. This confirms
that the file status has been refreshed after the reload and prevents
false-positive repeated reloads without further file changes.
src/tls/flb_tls.c (2)

340-342: 💤 Low value

Consider checking pthread_mutex_init return value.

While pthread_mutex_init rarely fails, it can return ENOMEM or EAGAIN under resource pressure. Currently, a silent failure would leave the mutex in an undefined state, potentially causing undefined behavior when pthread_mutex_lock is called in flb_tls_reload_if_needed.

🔧 Suggested defensive check
     tls->ctx = backend;
     tls->api = &tls_openssl;
-    pthread_mutex_init(&tls->reload_mutex, NULL);
+    if (pthread_mutex_init(&tls->reload_mutex, NULL) != 0) {
+        flb_errno();
+        tls_context_destroy(backend);
+        flb_free(tls);
+        return NULL;
+    }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/tls/flb_tls.c` around lines 340 - 342, The pthread_mutex_init call
initializing tls->reload_mutex does not check its return value, which could
leave the mutex in an undefined state if initialization fails due to resource
constraints. Add a check for the return value of pthread_mutex_init in the
initialization block and handle the error appropriately, such as by logging an
error and returning a failure status to prevent subsequent mutex operations like
pthread_mutex_lock in flb_tls_reload_if_needed from operating on an improperly
initialized mutex.

595-595: ⚡ Quick win

Function opening brace should be on the next line.

Per the project's coding guidelines, function opening braces should be on a new line. The brace is currently on the same line as the function signature.

🔧 Suggested fix
-int flb_tls_set_client_thumbprints(struct flb_tls *tls, const char *thumbprints) {
+int flb_tls_set_client_thumbprints(struct flb_tls *tls, const char *thumbprints)
+{
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/tls/flb_tls.c` at line 595, Move the opening brace to a new line in the
flb_tls_set_client_thumbprints function definition to comply with the project's
coding guidelines. The function signature should end on one line, and the
opening brace should be placed on the following line by itself.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/tls/openssl.c`:
- Around line 1362-1366: The ALPN callback is registered on new_ctx during the
tls_context_alpn_set call with new_ctx as the user data pointer, but after the
context swap operation exchanges the SSL_CTX pointers and tls_context_destroy
frees new_ctx, the callback still references the freed pointer. Fix this by
re-registering the ALPN callback after the context swap (after the operation
that exchanges SSL_CTX pointers but before tls_context_destroy is called) so
that the callback's user data pointer references the correct active context
instead of the freed one. This ensures subsequent TLS handshakes with ALPN will
dereference valid memory.

In `@tests/internal/upstream_tls.c`:
- Around line 45-56: The `copy_file` function does not distinguish between
end-of-file and actual read errors when `fread()` returns 0. After the while
loop in `copy_file` exits, add an explicit check using `ferror(in)` to detect if
a read error occurred during the file reading process. If `ferror(in)` indicates
an error condition, close both file handles and return -1 to signal failure;
otherwise proceed with closing the files and returning 0 to indicate successful
completion.

---

Nitpick comments:
In `@src/tls/flb_tls.c`:
- Around line 340-342: The pthread_mutex_init call initializing
tls->reload_mutex does not check its return value, which could leave the mutex
in an undefined state if initialization fails due to resource constraints. Add a
check for the return value of pthread_mutex_init in the initialization block and
handle the error appropriately, such as by logging an error and returning a
failure status to prevent subsequent mutex operations like pthread_mutex_lock in
flb_tls_reload_if_needed from operating on an improperly initialized mutex.
- Line 595: Move the opening brace to a new line in the
flb_tls_set_client_thumbprints function definition to comply with the project's
coding guidelines. The function signature should end on one line, and the
opening brace should be placed on the following line by itself.

In `@tests/internal/upstream_tls.c`:
- Around line 253-255: After the existing TEST_CHECK assertion that verifies
flb_tls_reload_if_needed(tls) returns 1, add an immediate follow-up TEST_CHECK
assertion that calls flb_tls_reload_if_needed(tls) again and verifies it returns
0. This confirms that the file status has been refreshed after the reload and
prevents false-positive repeated reloads without further file changes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f8758233-5637-4f8c-bfb7-970f1a7a4c3c

📥 Commits

Reviewing files that changed from the base of the PR and between 4ddcb11 and 09c64c5.

📒 Files selected for processing (4)
  • include/fluent-bit/tls/flb_tls.h
  • src/tls/flb_tls.c
  • src/tls/openssl.c
  • tests/internal/upstream_tls.c

Comment thread src/tls/openssl.c
Comment thread tests/internal/upstream_tls.c
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
@cosmo0920

Copy link
Copy Markdown
Contributor Author

@patrick-stephens FYI: we finally revived the capability of executing package test on our workflows! 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-required ok-package-test Run PR packaging tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Watch TLS certificates and reload on changes

1 participant