multiline: ml: Propagate metadata even if on exceptions#11469
multiline: ml: Propagate metadata even if on exceptions#11469
Conversation
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
📝 WalkthroughWalkthroughAdds per-stream-group in-memory metadata snapshot storage and management for multiline processing: two new fields on flb_ml_stream_group and new APIs to add and purge deep-copied msgpack metadata, plus calls to preserve and clear metadata across multiline lifecycle and on destruction. (≤50 words) Changes
Sequence DiagramsequenceDiagram
participant Producer as Content Producer
participant MetaMgr as Metadata Manager
participant StreamGrp as Stream Group
participant Flusher as Flush Handler
Producer->>MetaMgr: flb_ml_stream_group_add_metadata(metadata)
MetaMgr->>MetaMgr: deep_copy_msgpack_object(metadata)
MetaMgr->>StreamGrp: append metadata_objects entry
Note over StreamGrp: metadata snapshots retained per-group until flush or destroy
Flusher->>StreamGrp: request metadata for emit
StreamGrp->>MetaMgr: get/deduplicate/pack metadata
MetaMgr->>Flusher: provide packed/deduped metadata
Flusher->>MetaMgr: flb_ml_stream_group_purge_metadata(group)
MetaMgr->>StreamGrp: clear list & reset initialized flag
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/multiline/flb_ml.c (1)
1610-1623:⚠️ Potential issue | 🟡 MinorMetadata not purged on early error returns.
When the first-line state buffer unpack fails (lines 1610-1614 or 1618-1623), the function returns
-1without reaching theflb_ml_stream_group_purge_metadata(group)at line 1767. The accumulated metadata entries remain in the group's list and will be incorrectly included in the next successful flush.🔧 Proposed fix
if (ret != MSGPACK_UNPACK_SUCCESS) { flb_error("[multiline] could not unpack first line state buffer"); msgpack_unpacked_destroy(&result); group->mp_sbuf.size = 0; + flb_ml_stream_group_purge_metadata(group); return -1; } map = result.data; if (map.type != MSGPACK_OBJECT_MAP) { flb_error("[multiline] expected MAP type in first line state buffer"); msgpack_unpacked_destroy(&result); group->mp_sbuf.size = 0; + flb_ml_stream_group_purge_metadata(group); return -1; }
🤖 Fix all issues with AI agents
In `@src/multiline/flb_ml.c`:
- Around line 1473-1478: The deep-copy failure path leaks partially-allocated
sub-objects in entry->obj; after flb_ml_msgpack_object_deep_copy_convert(...)
returns non-zero, call the routine that frees/destroys the msgpack object (e.g.,
flb_ml_msgpack_object_destroy(&entry->obj) or the project’s equivalent
destructor) before flb_free(entry) so all heap allocations from the failed copy
are released, then return -1; keep mk_list_entry_init(&entry->_head) as-is.
- Around line 1329-1339: The MSGPACK_OBJECT_STR path currently calls
flb_malloc(len) and treats NULL as failure, which breaks when src->via.str.size
== 0 because flb_malloc(0) can return NULL; change the logic in the
MSGPACK_OBJECT_STR handling so that when src->via.str.size == 0 you do not call
flb_malloc and instead set dst->via.str.ptr to an immutable empty buffer (or
literal "") and dst->via.str.size = 0, otherwise allocate and copy as before
(use the same pattern for MSGPACK_OBJECT_BIN and MSGPACK_OBJECT_EXT referring to
dst->via.bin.ptr/dst->via.ext.ptr and src->via.bin.size/src->via.ext.size).
- Around line 1374-1419: The array and map deep-copy loops leak memory on
mid-loop failure; after allocating dst->via.array.ptr or dst->via.map.ptr and
before returning -1 when flb_ml_msgpack_object_deep_copy_convert fails, call
flb_ml_msgpack_object_destroy(dst) to free any partially-copied children and the
allocated ptr, then return -1. Apply the same fix in both the
MSGPACK_OBJECT_ARRAY loop (checking failure inside the for loop that copies into
dst->via.array.ptr[i]) and in the MSGPACK_OBJECT_MAP loop (for failures copying
dst->via.map.ptr[i].key or .val), ensuring you clean up dst via
flb_ml_msgpack_object_destroy before returning.
🧹 Nitpick comments (4)
src/multiline/flb_ml.c (4)
1563-1564: Double purge of metadata on success path.
flb_ml_flush_metadata_bufferpurges metadata at line 1564, and thenflb_ml_flush_stream_grouppurges again at line 1767. The second call is a harmless no-op but makes ownership semantics unclear. Consider removing the purge fromflb_ml_flush_metadata_bufferand letting the caller own cleanup, or vice versa.Also applies to: 1767-1767
1421-1424: Shallow copy in the default case could cause double-free for pointer-bearing types.
*dst = *srccopies raw union bytes. If an unhandled type contains a pointer (unlikely today, but possible with future msgpack extensions), bothsrcanddstwould reference the same memory, leading to a double-free whenflb_ml_msgpack_object_destroyis called. Consider settingdst->type = MSGPACK_OBJECT_NILinstead or returning an error.
349-351: Return value offlb_ml_stream_group_add_metadatais silently discarded.If the deep-copy allocation fails (OOM), the metadata for this line is lost without any log message. Consider at least logging a warning on failure so operators can diagnose metadata loss.
1501-1505: Usemk_list_is_empty()instead of manual empty-list check.Replace
(group->metadata_objects.next == &group->metadata_objects)with(mk_list_is_empty(&group->metadata_objects) == 0)for consistency with the rest of the codebase.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f9b0a94ed5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/multiline/flb_ml.c (1)
1632-1644:⚠️ Potential issue | 🟠 MajorMemory leak: early returns bypass metadata purge.
These two early-return paths (
msgpack_unpack_nextfailure at line 1636 and non-MAP check at line 1644) were harmless before this PR because metadata was stored inmp_md_sbuf(just a size reset). Now that metadata is a linked list of heap-allocated deep-copiedmsgpack_objectsnapshots, returning here leaks every entry ingroup->metadata_objects.Additionally, the local
mp_sbufinitialized at line 1615 is not destroyed on these paths (pre-existing).🔧 Proposed fix — purge metadata and destroy local sbuf on early returns
if (ret != MSGPACK_UNPACK_SUCCESS) { flb_error("[multiline] could not unpack first line state buffer"); msgpack_unpacked_destroy(&result); + msgpack_sbuffer_destroy(&mp_sbuf); + flb_ml_stream_group_purge_metadata(group); group->mp_sbuf.size = 0; return -1; } map = result.data; if (map.type != MSGPACK_OBJECT_MAP) { flb_error("[multiline] expected MAP type in first line state buffer"); msgpack_unpacked_destroy(&result); + msgpack_sbuffer_destroy(&mp_sbuf); + flb_ml_stream_group_purge_metadata(group); group->mp_sbuf.size = 0; return -1; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/multiline/flb_ml.c` around lines 1632 - 1644, On the two early-return paths that follow msgpack_unpack_next and the map-type check, ensure you free the per-group metadata list and destroy the local sbuf before returning: iterate and free every heap-allocated entry in group->metadata_objects (purge the linked list and reset any count/head pointers) and call msgpack_sbuffer_destroy(&mp_sbuf) for the locally-initialized mp_sbuf, then proceed to msgpack_unpacked_destroy(&result) and set group->mp_sbuf.size = 0 before returning -1; update both failure branches (the branch after msgpack_unpack_next failure and the non-MAP branch) to perform these cleanup steps so no metadata or sbuf memory is leaked.
🧹 Nitpick comments (2)
src/multiline/flb_ml.c (2)
1523-1527: Consider usingmk_list_is_emptyfor readability.The raw pointer comparison
group->metadata_objects.next == &group->metadata_objectsis a correct empty-list check, but using a list API macro (if available) would be clearer.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/multiline/flb_ml.c` around lines 1523 - 1527, Replace the raw empty-list pointer comparison in the early-return check with the list API macro for clarity: change the condition that tests group->metadata_objects.next == &group->metadata_objects to use mk_list_is_empty(&group->metadata_objects) (keeping the surrounding checks for group == NULL and group->metadata_objects_initialized). Update the condition in the function that contains group, metadata_objects_initialized and metadata_objects to use mk_list_is_empty so the intent is clearer and consistent with other list usage.
1785-1790: Double metadata purge is safe but redundant.
flb_ml_flush_metadata_buffer(line 1586) unconditionally purges the metadata list. The purge at line 1789 runs on an already-empty list. This is harmless and defensive — just noting it for awareness. No action needed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/multiline/flb_ml.c` around lines 1785 - 1790, The code calls flb_ml_stream_group_purge_metadata twice: once unconditionally in flb_ml_flush_metadata_buffer and again later where group->mp_md_sbuf.size is reset; remove the redundant second call to flb_ml_stream_group_purge_metadata (the one after resetting group->truncated and before setting group->mp_md_sbuf.size = 0) to avoid unnecessary double-purge while leaving the original purge in flb_ml_flush_metadata_buffer intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/multiline/flb_ml.c`:
- Around line 1759-1766: The error path after the event encoder failure in
flb_ml.c fails to destroy the local msgpack sbuffer; ensure that when ret !=
FLB_EVENT_ENCODER_SUCCESS you call msgpack_sbuffer_destroy(&mp_sbuf) before
purging metadata and returning -1. Update the error branch that currently logs
flb_error(...) and calls flb_ml_stream_group_purge_metadata(group) /
group->mp_md_sbuf.size = 0 so it first destroys mp_sbuf (via
msgpack_sbuffer_destroy) to avoid leaking resources, then proceed with the
existing cleanup and return.
- Around line 1451-1505: stream_group_destroy currently frees the group without
cleaning up metadata entries added by flb_ml_stream_group_add_metadata, leaking
flb_ml_metadata_object_entry and their msgpack objects; update
stream_group_destroy to call flb_ml_stream_group_purge_metadata(group) before
flb_free(group) so all metadata entries are removed and their resources freed
(ensure you reference the flb_ml_stream_group_purge_metadata function and call
it at the start of stream_group_destroy's teardown path).
---
Outside diff comments:
In `@src/multiline/flb_ml.c`:
- Around line 1632-1644: On the two early-return paths that follow
msgpack_unpack_next and the map-type check, ensure you free the per-group
metadata list and destroy the local sbuf before returning: iterate and free
every heap-allocated entry in group->metadata_objects (purge the linked list and
reset any count/head pointers) and call msgpack_sbuffer_destroy(&mp_sbuf) for
the locally-initialized mp_sbuf, then proceed to
msgpack_unpacked_destroy(&result) and set group->mp_sbuf.size = 0 before
returning -1; update both failure branches (the branch after msgpack_unpack_next
failure and the non-MAP branch) to perform these cleanup steps so no metadata or
sbuf memory is leaked.
---
Nitpick comments:
In `@src/multiline/flb_ml.c`:
- Around line 1523-1527: Replace the raw empty-list pointer comparison in the
early-return check with the list API macro for clarity: change the condition
that tests group->metadata_objects.next == &group->metadata_objects to use
mk_list_is_empty(&group->metadata_objects) (keeping the surrounding checks for
group == NULL and group->metadata_objects_initialized). Update the condition in
the function that contains group, metadata_objects_initialized and
metadata_objects to use mk_list_is_empty so the intent is clearer and consistent
with other list usage.
- Around line 1785-1790: The code calls flb_ml_stream_group_purge_metadata
twice: once unconditionally in flb_ml_flush_metadata_buffer and again later
where group->mp_md_sbuf.size is reset; remove the redundant second call to
flb_ml_stream_group_purge_metadata (the one after resetting group->truncated and
before setting group->mp_md_sbuf.size = 0) to avoid unnecessary double-purge
while leaving the original purge in flb_ml_flush_metadata_buffer intact.
f744f6f to
30e9e98
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
src/multiline/flb_ml.c (3)
1451-1457: Lazy initialization pattern is correct but could be set eagerly instream_group_create.The lazy-init via
metadata_objects_initializedflag works because every access path (add, purge, flush) checks the flag. However, initializing the list instream_group_create(inflb_ml_stream.c) would eliminate the flag entirely and simplify all call sites.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/multiline/flb_ml.c` around lines 1451 - 1457, The metadata list lazy-init adds unnecessary complexity; instead eagerly initialize the list in stream_group_create and remove the metadata_objects_initialized flag and the helper flb_ml_stream_group_metadata_list_init. In practice: in the constructor flb_ml_stream_group_create (in flb_ml_stream.c) call mk_list_init(&group->metadata_objects) and stop setting or checking metadata_objects_initialized; then remove the metadata_objects_initialized field from struct flb_ml_stream_group and delete calls to flb_ml_stream_group_metadata_list_init and its definition, and simplify callers such as flb_ml_stream_group_add_metadata_object, flb_ml_stream_group_purge_metadata and flb_ml_stream_group_flush_metadata to assume metadata_objects is always initialized.
1585-1586: Double purge: metadata is purged both insideflb_ml_flush_metadata_buffer(line 1586) and again at end-of-flush (line 1790).The second purge at line 1790 iterates an already-empty list and is harmless, but it's redundant on the success path. If you'd like to keep both for defensive safety that's fine — just noting it.
Also applies to: 1790-1790
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/multiline/flb_ml.c` around lines 1585 - 1586, The metadata purge is being called twice: once inside flb_ml_flush_metadata_buffer and again via flb_ml_stream_group_purge_metadata at the end-of-flush; remove the redundant second call (the end-of-flush invocation) or guard it so it only runs when metadata remains (e.g., check the group's metadata list emptiness before calling flb_ml_stream_group_purge_metadata). Update the flush routine that currently calls flb_ml_stream_group_purge_metadata at the end so it either deletes that call or wraps it in a conditional, leaving the single purge inside flb_ml_flush_metadata_buffer as the canonical cleanup.
1442-1445: Default case performs a shallow copy — safe for known msgpack scalar types only.The
*dst = *srcshallow copy is fine for the remaining scalar types in the msgpack spec (which have no heap pointers). However, if a future msgpack-c version adds a new pointer-bearing type,flb_ml_msgpack_object_destroywould not free it (itsdefaultis a no-op), so the data would just be an opaque copy rather than a leak. Consider adding a log warning here for unknown types to ease future debugging.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/multiline/flb_ml.c` around lines 1442 - 1445, The switch default in the msgpack copy (where "*dst = *src" is used) silently performs a shallow copy for unknown/future types; change it to emit a warning so unexpected pointer-bearing types are easier to detect: in the default branch (around the copy logic in the function handling msgpack object copy), call the project's logging API (e.g., flb_warn or equivalent) to log the msgpack type tag or a brief message identifying the unknown type and the code path, then still perform the shallow copy to preserve current behavior; this makes issues visible to developers and ties into flb_ml_msgpack_object_destroy investigations if a leak appears.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/multiline/flb_ml_stream.c`:
- Around line 150-151: The call to flb_ml_stream_group_purge_metadata from
flb_ml_stream.c fails because the function is defined as static in flb_ml.c and
not visible across translation units; either remove the static from the flb_ml.c
definition so it has external linkage and add a prototype to the public
multiline header (declare extern flb_ml_stream_group_purge_metadata(...) in
include header), or if the function must remain internal, move its
implementation into flb_ml_stream.c and call the local helper instead; update
the declaration/usage to keep a single non-static definition and ensure the
header contains the prototype for callers like flb_ml_stream.c.
---
Nitpick comments:
In `@src/multiline/flb_ml.c`:
- Around line 1451-1457: The metadata list lazy-init adds unnecessary
complexity; instead eagerly initialize the list in stream_group_create and
remove the metadata_objects_initialized flag and the helper
flb_ml_stream_group_metadata_list_init. In practice: in the constructor
flb_ml_stream_group_create (in flb_ml_stream.c) call
mk_list_init(&group->metadata_objects) and stop setting or checking
metadata_objects_initialized; then remove the metadata_objects_initialized field
from struct flb_ml_stream_group and delete calls to
flb_ml_stream_group_metadata_list_init and its definition, and simplify callers
such as flb_ml_stream_group_add_metadata_object,
flb_ml_stream_group_purge_metadata and flb_ml_stream_group_flush_metadata to
assume metadata_objects is always initialized.
- Around line 1585-1586: The metadata purge is being called twice: once inside
flb_ml_flush_metadata_buffer and again via flb_ml_stream_group_purge_metadata at
the end-of-flush; remove the redundant second call (the end-of-flush invocation)
or guard it so it only runs when metadata remains (e.g., check the group's
metadata list emptiness before calling flb_ml_stream_group_purge_metadata).
Update the flush routine that currently calls flb_ml_stream_group_purge_metadata
at the end so it either deletes that call or wraps it in a conditional, leaving
the single purge inside flb_ml_flush_metadata_buffer as the canonical cleanup.
- Around line 1442-1445: The switch default in the msgpack copy (where "*dst =
*src" is used) silently performs a shallow copy for unknown/future types; change
it to emit a warning so unexpected pointer-bearing types are easier to detect:
in the default branch (around the copy logic in the function handling msgpack
object copy), call the project's logging API (e.g., flb_warn or equivalent) to
log the msgpack type tag or a brief message identifying the unknown type and the
code path, then still perform the shallow copy to preserve current behavior;
this makes issues visible to developers and ties into
flb_ml_msgpack_object_destroy investigations if a leak appears.
98e7fc0 to
fb6f031
Compare
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
fb6f031 to
8d0ac8d
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/multiline/flb_ml.c (2)
1786-1789: Redundant purge afterflb_ml_flush_metadata_bufferalready purges.
flb_ml_flush_metadata_buffer(line 1580) unconditionally callsflb_ml_stream_group_purge_metadata(group)before returning. The purge here on line 1788 is therefore always operating on an already-empty list. It's harmless (the iteration is a no-op on an emptymk_list), but worth noting it's defensive duplication.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/multiline/flb_ml.c` around lines 1786 - 1789, The call to flb_ml_stream_group_purge_metadata(group) in the block that sets flb_sds_len_set(group->buf, 0), group->truncated = FLB_FALSE and group->mp_md_sbuf.size = 0 is redundant because flb_ml_flush_metadata_buffer already unconditionally purges metadata; remove this duplicate flb_ml_stream_group_purge_metadata(group) invocation (leave the buffer resets and mp_md_sbuf.size assignment intact) to avoid unnecessary no-op iteration and keep a single canonical purge point in flb_ml_flush_metadata_buffer.
1436-1440: Shallow-copy fallback for unknown msgpack types could silently alias pointers.The
defaultbranch does*dst = *src, which is a shallow (bitwise) copy. While all known msgpack object types with heap pointers (STR, BIN, EXT, ARRAY, MAP) are handled above, if a future msgpack-c version adds a new pointer-bearing type, this would create a double-free on destroy. Consider logging a warning or returning an error for truly unexpected types to make debugging easier.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/multiline/flb_ml.c` around lines 1436 - 1440, The default switch branch currently does a shallow bitwise copy (`*dst = *src`) which can alias heap pointers; instead, in the default branch for unknown msgpack types (where `dst` and `src` are used) refuse to shallow-copy: log a warning including the unexpected `src->type`, zero/initialize `*dst` to a safe empty state (e.g. set `dst->type = MSGPACK_OBJECT_NIL` or memset to 0) and return/propagate an error code so callers know cloning failed; update the function containing that switch (the block with `default:` handling `dst`/`src`) to avoid double-free risk and ensure callers handle the new error path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/multiline/flb_ml.c`:
- Around line 343-345: The bug is that flb_ml_stream_group_add_metadata is
called after flb_ml_flush_stream_group may have purged metadata, leaving an
orphaned snapshot for the next record; to fix, ensure metadata is added before
any flush or avoid re-adding after a flush: move the call to
flb_ml_stream_group_add_metadata in package_content so it runs prior to any
calls to flb_ml_flush_stream_group (including those inside flb_ml_rule_process),
or alternately add a guard (e.g., track a local "flushed" flag or check
stream_group->mp_md_sbuf.size) and only call flb_ml_stream_group_add_metadata
when the group has not just been flushed; update package_content accordingly so
metadata cannot be appended after a purge.
---
Nitpick comments:
In `@src/multiline/flb_ml.c`:
- Around line 1786-1789: The call to flb_ml_stream_group_purge_metadata(group)
in the block that sets flb_sds_len_set(group->buf, 0), group->truncated =
FLB_FALSE and group->mp_md_sbuf.size = 0 is redundant because
flb_ml_flush_metadata_buffer already unconditionally purges metadata; remove
this duplicate flb_ml_stream_group_purge_metadata(group) invocation (leave the
buffer resets and mp_md_sbuf.size assignment intact) to avoid unnecessary no-op
iteration and keep a single canonical purge point in
flb_ml_flush_metadata_buffer.
- Around line 1436-1440: The default switch branch currently does a shallow
bitwise copy (`*dst = *src`) which can alias heap pointers; instead, in the
default branch for unknown msgpack types (where `dst` and `src` are used) refuse
to shallow-copy: log a warning including the unexpected `src->type`,
zero/initialize `*dst` to a safe empty state (e.g. set `dst->type =
MSGPACK_OBJECT_NIL` or memset to 0) and return/propagate an error code so
callers know cloning failed; update the function containing that switch (the
block with `default:` handling `dst`/`src`) to avoid double-free risk and ensure
callers handle the new error path.
Enter
[N/A]in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
macOS's leaks command result is:
It reported no leaks.
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-testlabel to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit
Performance
Reliability