feat(out_azure_blob): add support for managed identity#11474
feat(out_azure_blob): add support for managed identity#11474temporaer wants to merge 5 commits intofluent:masterfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds Managed Identity (system/user) and Workload Identity authentication to the out_azure_blob plugin: new MSI/WI token retrieval and exchange logic, OAuth2 context handling and token caching with mutexes, build/test updates, and new runtime tests. No changes to public API signatures beyond added helper declarations. Changes
Sequence Diagram(s)sequenceDiagram
participant FB as Fluent Bit (out_azure_blob)
participant HTTP as azure_blob_http.c
participant OAuth as OAuth2 Context
participant IMDS as Azure IMDS / MSAL
participant ABS as Azure Blob Storage
FB->>HTTP: Prepare request
HTTP->>OAuth: Check cached token / expiry
alt token missing or expired
OAuth->>IMDS: Request token (IMDS GET or MSAL POST / WI exchange)
IMDS-->>OAuth: Return access_token + expiry
OAuth->>OAuth: Cache token (expires_at)
else token valid
OAuth-->>HTTP: Return cached token
end
HTTP->>HTTP: Build "Authorization: Bearer <token>"
HTTP->>ABS: Send HTTP request with Bearer header
ABS-->>HTTP: Response
HTTP-->>FB: Complete
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
fix fluent#10777 Signed-off-by: Hannes Schulz <Hannes.Schulz@microsoft.com>
f0d51b8 to
73c62d3
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
Verify each finding against the current code and only fix it if needed.
In `@plugins/out_azure_blob/azure_blob_conf.c`:
- Around line 782-833: The OAuth2 context (created by flb_oauth2_create),
ctx->oauth_url SDS and the pthread_mutex_init in flb_azure_blob_conf_create are
not cleaned up on subsequent error returns (e.g., after azb_db_open fails),
leaking resources; fix by adding a unified cleanup path (e.g., an error/cleanup
label) and replace early returns after these allocations with a jump to that
label which calls flb_azure_blob_conf_destroy(ctx) (or otherwise destroys
ctx->o, frees ctx->oauth_url and destroys the mutex) before returning NULL so
flb_oauth2_create, ctx->oauth_url and the initialized token mutex are always
released on failure.
In `@plugins/out_azure_blob/azure_blob_http.c`:
- Around line 365-411: The MI and WI paths allocate `auth` with
flb_sds_create_size without checking for NULL, risking dereference in
flb_sds_cat_safe/flb_http_add_header; add a NULL check after flb_sds_create_size
in both the AZURE_BLOB_AUTH_MI_SYSTEM/AZURE_BLOB_AUTH_MI_USER block and the
AZURE_BLOB_AUTH_WI block: if flb_sds_create_size returns NULL, log an error with
flb_plg_error (include context and that allocation failed), unlock
ctx->token_mutex and return -1, otherwise proceed to flb_sds_cat_safe,
flb_http_add_header and flb_sds_destroy as before.
In `@plugins/out_azure_blob/azure_blob_msiauth.c`:
- Around line 172-187: The chain of flb_sds_cat calls building the request body
can return NULL mid-chain causing a NULL dereference; update the code around the
flb_sds_cat calls (the body variable concatenations) to check that body is
non-NULL immediately after each flb_sds_cat (or at least after each logical
group of concatenations) and on NULL perform the same cleanup
(flb_sds_destroy(federated_token)) and return -1; ensure you never call
flb_sds_cat with a NULL body and keep the existing error log message via
flb_error("[azure blob workload identity] failed to build request body") when
handling the failure.
- Around line 106-138: In read_token_from_file, handle the case where the
4096-byte stack buffer may truncate the token: after fread, check if bytes_read
== sizeof(buf) - 1 and emit a warning via flb_warn or flb_error indicating the
token may have been truncated (include token_file in the message); also change
the emptiness check from "bytes_read <= 0" to "bytes_read == 0" (since
bytes_read is size_t) so only zero-length reads are treated as errors, and
ensure you still null-terminate buf and create the flb_sds_t token as before.
In `@tests/runtime/out_azure_blob.c`:
- Around line 53-374: Many tests (e.g., flb_test_azure_blob_json_invalid,
flb_test_azure_blob_managed_identity_system,
flb_test_azure_blob_workload_identity, flb_test_azure_blob_mi_missing_client_id,
etc.) repeat identical setup steps; extract that into a small helper (e.g.,
azure_blob_test_setup) that creates and configures ctx, in_ffd and out_ffd
(calls flb_create, flb_service_set, flb_input + flb_input_set(tag), flb_output +
flb_output_set for match, account_name, container_name) and returns or outputs
those handles, then update each test to call this helper and only set the
auth-specific flb_output_set options and assertions thereafter to reduce
boilerplate and make intent clearer.
🧹 Nitpick comments (3)
🤖 Fix all nitpicks with AI agents
Verify each finding against the current code and only fix it if needed. In `@plugins/out_azure_blob/azure_blob_conf.c`: - Around line 782-833: The OAuth2 context (created by flb_oauth2_create), ctx->oauth_url SDS and the pthread_mutex_init in flb_azure_blob_conf_create are not cleaned up on subsequent error returns (e.g., after azb_db_open fails), leaking resources; fix by adding a unified cleanup path (e.g., an error/cleanup label) and replace early returns after these allocations with a jump to that label which calls flb_azure_blob_conf_destroy(ctx) (or otherwise destroys ctx->o, frees ctx->oauth_url and destroys the mutex) before returning NULL so flb_oauth2_create, ctx->oauth_url and the initialized token mutex are always released on failure. In `@plugins/out_azure_blob/azure_blob_msiauth.c`: - Around line 106-138: In read_token_from_file, handle the case where the 4096-byte stack buffer may truncate the token: after fread, check if bytes_read == sizeof(buf) - 1 and emit a warning via flb_warn or flb_error indicating the token may have been truncated (include token_file in the message); also change the emptiness check from "bytes_read <= 0" to "bytes_read == 0" (since bytes_read is size_t) so only zero-length reads are treated as errors, and ensure you still null-terminate buf and create the flb_sds_t token as before. In `@tests/runtime/out_azure_blob.c`: - Around line 53-374: Many tests (e.g., flb_test_azure_blob_json_invalid, flb_test_azure_blob_managed_identity_system, flb_test_azure_blob_workload_identity, flb_test_azure_blob_mi_missing_client_id, etc.) repeat identical setup steps; extract that into a small helper (e.g., azure_blob_test_setup) that creates and configures ctx, in_ffd and out_ffd (calls flb_create, flb_service_set, flb_input + flb_input_set(tag), flb_output + flb_output_set for match, account_name, container_name) and returns or outputs those handles, then update each test to call this helper and only set the auth-specific flb_output_set options and assertions thereafter to reduce boilerplate and make intent clearer.plugins/out_azure_blob/azure_blob_msiauth.c (1)
106-138: Token file read may silently truncate large tokens.The 4096-byte stack buffer limits the token to 4095 bytes. While typical Azure federated identity JWTs are well within this limit, a silently truncated token would cause hard-to-diagnose auth failures. Consider logging a warning when
bytes_read == sizeof(buf) - 1to aid debugging.Also,
bytes_readissize_t(unsigned), so<= 0on line 128 is effectively== 0.🔧 Proposed improvement
bytes_read = fread(buf, 1, sizeof(buf) - 1, fp); fclose(fp); - if (bytes_read <= 0) { + if (bytes_read == 0) { flb_error("[azure blob workload identity] could not read token from file: %s", token_file); return NULL; } + if (bytes_read == sizeof(buf) - 1) { + flb_warn("[azure blob workload identity] token file may be truncated " + "(read %zu bytes): %s", bytes_read, token_file); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/out_azure_blob/azure_blob_msiauth.c` around lines 106 - 138, In read_token_from_file, handle the case where the 4096-byte stack buffer may truncate the token: after fread, check if bytes_read == sizeof(buf) - 1 and emit a warning via flb_warn or flb_error indicating the token may have been truncated (include token_file in the message); also change the emptiness check from "bytes_read <= 0" to "bytes_read == 0" (since bytes_read is size_t) so only zero-length reads are treated as errors, and ensure you still null-terminate buf and create the flb_sds_t token as before.plugins/out_azure_blob/azure_blob_conf.c (1)
782-833: Resource leak on early-return after OAuth2 context creation.After the OAuth2 context,
oauth_url, andtoken_mutexare created (e.g., line 813 or 832), if a subsequent step likeazb_db_open(line 844) fails, the function returnsNULLwithout callingflb_azure_blob_conf_destroy. This leaks the OAuth2 context,oauth_urlSDS, and leaves the mutex initialized.This is a pre-existing pattern throughout
flb_azure_blob_conf_create(other resources also leak on early returns), but the OAuth2 upstream + TLS context is a heavier resource to leak. Consider either a cleanupgotolabel or callingflb_azure_blob_conf_destroybefore returningNULL.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/out_azure_blob/azure_blob_conf.c` around lines 782 - 833, The OAuth2 context (created by flb_oauth2_create), ctx->oauth_url SDS and the pthread_mutex_init in flb_azure_blob_conf_create are not cleaned up on subsequent error returns (e.g., after azb_db_open fails), leaking resources; fix by adding a unified cleanup path (e.g., an error/cleanup label) and replace early returns after these allocations with a jump to that label which calls flb_azure_blob_conf_destroy(ctx) (or otherwise destroys ctx->o, frees ctx->oauth_url and destroys the mutex) before returning NULL so flb_oauth2_create, ctx->oauth_url and the initialized token mutex are always released on failure.tests/runtime/out_azure_blob.c (1)
53-374: Consider extracting shared setup into a helper to reduce boilerplate.Every test function repeats the same
flb_create→flb_service_set→flb_input→flb_input_set→flb_output→flb_output_set(account_name, container_name, match) sequence. A small helper returningctx/in_ffd/out_ffdwould cut ~8 lines per test and make the auth-specific config stand out more clearly.That said, this pattern is common in existing Fluent Bit test files, so this is purely a readability nit.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/runtime/out_azure_blob.c` around lines 53 - 374, Many tests (e.g., flb_test_azure_blob_json_invalid, flb_test_azure_blob_managed_identity_system, flb_test_azure_blob_workload_identity, flb_test_azure_blob_mi_missing_client_id, etc.) repeat identical setup steps; extract that into a small helper (e.g., azure_blob_test_setup) that creates and configures ctx, in_ffd and out_ffd (calls flb_create, flb_service_set, flb_input + flb_input_set(tag), flb_output + flb_output_set for match, account_name, container_name) and returns or outputs those handles, then update each test to call this helper and only set the auth-specific flb_output_set options and assertions thereafter to reduce boilerplate and make intent clearer.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f0d51b8c55
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
🤖 Fix all issues with AI agents
Verify each finding against the current code and only fix it if needed.
In `@plugins/out_azure_blob/azure_blob_msiauth.c`:
- Around line 74-82: The current logging logs IMDS non-200 response payloads at
flb_info; change that to flb_debug so both success (when c->resp.status == 200)
and error payloads use flb_debug for the payload content. Locate the block
referencing c->resp.status and c->resp.payload in azure_blob_msiauth.c (the
flb_info call that prints the error payload) and replace that flb_info payload
log with flb_debug while keeping the existing flb_info that logs the HTTP status
line (flb_info("[azure blob msi auth] HTTP Status=%i", c->resp.status)).
🧹 Nitpick comments (1)
🤖 Fix all nitpicks with AI agents
Verify each finding against the current code and only fix it if needed. In `@plugins/out_azure_blob/azure_blob_msiauth.c`: - Around line 74-82: The current logging logs IMDS non-200 response payloads at flb_info; change that to flb_debug so both success (when c->resp.status == 200) and error payloads use flb_debug for the payload content. Locate the block referencing c->resp.status and c->resp.payload in azure_blob_msiauth.c (the flb_info call that prints the error payload) and replace that flb_info payload log with flb_debug while keeping the existing flb_info that logs the HTTP status line (flb_info("[azure blob msi auth] HTTP Status=%i", c->resp.status)).plugins/out_azure_blob/azure_blob_msiauth.c (1)
74-82: Consider lowering the log level for error-response payloads.When IMDS returns a non-200 status, the full payload is logged at
flb_infolevel (line 80). Error responses from IMDS may occasionally contain request metadata. Usingflb_debugfor the payload (same as the success case at line 77) would be more consistent and cautious.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/out_azure_blob/azure_blob_msiauth.c` around lines 74 - 82, The current logging logs IMDS non-200 response payloads at flb_info; change that to flb_debug so both success (when c->resp.status == 200) and error payloads use flb_debug for the payload content. Locate the block referencing c->resp.status and c->resp.payload in azure_blob_msiauth.c (the flb_info call that prints the error payload) and replace that flb_info payload log with flb_debug while keeping the existing flb_info that logs the HTTP status line (flb_info("[azure blob msi auth] HTTP Status=%i", c->resp.status)).
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
plugins/out_azure_blob/azure_blob_conf.c (1)
778-829: Hardcoded OAuth2 timeout of 3000 — consider making it configurable.The
flb_oauth2_createcalls on lines 803 and 822 both use a hardcoded timeout of3000(milliseconds). In environments with slow IMDS or AAD endpoints (e.g., high-latency networks, overloaded IMDS), this may be insufficient or, conversely, too generous for fast-fail scenarios. Consider exposing this as a configuration property for flexibility, consistent with how other plugins handle OAuth2 timeouts.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/out_azure_blob/azure_blob_conf.c` around lines 778 - 829, The two flb_oauth2_create calls currently pass a hardcoded timeout (3000); add a configurable timeout property (e.g., oauth2_timeout_ms) to the plugin config, parse/store it into the context (ctx->oauth_timeout_ms with a sensible default of 3000) during configuration setup, and replace the literal 3000 in both flb_oauth2_create(config, ctx->oauth_url, 3000) calls with ctx->oauth_timeout_ms so both IMDS (AZURE_BLOB_AUTH_MI_SYSTEM / AZURE_BLOB_AUTH_MI_USER) and workload identity (AZURE_BLOB_AUTH_WI) flows use the configured timeout.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/out_azure_blob/azure_blob_conf.c`:
- Around line 914-918: The code currently calls
pthread_mutex_destroy(&ctx->token_mutex) whenever ctx->atype indicates MI/WI
even if initialization may have failed; change the teardown to only destroy the
mutex if it was actually initialized by guarding the destroy with a check that
ctx->o (the oauth2 handle created by flb_oauth2_create) is non-NULL (or another
explicit "initialized" flag), so that pthread_mutex_destroy is only called when
pthread_mutex_init succeeded; update the error/cleanup path that references
ctx->atype, pthread_mutex_init, flb_oauth2_create, and flb_sds_create_size to
use this guard.
---
Nitpick comments:
In `@plugins/out_azure_blob/azure_blob_conf.c`:
- Around line 778-829: The two flb_oauth2_create calls currently pass a
hardcoded timeout (3000); add a configurable timeout property (e.g.,
oauth2_timeout_ms) to the plugin config, parse/store it into the context
(ctx->oauth_timeout_ms with a sensible default of 3000) during configuration
setup, and replace the literal 3000 in both flb_oauth2_create(config,
ctx->oauth_url, 3000) calls with ctx->oauth_timeout_ms so both IMDS
(AZURE_BLOB_AUTH_MI_SYSTEM / AZURE_BLOB_AUTH_MI_USER) and workload identity
(AZURE_BLOB_AUTH_WI) flows use the configured timeout.
Valgrind resultsBuilt with cmake .. -DFLB_DEV=On -DFLB_VALGRIND=On
make flb-rt-out_azure_blobCommand cd $HOME/r/fluent-bit/build && conda run -n flb-build valgrind \
--leak-check=full \
--show-leak-kinds=all \
--track-origins=yes \
--suppressions=../valgrind.supp \
--error-exitcode=1 \
./bin/flb-rt-out_azure_blobOutput |
Signed-off-by: Hannes Schulz <Hannes.Schulz@microsoft.com>
8cc89d3 to
0ddb7f0
Compare
There was a problem hiding this comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@plugins/out_azure_blob/azure_blob_conf.c`:
- Around line 914-918: The code currently guards pthread_mutex_destroy by
ctx->atype, but that can be set to MI/WI before pthread_mutex_init is called
(e.g., if flb_sds_create_size or flb_oauth2_create fails), causing undefined
behavior; change the guard to check that ctx->o is non-NULL (since
pthread_mutex_init is invoked only after a successful flb_oauth2_create which
populates ctx->o) before calling pthread_mutex_destroy(&ctx->token_mutex), and
keep the existing ctx->atype check as needed for clarity.
Add missing NULL checks after flb_sds_create_size() calls in the shared key, managed identity, and workload identity auth blocks of azb_http_client_setup(). If the allocation fails, the subsequent flb_sds_cat_safe and flb_http_add_header calls would dereference a NULL pointer. On failure, properly clean up resources (destroy buffers, unlock mutexes) and return -1. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Hannes Schulz <Hannes.Schulz@microsoft.com>
Move the auth_type parsing block before the flb_azure_blob_apply_remote_configuration() call. Previously, atype was still zero-initialized (AZURE_BLOB_AUTH_KEY) when remote config ran, causing it to always try to extract shared_key from the remote payload. This broke managed_identity and workload_identity setups that intentionally do not provide key material. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Hannes Schulz <Hannes.Schulz@microsoft.com>
Add a check after fread to detect when the token file exceeds the 4095-byte buffer. Previously the token would be silently truncated, causing opaque token-exchange failures. Now an error is logged and the read is rejected. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Hannes Schulz <Hannes.Schulz@microsoft.com>
This PR adds three new authentication methods to the
azure_bloboutput plugin: system-assigned managed identity, user-assigned managed identity (UAMI), and workload identity (OIDC token exchange). Previously onlykey(SharedKey HMAC-SHA256) andsas(SAS token) auth were supported.The implementation follows the same patterns already used in the
azure_kustoplugin, adapted for Azure Blob Storage's resource scope (https://storage.azure.com/).fix #10777
Full disclosure: I got significant help from github copilot when implementing this feature.
Motivation
Users running Fluent Bit on Azure VMs, VMSS, or AKS often prefer managed identities over static credentials (shared keys / SAS tokens). Managed identities eliminate the need to rotate secrets and reduce the attack surface. This brings
azure_blobto parity withazure_kusto, which already supports all three token-based auth methods.Configuration
System-assigned managed identity
[OUTPUT] Name azure_blob Match * account_name mystorageaccount container_name mycontainer auth_type managed_identity client_id systemUser-assigned managed identity (UAMI)
[OUTPUT] Name azure_blob Match * account_name mystorageaccount container_name mycontainer auth_type managed_identity client_id <managed-identity-client-id>Workload identity (AKS / OIDC federation)
[OUTPUT] Name azure_blob Match * account_name mystorageaccount container_name mycontainer auth_type workload_identity client_id <app-client-id> tenant_id <azure-tenant-id> workload_identity_token_file /var/run/secrets/azure/tokens/azure-identity-tokenNew config properties
auth_typekey,sas,managed_identity,workload_identityclient_idsystem(system-assigned) or the UAMI client ID. For WI: the app registration client IDtenant_idclient_secretworkload_identity_token_file/var/run/secrets/azure/tokens/azure-identity-tokenImplementation details
169.254.169.254) withMetadata: trueheader. WI reads an OIDC token from a projected file and exchanges it at the Azure AD token endpoint viaclient_credentialsgrant.Authorization: Bearer <token>instead of the SharedKey canonical request signature. The existingx-ms-version: 2019-12-12already supports Bearer auth.pthread_mutex_t(same pattern asazure_kusto).flb_oauth2context and refreshed only when expired (flb_oauth2_token_expired()).keyandsasauth paths are completely unchanged.Files changed
plugins/out_azure_blob/azure_blob_msiauth.hplugins/out_azure_blob/azure_blob_msiauth.cplugins/out_azure_blob/azure_blob.hplugins/out_azure_blob/azure_blob.cplugins/out_azure_blob/azure_blob_conf.cplugins/out_azure_blob/azure_blob_http.cazb_http_client_setup()plugins/out_azure_blob/CMakeLists.txtazure_blob_msiauth.cto source listTesting
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-testlabel to test for all targets (requires maintainer to do).Documentation
Backporting
Debug log output
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit
New Features
Chores
Tests