-
Notifications
You must be signed in to change notification settings - Fork 99
MINIFICPP-2705 - Configurable timeout, download assets directly to disk #2088
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements configurable timeout for asset downloads and changes the download mechanism to stream data directly to disk rather than loading it entirely in memory first. This improves memory efficiency for large asset downloads.
Changes:
- Added new configuration property
nifi.c2.asset.download.timeoutfor controlling asset download timeouts - Refactored HTTP read callback architecture to support streaming chunk-based processing
- Modified asset and flow download implementations to write directly to disk/strings via chunk callbacks
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| minifi-api/include/minifi-cpp/properties/Configuration.h | Added new configuration constant for asset download timeout |
| libminifi/include/c2/C2Protocol.h | Changed fetch interface to accept chunk callback instead of returning payload |
| libminifi/include/c2/protocols/RESTSender.h | Added asset_download_timeout member and updated fetch signature |
| libminifi/include/utils/file/AssetManager.h | Updated sync method signature to accept callback that writes to file path |
| libminifi/src/c2/C2Agent.cpp | Implemented streaming download for assets and flows using chunk callbacks |
| libminifi/src/c2/protocols/RESTSender.cpp | Implemented new fetch method with timeout support and chunk-based processing |
| libminifi/src/utils/file/AssetManager.cpp | Updated to use fetch callback that writes directly to disk |
| core-framework/include/http/BaseHTTPClient.h | Refactored HTTPReadCallback hierarchy for chunk-based processing |
| core-framework/include/http/HTTPClient.h | Added setAbsoluteTimeout method and absolute_timeout member |
| core-framework/include/http/HTTPStream.h | Updated to use dynamic_cast for ByteOutputCallback access |
| core-framework/include/utils/ByteArrayCallback.h | Changed write methods to accept const char* |
| core-framework/src/http/BaseHTTPClient.cpp | Updated receiveWrite to use new process method |
| core-framework/src/http/HTTPClient.cpp | Updated getResponseBody to use dynamic_cast |
| core-framework/src/http/HTTPStream.cpp | Updated close and read to use dynamic_cast |
| core-framework/src/utils/ByteArrayCallback.cpp | Updated write methods to accept const char* |
| C2.md | Added documentation for new timeout configuration property |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return dynamic_cast<utils::ByteOutputCallback*>(http_client_->getReadCallback()) | ||
| && dynamic_cast<utils::ByteOutputCallback*>(http_client_->getReadCallback())->getSize() > 0; |
Copilot
AI
Jan 19, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dynamic_cast is performed twice in this expression without caching the result. This is inefficient and could be optimized by storing the cast result in a local variable.
| return; | ||
| } | ||
|
|
||
| std::filesystem::rename(tmp_file, file_path); |
Copilot
AI
Jan 19, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The std::filesystem::rename function can throw an exception on error. This call should be wrapped in error handling (try-catch or use the error_code overload) to prevent unhandled exceptions from propagating and ensure proper error reporting to the user.
| std::ofstream{root_ / path, std::ios::binary}.write(reinterpret_cast<const char*>(content.data()), gsl::narrow<std::streamsize>(content.size())); | ||
| for (auto& path : new_file_paths) { | ||
| create_dir(path.parent_path()); | ||
| std::filesystem::rename(path.string() + ".part", path); |
Copilot
AI
Jan 19, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The std::filesystem::rename function can throw an exception on error. This call should be wrapped in error handling (try-catch or use the error_code overload) to prevent unhandled exceptions from propagating, especially since this is called in a loop and could fail for some files but not others.
| } | ||
| asset_download_timeout_ = (configure->get(Configuration::nifi_c2_asset_download_timeout) | utils::andThen([] (const auto& s) { return parsing::parseDuration(s) | utils::toOptional(); })).value_or(0s); |
Copilot
AI
Jan 19, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The asset_download_timeout_ initialization occurs outside the configure null check block, which could lead to a null pointer dereference if configure is nullptr. This line should be moved inside the if (nullptr != configure) block at line 45 to prevent potential crashes.
| } | |
| asset_download_timeout_ = (configure->get(Configuration::nifi_c2_asset_download_timeout) | utils::andThen([] (const auto& s) { return parsing::parseDuration(s) | utils::toOptional(); })).value_or(0s); | |
| asset_download_timeout_ = (configure->get(Configuration::nifi_c2_asset_download_timeout) | utils::andThen([] (const auto& s) { return parsing::parseDuration(s) | utils::toOptional(); })).value_or(0s); | |
| } else { | |
| asset_download_timeout_ = 0s; | |
| } |
| bool success = protocol_->fetch(resolved_url.value(), update_sink_->getSupportedConfigurationFormats(), [&] (std::span<const char> chunk) { | ||
| flow_content.append(chunk.data(), chunk.size()); | ||
| return true; |
Copilot
AI
Jan 19, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flow_content string is accumulated in memory as chunks arrive. For large flows, this could consume significant memory. Consider if there's a maximum expected flow size, or if this should stream to a file instead (similar to how asset downloads are now handled).
| && dynamic_cast<utils::ByteOutputCallback*>(http_client_->getReadCallback()) | ||
| && dynamic_cast<utils::ByteOutputCallback*>(http_client_->getReadCallback())->getSize() == 0 | ||
| && dynamic_cast<utils::ByteOutputCallback*>(http_client_->getReadCallback())->waitingOps(); |
Copilot
AI
Jan 19, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dynamic_cast is performed three times in this expression without caching the result. This is inefficient and could be optimized by storing the cast result in a local variable.
| && dynamic_cast<utils::ByteOutputCallback*>(http_client_->getReadCallback()) | ||
| && dynamic_cast<utils::ByteOutputCallback*>(http_client_->getReadCallback())->getSize() == 0); |
Copilot
AI
Jan 19, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dynamic_cast is performed twice in this expression without caching the result. This is inefficient and could be optimized by storing the cast result in a local variable.
c5f565e to
f3663a3
Compare
Thank you for submitting a contribution to Apache NiFi - MiNiFi C++.
In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:
For all changes:
Is there a JIRA ticket associated with this PR? Is it referenced
in the commit message?
Does your PR title start with MINIFICPP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
Has your PR been rebased against the latest commit within the target branch (typically main)?
Is your initial contribution a single, squashed commit?
For code changes:
For documentation related changes:
Note:
Please ensure that once the PR is submitted, you check GitHub Actions CI results for build issues and submit an update to your PR as soon as possible.