Skip to content

Conversation

@adamdebreceni
Copy link
Contributor

Thank you for submitting a contribution to Apache NiFi - MiNiFi C++.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there a JIRA ticket associated with this PR? Is it referenced
    in the commit message?

  • Does your PR title start with MINIFICPP-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.

  • Has your PR been rebased against the latest commit within the target branch (typically main)?

  • Is your initial contribution a single, squashed commit?

For code changes:

  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE file?
  • If applicable, have you updated the NOTICE file?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

Note:

Please ensure that once the PR is submitted, you check GitHub Actions CI results for build issues and submit an update to your PR as soon as possible.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements configurable timeout for asset downloads and changes the download mechanism to stream data directly to disk rather than loading it entirely in memory first. This improves memory efficiency for large asset downloads.

Changes:

  • Added new configuration property nifi.c2.asset.download.timeout for controlling asset download timeouts
  • Refactored HTTP read callback architecture to support streaming chunk-based processing
  • Modified asset and flow download implementations to write directly to disk/strings via chunk callbacks

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
minifi-api/include/minifi-cpp/properties/Configuration.h Added new configuration constant for asset download timeout
libminifi/include/c2/C2Protocol.h Changed fetch interface to accept chunk callback instead of returning payload
libminifi/include/c2/protocols/RESTSender.h Added asset_download_timeout member and updated fetch signature
libminifi/include/utils/file/AssetManager.h Updated sync method signature to accept callback that writes to file path
libminifi/src/c2/C2Agent.cpp Implemented streaming download for assets and flows using chunk callbacks
libminifi/src/c2/protocols/RESTSender.cpp Implemented new fetch method with timeout support and chunk-based processing
libminifi/src/utils/file/AssetManager.cpp Updated to use fetch callback that writes directly to disk
core-framework/include/http/BaseHTTPClient.h Refactored HTTPReadCallback hierarchy for chunk-based processing
core-framework/include/http/HTTPClient.h Added setAbsoluteTimeout method and absolute_timeout member
core-framework/include/http/HTTPStream.h Updated to use dynamic_cast for ByteOutputCallback access
core-framework/include/utils/ByteArrayCallback.h Changed write methods to accept const char*
core-framework/src/http/BaseHTTPClient.cpp Updated receiveWrite to use new process method
core-framework/src/http/HTTPClient.cpp Updated getResponseBody to use dynamic_cast
core-framework/src/http/HTTPStream.cpp Updated close and read to use dynamic_cast
core-framework/src/utils/ByteArrayCallback.cpp Updated write methods to accept const char*
C2.md Added documentation for new timeout configuration property

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 133 to 134
return dynamic_cast<utils::ByteOutputCallback*>(http_client_->getReadCallback())
&& dynamic_cast<utils::ByteOutputCallback*>(http_client_->getReadCallback())->getSize() > 0;
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dynamic_cast is performed twice in this expression without caching the result. This is inefficient and could be optimized by storing the cast result in a local variable.

Copilot uses AI. Check for mistakes.
return;
}

std::filesystem::rename(tmp_file, file_path);
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The std::filesystem::rename function can throw an exception on error. This call should be wrapped in error handling (try-catch or use the error_code overload) to prevent unhandled exceptions from propagating and ensure proper error reporting to the user.

Copilot uses AI. Check for mistakes.
std::ofstream{root_ / path, std::ios::binary}.write(reinterpret_cast<const char*>(content.data()), gsl::narrow<std::streamsize>(content.size()));
for (auto& path : new_file_paths) {
create_dir(path.parent_path());
std::filesystem::rename(path.string() + ".part", path);
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The std::filesystem::rename function can throw an exception on error. This call should be wrapped in error handling (try-catch or use the error_code overload) to prevent unhandled exceptions from propagating, especially since this is called in a loop and could fail for some files but not others.

Copilot uses AI. Check for mistakes.
Comment on lines 78 to 79
}
asset_download_timeout_ = (configure->get(Configuration::nifi_c2_asset_download_timeout) | utils::andThen([] (const auto& s) { return parsing::parseDuration(s) | utils::toOptional(); })).value_or(0s);
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The asset_download_timeout_ initialization occurs outside the configure null check block, which could lead to a null pointer dereference if configure is nullptr. This line should be moved inside the if (nullptr != configure) block at line 45 to prevent potential crashes.

Suggested change
}
asset_download_timeout_ = (configure->get(Configuration::nifi_c2_asset_download_timeout) | utils::andThen([] (const auto& s) { return parsing::parseDuration(s) | utils::toOptional(); })).value_or(0s);
asset_download_timeout_ = (configure->get(Configuration::nifi_c2_asset_download_timeout) | utils::andThen([] (const auto& s) { return parsing::parseDuration(s) | utils::toOptional(); })).value_or(0s);
} else {
asset_download_timeout_ = 0s;
}

Copilot uses AI. Check for mistakes.
Comment on lines +1025 to +1033
bool success = protocol_->fetch(resolved_url.value(), update_sink_->getSupportedConfigurationFormats(), [&] (std::span<const char> chunk) {
flow_content.append(chunk.data(), chunk.size());
return true;
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flow_content string is accumulated in memory as chunks arrive. For large flows, this could consume significant memory. Consider if there's a maximum expected flow size, or if this should stream to a file instead (similar to how asset downloads are now handled).

Copilot uses AI. Check for mistakes.
Comment on lines 118 to 120
&& dynamic_cast<utils::ByteOutputCallback*>(http_client_->getReadCallback())
&& dynamic_cast<utils::ByteOutputCallback*>(http_client_->getReadCallback())->getSize() == 0
&& dynamic_cast<utils::ByteOutputCallback*>(http_client_->getReadCallback())->waitingOps();
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dynamic_cast is performed three times in this expression without caching the result. This is inefficient and could be optimized by storing the cast result in a local variable.

Copilot uses AI. Check for mistakes.
Comment on lines 130 to 131
&& dynamic_cast<utils::ByteOutputCallback*>(http_client_->getReadCallback())
&& dynamic_cast<utils::ByteOutputCallback*>(http_client_->getReadCallback())->getSize() == 0);
Copy link

Copilot AI Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dynamic_cast is performed twice in this expression without caching the result. This is inefficient and could be optimized by storing the cast result in a local variable.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant