branch-4.0: [feature](multi-catalog) Add max_file_split_num session variable to prevent OOM in file scan #58759#60732
Merged
yiguolei merged 2 commits intoapache:branch-4.0from Feb 14, 2026
Conversation
Contributor
suxiaogang223
commented
Feb 13, 2026
- Cherry-picked from [feature](multi-catalog) Add max_file_split_num session variable to prevent OOM in file scan #58759
…revent OOM in file scan (apache#58759) ### What problem does this PR solve? - Relate Pr: apache#58858 ## Problem Summary When querying external table catalog (Hive, Iceberg, Paimon, etc.), Doris splits files into multiple splits for parallel processing. In some cases, especially with numerous small files, this can generate an excessive number of splits, potentially causing: 1. **Memory pressure**: Too many splits consume significant memory in FE 2. **OOM issues**: Excessive split generation can lead to OutOfMemoryError 3. **Performance degradation**: Managing too many splits impacts query planning overhead Previously, there was no upper limit on the number of splits in non-batch mode, which could lead to problems when querying tables with many small files. ## Solution This PR introduces a new session variable `max_file_split_num` to limit the maximum number of splits allowed per table scan in non-batch mode. ### Changes 1. **New Session Variable**: `max_file_split_num` - Type: `int` - Default: `100000` - Description: "在非 batch 模式下,每个 table scan 最大允许的 split 数量,防止产生过多 split 导致 OOM。" - Forward to BE: `true` 2. **Implementation in FileQueryScanNode**: - Added method `applyMaxFileSplitNumLimit(long targetSplitSize, long totalFileSize)` - Dynamically calculates minimum split size to ensure split count doesn't exceed the limit - Formula: `minSplitSizeForMaxNum = (totalFileSize + maxFileSplitNum - 1) / maxFileSplitNum` - Returns: `Math.max(targetSplitSize, minSplitSizeForMaxNum)` 3. **Applied to multiple scan nodes**: - `HiveScanNode` - `IcebergScanNode` - `PaimonScanNode` - `TVFScanNode` 4. **Unit Tests**: - `FileQueryScanNodeTest`: Test base logic - `HiveScanNodeTest`: Test Hive-specific implementation - `IcebergScanNodeTest`: Test Iceberg-specific implementation - `PaimonScanNodeTest`: Test Paimon-specific implementation - `TVFScanNodeTest`: Test TVF-specific implementation ## Usage Users can now control the maximum number of splits per table scan by setting the session variable: ```sql -- Set to 50000 splits maximum SET max_file_split_num = 50000; -- Disable the limit (set to 0 or negative) SET max_file_split_num = 0; ``` (cherry picked from commit 3e5a70f)
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
Author
|
run buildall |
Contributor
Author
|
run buildall |
4f944e4 to
75641e0
Compare
Contributor
Author
|
run buildall |
morningman
approved these changes
Feb 13, 2026
Contributor
|
PR approved by at least one committer and no changes requested. |
Contributor
|
PR approved by anyone and no changes requested. |
yiguolei
approved these changes
Feb 14, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.