[Improvement] add zlib_crc32_fixed#60735
Open
BiteTheDDDDt wants to merge 3 commits intoapache:masterfrom
Open
Conversation
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
Author
|
run buildall |
Contributor
Author
|
run buildall |
There was a problem hiding this comment.
Pull request overview
This PR introduces an optimized zlib_crc32_fixed function to improve CRC32 hash computation performance for fixed-size types. The implementation uses the Slicing-by-4 algorithm, which processes 4 bytes at a time using precomputed lookup tables, achieving a significant performance improvement from 269ms to 62ms for hash computation.
Changes:
- Adds
zlib_crc32_fixedtemplate function using Slicing-by-4 algorithm for 1, 2, 4, and 8-byte types, with fallback to zlib for larger types - Implements constexpr
CRC32SliceBy4Tablefor compile-time lookup table generation - Refactors column hashing in
ColumnVectorandColumnDecimalto use the optimized function - Provides comprehensive test coverage for the new function across various data types and edge cases
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| be/src/util/hash_util.hpp | Adds CRC32SliceBy4Table struct and zlib_crc32_fixed template function with Slicing-by-4 optimization; updates zlib_crc_hash_null to use the new function |
| be/src/vec/columns/column_vector.h | Declares _zlib_crc32_hash helper method |
| be/src/vec/columns/column_vector.cpp | Refactors update_crcs_with_value to use new _zlib_crc32_hash helper method; implements _zlib_crc32_hash with special handling for date/datetime types |
| be/src/vec/columns/column_decimal.cpp | Replaces direct zlib_crc_hash calls with zlib_crc32_fixed for non-DECIMALV2 types |
| be/test/util/crc32c_test.cpp | Adds comprehensive tests for zlib_crc32_fixed covering all data types, incremental hashing, and edge cases; includes commented-out tests for future crc32c_fixed improvements |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Contributor
Author
|
run buildall |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
zlib_crc32: SplitBlockHashComputeTime: 269.15ms
crc32c_fixed: SplitBlockHashComputeTime: 33.258ms
zlib_crc32_fixed: SplitBlockHashComputeTime: 61.941ms
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)