Skip to content

HADOOP-19810. Replace FastByteComparisons with JDK Arrays.compareUnsigned#8239

Open
pan3793 wants to merge 2 commits intoapache:trunkfrom
pan3793:HADOOP-19810
Open

HADOOP-19810. Replace FastByteComparisons with JDK Arrays.compareUnsigned#8239
pan3793 wants to merge 2 commits intoapache:trunkfrom
pan3793:HADOOP-19810

Conversation

@pan3793
Copy link
Member

@pan3793 pan3793 commented Feb 9, 2026

Description of PR

Replace FastByteComparisons(derived from Guava) with JDK Arrays.compareUnsigned(added in JDK 9, and optimized in modern JDKs)

Guava v33.4.5 starts to use Arrays.compareUnsigned when it is available.
google/guava@b3bb29a

The benchmarks suggest that the old, Unsafe-based implementation is faster up to 64 elements or so
but that it's a matter of only about a nanosecond. The new implementation pulls ahead for larger arrays,
including an advantage of about 5-10 ns for 1024 elements.

How was this patch tested?

Create a JMH benchmark https://github.com/pan3793/HADOOP-19810.

TL;DR - JDK Arrays.compareUnsigned has better performance (except for small bytes like 128) on the x86_64 platform, and the same performance on Apple Silicon

Benchmark results

  • Ubuntu 24.04 LTS x86_64
  • CPU: Intel i5-9500 (6) @ 4.400GHz
  • Kernel: 6.17.9-76061709-generic
  • OpenJDK Runtime Environment Temurin-25.0.2+10 (build 25.0.2+10-LTS)
Benchmark                                  (length)  Mode  Cnt  Score   Error  Units
BytesComparatorBenchmark.diff_last_hadoop       128    ss    3  0.319 ± 0.023   s/op
BytesComparatorBenchmark.diff_last_hadoop       256    ss    3  0.505 ± 0.008   s/op
BytesComparatorBenchmark.diff_last_hadoop       512    ss    3  0.879 ± 0.189   s/op
BytesComparatorBenchmark.diff_last_hadoop      1024    ss    3  1.684 ± 0.141   s/op
BytesComparatorBenchmark.diff_last_hadoop      2048    ss    3  3.233 ± 0.095   s/op
BytesComparatorBenchmark.diff_last_jdk          128    ss    3  0.379 ± 0.046   s/op
BytesComparatorBenchmark.diff_last_jdk          256    ss    3  0.491 ± 0.024   s/op
BytesComparatorBenchmark.diff_last_jdk          512    ss    3  0.685 ± 0.055   s/op
BytesComparatorBenchmark.diff_last_jdk         1024    ss    3  1.180 ± 0.052   s/op
BytesComparatorBenchmark.diff_last_jdk         2048    ss    3  1.929 ± 0.081   s/op
BytesComparatorBenchmark.equal_hadoop           128    ss    3  0.318 ± 0.052   s/op
BytesComparatorBenchmark.equal_hadoop           256    ss    3  0.502 ± 0.005   s/op
BytesComparatorBenchmark.equal_hadoop           512    ss    3  0.957 ± 0.086   s/op
BytesComparatorBenchmark.equal_hadoop          1024    ss    3  1.696 ± 0.174   s/op
BytesComparatorBenchmark.equal_hadoop          2048    ss    3  3.216 ± 0.219   s/op
BytesComparatorBenchmark.equal_jdk              128    ss    3  0.380 ± 0.155   s/op
BytesComparatorBenchmark.equal_jdk              256    ss    3  0.479 ± 0.053   s/op
BytesComparatorBenchmark.equal_jdk              512    ss    3  0.741 ± 0.094   s/op
BytesComparatorBenchmark.equal_jdk             1024    ss    3  1.421 ± 0.187   s/op
BytesComparatorBenchmark.equal_jdk             2048    ss    3  1.872 ± 0.209   s/op
  • CPU: Apple M1 Max
  • OpenJDK Runtime Environment Temurin-25.0.2+10 (build 25.0.2+10-LTS)
Benchmark                                  (length)  Mode  Cnt  Score   Error  Units
BytesComparatorBenchmark.diff_last_hadoop       128    ss    3  0.279 ± 0.123   s/op
BytesComparatorBenchmark.diff_last_hadoop       256    ss    3  0.441 ± 0.009   s/op
BytesComparatorBenchmark.diff_last_hadoop       512    ss    3  0.788 ± 0.037   s/op
BytesComparatorBenchmark.diff_last_hadoop      1024    ss    3  1.442 ± 0.201   s/op
BytesComparatorBenchmark.diff_last_hadoop      2048    ss    3  2.845 ± 0.177   s/op
BytesComparatorBenchmark.diff_last_jdk          128    ss    3  0.306 ± 0.002   s/op
BytesComparatorBenchmark.diff_last_jdk          256    ss    3  0.492 ± 0.204   s/op
BytesComparatorBenchmark.diff_last_jdk          512    ss    3  0.867 ± 0.315   s/op
BytesComparatorBenchmark.diff_last_jdk         1024    ss    3  1.467 ± 0.179   s/op
BytesComparatorBenchmark.diff_last_jdk         2048    ss    3  2.871 ± 0.129   s/op
BytesComparatorBenchmark.equal_hadoop           128    ss    3  0.253 ± 0.007   s/op
BytesComparatorBenchmark.equal_hadoop           256    ss    3  0.426 ± 0.018   s/op
BytesComparatorBenchmark.equal_hadoop           512    ss    3  0.772 ± 0.015   s/op
BytesComparatorBenchmark.equal_hadoop          1024    ss    3  1.428 ± 0.030   s/op
BytesComparatorBenchmark.equal_hadoop          2048    ss    3  2.842 ± 0.173   s/op
BytesComparatorBenchmark.equal_jdk              128    ss    3  0.294 ± 0.007   s/op
BytesComparatorBenchmark.equal_jdk              256    ss    3  0.474 ± 0.009   s/op
BytesComparatorBenchmark.equal_jdk              512    ss    3  0.842 ± 0.075   s/op
BytesComparatorBenchmark.equal_jdk             1024    ss    3  1.448 ± 0.029   s/op
BytesComparatorBenchmark.equal_jdk             2048    ss    3  2.854 ± 0.153   s/op

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (HADOOP-19810)?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

AI Tooling

No AI usage.

@pan3793
Copy link
Member Author

pan3793 commented Feb 9, 2026

cc @aajisaka since you reviewed and committed HADOOP-14313, also cc @cnauroth

Copy link
Member

@aajisaka aajisaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 pending Jenkins.

I'm happy to see sun.misc.Unsafe API usage is dropped.

@aajisaka
Copy link
Member

aajisaka commented Feb 9, 2026

@pan3793 Writable object is usually small, so would you also benchmark against smaller array size like https://issues.apache.org/jira/browse/HADOOP-14313? I want to check there's no regression in 128, 256, 512, 1028, 2048.

BytesComparatorBenchmark.hadoopDiffLast 64 ss 5 ≈ 10⁻⁴ s/op
BytesComparatorBenchmark.jdkDiffLast 64 ss 5 ≈ 10⁻⁴ s/op

The above numbers are very small. How can we compare these?

@pan3793
Copy link
Member Author

pan3793 commented Feb 9, 2026

@aajisaka, thanks for the information, let me tune the code to test small size, will update benchmark results later.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 39s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
-1 ❌ mvninstall 27m 20s /branch-mvninstall-root.txt root in trunk failed.
+1 💚 compile 9m 0s trunk passed with JDK Ubuntu-21.0.9+10-Ubuntu-124.04
+1 💚 compile 8m 52s trunk passed with JDK Ubuntu-17.0.17+10-Ubuntu-124.04
+1 💚 checkstyle 0m 47s trunk passed
+1 💚 mvnsite 1m 4s trunk passed
+1 💚 javadoc 0m 46s trunk passed with JDK Ubuntu-21.0.9+10-Ubuntu-124.04
+1 💚 javadoc 0m 48s trunk passed with JDK Ubuntu-17.0.17+10-Ubuntu-124.04
+1 💚 spotbugs 1m 41s trunk passed
-1 ❌ shadedclient 16m 27s branch has errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 41s the patch passed
+1 💚 compile 8m 13s the patch passed with JDK Ubuntu-21.0.9+10-Ubuntu-124.04
+1 💚 javac 8m 13s root-jdkUbuntu-21.0.9+10-Ubuntu-124.04 with JDK Ubuntu-21.0.9+10-Ubuntu-124.04 generated 0 new + 16 unchanged - 4 fixed = 16 total (was 20)
+1 💚 compile 8m 53s the patch passed with JDK Ubuntu-17.0.17+10-Ubuntu-124.04
+1 💚 javac 8m 53s root-jdkUbuntu-17.0.17+10-Ubuntu-124.04 with JDK Ubuntu-17.0.17+10-Ubuntu-124.04 generated 0 new + 16 unchanged - 4 fixed = 16 total (was 20)
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 47s hadoop-common-project/hadoop-common: The patch generated 0 new + 6 unchanged - 8 fixed = 6 total (was 14)
+1 💚 mvnsite 1m 4s the patch passed
+1 💚 javadoc 0m 46s the patch passed with JDK Ubuntu-21.0.9+10-Ubuntu-124.04
+1 💚 javadoc 0m 46s the patch passed with JDK Ubuntu-17.0.17+10-Ubuntu-124.04
+1 💚 spotbugs 1m 51s the patch passed
-1 ❌ shadedclient 16m 29s patch has errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 17m 37s /patch-unit-hadoop-common-project_hadoop-common.txt hadoop-common in the patch passed.
-1 ❌ asflicense 0m 42s /results-asflicense.txt The patch generated 1 ASF License warnings.
126m 33s
Reason Tests
Failed junit tests hadoop.io.file.tfile.TestTFileSeek
hadoop.util.TestIndexedSort
hadoop.io.TestSequenceFile
Subsystem Report/Notes
Docker ClientAPI=1.53 ServerAPI=1.53 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8239/1/artifact/out/Dockerfile
GITHUB PR #8239
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 77035d4967dd 5.15.0-164-generic #174-Ubuntu SMP Fri Nov 14 20:25:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 67333a8
Default Java Ubuntu-17.0.17+10-Ubuntu-124.04
Multi-JDK versions /usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.9+10-Ubuntu-124.04 /usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.17+10-Ubuntu-124.04
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8239/1/testReport/
Max. process+thread count 1518 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8239/1/console
versions git=2.43.0 maven=3.9.11 spotbugs=4.9.7
Powered by Apache Yetus 0.14.1 https://yetus.apache.org

This message was automatically generated.

@pan3793
Copy link
Member Author

pan3793 commented Feb 9, 2026

@aajisaka, I updated the benchmark result - on Linux, the result matches Guava's statement, the JDK impl is a little slower for 128 bytes, but faster for larger bytes.

Note: I'm using a quite old Intel CPU, given Arrays.compareUnsigned can leverage SIMD in modern JDKs, results might be different on different CPUs.

Copy link
Member

@aajisaka aajisaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for sharing the benchmark result. Looks good to me.

I think the unit test failures are related. Would you fix them?

Maybe Arrays.compareUnsigned(b1, s1, s1+l1, b2, s2, s2+l2);?

@pan3793
Copy link
Member Author

pan3793 commented Feb 9, 2026

@aajisaka Oops, you are right, I fixed it and ensure org.apache.hadoop.util.TestIndexedSort passes locally now.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 40s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
-1 ❌ mvninstall 32m 33s /branch-mvninstall-root.txt root in trunk failed.
+1 💚 compile 8m 37s trunk passed with JDK Ubuntu-21.0.9+10-Ubuntu-124.04
+1 💚 compile 8m 53s trunk passed with JDK Ubuntu-17.0.17+10-Ubuntu-124.04
+1 💚 checkstyle 0m 49s trunk passed
+1 💚 mvnsite 1m 5s trunk passed
+1 💚 javadoc 0m 51s trunk passed with JDK Ubuntu-21.0.9+10-Ubuntu-124.04
+1 💚 javadoc 0m 47s trunk passed with JDK Ubuntu-17.0.17+10-Ubuntu-124.04
+1 💚 spotbugs 1m 42s trunk passed
-1 ❌ shadedclient 16m 9s branch has errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 42s the patch passed
+1 💚 compile 8m 27s the patch passed with JDK Ubuntu-21.0.9+10-Ubuntu-124.04
+1 💚 javac 8m 27s root-jdkUbuntu-21.0.9+10-Ubuntu-124.04 with JDK Ubuntu-21.0.9+10-Ubuntu-124.04 generated 0 new + 16 unchanged - 4 fixed = 16 total (was 20)
+1 💚 compile 8m 58s the patch passed with JDK Ubuntu-17.0.17+10-Ubuntu-124.04
+1 💚 javac 8m 58s root-jdkUbuntu-17.0.17+10-Ubuntu-124.04 with JDK Ubuntu-17.0.17+10-Ubuntu-124.04 generated 0 new + 16 unchanged - 4 fixed = 16 total (was 20)
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 46s hadoop-common-project/hadoop-common: The patch generated 0 new + 6 unchanged - 8 fixed = 6 total (was 14)
+1 💚 mvnsite 1m 5s the patch passed
+1 💚 javadoc 0m 49s the patch passed with JDK Ubuntu-21.0.9+10-Ubuntu-124.04
+1 💚 javadoc 0m 49s the patch passed with JDK Ubuntu-17.0.17+10-Ubuntu-124.04
+1 💚 spotbugs 1m 46s the patch passed
-1 ❌ shadedclient 16m 34s patch has errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 17m 46s hadoop-common in the patch passed.
-1 ❌ asflicense 0m 43s /results-asflicense.txt The patch generated 1 ASF License warnings.
132m 15s
Subsystem Report/Notes
Docker ClientAPI=1.53 ServerAPI=1.53 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8239/2/artifact/out/Dockerfile
GITHUB PR #8239
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux d6399752827c 5.15.0-164-generic #174-Ubuntu SMP Fri Nov 14 20:25:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / ba66c6c
Default Java Ubuntu-17.0.17+10-Ubuntu-124.04
Multi-JDK versions /usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.9+10-Ubuntu-124.04 /usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.17+10-Ubuntu-124.04
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8239/2/testReport/
Max. process+thread count 1254 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8239/2/console
versions git=2.43.0 maven=3.9.11 spotbugs=4.9.7
Powered by Apache Yetus 0.14.1 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 41s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 27m 50s trunk passed
+1 💚 compile 8m 38s trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 compile 9m 7s trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 checkstyle 0m 48s trunk passed
+1 💚 mvnsite 1m 5s trunk passed
+1 💚 javadoc 0m 46s trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javadoc 0m 43s trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 spotbugs 1m 40s trunk passed
+1 💚 shadedclient 16m 18s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 40s the patch passed
+1 💚 compile 8m 14s the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javac 8m 14s root-jdkUbuntu-21.0.10+7-Ubuntu-124.04 with JDK Ubuntu-21.0.10+7-Ubuntu-124.04 generated 0 new + 16 unchanged - 4 fixed = 16 total (was 20)
+1 💚 compile 9m 1s the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 javac 9m 1s root-jdkUbuntu-17.0.18+8-Ubuntu-124.04.1 with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1 generated 0 new + 16 unchanged - 4 fixed = 16 total (was 20)
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 46s hadoop-common-project/hadoop-common: The patch generated 0 new + 6 unchanged - 8 fixed = 6 total (was 14)
+1 💚 mvnsite 1m 3s the patch passed
+1 💚 javadoc 0m 50s the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javadoc 0m 49s the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 spotbugs 1m 44s the patch passed
+1 💚 shadedclient 16m 52s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 17m 33s hadoop-common in the patch passed.
+1 💚 asflicense 0m 40s The patch does not generate ASF License warnings.
127m 21s
Subsystem Report/Notes
Docker ClientAPI=1.53 ServerAPI=1.53 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8239/3/artifact/out/Dockerfile
GITHUB PR #8239
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux f0ff52598365 5.15.0-164-generic #174-Ubuntu SMP Fri Nov 14 20:25:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / cf758f7
Default Java Ubuntu-17.0.18+8-Ubuntu-124.04.1
Multi-JDK versions /usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.10+7-Ubuntu-124.04 /usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.18+8-Ubuntu-124.04.1
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8239/3/testReport/
Max. process+thread count 3151 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8239/3/console
versions git=2.43.0 maven=3.9.11 spotbugs=4.9.7
Powered by Apache Yetus 0.14.1 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants