perf: Improve Xor method performance by ~20% for big sets #1

romshark · 2025-02-16T16:16:19Z

Handling larger bitsets in 8-batches is more efficient on modern CPUs.
I assume it's related to instruction-level parallelism.
This technique can effectively be applied to most bitset methods and functions.

goos: darwin
goarch: arm64
pkg: github.com/KernelPryanic/bitmask
cpu: Apple M1 Max
                    │   old.txt   │              new.txt               │
                    │   sec/op    │   sec/op     vs base               │
BitSet_Xor/empty-10   2.498n ± 4%   2.493n ± 3%        ~ (p=0.372 n=6)
BitSet_Xor/5-10       2.491n ± 1%   2.492n ± 1%        ~ (p=0.729 n=6)
BitSet_Xor/10k-10     76.10n ± 1%   49.79n ± 1%  -34.57% (p=0.002 n=6)
BitSet_Xor/1m-10      8.453µ ± 0%   5.112µ ± 1%  -39.52% (p=0.002 n=6)
geomean               44.73n        35.46n       -20.72%

                    │   old.txt    │              new.txt               │
                    │     B/op     │    B/op     vs base                │
BitSet_Xor/empty-10   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
BitSet_Xor/5-10       0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
BitSet_Xor/10k-10     0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
BitSet_Xor/1m-10      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
geomean                          ²               +0.00%               ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                    │   old.txt    │              new.txt               │
                    │  allocs/op   │ allocs/op   vs base                │
BitSet_Xor/empty-10   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
BitSet_Xor/5-10       0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
BitSet_Xor/10k-10     0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
BitSet_Xor/1m-10      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
geomean                          ²               +0.00%               ²
¹ all samples are equal
² summaries must be >0 to compute geomean

Handling larger bitsets in 8-batches is more efficient on modern CPUs. I assume it's related to instruction-level parallelism. goos: darwin goarch: arm64 pkg: github.com/KernelPryanic/bitmask cpu: Apple M1 Max │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ BitSet_Xor/empty-10 2.498n ± 4% 2.493n ± 3% ~ (p=0.372 n=6) BitSet_Xor/5-10 2.491n ± 1% 2.492n ± 1% ~ (p=0.729 n=6) BitSet_Xor/10k-10 76.10n ± 1% 49.79n ± 1% -34.57% (p=0.002 n=6) BitSet_Xor/1m-10 8.453µ ± 0% 5.112µ ± 1% -39.52% (p=0.002 n=6) geomean 44.73n 35.46n -20.72% │ old.txt │ new.txt │ │ B/op │ B/op vs base │ BitSet_Xor/empty-10 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=6) ¹ BitSet_Xor/5-10 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=6) ¹ BitSet_Xor/10k-10 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=6) ¹ BitSet_Xor/1m-10 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=6) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ old.txt │ new.txt │ │ allocs/op │ allocs/op vs base │ BitSet_Xor/empty-10 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=6) ¹ BitSet_Xor/5-10 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=6) ¹ BitSet_Xor/10k-10 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=6) ¹ BitSet_Xor/1m-10 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=6) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean

KernelPryanic approved these changes Feb 16, 2025

View reviewed changes

KernelPryanic merged commit 63daa84 into KernelPryanic:main Feb 16, 2025
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: Improve Xor method performance by ~20% for big sets #1

perf: Improve Xor method performance by ~20% for big sets #1

Uh oh!

romshark commented Feb 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

perf: Improve Xor method performance by ~20% for big sets #1

perf: Improve Xor method performance by ~20% for big sets #1

Uh oh!

Conversation

romshark commented Feb 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

romshark commented Feb 16, 2025 •

edited

Loading