Conversation
|
Still slower than GNU |
|
How about splice threshould + clap bypass? |
|
GNU testsuite comparison: |
|
please run the benchmark with /usr/bin/cat too |
|
now running hyperfine using --show-output, as otherwise perf improvement does not make much sense when it comes to writes (every command takes about the same). Downside being it seems to introduce significant noise: hyperfine -w 10 -L coreutils "target/release/cat_16k","target/release/cat_64k","target/release/cat_bypass","target/release/cat_base","target/release/cat_bypass_8k","/usr/bin/cat" "{coreutils} /tmp/threshold_test/file_16K*" --show-output
Benchmark 1: target/release/cat_16k /tmp/threshold_test/file_16K*
Time (mean ± σ): 347.5 ms ± 209.5 ms [User: 1.3 ms, System: 2.0 ms]
Range (min … max): 123.6 ms … 713.3 ms 10 runs
Benchmark 2: target/release/cat_64k /tmp/threshold_test/file_16K*
Time (mean ± σ): 125.3 ms ± 134.1 ms [User: 0.3 ms, System: 1.8 ms]
Range (min … max): 68.3 ms … 704.3 ms 21 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Benchmark 3: target/release/cat_bypass /tmp/threshold_test/file_16K*
Time (mean ± σ): 77.5 ms ± 16.7 ms [User: 0.4 ms, System: 0.9 ms]
Range (min … max): 44.6 ms … 101.3 ms 33 runs
Benchmark 4: target/release/cat_base /tmp/threshold_test/file_16K*
Time (mean ± σ): 109.1 ms ± 221.0 ms [User: 0.3 ms, System: 1.1 ms]
Range (min … max): 44.7 ms … 1710.4 ms 55 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Benchmark 5: target/release/cat_bypass_8k /tmp/threshold_test/file_16K*
Time (mean ± σ): 82.8 ms ± 23.6 ms [User: 0.3 ms, System: 1.3 ms]
Range (min … max): 52.9 ms … 154.0 ms 18 runs
Benchmark 6: /usr/bin/cat /tmp/threshold_test/file_16K*
Time (mean ± σ): 142.9 ms ± 212.9 ms [User: 0.3 ms, System: 1.2 ms]
Range (min … max): 54.0 ms … 1130.2 ms 27 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Summary
target/release/cat_bypass /tmp/threshold_test/file_16K* ran
1.07 ± 0.38 times faster than target/release/cat_bypass_8k /tmp/threshold_test/file_16K*
1.41 ± 2.87 times faster than target/release/cat_base /tmp/threshold_test/file_16K*
1.62 ± 1.77 times faster than target/release/cat_64k /tmp/threshold_test/file_16K*
1.84 ± 2.78 times faster than /usr/bin/cat /tmp/threshold_test/file_16K*
4.48 ± 2.87 times faster than target/release/cat_16k /tmp/threshold_test/file_16K* |
|
Does this fix #9609 too? |
e7d7f5d to
04ad03d
Compare
|
GNU testsuite comparison: |
Merging this PR will not alter performance
Comparing Footnotes
|
As I understand, this should fix the specific example presented, but I frankly don't know if a similar case could arrive in the future. All it takes is for stat() to wrongly report some size larger that the defined threshold. |
|
GNU testsuite comparison: |
Based on #10832
Avoids using splice on small files.
May result in a small perf improvement
Threshold set to 16KB, determined by trial and error: