Skip to content

Conversation

@ssh4net
Copy link
Contributor

@ssh4net ssh4net commented Dec 26, 2025

perf(IBA): Add Google Highway SIMD fast paths for core operations

Implement SIMD acceleration using Google Highway for ImageBufAlgo add, sub,
mul, pow, and resample operations. Provides 2-8x performance improvement for
contiguous pixel layouts with portable vectorization across x86 and ARM.

New file imagebufalgo_hwy_pvt.h provides reusable SIMD infrastructure with
automatic type promotion/demotion (uint8/uint16/int16/uint32/half/float/double),
generic operation kernels, and smart fallback to scalar code for strided layouts.

Operations use runtime vector width detection (ScalableTag), FMA instructions
where applicable, and handle partial vectors correctly. Code follows OIIO style
with modern C++ casts and comprehensive documentation.

Requires: Google Highway library (MIT license, header-only)
Modified: imagebufalgo_{addsub,muldiv,pixelmath,xform}.cpp + new hwy_pvt.h

Checklist:

  • I have read the guidelines on contributions and code review procedures.
  • I have updated the documentation if my PR adds features or changes
    behavior.
  • I am sure that this PR's changes are tested somewhere in the
    testsuite
    .
  • I have run and passed the testsuite in CI before submitting the
    PR, by pushing the changes to my fork and seeing that the automated CI
    passed there. (Exceptions: If most tests pass and you can't figure out why
    the remaining ones fail, it's ok to submit the PR and ask for help. Or if
    any failures seem entirely unrelated to your change; sometimes things break
    on the GitHub runners.)
  • My code follows the prevailing code style of this project and I
    fixed any problems reported by the clang-format CI test.
  • If I added or modified a public C++ API call, I have also amended the
    corresponding Python bindings. If altering ImageBufAlgo functions, I also
    exposed the new functionality as oiiotool options.

Integrated Google Highway (hwy) as a required dependency and updated build scripts accordingly. Added a new SIMD-accelerated resample_hwy implementation in imagebufalgo_xform.cpp, which is used for resampling when both source and destination have local pixels and the image is not deep. The scalar fallback remains for other cases.
Introduces Highway (hwy) SIMD-accelerated implementations for add, sub, mul, and pow operations in imagebufalgo, using fast pointer-based code paths when localpixels are available. Also updates resample_hwy to support both float and double types, improving performance and type safety for SIMD image processing.
Deleted all CI, analysis, documentation, release, and related workflow YAML files from .github/workflows. This disables all automated GitHub Actions for the repository.
Introduces a new header, imagebufalgo_hwy_pvt.h, encapsulating SIMD type traits and vectorized load/store utilities using Highway. Refactors add, sub, and mul implementations in imagebufalgo_addsub.cpp and imagebufalgo_muldiv.cpp to use these utilities, improving code clarity and enabling more robust SIMD handling for various pixel types.
Refactors SIMD kernel runners and type promotion/demotion utilities in imagebufalgo_hwy_pvt.h for better extensibility and correctness, including support for int16_t and improved documentation. Updates all relevant imagebufalgo implementations to use reinterpret_cast and static_cast for type safety, and enhances pow_impl_hwy to use SIMD for scalar exponents. Also links hwy_contrib in CMake and replaces direct Highway includes with the new header where appropriate.
@lgritz
Copy link
Collaborator

lgritz commented Dec 27, 2025

It's gonna take me a few days to do a thorough review -- I'm under the weather this week and mostly sacked out in bed with some strong meds, so brain isn't what it should be.

But I like the direction of this. Stylistically, it's not very different from the small number of places that I've "hard coded" our simd.h approaches to certain math IBA ops, but mostly I only did it for a small number of common cases, for example over_impl_rgbafloat special case of IBA::over() when all images are 4 channel float, in imagebufalgo_composite.cpp. But you're handling many more cases, and in a more principled way using hwy.

This PR seems to have completely removed the entire directory of GHA workflows.

Benchmarking scripts for OIIO resample operations in Windows (PowerShell, BAT) and Linux (Bash), along with a C++ benchmark for HWY arithmetic operations. Includes Visual Studio project files for building and organizing the HWY benchmark.

Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>
@ssh4net
Copy link
Contributor Author

ssh4net commented Dec 27, 2025

Sure, no rush. I also will experiment with some IBA functions, that might be a good for SIMD use

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants