-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Describe the enhancement requested
As part of:
The discussion about adding a sanitizers build for PyArrow popped up. I am creating this issue to track the discussion and raise it as a separate enhancement.
So far the summary of the discussion there:
I think the main difficulty for a PyArrow sanitizers build is that the sanitizer instrumentation should be enabled in CPython as well (and potentially NumPy?).
Originally posted by @pitrou #36411
You may be interested in how numpy & scipy are doing this, in conjunction with CPython. That setup uses pixi as a kind of "light-weight conda-build" orchestrator that wraps the various rebuilds (independent of whether that's via CMake/meson/whatever):
Originally posted by @h-vetinari in #36411
That's an ideal setup but I don't think its required - you could use point LD_PRELOAD to the sanitizer library to have it loaded correctly from a process that was not built with sanitizers enabled (i.e. Python). We used to do that in CI with pandas, although we did abandon it after time due to it being a maintenance burden
Originally posted by @WillAyd in #36411
Is that enough, though? Ideally, the code is instrumented at compile time (memory accesses etc.). For example, if PyArrow passes a bogus memory pointer to NumPy, we want ASan to notice and that might not happen if NumPy was not compiled with ASan enabled.
Originally posted by @pitrou in #36411
Yeah, for ASAN/TSAN, you need to instrument the other relevant libraries, which means rebuilding them, which is generally a huge pain, which is why the approach I referenced above provides a real benefit. Once all the pieces are in place, it comes down to
pixi run test-asan -t some_testwhich rebuilds (& caches) instrumented cpython, numpy etc. as necessary. I haven't been very involved, but the scipy PR contains more details; and I'm pretty sure that Lucas wouldn't mind answering questions (not tagged here because it's already a bit OT).
Originally posted by @h-vetinari in #36411
Component(s)
Python