GPU: Replace OpenMP parallization with TBB - WIP#13997
GPU: Replace OpenMP parallization with TBB - WIP#13997davidrohr merged 1 commit intoAliceO2Group:devfrom
Conversation
|
REQUEST FOR PRODUCTION RELEASES: This will add The following labels are available |
f6b40be to
be91a01
Compare
|
Error while checking build/O2/fullCI for be91a01 at 2025-02-21 21:57: Full log here. |
be91a01 to
184803f
Compare
|
Error while checking build/O2/fullCI_slc9 for 184803f at 2025-02-21 23:50: Full log here. |
184803f to
158afe1
Compare
|
Error while checking build/O2/fullCI for 184803f at 2025-02-22 00:20: Full log here. |
|
Error while checking build/O2/fullCI_slc9 for 158afe1 at 2025-02-22 13:21: Full log here. |
158afe1 to
357e76e
Compare
|
Error while checking build/O2/fullCI_slc9 for 357e76e at 2025-02-22 16:25: Full log here. |
357e76e to
b21fbd8
Compare
Work in progress, don't merge yet!
For now, want to test in CI.
Background for this PR is that OpenMP does not have a proper scheduling approach for nested loop, or for loops with reduced parallelism. Instead of using a thread pool, it just keeps spawning and termiinating threads.
For me that is order of 10k thread spawns on an EPN node for 1 TF.
I could fix that by hacking GCC libgomp, but GCC is not interested in taking that patch, since their behavior is compliant with the OpenMP spec, which is IMHO broken.
So I decided to just do away with OpenMP.