fix test failures on 1.12, avoid race condition in multithreaded partitioned writes #582

palday · 2025-12-31T16:26:30Z

There is race condition fundamental to the current architecture for creating and writing dictionary encodings. The relevant lock is created on a worker thread and thus there is a race to create the lock and initialize the relevant data structure. This race condition has existed for a long time and consistently occurs when testing on 1.12, but I have occasionally been able to see it occur on Julia 1.10.

Reworking this goes well beyond what I currently have time for, so I have simply disabled multithreaded writing as a stopgap. This may seem extreme, but:

This is a correctness bug and correctness is far more important than speed.
The test failures that this race condition causes on 1.12 are blocking the release of 2.8.1, which includes Avoid extending Type from Base on Julia 1.12 #543 and addresses another source of potential correctness issues on Julia 1.12+.

codecov-commenter · 2025-12-31T16:36:31Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.98%. Comparing base (3712291) to head (db35d62).
⚠️ Report is 42 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #582      +/-   ##
==========================================
- Coverage   87.43%   86.98%   -0.46%     
==========================================
  Files          26       27       +1     
  Lines        3288     3396     +108     
==========================================
+ Hits         2875     2954      +79     
- Misses        413      442      +29

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

This reverts commit 58c1a9e.

palday · 2026-01-02T05:08:45Z

src/arraytypes/dictencoding.jl

    x = x.data
    len = length(x)
    validity = ValidityBitmap(x)
+    # XXX This is a race condition if two workers hit this block at the same time, then they'll create


@quinnj I think there is a race condition baked into the current architecture that can't be addressed without a very large refactoring. The current architecture creates the locks on a worker thread if they don't already exist, which means that threads are competing for the creation of the initial lock. The locks should be created before any tasks are spawned.

palday · 2026-01-06T17:41:59Z

test/runtests.jl

 using FilePathsBase
 using DataFrames
 import Random: randstring
+using TestSetExtensions: ExtendedTestSet


given how long the Arrow tests take, it's useful to have some indication of progress so that we can tell if tests have hung. ExtendedTestSet shows a . for each completed test.

(We also get colored diffs of arrays when tests fail, which is nice.)

src/arraytypes/dictencoding.jl

palday · 2026-01-06T22:47:48Z

@kou @ericphanson Could I get review on this please? It's not a pretty solution, but it works.

ericphanson

I think correct writing of multiply partitioned files is better than fast-but-wrong so this seems like an improvement. In general it looks like there's a lot to cleanup here with the threading, e.g. we shouldn't really be spawning tasks without holding onto them so we can fetch them and recover any errors they threw. We should prefer structured concurrency primitives like threaded map / asyncmap etc. But that's unrelated to this PR.

kou · 2026-01-09T01:12:38Z

I'll merge this in a few days if nobody has more comments.

We can try 2.8.4 RC2 release with this, right?

See also: The 2.8.4 RC1 vote: https://lists.apache.org/thread/7g3wr39wlbs8dj08hpb87mf9z2mlqrft

palday · 2026-01-09T03:55:18Z

We can try 2.8.4 RC2 release with this, right?

@kou Yes, exactly. 😄

palday added 3 commits December 17, 2025 20:51

debug code on 1.12

5af9cfd

early return

86c6005

dictencode

58c1a9e

palday added 5 commits December 31, 2025 11:06

ExtendedTestSet

c3dffb6

less indirection in loading additional test files

b387e5c

Revert "dictencode"

62c6a99

This reverts commit 58c1a9e.

reuse the same lock

ac66cf7

note on race condition

fb22434

palday commented Jan 2, 2026

View reviewed changes

disable multithreaded writes

75c06ec

palday commented Jan 6, 2026

View reviewed changes

src/arraytypes/dictencoding.jl Outdated Show resolved Hide resolved

palday added 5 commits January 6, 2026 12:07

format

29a04a6

revert unrelated changes

7622b09

reduce depth in nesting test

58470ae

oops

f68a332

format

db35d62

palday marked this pull request as ready for review January 6, 2026 22:47

palday changed the title ~~fix test failures on 1.12~~ fix test failures on 1.12, avoid race condition in multithreaded partitioned writes Jan 6, 2026

ericphanson approved these changes Jan 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix test failures on 1.12, avoid race condition in multithreaded partitioned writes #582

fix test failures on 1.12, avoid race condition in multithreaded partitioned writes #582

Uh oh!

palday commented Dec 31, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Dec 31, 2025 •

edited

Loading

Uh oh!

palday Jan 2, 2026

Uh oh!

palday Jan 6, 2026

Uh oh!

Uh oh!

palday commented Jan 6, 2026

Uh oh!

ericphanson left a comment

Uh oh!

kou commented Jan 9, 2026

Uh oh!

palday commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix test failures on 1.12, avoid race condition in multithreaded partitioned writes #582

Are you sure you want to change the base?

fix test failures on 1.12, avoid race condition in multithreaded partitioned writes #582

Uh oh!

Conversation

palday commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

palday Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

palday Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

palday commented Jan 6, 2026

Uh oh!

ericphanson left a comment

Choose a reason for hiding this comment

Uh oh!

kou commented Jan 9, 2026

Uh oh!

palday commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

palday commented Dec 31, 2025 •

edited

Loading

codecov-commenter commented Dec 31, 2025 •

edited

Loading