Skip to content

Improve performance symmetry of Set.intersect#19292

Draft
aw0lid wants to merge 2 commits intodotnet:mainfrom
aw0lid:fix/set-intersect-perf-final
Draft

Improve performance symmetry of Set.intersect#19292
aw0lid wants to merge 2 commits intodotnet:mainfrom
aw0lid:fix/set-intersect-perf-final

Conversation

@aw0lid
Copy link

@aw0lid aw0lid commented Feb 14, 2026

Optimize Set.intersect for Symmetric Performance & Identity Preservation

Problem (#19139)

Set.intersect in F# currently shows significant performance asymmetry when intersecting sets of very different sizes:

  • Always iterates over the first set (a) and checks membership in the second set (b).
  • Huge(a) ∩ Tiny(b) is orders of magnitude slower than Tiny(a) ∩ Huge(b).
  • F# semantics require resulting elements to preserve the identity of elements from the first set (a).

Solution

This PR introduces a size-based heuristic and optimized traversal:

  1. Size Heuristic: Compare the sizes of both sets.
  2. Optimized Traversal: Iterate over the smaller set, but retrieve original elements from the first set (a) to preserve identity.
  3. Identity Preservation: Guarantees resulting elements come from the first set.
  4. No API Changes: All changes are internal; method signatures remain unchanged.

Benchmark Results

Tested on Linux x64, .NET 10.0.1 (Release build, BenchmarkDotNet).

Scenario Mean Time (Release) Complexity Identity Preserved
Tiny ∩ Tiny 2.67 μs O(N)
Huge(a) ∩ Tiny(b) 3,633 μs O(M log N)
Tiny(a) ∩ Huge(b) 3,559 μs O(M log N)
Medium ∩ Medium 13,083 μs O(N)
Huge ∩ Huge 1,946,891 μs O(N)
Partial Overlap Huge 1,264,918 μs O(N)
Disjoint Huge ∩ Tiny 3,572 μs O(M log N)

Note: Performance is now symmetric; execution time follows O(M log N) where M is the size of the smaller set.

Verification Code

This snippet validates both performance and identity preservation; it can be copy-pasted directly in F# Interactive or a test project.

open System
open Microsoft.FSharp.Collections

[<CustomEquality; CustomComparison>]
type Element = { Id: int; Data: string }
    interface IComparable with
        member x.CompareTo(obj) =
            match obj with
            | :? Element as e -> compare x.Id e.Id
            | _ -> -1
    override x.Equals(obj) =
        match obj with
        | :? Element as e -> x.Id = e.Id
        | _ -> false
    override x.GetHashCode() = hash x.Id

let runBench name (a:Set<Element>) (b:Set<Element>) =
    // Warmup
    for _ in 1..3 do ignore (Set.intersect a b)
    GC.Collect(); GC.WaitForPendingFinalizers()
    let sw = Diagnostics.Stopwatch.StartNew()
    for _ in 1..5 do ignore (Set.intersect a b)
    sw.Stop()
    let result = Set.intersect a b
    let identityOk =
        if not result.IsEmpty then (Set.minElement result).Data = "A" else true
    printfn "%-35s | Time: %10.4f ms | Identity: %b" name (sw.Elapsed.TotalMilliseconds/5.0) identityOk

let hugeA = Set.ofSeq [ for i in 1..1_000_000 -> { Id=i; Data="A" } ]
let tinyB = Set.ofSeq [ for i in 1..10 -> { Id=i; Data="B" } ]

runBench "Huge(a) ∩ Tiny(b)" hugeA tinyB

Checklist

@github-actions
Copy link
Contributor

❗ Release notes required


✅ Found changes and release notes in following paths:

Warning

No PR link found in some release notes, please consider adding it.

Change path Release notes path Description
src/FSharp.Core docs/release-notes/.FSharp.Core/10.0.300.md No current pull request URL (#19292) found, please consider adding it

@aw0lid aw0lid marked this pull request as draft February 14, 2026 17:45
@vzarytovskii
Copy link
Member

vzarytovskii commented Feb 14, 2026

Benchmark will need to be a bdn, to see how it performs in jitted code, with proper preheat, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: New

Development

Successfully merging this pull request may close these issues.

Slow performance of Set.intersects when comparing two sets of different sizes

2 participants