In initial experiments with HOC4 trees, for k = 2, many arms were being computed exactly / falling back to naive.
In some sense, in the very beginning as you initialize medoids, your effective \sigma is huge. This makes good candidates hard to distinguish, and due to symmetry of clusters, makes a lot of candidates look good. The larger number of candidates are hard to distinguish because of the larger effective \sigma.
As you add more medoids, the effective \sigma decreases dramatically, until the elbow of decreasing returns. Then it's easier for the UCB algorithm to shine.
If we increase k in the experiment, the results do indeed look better than naive.