Skip to content

Latest commit

 

History

History
46 lines (38 loc) · 1.75 KB

README.md

File metadata and controls

46 lines (38 loc) · 1.75 KB

Sampling without replacement

The problem of sampling without replacement: given integers n, k such that 0 ≤ kn, return k distinct nonnegative integers < n, choosing at random such that every possible result is equally likely. One way to get a precise understanding of the problem is to look at the simplest algorithm for solving it, via these Python or C++ implementations.

Here, I propose an algorithm I call "Cardchoose" for this problem which as far as I know is novel, and which in C++ outperforms all other solutions to either problem for most (n, k) inputs. The only data structure used is the output vector, so no extra memory is needed; it runs in O(k log k) time. I've implemented this and other algorithms in C++ and Python to compare their performance.

In my C++ tests, cardchoose outperforms rejectionsample, floydf2, and hsel for all values of n and k. quadraticreject is best where k < 100 or so, and select and iterativechoose perform well when k/n is large. cardchoose always performs within an acceptable factor of other algorithms.

def sorted_choose(n, k):
    "k distinct integers 0 <= x < n, sorted"
    t = n - k + 1
    d = [None] * k
    for i in range(k):
        r = random.randrange(t + i)
        if r < t:
            d[i] = r
        else:
            d[i] = d[r - t]
    d.sort()
    for i in range(k):
        d[i] += i
    return d

graph

This is not an officially supported Google product.