-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
<algorithm>: nth_element does not comply with worst case running time requirements #856
Comments
It doesn't mention the worst time when without ExecutionPolicy. And I check the llvm's code, the worst case is also O(n^2). |
And do you think weather it is a LWG issue? |
Our implementations of parallel algorithms delegate to the serial implementations when the |
I consistently argued that a big Oh worst case that is worse for the parallel algorithms than for the serial ones was nonsense (how is a user supposed to aim for the right size of N where the parallel one actually wins?) but it seemed like not enough people cared about |
But if you want an O(NlogN) algorithm, the average complexity might be worse, because the original algorithm was O(n) on average, and deter to O(N^2) in few cases. So I think we need a clearer definition in the standard here. |
I think the existing wording in the standard is very clear. The serial version must be They said O(n lg n) because the divide and conquer version of the algorithm is indeed O(n lg n) but there's no reason the same trick that makes introselect O(n) can't be applied to the decision to do more parallelism, as in the following pseudocode:
|
I can't get your point because there is no algorithm can get nth element with O(n) |
Median-of-medians is O(n). https://en.wikipedia.org/wiki/Median_of_medians#Proof_of_linear_running_time |
I wrote a big discussion in the isocpp mailing lists about this that I think might be useful to have to reference so I'm morally reposting it here: The intended implementation of std::nth_element is to implement introselect. Selection has a ‘fast practically but slow worst case’ algorithm pair just like sort does. In the same way that std::sort must implement something like introsort, where it falls back to heap sort or similar if the number of quicksort partitions becomes awful, nth_element is supposed to choose between quickselect and Median-of-medians. Quickselect is like quicksort where after putting the pivot into positions, rather than recursively sorting both sides, you only select on the side containing the selection you want. In Master theorem terms, if the pivot is good, we on average divide the work into 2 subproblems (b is 2), and we only solve one of them (a is 1), and did a partition op f(n) = O(n). This makes c_crit 0, putting us in Case 3, for overall O(n). Just like quicksort, if the chosen pivot is awful, it can degrade to O(N^2) worst case. Median-of-medians takes linear big-Oh time, but has poor real world constant factors. Based on the complexity listed for the parallel version, I believe SG1 intends the same quickselect algorithm to be used, but quickselect’s partition operation replaced with parallelized partition. I apologize for the confusion, further discussion forces us to use ‘partition’ both ways:
The straightforward way to parallelize partition is to do what stable_partition does. At each step, divide the input into 2 parts, and recursively partition each part, giving a sequence like TTTTTTFFFFFFTTTTTTFFFF, then rotate* the middle FFFFFTTTTTT section to put all the Ts at the front and all the Fs at the end. In Master theorem terms, f(n) is the rotate which is O(n), a and b are both 2. That gives c_crit of 1, putting us in Case 2, and an overall result of O(n lg n) , matching the standard. Putting this back into Master theorem terms for Quickselect, f(n) is now O(n lg n), b is still 2, and a is still 1, c_crit is still 0, still putting us in Case 3, for an overall result of O(n lg n), the current requirement for parallelized std::nth_element. * partition can use reverse rather than the rotate that stable_partition needs, but I don’t think that was meaningful to explaining how it works P.S.: The reason I believe the complexity requirement for parallelized partition is wrong that you can stop dividing into subproblems and recursing once you have divided the work into enough parts to saturate the available parallelism. We don’t need N subproblems, we only need T subproblems. We can recurse to depth lg(T) to generate T subproblems, and run the classic linear partition operation on those subproblems. That gives a tree of depth lg(T), and at each layer of the tree we do O(n) work to combine the work of the subproblems, giving overall time O(N lg T). But T, for purposes of N, is a constant factor, so the requirement in the standard should be O(n), not O(n lg n). |
Could you put me in the mail list? [email protected] |
I've filed LWG-4162 for this. But per @BillyONeal's thoughts, I think the currently proposed resolution is less than ideal now and O(N) complexity should be required, perhaps.
It's permitted to implement a faster algorithm even though the LWG issue hasn't been resolved. IIUC there's no real or theoretical obstruction. |
Unfortunately that stuff is controlled by ISO rules; I am not the list owner. |
I agree, but I already had that argument with LEWG and SG1 and don't think an LWG issue is the right place to litigate such a thing. It needs a paper if someone is willing to write one. |
Describe the bug
The standard requires
nth_element
(bothstd
andstd::ranges
overloads) to perform only O(n lg n) swaps. However, we currently implement only quickselect:STL/stl/inc/algorithm
Line 4470 in ff94756
This has worst case running time O(n^2) if we always choose bad pivots, just like quicksort. We need some form of 'ideal' checking similar to how
std::sort
works, and fall back to a median-of-medians or similar implementation.Additional context
This was previously recorded in a Microsoft-internal bug database but I closed the issue there as I had filed it against myself.
The text was updated successfully, but these errors were encountered: