-
-
Notifications
You must be signed in to change notification settings - Fork 14.2k
Update documentation of select_nth_unstable and select_nth_unstable_by to state O(n^2) complexity #106933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @cuviper (or someone else) soon. Please see the contribution instructions for more information. |
library/core/src/slice/mod.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure the current implementation is actually O(n²) worse-case? It looks to me like it's doing a bunch of work in pivot selection to avoid being quadratic: EDIT: I'm wrong, and should have read the issue.
rust/library/core/src/slice/sort.rs
Lines 692 to 716 in 4817259
| if len >= SHORTEST_MEDIAN_OF_MEDIANS { | |
| // Finds the median of `v[a - 1], v[a], v[a + 1]` and stores the index into `a`. | |
| let mut sort_adjacent = |a: &mut usize| { | |
| let tmp = *a; | |
| sort3(&mut (tmp - 1), a, &mut (tmp + 1)); | |
| }; | |
| // Find medians in the neighborhoods of `a`, `b`, and `c`. | |
| sort_adjacent(&mut a); | |
| sort_adjacent(&mut b); | |
| sort_adjacent(&mut c); | |
| } | |
| // Find the median among `a`, `b`, and `c`. | |
| sort3(&mut a, &mut b, &mut c); | |
| } | |
| if swaps < MAX_SWAPS { | |
| (b, swaps == 0) | |
| } else { | |
| // The maximum number of swaps was performed. Chances are the slice is descending or mostly | |
| // descending, so reversing will probably help sort it faster. | |
| v.reverse(); | |
| (len - 1 - b, true) | |
| } |
At least I think that any documentation update here should emphasize that this is average-case O(n), since that's the point of the function existing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm relying on the statement that this is O(n²) worst-case from here: #102451
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, just saw the discussion in the issue. And sadly it's not only O(n²) for malicious Ord either :(
Yeah, guess this needs to change to talk about the average and the worst-case separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noted it. I’m not sure if select_nth_unstable_by_key is also affected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a better option would be to rewrite the algorithm based on one of the methods linked in the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe as a first step here do the simple change to update the documentation just to change "worst-case" to "average"? Since the issue shows it's currently incorrect.
Then a follow-up PR could do the same fallback to heapsort that sort_unstable does to avoid being worse-cast O(n²).
7fceb42 to
292a8d5
Compare
|
Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
|
292a8d5 to
6652405
Compare
|
Some changes occurred in src/tools/cargo cc @ehuss |
|
oh f***. I ran x.py, but that introduced more changes… |
e8d1b07 to
85da540
Compare
cuviper
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not sure if
select_nth_unstable_by_keyis also affected.
I think it must be -- e.g. you could use T::clone to get the exact same complexity as select_nth_unstable (plus cloning overhead).
|
OK, so I guess this is resolved now. |
|
The documentation is still incorrect -- it's now average O(n) and worst-case O(n log n). |
3ab2f78 to
672aa43
Compare
This comment has been minimized.
This comment has been minimized.
77bfbb2 to
60c3b6a
Compare
|
Technically, this breaks an API promise, but one we never fulfilled and really could not fulfill as it is not truly possible to use a "comparison sort" as implied by the |
|
It is definitely possible to get better than O(n log n) for comparison based selection, using something like median of medians which runs in O(n) worst case. Although it's slow enough that it's not worth it to always use it, we could definitely realistically use it as a fallback algorithm for our introselect, so we get the fast average case of quickselect but still guaranteed O(n) |
|
r? libs-api |
Oh, fair enough! I actually thought of a few different other counterexamples but most of them had sufficient additional space complexity that it didn't seem worth mentioning, and after a few tens of minutes surveying things I was about ready to write them off.
Interesting. This seems like it'd be worth actually comparing the real performance on the options in some practical expected cases at various sizes, because it's not worth having O(n) if the O(n log n) beats it every time, but maybe the median-of-medians fallback would actually win out here. |
|
I wrote a (very unoptimized) median-of-medians implementation a few days ago, and IIRC, using an input that caused quadratic behavior before #106997, it was slower to use as the introselect fallback than heapsort on a slice of length |
|
So I wrote this pretty simple median of medians implementation: // the indices must all be in bounds and must not overlap.
// swaps elements around so the median of the 5 elements ends up in `v[c]`
unsafe fn median_of_five<T, F: FnMut(&T, &T) -> bool>(
v: &mut [T],
is_less: &mut F,
a: usize,
b: usize,
c: usize,
d: usize,
e: usize,
) {
let [a, b, c, d, e] = unsafe { v.get_many_unchecked_mut([a, b, c, d, e]) };
let sort = |a: &mut T, b: &mut T, is_less: &mut F| {
if is_less(b, a) {
mem::swap(a, b);
}
};
sort(a, c, is_less);
sort(b, d, is_less);
if is_less(c, d) {
mem::swap(c, d);
mem::swap(a, b);
}
sort(b, e, is_less);
if is_less(c, e) {
mem::swap(c, e);
sort(a, c, is_less);
} else {
sort(b, c, is_less);
}
}
pub fn select_linear<T, F: FnMut(&T, &T) -> bool>(mut v: &mut [T], is_less: &mut F, mut k: usize) {
fn select_pivot<T, F: FnMut(&T, &T) -> bool>(v: &mut [T], is_less: &mut F) -> usize {
debug_assert!(v.len() >= 5);
let mut j = 0;
let mut i = 0;
while i + 4 < v.len() {
unsafe { median_of_five(v, is_less, i, i + 1, i + 2, i + 3, i + 4) };
unsafe { v.swap_unchecked(i + 2, j) };
i += 5;
j += 1;
}
select_linear(unsafe { v.get_unchecked_mut(..j) }, is_less, j / 2);
partition(v, j / 2, is_less).0
}
loop {
if v.len() <= 10 {
insertion_sort(v, is_less);
return;
}
let p = select_pivot(v, is_less);
if p == k {
return;
} else if p > k {
v = unsafe { v.get_unchecked_mut(..p) };
} else {
k -= p + 1;
v = unsafe { v.get_unchecked_mut(p + 1..) };
}
}
}and on my machine it becomes faster at computing the median than the stdlib heapsort at roughly |
|
I also now implemented the fast deterministic selection algorithm here. The implementation is entirely based on this paper (and this repo) except that I couldn't be bothered to implement In its current form it is a bit slower than |
|
Is this even the right thread for this comment? Should I be posting it in the original issue thread instead? |
|
I just wanted to update the documentation. It’s probably better to discuss this on the original issue. |
Amanieu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some wording nits, but otherwise this doc change looks good to me.
…y and select_nth_unstable_by_key to state O(n log n) worst case complexity Also remove erronious / in doc comment
60c3b6a to
f1e649b
Compare
|
Updated the wording. |
|
@bors r+ rollup |
See #102451