Skip to content
Merged
Prev Previous commit
Next Next commit
Add sort_by_cached_key method
  • Loading branch information
varkor committed Mar 16, 2018
commit f41a26f2040dfa752825a7d1fdfbd5a8ae3310cf
61 changes: 50 additions & 11 deletions src/liballoc/slice.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1301,33 +1301,76 @@ impl<T> [T] {
merge_sort(self, |a, b| compare(a, b) == Less);
}

/// Sorts the slice with a key extraction function.
///
/// This sort is stable (i.e. does not reorder equal elements) and `O(m n log m n)`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please write it as O(m n log(m n)) to avoid ambiguity.

/// worst-case, where the key function is `O(m)`.
///
/// For expensive key functions (e.g. functions that are not simple property accesses or
/// basic operations), [`sort_by_cached_key`](#method.sort_by_cached_key) is likely to be
/// significantly faster, as it does not recompute element keys.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good pointer to make but I think we are normally cautious and omit this; otherwise we have the docs for a stable method recommending an experimental method (for the next few releases).

This is how we handled sort_unstable_by; the mention of it in sort_by's doc first showed up in Rust 1.20 when it went stable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, that's a reasonable decision. I'll get rid of those comments for now.

///
/// When applicable, unstable sorting is preferred because it is generally faster than stable
/// sorting and it doesn't allocate auxiliary memory.
/// See [`sort_unstable_by_key`](#method.sort_unstable_by_key).
///
/// # Current implementation
///
/// The current algorithm is an adaptive, iterative merge sort inspired by
/// [timsort](https://en.wikipedia.org/wiki/Timsort).
/// It is designed to be very fast in cases where the slice is nearly sorted, or consists of
/// two or more sorted sequences concatenated one after another.
///
/// Also, it allocates temporary storage half the size of `self`, but for short slices a
/// non-allocating insertion sort is used instead.
///
/// # Examples
///
/// ```
/// let mut v = [-5i32, 4, 1, -3, 2];
///
/// v.sort_by_key(|k| k.abs());
/// assert!(v == [1, 2, -3, 4, -5]);
/// ```
#[stable(feature = "slice_sort_by_key", since = "1.7.0")]
#[inline]
pub fn sort_by_key<K, F>(&mut self, mut f: F)
where F: FnMut(&T) -> K, K: Ord
{
merge_sort(self, |a, b| f(a).lt(&f(b)));
}

/// Sorts the slice with a key extraction function.
///
/// During sorting, the key function is called only once per element.
///
/// This sort is stable (i.e. does not reorder equal elements) and `O(m n + n log n)`
/// worst-case, where the key function is `O(m)`.
///
/// For simple key functions (e.g. functions that are property accesses or
/// basic operations), [`sort_by_key`](#method.sort_by_key) is likely to be
/// faster.
///
/// # Current implementation
///
/// The current algorithm is an adaptive, iterative merge sort inspired by
Copy link

@ghost ghost Mar 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is incorrect. It's using pattern-defeating quicksort (see the comment above sort_unstable).

/// [timsort](https://en.wikipedia.org/wiki/Timsort).
/// It is designed to be very fast in cases where the slice is nearly sorted, or consists of
/// two or more sorted sequences concatenated one after another.
///
/// The algorithm allocates temporary storage in a `Vec<(K, usize)` the length of the slice.
/// The algorithm allocates temporary storage in a `Vec<(K, usize)>` the length of the slice.
Copy link

@ghost ghost Mar 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the recently added index type optimization, this is not 100% correct either. :) But I don't have a suggestion for how to rephrase this better right now... Maybe it's okay as is.

///
/// # Examples
///
/// ```
/// let mut v = [-5i32, 4, 1, -3, 2];
///
/// v.sort_by_key(|k| k.abs());
/// v.sort_by_cached_key(|k| k.abs());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use a more interesting example here, ideally one that illustrates the intended use cases of this function. Maybe sort numbers lexicographically?

/// assert!(v == [1, 2, -3, 4, -5]);
/// ```
#[stable(feature = "slice_sort_by_key", since = "1.7.0")]
#[unstable(feature = "slice_sort_by_uncached_key", issue = "34447")]
#[inline]
pub fn sort_by_key<K, F>(&mut self, f: F)
pub fn sort_by_cached_key<K, F>(&mut self, f: F)
where F: FnMut(&T) -> K, K: Ord
{
// Helper macro for indexing our vector by the smallest possible type, to reduce allocation.
Expand Down Expand Up @@ -1432,9 +1475,6 @@ impl<T> [T] {
/// Sorts the slice with a key extraction function, but may not preserve the order of equal
/// elements.
///
/// Note that, currently, the key function for [`sort_unstable_by_key`] is called multiple times
/// per element, unlike [`sort_by_key`].
///
/// This sort is unstable (i.e. may reorder equal elements), in-place (i.e. does not allocate),
/// and `O(m n log m n)` worst-case, where the key function is `O(m)`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

///
Expand All @@ -1446,8 +1486,9 @@ impl<T> [T] {
/// randomization to avoid degenerate cases, but with a fixed seed to always provide
/// deterministic behavior.
///
/// Due to its key calling strategy, [`sort_unstable_by_key`] is likely to be slower than
/// [`sort_by_key`] in cases where the key function is expensive.
/// Due to its key calling strategy, [`sort_unstable_by_key`](#method.sort_unstable_by_key)
/// is likely to be slower than [`sort_by_cached_key`](#method.sort_by_uncached_key) in
/// cases where the key function is expensive.
///
/// # Examples
///
Expand All @@ -1458,8 +1499,6 @@ impl<T> [T] {
/// assert!(v == [1, 2, -3, 4, -5]);
/// ```
///
/// [`sort_by_key`]: #method.sort_by_key
/// [`sort_unstable_by_key`]: #method.sort_unstable_by_key
/// [pdqsort]: https://github.com/orlp/pdqsort
#[stable(feature = "sort_unstable", since = "1.20.0")]
#[inline]
Expand Down
9 changes: 6 additions & 3 deletions src/liballoc/tests/slice.rs
Original file line number Diff line number Diff line change
Expand Up @@ -426,9 +426,12 @@ fn test_sort() {
assert!(v.windows(2).all(|w| w[0] >= w[1]));

// Sort in lexicographic order.
let mut v = orig.clone();
v.sort_by_key(|x| x.to_string());
assert!(v.windows(2).all(|w| w[0].to_string() <= w[1].to_string()));
let mut v1 = orig.clone();
let mut v2 = orig.clone();
v1.sort_by_key(|x| x.to_string());
v2.sort_by_cached_key(|x| x.to_string());
assert!(v1.windows(2).all(|w| w[0].to_string() <= w[1].to_string()));
assert!(v1 == v2);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since sort_by_cached_key is a stable sort, let's also verify its stability in the fn test_sort_stability() below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

// Sort with many pre-sorted runs.
let mut v = orig.clone();
Expand Down