Skip to content

Conversation

mattsu2020
Copy link
Contributor

Add automatic buffer-size heuristics (ported from commit a0e77d9). We now size external-sort chunks based on input file sizes and available memory, clamping to 512 KiB–128 MiB so we avoid both tiny buffers and risky overcommit on constrained systems.

Respect user-provided --buffer-size. Only automatically computed sizes are raised to the safety minimum; explicit values are left untouched, which keeps external sorting and --compress-program working even when users choose small buffers.

Performance Comparison (baseline vs. current)
Measurements come from hyperfine --warmup 3 --runs 10; values are means in milliseconds (lower is better).

Scenario Baseline Current Delta Speedup
ASCII 500k 17.04 14.59 -2.45 1.17×
Numeric 500k 36.89 37.29 +0.40 0.99×
ASCII 4M 112.47 107.52 -4.95 1.05×
ASCII 4M (-S 32M) 207.40 100.48 -106.92 2.06×
ASCII 16M 854.18 414.45 -439.73 2.06×
ASCII 16M (-S 512M) 838.05 426.85 -411.20 1.96×

Copy link

github-actions bot commented Oct 6, 2025

GNU testsuite comparison:

Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch) 
Copy link

github-actions bot commented Oct 7, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch) 
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch) Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch) Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch) 
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch) 
@sylvestre
Copy link
Contributor

you have to refresh the fuzz/Cargo.lock file

Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch) Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch) 
@sylvestre
Copy link
Contributor

please follow this documentation for the performance work:
https://github.com/uutils/coreutils/blob/main/docs/src/performance.md

We would like to see hyperfine results and codspeed benchmark :)
thanks

.filter(|s| matches!(s.settings.mode, SortMode::GeneralNumeric))
.count();

self.precomputed.fast_lexicographic = self.mode == SortMode::Default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it needs comments, and moved into a function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

let file_hint = file_size_hint(files);
let mem_hint = available_memory_hint();

match (file_hint, mem_hint) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the cases need comments here
it is a bit cryptic

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now pattern-matches file_size_hint and available_memory_hint, selecting the smaller of the two when both are present. If only one hint is available, it uses that; when neither is available, it falls back to FALLBACK_AUTOMATIC_BUF_SIZE.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment in the code, not here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

quarter.max(max)
}

#[cfg(target_os = "linux")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was already not convinced in #8802
but
there is probably a better way than parsing /proc/meminfo

esp in the sort.rs code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented it using a different method

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I said it in the other pr but it does not belong to sort but uucore
And maybe we already have such functions

Copy link
Contributor Author

@mattsu2020 mattsu2020 Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it total_physical_memory? Does it perform the same processing?
Is it acceptable to modify uucore to add available memory?

}
}

fn ascii_case_insensitive_cmp(a: &[u8], b: &[u8]) -> Ordering {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is that necessary ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/uu/sort/src/sort.rs:1931 adds ascii_case_insensitive_cmp in commit da5bd03 (2025-10-06 “fix”) so the new fast_ascii_insensitive path in compare_by can bypass the heavier general comparator and deliver a cheap ASCII fold when sort -f-style settings allow it.

src/uu/sort/src/sort.rs:1934 is tweaked one commit later (5fe2b48, also “fix”) to use to_ascii_lowercase, relying on the standard helper for clarity and correct handling of non-uppercase bytes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what your Ai wrote here, sorry

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strcasecmp is affected by locale settings, so we implemented it separately.
Should we use strcasecmp?

path: Option<PathBuf>,
}

fn handler_state() -> Arc<Mutex<HandlerRegistration>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit b3b721f makes the temporary-directory Ctrl+C handler reusable by storing the global handler state in OnceLock/AtomicBool and cleaning it up in Drop.

Commit 5fe2b48 removes an unnecessary .into() because USimpleError::new already returns the boxed error type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be clear, i am still waiting for a unclear explanation in the code

.clone()
}

fn ensure_signal_handler_installed(state: Arc<Mutex<HandlerRegistration>>) -> UResult<()> {
Copy link
Contributor

@sylvestre sylvestre Oct 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, it needs some explanations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensures the SIGINT handler for sort is registered only once via an AtomicBool, shares the current temp-dir lock/path through a global HandlerRegistration so the handler can safely delete the directory and exit, and resets the flag plus surfaces a localized error if handler setup fails; commits b3b721f (introducing this stateful handler) and 5fe2b48 (cleanup of the returned error) capture the change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, explain in the code why it is necessary

Copy link

codspeed-hq bot commented Oct 11, 2025

CodSpeed Performance Report

Merging #8833 will degrade performances by 2.74%

Comparing mattsu2020:sort_performan-ce (50e974e) with main (73af9e9)

Summary

⚡ 16 improvements
❌ 3 regressions
✅ 87 untouched
⏩ 73 skipped1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
sort_accented_data[500000] 819.3 ms 352.5 ms ×2.3
sort_ascii_only[500000] 798 ms 343.6 ms ×2.3
sort_case_insensitive[500000] 417.1 ms 271.9 ms +53.44%
sort_case_sensitive[500000] 327.1 ms 162.9 ms ×2
sort_key_field[500000] 688.8 ms 707.9 ms -2.7%
sort_long_line[160000] 2 ms 2 ms -2.74%
sort_mixed_data[500000] 729.6 ms 317.7 ms ×2.3
sort_numeric[500000] 1.2 s 1.2 s -2.53%
sort_reverse_locale[500000] 822.3 ms 351.7 ms ×2.3
sort_unique_locale[500000] 1,142.4 ms 473.1 ms ×2.4
sort_ascii_c_locale 27.9 ms 21.3 ms +30.86%
sort_ascii_utf8_locale 55.8 ms 42.5 ms +31.29%
sort_german_c_locale 94.5 ms 38.1 ms ×2.5
sort_german_locale 94.5 ms 38.1 ms ×2.5
sort_mixed_c_locale 94.1 ms 38 ms ×2.5
sort_mixed_utf8_locale 94.1 ms 38 ms ×2.5
sort_random_strings 55.3 ms 30.9 ms +78.77%
sort_reverse_mixed 93.2 ms 37.3 ms ×2.5
sort_unique_mixed 87.6 ms 38.2 ms ×2.3

Footnotes

  1. 73 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@sylvestre
Copy link
Contributor

impressive wins :)

Replaced fixed 1GB buffer size with dynamic calculation based on input file sizes and available memory. Added heuristics to clamp buffer size within 512 KiB to 128 MiB range, preventing overcommitment on constrained systems while optimizing for typical workloads. Updated dependencies and imports to use libc directly for better portability.
- Add sysconf to codespell ignore list to prevent false positives - Remove "NOTE:" from buffer heuristics comment for cleaner style
…cations This change modifies the buffer size handling in the external sort functionality to halve requests exceeding 512MiB, reducing memory usage and avoiding potential allocation issues.
Removed the `nix::libc::` prefix from the `getrlimit` function call in `get_rlimit()`, likely to use a direct import from `libc` for cleaner code and reduced dependency specificity.
Added a comment and updated the match logic in `automatic_buffer_size` to prefer the minimum of file and memory hints when both are available, ensuring a more conservative buffer size estimate for better performance and resource management.
Copy link
Contributor Author

@mattsu2020 mattsu2020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments to the source code.

let file_hint = file_size_hint(files);
let mem_hint = available_memory_hint();

match (file_hint, mem_hint) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

quarter.max(max)
}

#[cfg(target_os = "linux")]
Copy link
Contributor Author

@mattsu2020 mattsu2020 Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it total_physical_memory? Does it perform the same processing?
Is it acceptable to modify uucore to add available memory?

}
}

fn ascii_case_insensitive_cmp(a: &[u8], b: &[u8]) -> Ordering {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strcasecmp is affected by locale settings, so we implemented it separately.
Should we use strcasecmp?

mattsu2020 and others added 3 commits October 13, 2025 20:17
Added `getrlimit` to the nix::libc imports in sort.rs to enable checking resource limits, likely for handling file descriptor limits on Linux systems.
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch) 
mattsu2020 and others added 2 commits October 14, 2025 14:36
- Moved `available_memory_bytes` function from `src/uu/sort/src/sort.rs` to `src/uucore/src/lib/features/parser/parse_size.rs` to centralize memory parsing logic in the core library. - Updated the function to read `/proc/meminfo` more robustly using a new helper `parse_meminfo_line`, improving accuracy and modularity for better code reuse across utilities.
@mattsu2020
Copy link
Contributor Author

move memory functions to uucore

Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch) 
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch) Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch) 
/// Return the number of bytes of memory that appear to be currently available.
#[cfg(target_os = "linux")]
pub fn available_memory_bytes() -> Option<u128> {
let file = std::fs::File::open("/proc/meminfo").ok()?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think there is a crate for that, no ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://crates.io/crates/procfs
This is available, but the update frequency isn't that fast.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update frequency of the crate ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be updated moderately.
It's low-level, but this seems like the best option.
I'll try rewriting it a bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it isn't a blocker, it isn't like /proc is changing often :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the implementation of procfs

// Heuristically chosen: Dividing by 10 seems to keep our memory usage roughly
// around settings.buffer_size as a whole.
let buffer_size = settings.buffer_size / 10;
// Cap oversized buffer requests at 512MiB to avoid unnecessary allocations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please explain why 512 :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cap automatic_buffer_size at 1 GiB so the reader and sorter never hold more than about 1 GiB of data at once; without that cap they could grab several gigabytes and waste memory.
Since it becomes 1GB when allocated simultaneously, I set it to 512MB.

Added a comment to the `ensure_signal_handler_installed` function in `tmp_dir.rs` to clarify that the shared SIGINT handler ensures the active temp directory is deleted when the user aborts, improving code readability and maintainability.
Added explanatory comments to `handler_state()` and `ensure_signal_handler_installed()` to clarify lazy initialization of shared state and ensure proper SIGINT cleanup across TmpDirWrapper instances.
Improved the buffer size handling in ext_sort.rs by adding a detailed comment explaining the 512MiB cap and halving logic. This safeguards against excessive memory usage from oversized user-specified buffers, ensuring reasonable chunk sizes and preventing unnecessary allocations.
- Added procfs 0.16 as a Linux-specific dependency in Cargo.toml - Replaced manual /proc/meminfo parsing with procfs::Meminfo in parse_size.rs - Updated Cargo.lock with new dependencies and version pins for rustix and linux-raw-sys - Improves code reliability and maintainability by leveraging a dedicated crate for system info
Updated memory parsing to use the new procfs::meminfo() function instead of Meminfo::new(), improving compatibility with the latest procfs crate API and simplifying memory info retrieval.
Updated memory parsing to utilize Procfs struct instead of direct meminfo() calls, improving modularity and API consistency in total_physical_memory() and available_memory_bytes() functions.
… directly Removed Procfs instantiation in total_physical_memory() and available_memory_bytes() functions, switching to direct Meminfo::current() calls for more efficient memory info retrieval on Linux.
Updated the import statement to include `Current` from the `procfs` crate, enabling access to current process information for size parsing functionality on Linux systems.
Moved the Quotable import before the procfs import and added #[cfg(target_os = "linux")] to the procfs import to properly gate Linux-specific code and group standard library imports together for better code organization and portability.
Add #[cfg(not(target_os = "linux"))] to the NotFound variant in SystemError enum to ensure it's only available on non-Linux platforms, improving cross-platform compatibility for size parsing.
Updated procfs and procfs-core from 0.16.0 to 0.18.0, removed obsolete dependencies like linux-raw-sys 0.4.15, rustix 0.38.44, lazy_static, and hex, and adjusted rustix version constraints to use latest versions. This ensures compatibility and removes unused code as noted in developer comments.
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/tee (passes in this run but fails in the 'main' branch) Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch) 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants