-
- Notifications
You must be signed in to change notification settings - Fork 1.6k
Improve sort buffer sizing heuristics and honor explicit --buffer-size #8833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
GNU testsuite comparison:
|
GNU testsuite comparison:
|
GNU testsuite comparison:
|
GNU testsuite comparison:
|
you have to refresh the fuzz/Cargo.lock file |
GNU testsuite comparison:
|
please follow this documentation for the performance work: We would like to see hyperfine results and codspeed benchmark :) |
src/uu/sort/src/sort.rs Outdated
.filter(|s| matches!(s.settings.mode, SortMode::GeneralNumeric)) | ||
.count(); | ||
| ||
self.precomputed.fast_lexicographic = self.mode == SortMode::Default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it needs comments, and moved into a function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
let file_hint = file_size_hint(files); | ||
let mem_hint = available_memory_hint(); | ||
| ||
match (file_hint, mem_hint) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the cases need comments here
it is a bit cryptic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now pattern-matches file_size_hint and available_memory_hint, selecting the smaller of the two when both are present. If only one hint is available, it uses that; when neither is available, it falls back to FALLBACK_AUTOMATIC_BUF_SIZE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment in the code, not here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/uu/sort/src/sort.rs Outdated
quarter.max(max) | ||
} | ||
| ||
#[cfg(target_os = "linux")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was already not convinced in #8802
but
there is probably a better way than parsing /proc/meminfo
esp in the sort.rs code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented it using a different method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I said it in the other pr but it does not belong to sort but uucore
And maybe we already have such functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it total_physical_memory? Does it perform the same processing?
Is it acceptable to modify uucore to add available memory?
} | ||
} | ||
| ||
fn ascii_case_insensitive_cmp(a: &[u8], b: &[u8]) -> Ordering { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is that necessary ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/uu/sort/src/sort.rs:1931 adds ascii_case_insensitive_cmp in commit da5bd03 (2025-10-06 “fix”) so the new fast_ascii_insensitive path in compare_by can bypass the heavier general comparator and deliver a cheap ASCII fold when sort -f-style settings allow it.
src/uu/sort/src/sort.rs:1934 is tweaked one commit later (5fe2b48, also “fix”) to use to_ascii_lowercase, relying on the standard helper for clarity and correct handling of non-uppercase bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what your Ai wrote here, sorry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strcasecmp is affected by locale settings, so we implemented it separately.
Should we use strcasecmp?
path: Option<PathBuf>, | ||
} | ||
| ||
fn handler_state() -> Arc<Mutex<HandlerRegistration>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be clear, i am still waiting for a unclear explanation in the code
.clone() | ||
} | ||
| ||
fn ensure_signal_handler_installed(state: Arc<Mutex<HandlerRegistration>>) -> UResult<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, it needs some explanations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensures the SIGINT handler for sort is registered only once via an AtomicBool, shares the current temp-dir lock/path through a global HandlerRegistration so the handler can safely delete the directory and exit, and resets the flag plus surfaces a localized error if handler setup fails; commits b3b721f (introducing this stateful handler) and 5fe2b48 (cleanup of the returned error) capture the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, explain in the code why it is necessary
CodSpeed Performance ReportMerging #8833 will degrade performances by 2.74%Comparing Summary
Benchmarks breakdown
Footnotes
|
impressive wins :) |
Replaced fixed 1GB buffer size with dynamic calculation based on input file sizes and available memory. Added heuristics to clamp buffer size within 512 KiB to 128 MiB range, preventing overcommitment on constrained systems while optimizing for typical workloads. Updated dependencies and imports to use libc directly for better portability.
- Add sysconf to codespell ignore list to prevent false positives - Remove "NOTE:" from buffer heuristics comment for cleaner style
587a73c
to 287d6c4
Compare …cations This change modifies the buffer size handling in the external sort functionality to halve requests exceeding 512MiB, reducing memory usage and avoiding potential allocation issues.
Removed the `nix::libc::` prefix from the `getrlimit` function call in `get_rlimit()`, likely to use a direct import from `libc` for cleaner code and reduced dependency specificity.
Added a comment and updated the match logic in `automatic_buffer_size` to prefer the minimum of file and memory hints when both are available, ensuring a more conservative buffer size estimate for better performance and resource management.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments to the source code.
let file_hint = file_size_hint(files); | ||
let mem_hint = available_memory_hint(); | ||
| ||
match (file_hint, mem_hint) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/uu/sort/src/sort.rs Outdated
quarter.max(max) | ||
} | ||
| ||
#[cfg(target_os = "linux")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it total_physical_memory? Does it perform the same processing?
Is it acceptable to modify uucore to add available memory?
} | ||
} | ||
| ||
fn ascii_case_insensitive_cmp(a: &[u8], b: &[u8]) -> Ordering { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strcasecmp is affected by locale settings, so we implemented it separately.
Should we use strcasecmp?
Added `getrlimit` to the nix::libc imports in sort.rs to enable checking resource limits, likely for handling file descriptor limits on Linux systems.
…eutils into sort_performan-ce
GNU testsuite comparison:
|
- Moved `available_memory_bytes` function from `src/uu/sort/src/sort.rs` to `src/uucore/src/lib/features/parser/parse_size.rs` to centralize memory parsing logic in the core library. - Updated the function to read `/proc/meminfo` more robustly using a new helper `parse_meminfo_line`, improving accuracy and modularity for better code reuse across utilities.
move memory functions to uucore |
GNU testsuite comparison:
|
GNU testsuite comparison:
|
/// Return the number of bytes of memory that appear to be currently available. | ||
#[cfg(target_os = "linux")] | ||
pub fn available_memory_bytes() -> Option<u128> { | ||
let file = std::fs::File::open("/proc/meminfo").ok()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think there is a crate for that, no ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://crates.io/crates/procfs
This is available, but the update frequency isn't that fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update frequency of the crate ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be updated moderately.
It's low-level, but this seems like the best option.
I'll try rewriting it a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it isn't a blocker, it isn't like /proc is changing often :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the implementation of procfs
src/uu/sort/src/ext_sort.rs Outdated
// Heuristically chosen: Dividing by 10 seems to keep our memory usage roughly | ||
// around settings.buffer_size as a whole. | ||
let buffer_size = settings.buffer_size / 10; | ||
// Cap oversized buffer requests at 512MiB to avoid unnecessary allocations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please explain why 512 :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cap automatic_buffer_size at 1 GiB so the reader and sorter never hold more than about 1 GiB of data at once; without that cap they could grab several gigabytes and waste memory.
Since it becomes 1GB when allocated simultaneously, I set it to 512MB.
Added a comment to the `ensure_signal_handler_installed` function in `tmp_dir.rs` to clarify that the shared SIGINT handler ensures the active temp directory is deleted when the user aborts, improving code readability and maintainability.
…eutils into sort_performan-ce
Added explanatory comments to `handler_state()` and `ensure_signal_handler_installed()` to clarify lazy initialization of shared state and ensure proper SIGINT cleanup across TmpDirWrapper instances.
Improved the buffer size handling in ext_sort.rs by adding a detailed comment explaining the 512MiB cap and halving logic. This safeguards against excessive memory usage from oversized user-specified buffers, ensuring reasonable chunk sizes and preventing unnecessary allocations.
- Added procfs 0.16 as a Linux-specific dependency in Cargo.toml - Replaced manual /proc/meminfo parsing with procfs::Meminfo in parse_size.rs - Updated Cargo.lock with new dependencies and version pins for rustix and linux-raw-sys - Improves code reliability and maintainability by leveraging a dedicated crate for system info
Updated memory parsing to use the new procfs::meminfo() function instead of Meminfo::new(), improving compatibility with the latest procfs crate API and simplifying memory info retrieval.
Updated memory parsing to utilize Procfs struct instead of direct meminfo() calls, improving modularity and API consistency in total_physical_memory() and available_memory_bytes() functions.
… directly Removed Procfs instantiation in total_physical_memory() and available_memory_bytes() functions, switching to direct Meminfo::current() calls for more efficient memory info retrieval on Linux.
Updated the import statement to include `Current` from the `procfs` crate, enabling access to current process information for size parsing functionality on Linux systems.
Moved the Quotable import before the procfs import and added #[cfg(target_os = "linux")] to the procfs import to properly gate Linux-specific code and group standard library imports together for better code organization and portability.
Add #[cfg(not(target_os = "linux"))] to the NotFound variant in SystemError enum to ensure it's only available on non-Linux platforms, improving cross-platform compatibility for size parsing.
Updated procfs and procfs-core from 0.16.0 to 0.18.0, removed obsolete dependencies like linux-raw-sys 0.4.15, rustix 0.38.44, lazy_static, and hex, and adjusted rustix version constraints to use latest versions. This ensures compatibility and removes unused code as noted in developer comments.
GNU testsuite comparison:
|
Add automatic buffer-size heuristics (ported from commit a0e77d9). We now size external-sort chunks based on input file sizes and available memory, clamping to 512 KiB–128 MiB so we avoid both tiny buffers and risky overcommit on constrained systems.
Respect user-provided --buffer-size. Only automatically computed sizes are raised to the safety minimum; explicit values are left untouched, which keeps external sorting and --compress-program working even when users choose small buffers.
Performance Comparison (baseline vs. current)
Measurements come from hyperfine --warmup 3 --runs 10; values are means in milliseconds (lower is better).