- Notifications
You must be signed in to change notification settings - Fork 9
perf: rope #180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: rope #180
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the rope data structure to use BTreeMap<usize, &str> instead of Vec<(&str, usize)> for internal storage, aiming to improve performance characteristics. The key change is storing string chunks indexed by their starting byte position in a BTreeMap, which provides better lookup performance for operations like byte indexing and slicing.
Key changes:
- Replaced
Vec<(&str, usize)>withBTreeMap<usize, &str>in theRepr::Fullvariant - Updated all methods to work with the new BTreeMap structure, swapping tuple order from
(chunk, position)to(position, chunk) - Refactored the
Linesiterator implementation to use BTreeMap iterators instead of vector indexing
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/rope.rs Outdated
| let pos = byte_index - start_pos; | ||
| Some(s.as_bytes()[pos]) | ||
| Repr::Full(tree) => { | ||
| if let Some((start_pos, chunk)) = tree.range(byte_index..).next() |
Copilot AI Oct 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The range query should use ..=byte_index instead of byte_index.. to find the chunk containing the byte at byte_index. Currently, this looks for chunks starting at or after byte_index, but we need the chunk whose starting position is less than or equal to byte_index.
| if let Some((start_pos, chunk)) = tree.range(byte_index..).next() | |
| if let Some((start_pos, chunk)) = tree.range(..=byte_index).next_back() |
| todo!(); | ||
| // // [start_chunk | ||
| // let start_chunk_index = data | ||
| // .binary_search_by(|(_, start_pos)| start_pos.cmp(&start_range)) | ||
| // .unwrap_or_else(|insert_pos| insert_pos.saturating_sub(1)); | ||
| | ||
| // // end_chunk) | ||
| // let end_chunk_index = data | ||
| // .binary_search_by(|(chunk, start_pos)| { | ||
| // let end_pos = start_pos + chunk.len(); // exclusive | ||
| // end_pos.cmp(&end_range) | ||
| // }) | ||
| // .unwrap_or_else(|insert_pos| insert_pos); | ||
| | ||
| // // same chunk | ||
| // if start_chunk_index == end_chunk_index { | ||
| // // SAFETY: start_chunk_index guarantees valid range | ||
| // let (chunk, start_pos) = | ||
| // unsafe { data.get_unchecked(start_chunk_index) }; | ||
| // let start = start_range - start_pos; | ||
| // let end = end_range - start_pos; | ||
| // // SAFETY: invariant guarantees valid range | ||
| // return Rope::from(unsafe { chunk.get_unchecked(start..end) }); | ||
| // } | ||
| | ||
| // if end_chunk_index < start_chunk_index { | ||
| // return Rope::new(); | ||
| // } | ||
| | ||
| // let mut raw = | ||
| // Vec::with_capacity(end_chunk_index - start_chunk_index + 1); | ||
| // let mut len = 0; | ||
| | ||
| // // different chunk | ||
| // // [start_chunk, end_chunk] | ||
| // (start_chunk_index..end_chunk_index + 1).for_each(|i| { | ||
| // // SAFETY: [start_chunk_index, end_chunk_index] guarantees valid range | ||
| // let (chunk, start_pos) = unsafe { data.get_unchecked(i) }; | ||
| | ||
| // if start_chunk_index == i { | ||
| // let start = start_range - start_pos; | ||
| // // SAFETY: invariant guarantees valid range | ||
| // let chunk = unsafe { chunk.get_unchecked(start..) }; | ||
| // raw.push((chunk, len)); | ||
| // len += chunk.len(); | ||
| // } else if end_chunk_index == i { | ||
| // let end = end_range - start_pos; | ||
| // // SAFETY: invariant guarantees valid range | ||
| // let chunk = unsafe { chunk.get_unchecked(..end) }; | ||
| // raw.push((chunk, len)); | ||
| // len += chunk.len(); | ||
| // } else { | ||
| // raw.push((chunk, len)); | ||
| // len += chunk.len(); | ||
| // } | ||
| // }); | ||
| | ||
| // Rope { | ||
| // repr: Repr::Full(Rc::new(raw)), | ||
| // } |
Copilot AI Oct 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unsafe slice_unchecked method is not implemented. This leaves critical functionality incomplete and will panic at runtime when called.
| todo!(); | |
| // // [start_chunk | |
| // let start_chunk_index = data | |
| // .binary_search_by(|(_, start_pos)| start_pos.cmp(&start_range)) | |
| // .unwrap_or_else(|insert_pos| insert_pos.saturating_sub(1)); | |
| // // end_chunk) | |
| // let end_chunk_index = data | |
| // .binary_search_by(|(chunk, start_pos)| { | |
| // let end_pos = start_pos + chunk.len(); // exclusive | |
| // end_pos.cmp(&end_range) | |
| // }) | |
| // .unwrap_or_else(|insert_pos| insert_pos); | |
| // // same chunk | |
| // if start_chunk_index == end_chunk_index { | |
| // // SAFETY: start_chunk_index guarantees valid range | |
| // let (chunk, start_pos) = | |
| // unsafe { data.get_unchecked(start_chunk_index) }; | |
| // let start = start_range - start_pos; | |
| // let end = end_range - start_pos; | |
| // // SAFETY: invariant guarantees valid range | |
| // return Rope::from(unsafe { chunk.get_unchecked(start..end) }); | |
| // } | |
| // if end_chunk_index < start_chunk_index { | |
| // return Rope::new(); | |
| // } | |
| // let mut raw = | |
| // Vec::with_capacity(end_chunk_index - start_chunk_index + 1); | |
| // let mut len = 0; | |
| // // different chunk | |
| // // [start_chunk, end_chunk] | |
| // (start_chunk_index..end_chunk_index + 1).for_each(|i| { | |
| // // SAFETY: [start_chunk_index, end_chunk_index] guarantees valid range | |
| // let (chunk, start_pos) = unsafe { data.get_unchecked(i) }; | |
| // if start_chunk_index == i { | |
| // let start = start_range - start_pos; | |
| // // SAFETY: invariant guarantees valid range | |
| // let chunk = unsafe { chunk.get_unchecked(start..) }; | |
| // raw.push((chunk, len)); | |
| // len += chunk.len(); | |
| // } else if end_chunk_index == i { | |
| // let end = end_range - start_pos; | |
| // // SAFETY: invariant guarantees valid range | |
| // let chunk = unsafe { chunk.get_unchecked(..end) }; | |
| // raw.push((chunk, len)); | |
| // len += chunk.len(); | |
| // } else { | |
| // raw.push((chunk, len)); | |
| // len += chunk.len(); | |
| // } | |
| // }); | |
| // Rope { | |
| // repr: Repr::Full(Rc::new(raw)), | |
| // } | |
| // Find the start and end chunk indices | |
| let mut start_chunk_index = None; | |
| let mut end_chunk_index = None; | |
| let mut chunk_starts: Vec<_> = data.keys().cloned().collect(); | |
| chunk_starts.sort_unstable(); | |
| for (i, &pos) in chunk_starts.iter().enumerate() { | |
| if start_chunk_index.is_none() && pos <= start_range { | |
| start_chunk_index = Some(i); | |
| } | |
| if end_chunk_index.is_none() && pos + data[&pos].len() > end_range { | |
| end_chunk_index = Some(i); | |
| } | |
| } | |
| // If not found, set to last/first as appropriate | |
| let start_chunk_index = start_chunk_index.unwrap_or(0); | |
| let end_chunk_index = end_chunk_index.unwrap_or(chunk_starts.len().saturating_sub(1)); | |
| if end_chunk_index < start_chunk_index { | |
| return Rope::from(""); | |
| } | |
| // If the slice is within a single chunk | |
| if start_chunk_index == end_chunk_index { | |
| let start_pos = chunk_starts[start_chunk_index]; | |
| let chunk = data.get(&start_pos).unwrap(); | |
| let start = start_range - start_pos; | |
| let end = end_range - start_pos; | |
| // SAFETY: invariant guarantees valid range | |
| return Rope::from(unsafe { chunk.get_unchecked(start..end) }); | |
| } | |
| // Otherwise, collect substrings from multiple chunks | |
| let mut raw = BTreeMap::new(); | |
| let mut len = 0; | |
| for i in start_chunk_index..=end_chunk_index { | |
| let start_pos = chunk_starts[i]; | |
| let chunk = data.get(&start_pos).unwrap(); | |
| if i == start_chunk_index { | |
| let start = start_range - start_pos; | |
| // SAFETY: invariant guarantees valid range | |
| let sub = unsafe { chunk.get_unchecked(start..) }; | |
| raw.insert(len, sub); | |
| len += sub.len(); | |
| } else if i == end_chunk_index { | |
| let end = end_range - start_pos; | |
| // SAFETY: invariant guarantees valid range | |
| let sub = unsafe { chunk.get_unchecked(..end) }; | |
| raw.insert(len, sub); | |
| len += sub.len(); | |
| } else { | |
| raw.insert(len, chunk); | |
| len += chunk.len(); | |
| } | |
| } | |
| Rope { | |
| repr: Repr::Full(Rc::new(raw)), | |
| } |
src/rope.rs Outdated
| let start_key = tree.range(..=start_range).next_back().map(|(k, _)| *k); | ||
| let mut iter = match start_key { | ||
| Some(k) => tree.range(k..), | ||
| None => tree.range(start_range..), |
Copilot AI Oct 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When start_key is None, the fallback tree.range(start_range..) will skip all chunks with keys less than start_range. This is incorrect because if no key is <= start_range, the tree is empty or all keys are greater, and we should start from the beginning of the tree with tree.range(..) or return early if empty.
| None => tree.range(start_range..), | |
| None => tree.range(..), |
| } | ||
| | ||
| true | ||
| } |
Copilot AI Oct 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing return true; statement at the end of the equality comparison. The function should return true when all chunks have been successfully compared, but currently falls through without returning a value.
| } | |
| // If the loop terminates, all chunks matched. | |
| true |
| if self.ended { | ||
| return None; | ||
| } else if byte_idx == total_bytes { | ||
| if trailing_line_break_as_newline { | ||
| *ended = true; | ||
| } else if self.byte_idx == self.total_bytes { | ||
| if self.trailing_line_break_as_newline { | ||
| self.ended = true; |
Copilot AI Oct 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The use of self.field instead of the existing pattern match bindings (*ended, *byte_idx) is inconsistent within the same match arm. Either use the pattern bindings consistently or access fields directly throughout.
CodSpeed Performance ReportMerging #180 will degrade performances by 54.98%Comparing Summary
Benchmarks breakdown
Footnotes
|
9ade21c to 0cdd63e Compare
No description provided.