DEV Community

Tim McNamara
Tim McNamara

Posted on

Creating a priority queue with a custom sort order using a binary heap in Rust

I want to create a personal web crawler. As I download each page, I collect a new list of URLs. To increase the likelihood that I am downloading the most important pages first, I want to keep the list of URLs to visit sorted by length (shorter URLs are more likely to be closer to the front page). That presents a problem though, because I'm also adding URLs.

A naïve list (Vec<T> in Rust), won't be very efficient. After every batch of inserts, I would need to re-sort the list. Wouldn't it be good if there were a data structure that could maintain the sort order itself? Yes! It's called a binary heap.

Rust's standard library provides a binary heap implementation, as std::collections::BinaryHeap. In my case, I want to sort a list of url::Url. By default, they are sorted in alphabetic order.

use url::Url; use std::collections::BinaryHeap; fn main() { let raw_urls = vec![ "https://www.youtube.com/watch?v=ywskA8CoulM", "https://tim.mcnamara.nz/", "https://github.com/rust-lang/rust/issues?labels=E-easy&state=open", ]; let mut urls = BinaryHeap::new(); for url in raw_urls { urls.push(Url::parse(url).unwrap()); } while let Some(url) = urls.pop() { println!("{:?}", url.as_str()); } } 
Enter fullscreen mode Exit fullscreen mode

This prints our URLs, but in alphabetic order:

"https://www.youtube.com/watch?v=ywskA8CoulM" "https://tim.mcnamara.nz/" "https://github.com/rust-lang/rust/issues?labels=E-easy&state=open" 
Enter fullscreen mode Exit fullscreen mode

Unfortunately, it's a little bit complicated to ask BinaryHeap to use a custom sort order. To change the default, I create a new type that wraps one as ShortestFirst(Url). Then, I implement Ord and PartialOrd traits.

use url::Url; use std::collections::BinaryHeap; #[derive(PartialEq, Eq, , Debug)] struct ShortestFirst(Url); impl PartialOrd for ShortestFirst { fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> { Some(self.cmp(other)) } } impl Ord for ShortestFirst { fn cmp(&self, other: &Self) -> std::cmp::Ordering { let left = (other.0).as_str().len(); let right = (self.0).as_str().len(); left.cmp(&right) } } fn main() { let raw_urls = vec![ "https://www.youtube.com/watch?v=ywskA8CoulM", "https://tim.mcnamara.nz/", "https://github.com/rust-lang/rust/issues?labels=E-easy&state=open", ]; let mut urls = BinaryHeap::new(); for url in raw_urls { urls.push(ShortestFirst(Url::parse(url).unwrap())); } while let Some(ShortestFirst(url)) = urls.pop() { println!("{:?}", url.as_str()); } } 
Enter fullscreen mode Exit fullscreen mode

This prints the URLs, in the order we wanted:

"https://tim.mcnamara.nz/" "https://www.youtube.com/watch?v=ywskA8CoulM" "https://github.com/rust-lang/rust/issues?labels=E-easy&state=open" 
Enter fullscreen mode Exit fullscreen mode

Top comments (0)