A simple crawler to calculate disk usage of a root directory.
This crawler is written in Rust and uses the walkdir crate to traverse the directory tree. It uses rayon to parallelize the traversal. Each file and directory's metadata is read using the std::fs module (Unix specific). This information is then written to a postgres database using sqlx.
- Crawls the directory tree and calculates the disk usage of each file.
- Uses
rayonto parallelize the traversal. - Uses
sqlxto write the data to a postgres database. - Estimates the disk usage of a folder, using a recursive query in the database.
- Calculate the disk usage of an especially large directory.
- If looking for something more lightweight, consider using the
ducommand or parallel-disk-usage instead.
- If looking for something more lightweight, consider using the
- See the owner of the files and directories.
- Most other programs do only look at the size of the files and directories.
- An configured and accessible postgres database.
- The crawler is Unix specific and uses the
std::os::unix::fs::MetadataExtmodule to read file metadata. - The
sqlxcrate requires a valid schema to be present in the database, during compilation. This schema can be generated using theinit_dbbinary.- This might require partial compilation of the project, which can be done using the
cargo build --bin init_dbcommand.
- This might require partial compilation of the project, which can be done using the
- Build the project:
cargo run --release -- <root_directory>- Initialize the database:
export DATABASE_URL=postgres://<user>:<password>@<host>:<port>/<database> ./target/release/init_db- Run the crawler:
export DATABASE_URL=postgres://<user>:<password>@<host>:<port>/<database> ./target/release/disk_usage -r <root_directory>- Get folder size:
export DATABASE_URL=postgres://<user>:<password>@<host>:<port>/<database> ./target/release/estimate -p <path>The database schema is visualized below:
