Here's an example that I've used to get all the pages from Paul Graham's website:
$ wget --recursive --level=inf --no-remove-listing --wait=6 --random-wait --adjust-extension --no-clobber --continue -e robots=off --user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36" --domains=paulgraham.com https://paulgraham.com
Parameter | Description |
---|---|
--recursive | Enables recursive downloading (following links) |
--level=inf | Sets the recursion level to infinite |
--no-remove-listing | Keep ".listing" files that are created to keep track of directory listings |
--wait=6 | Wait the given number of seconds between requests |
--random-wait | Multiplies --wait randomly between 0.5 and 1.5 for each request |
--adjust-extension | Make sure that ".html" is added to the files |
--no-clobber | Do not redownload a file if exists locally |
--continue | Allows resuming downloading a partially downloaded file |
-e robots=off | Ignores robots.txt instructions. |
--user-agent | Sends the given "User-Agent" header to the server |
--domains | Comma-separated list of domains to be followed |
--span-hosts | Allows navigating to subdomains |
Other useful parameters:
Parameter | Description |
---|---|
--page-requisites | Downloads things as inlined images, sounds, and referenced stylesheets |
--span-hosts | Allows downloading files from links that point to different hosts |
--convert-links | Converts links to local links (allowing local viewing) |
--no-check-certificate | Bypasses SSL certificate verification. |
--directory-prefix=/my/directory | Sets up the destination directory. |
--include-directories=posts | Comma-separated list of allowed directories to be followed when crawling |
--reject "*?*" | Rejects URLs that contain query strings |
Top comments (0)