Scrape files for sensitive information, and generate an interactive HTML report. Based on Rabin2.
This tool is only as good as your RegEx skills.
You can also style your own report.
Tested on Kali Linux v2024.2 (64-bit).
Made for educational purposes. I hope it will help!
On Kali Linux, run:
apt-get -y install radare2On Windows OS, download and unpack radareorg/radare2, then, add the bin directory to Windows PATH environment variable.
On macOS, run:
brew install radare2pip3 install --upgrade file-scrapergit clone https://github.com/ivan-sincek/file-scraper && cd file-scraper python3 -m pip install --upgrade build python3 -m build python3 -m pip install dist/file_scraper-4.7-py3-none-any.whlPrepare a template such as the default template:
{ "Auth.":{ "query":"(?:basic|bearer)\\ ", "ignorecase":true, "search":true }, "Variables":{ "query":"(?:access|account|admin|auth|card|conf|cookie|cred|customer|email|history|ident|info|jwt|key|kyc|log|otp|pass|pin|priv|refresh|salt|secret|seed|session|setting|sign|token|transaction|transfer|user)[\\w\\d\\-\\_]*(?:\\\"\\ *\\:|\\ *\\=[^\\=]{1})", "ignorecase":true, "search":true }, "Comments":{ "query":"(?:(?<!\\:)\\/\\/|\\#).*(?:bug|compatibility|crash|deprecated|fix|issue|legacy|problem|review|security|todo|to do|to-do|to_do|vuln|warning)", "ignorecase":true, "search":true }, "Abs. URL":{ "query":"[\\w\\d\\+]*\\:\\/\\/[\\w\\d\\@\\-\\_\\.\\:\\/\\?\\&\\=\\%\\#]+", "unique":true, "collect":true }, "IPv4":{ "query":"(?:\b25[0-5]|\b2[0-4][0-9]|\b[01]?[0-9][0-9]?)(?:\\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}", "unique":true, "collect":true }, "Base64":{ "query":"(?:[a-zA-Z0-9\\+\\/]{4})*(?:[a-zA-Z0-9\\+\\/]{4}|[a-zA-Z0-9\\+\\/]{3}\\=|[a-zA-Z0-9\\+\\/]{2}\\=\\=)", "minimum":8, "decode":"base64", "minimum_decode":6, "unique":true, "collect":true }, "HEX":{ "query":"(?:(?:0x|(?:\\\\)+x)[a-fA-F0-9]{2})+|(?:[a-fA-F0-9]{2})+", "minimum":12, "decode":"hex", "minimum_decode":6, "unique":true, "collect":true }, "PEM":{ "query":"-----BEGIN (?:CERTIFICATE|PRIVATE KEY)-----[\\s\\S]+?-----END (?:CERTIFICATE|PRIVATE KEY)-----", "decode":"pem", "unique":true, "collect":true } }Make sure your regular expressions return only one capturing group, e.g., [1, 2, 3, 4]; and not a touple, e.g., [(1, 2), (3, 4)].
Make sure to properly escape regular expression specific symbols in your template file, e.g., make sure to escape dot . as \\., and forward slash / as \\/, etc.
| Name | Type | Required | Description |
|---|---|---|---|
| query | str | yes | Regular expression query. |
| search | bool | no | Highlight matches within the searched lines; otherwise, extract the matches. |
| ignorecase | bool | no | Case-insensitive search. |
| minimum | int | no | Only accept matches longer than int characters. |
| maximum | int | no | Only accept matches lesser than int characters. |
| decode | str | no | Decode the matches. Available decodings: url, base64 hex, pem. |
| minimum_decode | int | no | Only accept decodings longer than int characters. |
| maximum_decode | int | no | Only accept decodings lesser than int characters. |
| unique | bool | no | Filter out duplicates. |
| collect | bool | no | Collect all the matches in one place. |
minimum_decode and maximum_decode will check the length of the decoded string after bad characters are removed.
How I typically run the tool:
file-scraper -dir directory -o results.html -e default Default (built-in) exclude file types:
car, css, gif, jpeg, jpg, mp3, mp4, nib, ogg, otf, eot, png, storyboard, strings, svg, ttf, webp, woff, woff2, xib, vtt File Scraper v4.7 ( github.com/ivan-sincek/file-scraper ) Usage: file-scraper -dir directory -o out [-t template ] [-th threads] Example: file-scraper -dir decoded -o results.html [-t template.json] [-th 10 ] DESCRIPTION Scrape files for sensitive information DIRECTORY Directory containing files or a single file to scrape -dir, --directory> = decoded | files | test.exe | etc. TEMPLATE File containing extraction details or a single RegEx to use Default: built-in JSON template file -t, --template = template.json | "secret\: [\w\d]+" | etc. EXCLUDES Exclude all files ending with the specified extension Specify 'default' to load the built-in list Use comma-separated values -e, --excludes = mp3 | default,jpg,png | etc. INCLUDES Include all files ending with the specified extension Overrides the excludes Use comma-separated values -i, --includes = java | json,xml,yaml | etc. BEAUTIFY Beautify [minified] JavaScript (.js) files -b, --beautify THREADS Number of parallel threads to run Default: 30 -th, --threads = 10 | etc. OUT Output file -o, --out = results.html | etc. DEBUG Enable debug output -dbg, --debug Figure 1 - Interactive Report (1)
Figure 2 - Interactive Report (2)
Figure 3 - Interactive Report (3)


