A VSCode extension that generates markdown documentation from web pages and GitHub repositories.
If you find Docs Miner useful, please consider leaving a star ⭐ on github repository or buying me a coffee ☕ to keep me motivated to work on this project.
- Generate markdown documentation from any web URL or GitHub repository
 - Two scraping methods: 
- API Method (Faster but may fail on some sites)
 - Browser Method (Slower but more reliable)
 
 - Smart crawling that follows: 
- Subdirectory structure from the initial URL for websites
 - Repository file structure for GitHub repositories
 
 - Configurable crawling depth with precise level control
 - Real-time progress tracking
 - Stop crawling at any time
 - Automatically saves the markdown file in your current workspace
 - Opens the generated file for immediate viewing
 
-  
Open the Docs Miner sidebar (look for the Docs Miner icon in the Activity Bar)
 -  
Enter the URL you want to generate documentation from:
- For websites: any web URL (e.g., https://example.com)
 - For GitHub: repository URL (e.g., https://github.com/username/repo) or specific directory (e.g., https://github.com/username/repo/tree/main/docs)
 
 -  
Adjust the crawling depth using the slider:
Website depth levels
- Depth 1: Only the entered page
 - Depth 2: The entered page and links at the same directory level
 - Depth 3: The entered page and links up to two directory levels
 - Depth 4: The entered page and links up to three directory levels
 - Depth 5: The entered page and links up to four directory levels
 
GitHub repository depth levels
- Depth 1: Root files only
 - Depth 2: Root + one directory level
 - Depth 3: Root + two directory levels
 - Depth 4: Root + three directory levels
 - Depth 5: Root + four directory levels
 
 -  
Specify the file name for the generated documentation. If not specified, the URL will be used instead.
 -  
Specify the output folder for the generated documentation. If not specified, the current workspace folder will be used.
 -  
Alternatively, use the "Add to File" button to choose an existing markdown file to append the crawled content to.
 -  
Click "Start Crawling" to begin
 -  
Monitor the progress in real-time
 -  
Use the "Stop Crawling" button if you want to end the process early
 
The markdown file will be automatically created in your specified output folder and opened for viewing.
- VSCode 1.80.0 or higher
 - Active internet connection
 
Choose one of the following installation methods:
- Open VS Code
 - Go to the Extensions view (Ctrl+Shift+X)
 - Search for "Docs Miner"
 - Click Install
 
- Go to the latest release
 - Download the latest 
docs-miner-x.x.x.vsixfile - In VS Code: 
- Go to Extensions view (Ctrl+Shift+X)
 - Click '...' menu (top-right)
 - Select 'Install from VSIX...'
 - Choose the downloaded file
 
 
- Clone the repository: 
git clone https://github.com/3choff/docs-miner - Run 
npm installin the terminal - Run 
npm run compileto build the extension - To create a VSIX package: 
- Install vsce: 
npm install -g @vscode/vsce - Run 
vsce package - The .vsix file will be created in the root directory
 
 - Install vsce: 
 - To install the VSIX: 
- Go to VS Code Extensions view
 - Click the '...' menu (top-right)
 - Select 'Install from VSIX...'
 - Choose the generated .vsix file
 
 
- The extension offers two methods for content extraction: 
- Jina AI Reader API: Fast but may fail on some websites
 - Browser-based scraping: More reliable but slower, handles JavaScript-heavy sites
 
 - Crawling is restricted to subdirectories of the initial URL to ensure focused documentation
 - Rate limiting: 0.5 second delay between requests to prevent overloading
 - May be affected by website's robots.txt and rate limiting policies
 - Skips non-documentation links (Images, executables, etc.)
 
Feedback and contributions are welcome. If you encounter any issues or have suggestions for improvements, please create a new issue on the GitHub repository.
If you'd like to contribute to the development of the extension, feel free to submit a pull request with your changes.
This extension is licensed under the MIT License.
