|
1 | | -usage: git [-v | --version] [-h | --help] [-C <path>] [-c <name>=<value>] |
2 | | - [--exec-path[=<path>]] [--html-path] [--man-path] [--info-path] |
3 | | - [-p | --paginate | -P | --no-pager] [--no-replace-objects] [--bare] |
4 | | - [--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>] |
5 | | - [--super-prefix=<path>] [--config-env=<name>=<envvar>] |
6 | | - <command> [<args>] |
7 | | - |
8 | | -These are common Git commands used in various situations: |
9 | | - |
10 | | -start a working area (see also: git help tutorial) |
11 | | - clone Clone a repository into a new directory |
12 | | - init Create an empty Git repository or reinitialize an existing one |
13 | | - |
14 | | -work on the current change (see also: git help everyday) |
15 | | - add Add file contents to the index |
16 | | - mv Move or rename a file, a directory, or a symlink |
17 | | - restore Restore working tree files |
18 | | - rm Remove files from the working tree and from the index |
19 | | - |
20 | | -examine the history and state (see also: git help revisions) |
21 | | - bisect Use binary search to find the commit that introduced a bug |
22 | | - diff Show changes between commits, commit and working tree, etc |
23 | | - grep Print lines matching a pattern |
24 | | - log Show commit logs |
25 | | - show Show various types of objects |
26 | | - status Show the working tree status |
27 | | - |
28 | | -grow, mark and tweak your common history |
29 | | - branch List, create, or delete branches |
30 | | - commit Record changes to the repository |
31 | | - merge Join two or more development histories together |
32 | | - rebase Reapply commits on top of another base tip |
33 | | - reset Reset current HEAD to the specified state |
34 | | - switch Switch branches |
35 | | - tag Create, list, delete or verify a tag object signed with GPG |
36 | | - |
37 | | -collaborate (see also: git help workflows) |
38 | | - fetch Download objects and refs from another repository |
39 | | - pull Fetch from and integrate with another repository or a local branch |
40 | | - push Update remote refs along with associated objects |
41 | | - |
42 | | -'git help -a' and 'git help -g' list available subcommands and some |
43 | | -concept guides. See 'git help <command>' or 'git help <concept>' |
44 | | -to read about a specific subcommand or concept. |
45 | | -See 'git help git' for an overview of the system. |
| 1 | + |
| 2 | +# Web Scraping with Selenium WebDriver |
| 3 | + |
| 4 | +This repository contains a web scraping tool that utilizes **Selenium WebDriver** with the latest version of FireFox to scrape data from the web. The tool supports proxy rotating and manual user agents for additional privacy and flexibility. |
| 5 | +This is the initial build of script using proxy rotating, user agents and other techniques to stay anonymous. |
| 6 | +I am know this is the not professional script but will be useful for moderate scraping. |
| 7 | +## Requirements |
| 8 | + |
| 9 | +The following dependencies are required to run the web scraping tool: |
| 10 | + |
| 11 | +- Python 3.x |
| 12 | +- Selenium WebDriver |
| 13 | +- geckodriver (for FireFox) |
| 14 | +- Requests (for sending HTTP requests) |
| 15 | +- Random (for parsing HTML) |
| 16 | +- Time (for rotating proxies) |
| 17 | +- Beautiful Soup (for parsing HTML) |
| 18 | + |
| 19 | +You can install the dependencies using pip: |
| 20 | + |
| 21 | +####pip install selenium requests beautifulsoup4 |
| 22 | + |
| 23 | +## Usage |
| 24 | + |
| 25 | +To use the web scraping tool, just go the main file do whatever you want, but make sure you use the `Anonymous` class for making the object of WebDriver. `Anonymous` class will do work on his behalf. The `main.py` file should contain the following information: |
| 26 | + |
| 27 | +- `base_url`: the base URL of the website you want to scrape |
| 28 | +- `search_query`: the search query to be used to fetch data |
| 29 | + |
| 30 | +The `Anonymous.py` file should contain the following information: |
| 31 | +- `proxies`: a list of proxy servers to be used for scraping. |
| 32 | +- `user_agents`: a list of user agents to be used for scraping. |
| 33 | +- `setup_webdriver`: create a web driver with desired capabilities and options. |
| 34 | + |
| 35 | +All the data like proxies and user agents stored in text files (`Data Folder`), you can edit it if you want, before running actual work program will ask to download again the proxies to increase efficiency and speed. |
| 36 | + |
| 37 | +- `proxies`: up http proxies downloaded from `geonode.com` |
| 38 | + |
| 39 | + |
| 40 | +To start the web scraping tool, run the following file: |
| 41 | + |
| 42 | +###python main.py |
| 43 | + |
| 44 | +You can add additional command line arguments as needed. |
| 45 | + |
| 46 | +## Contact |
| 47 | + |
| 48 | +If you have any questions or issues, please contact the author at [hammadrafique029@gmail.com](mailto:hammadrafique029@gmail.com) or [codingmagician0@gmail.com](mailto:codingmagician0@gmail.com). |
0 commit comments