Skip to content

Commit 5ce11be

Browse files
README.md Updated
Updating the information in Readme.md file
1 parent bf9e0ec commit 5ce11be

File tree

3 files changed

+56
-103
lines changed

3 files changed

+56
-103
lines changed

README.md

Lines changed: 48 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -1,45 +1,48 @@
1-
usage: git [-v | --version] [-h | --help] [-C <path>] [-c <name>=<value>]
2-
[--exec-path[=<path>]] [--html-path] [--man-path] [--info-path]
3-
[-p | --paginate | -P | --no-pager] [--no-replace-objects] [--bare]
4-
[--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>]
5-
[--super-prefix=<path>] [--config-env=<name>=<envvar>]
6-
<command> [<args>]
7-
8-
These are common Git commands used in various situations:
9-
10-
start a working area (see also: git help tutorial)
11-
clone Clone a repository into a new directory
12-
init Create an empty Git repository or reinitialize an existing one
13-
14-
work on the current change (see also: git help everyday)
15-
add Add file contents to the index
16-
mv Move or rename a file, a directory, or a symlink
17-
restore Restore working tree files
18-
rm Remove files from the working tree and from the index
19-
20-
examine the history and state (see also: git help revisions)
21-
bisect Use binary search to find the commit that introduced a bug
22-
diff Show changes between commits, commit and working tree, etc
23-
grep Print lines matching a pattern
24-
log Show commit logs
25-
show Show various types of objects
26-
status Show the working tree status
27-
28-
grow, mark and tweak your common history
29-
branch List, create, or delete branches
30-
commit Record changes to the repository
31-
merge Join two or more development histories together
32-
rebase Reapply commits on top of another base tip
33-
reset Reset current HEAD to the specified state
34-
switch Switch branches
35-
tag Create, list, delete or verify a tag object signed with GPG
36-
37-
collaborate (see also: git help workflows)
38-
fetch Download objects and refs from another repository
39-
pull Fetch from and integrate with another repository or a local branch
40-
push Update remote refs along with associated objects
41-
42-
'git help -a' and 'git help -g' list available subcommands and some
43-
concept guides. See 'git help <command>' or 'git help <concept>'
44-
to read about a specific subcommand or concept.
45-
See 'git help git' for an overview of the system.
1+
2+
# Web Scraping with Selenium WebDriver
3+
4+
This repository contains a web scraping tool that utilizes **Selenium WebDriver** with the latest version of FireFox to scrape data from the web. The tool supports proxy rotating and manual user agents for additional privacy and flexibility.
5+
This is the initial build of script using proxy rotating, user agents and other techniques to stay anonymous.
6+
I am know this is the not professional script but will be useful for moderate scraping.
7+
## Requirements
8+
9+
The following dependencies are required to run the web scraping tool:
10+
11+
- Python 3.x
12+
- Selenium WebDriver
13+
- geckodriver (for FireFox)
14+
- Requests (for sending HTTP requests)
15+
- Random (for parsing HTML)
16+
- Time (for rotating proxies)
17+
- Beautiful Soup (for parsing HTML)
18+
19+
You can install the dependencies using pip:
20+
21+
####pip install selenium requests beautifulsoup4
22+
23+
## Usage
24+
25+
To use the web scraping tool, just go the main file do whatever you want, but make sure you use the `Anonymous` class for making the object of WebDriver. `Anonymous` class will do work on his behalf. The `main.py` file should contain the following information:
26+
27+
- `base_url`: the base URL of the website you want to scrape
28+
- `search_query`: the search query to be used to fetch data
29+
30+
The `Anonymous.py` file should contain the following information:
31+
- `proxies`: a list of proxy servers to be used for scraping.
32+
- `user_agents`: a list of user agents to be used for scraping.
33+
- `setup_webdriver`: create a web driver with desired capabilities and options.
34+
35+
All the data like proxies and user agents stored in text files (`Data Folder`), you can edit it if you want, before running actual work program will ask to download again the proxies to increase efficiency and speed.
36+
37+
- `proxies`: up http proxies downloaded from `geonode.com`
38+
39+
40+
To start the web scraping tool, run the following file:
41+
42+
###python main.py
43+
44+
You can add additional command line arguments as needed.
45+
46+
## Contact
47+
48+
If you have any questions or issues, please contact the author at [hammadrafique029@gmail.com](mailto:hammadrafique029@gmail.com) or [codingmagician0@gmail.com](mailto:codingmagician0@gmail.com).

Scripts/anonymous_techniques.py

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
from selenium import webdriver
21
from proxies import *
32
import random
43

@@ -43,12 +42,11 @@ def setup_agents(self):
4342
except Exception as e:
4443
print("\n\tGOT ERROR IN DEFINING USER AGENTS! ERROR BELOW:\n\t" + str(e))
4544

46-
def setup_webDriver(self, Url):
45+
def setup_webDriver(self):
4746
try:
4847
self.setup_proxies()
4948
driver = webdriver.Firefox(desired_capabilities=self.setup_desired_capabilities(),
5049
options=self.setup_agents())
51-
driver.get(Url)
52-
driver.close()
50+
return driver
5351
except Exception as e:
5452
print("\n\tGOT ERROR IN DEFINING WEB DRIVER! ERROR BELOW:\n\t" + str(e))

Scripts/main.py

Lines changed: 6 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -1,58 +1,10 @@
11
import time
22

3-
from bs4 import BeautifulSoup
4-
from selenium import webdriver
5-
from selenium.webdriver.common.keys import Keys
6-
7-
from proxies import *
8-
import random
93
from anonymous_techniques import *
104

11-
12-
urls = ["https://www.google.com", "https://www.bing.com"]
13-
driver = webdriver.Firefox()
14-
driver.get(urls[0])
15-
16-
driver.execute_script(f"window.open('{urls[1]}', '_blank');")
17-
driver.switch_to.window(driver.window_handles[-1])
18-
19-
20-
21-
22-
23-
24-
25-
26-
27-
# get_proxies = FreeProxies()
28-
# print(get_proxies.verify_proxies())
29-
30-
31-
# # Initialize the Selenium webdriver
32-
# driver = webdriver.Firefox()
33-
#
34-
# # Use the webdriver to open a website
35-
# driver.get("https://www.google.com")
36-
#
37-
# # Get the HTML content of the page
38-
# html_content = driver.page_source
39-
#
40-
# # Use Beautiful Soup to parse the HTML content
41-
# soup = BeautifulSoup(html_content, 'html.parser')
42-
# print(soup.prettify())
43-
#
44-
#
45-
#
46-
# # Find elements in the HTML content
47-
# elements = soup.find_all('div', class_='example-class')
48-
#
49-
# # Extract data from the elements
50-
# data = []
51-
# for element in elements:
52-
# data.append(element.text)
53-
#
54-
# # Close the webdriver
55-
# driver.quit()
56-
#
57-
# # Print the extracted data
58-
# print(data)
5+
obj = Anonymous()
6+
url = "https://www.goole.com"
7+
driver = obj.setup_webDriver()
8+
driver.get(url)
9+
time.sleep(5)
10+
driver.close()

0 commit comments

Comments
 (0)