Web Scraping With Python :

Project Setup

Making the project as :
```
mkdir webscraping cd webscraping 
```

Web Scraping installation:

open command prompt type pip install virtualenv create virtualenv >>virtualenv web-scraping we need to activate virtualenv for use >>web-scraping\scripts\activate need libraries for Web Scraping : pip install requests pip install beautifulsoup4 or install bs4

Create WebsiteScrap.py for development

import requests from bs4 import BeautifulSoup url = "https://www.learnpython.org/" response = requests.get(url) htmlContent = response.content formatted_html_content = BeautifulSoup(htmlContent, 'html.parser') # print(formatted_html_content) # 1} Get the title of the HTML page title = formatted_html_content.title print(title) # if you want only tag content print(title.string) # 2} find All anchor tag on this website and print count list_anchors = formatted_html_content.find_all('a') # print all anchor tags print(list_anchors) # print count print("Number of anchor tags on this website : ", len(list_anchors)) # 3} Get first element in the HTML page print(formatted_html_content.find('head')) # 4} Get classes of any element in the HTML page print(formatted_html_content.find('a')['class']) # 5} find all the elements by class name print(formatted_html_content.find_all("a", class_="navbar-brand")) # 6} Get the text from the tags/soup print(formatted_html_content.find("p").get_text()) # 7} Get all the anchor tags from the page with iteration list_anchors = formatted_html_content.find_all('a') all_links = set() for link in list_anchors: print(link) # get all anchor tag with links print(link.get('href')) # get all links all_links.add(link.get('href')) # want to remove duplicate links print(all_links) print(len(all_links)) # 8} find duplicate links all_web_links_count=len(list_anchors) after_remove_duplicate_links_count=len(all_links) print('Number of duplicate links in this website are : ',all_web_links_count-after_remove_duplicate_links_count)

In order to run app:
```
 python WebsiteScrap.py 
```

create clone in you system just execute this file

1} create virtualenv and just type below command 2} pip install -r .\requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.idea		.idea
web-scraping		web-scraping
README.md		README.md
WebsiteScrap.py		WebsiteScrap.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web Scraping With Python :

Project Setup

About

Uh oh!

Releases

Packages

Uh oh!

Languages

rahulmoundekar/webscraping-in-python

Folders and files

Latest commit

History

Repository files navigation

Web Scraping With Python :

Project Setup

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages