How can BeautifulSoup be used to extract ‘href’ links from a website?



BeautifulSoup is a third party Python library that is used to parse data from web pages. It helps in web scraping, which is a process of extracting, using, and manipulating the data from different resources.

Web scraping can also be used to extract data for research purposes, understand/compare market trends, perform SEO monitoring, and so on.

The below line can be run to install BeautifulSoup on Windows −

pip install beautifulsoup4

Following is an example −

Example

from bs4 import BeautifulSoup import requests url = "https://en.wikipedia.org/wiki/Algorithm" req = requests.get(url) soup = BeautifulSoup(req.text, "html.parser") print("The href links are :") for link in soup.find_all('a'):    print(link.get('href'))

Output

The href links are : … https://stats.wikimedia.org/#/en.wikipedia.org https://foundation.wikimedia.org/wiki/Cookie_statement https://wikimediafoundation.org/ https://www.mediawiki.org/

Explanation

  • The required packages are imported, and aliased.

  • The website is defined.

  • The url is opened, and data is read from it.

  • The ‘BeautifulSoup’ function is used to extract text from the webpage.

  • The ‘find_all’ function is used to extract text from the webpage data.

  • The href links are printed on the console.

Updated on: 2021-01-18T12:53:53+05:30

13K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements