How to turn HTML to text in Python?

by scrapecrow Oct 31, 2022

When web scraping, we might need to represent scrape HTML data as plain text. For this we can use BeautifulSoup's get_text() method which extracts all visible HTML text and most importantly ignores invisible details such as <script> elements:

from bs4 import BeautifulSoup soup = BeautifulSoup(""" <body>  <article>  <h1>Article title</h1>  <p>first paragraph and a <a>link</a></p>  <script>var invisible="javascript variable";</script>  </article> </body> """) # if possible it's best to restrict html to a specific element element = soup.find('article') text = element.get_text() print(text) """ Article title first paragraph and a link """

How to Parse Web Data with Python and Beautifulsoup

Beautifulsoup is one the most popular libraries in web scraping. In this tutorial, we'll take a hand-on overview of how to use it, what is it good for and explore a real -life web scraping example.

BEAUTIFULSOUP

DATA-PARSING

PYTHON