When web scraping, we might need to represent scrape HTML data as plain text. For this we can use BeautifulSoup's get_text()
method which extracts all visible HTML text and most importantly ignores invisible details such as <script>
elements:
from bs4 import BeautifulSoup soup = BeautifulSoup(""" <body> <article> <h1>Article title</h1> <p>first paragraph and a <a>link</a></p> <script>var invisible="javascript variable";</script> </article> </body> """) # if possible it's best to restrict html to a specific element element = soup.find('article') text = element.get_text() print(text) """ Article title first paragraph and a link """