Contents list - Python Beautifulsoup

Contents list - Python Beautifulsoup

In BeautifulSoup, the contents attribute of a tag returns a list of its children. Each item in the list is a BeautifulSoup object representing either a tag or a NavigableString (text). This can be useful when you want to navigate through different parts of an HTML or XML document.

Here��s a simple guide on how to use the contents attribute in BeautifulSoup:

Step 1: Install BeautifulSoup

If you haven't installed BeautifulSoup, you can do so using pip:

pip install beautifulsoup4 

Step 2: Import Libraries

Import BeautifulSoup and requests or another library for loading the HTML content.

from bs4 import BeautifulSoup import requests 

Step 3: Load HTML Content

You can load HTML content from a web page or use a string of HTML.

  • From a Web Page:

    url = 'http://example.com' response = requests.get(url) html_content = response.content 
  • From a String:

    html_content = ''' <html> <head><title>Test Page</title></head> <body> <p>Paragraph 1</p> <p>Paragraph 2</p> <ul> <li>Item 1</li> <li>Item 2</li> </ul> </body> </html> ''' 

Step 4: Parse HTML with BeautifulSoup

Create a BeautifulSoup object by parsing the HTML content.

soup = BeautifulSoup(html_content, 'html.parser') 

Step 5: Access Contents of a Tag

Use the contents attribute to access a tag��s children.

# Access the contents of the body tag body_tag = soup.body children = body_tag.contents print("Children of the <body> tag:") for child in children: print(child if child.name else str(child).strip()) 

This script will list the children of the <body> tag, including tags and text.

Step 6: Working with the Contents

You can iterate over the list, index into it, and use it just like any other list in Python.

  • Iterating Over Children:

    for child in body_tag.contents: print(child) 
  • Accessing Specific Children:

    first_child = body_tag.contents[0] print(first_child) 

Note

  • The contents list includes both tags and text nodes. Text nodes are of type NavigableString.
  • If you only want to iterate over tags, you might prefer using .children instead of .contents.
  • Remember that contents will return an empty list if a tag has no children.
  • Ensure that your web scraping activities comply with the terms and conditions of the website and relevant laws.

More Tags

local pdf powermockito melt sql-loader xcode9 reverse-engineering tls1.2 argmax put

More Programming Guides

Other Guides

More Programming Examples