Python Forum
BeautifulSoup4, How to get an HTML tag with specific class.
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
BeautifulSoup4, How to get an HTML tag with specific class.
#1
I have HTML code like the following from a URL:
<img class="this" alt="this" src="this_source1.gif">
<img class="this" alt="this" src="this_source2.gif">
<img class="this" alt="this" src="this_source3.gif">
<img class="this and that" alt="not this" src="this__and_that_source1.gif">
<img class="this and that" alt="not this" src="this__and_that_source2.gif">
<img class="this and that" alt="not this" src="this__and_that_source3.gif">

I'm trying to get the alt value of just the img tags with only class="this"

import requests from bs4 import BeautifulSoup url = "https://someurl.com" resp = requests.get(url) txt = resp.text soup = BeautifulSoup(txt, 'lxml') imgThis = soup.find_all('img', class_='this') for i in (imgThis):	imgThis[i]['alt']
The find_all method returns alts for both class_="this" and class_="this and that"

How do I specify only to return class_="this"?
Reply
#2
I have HTML code like the following from a URL:
<img class="this" alt="this" src="this_source1.gif">
<img class="this" alt="this" src="this_source2.gif">
<img class="this" alt="this" src="this_source3.gif">
<img class="this and that" alt="not this" src="this__and_that_source1.gif">
<img class="this and that" alt="not this" src="this__and_that_source2.gif">
<img class="this and that" alt="not this" src="this__and_that_source3.gif">

I'm trying to get the alt strings of img tags with specifically class="this"

import requests from bs4 import BeautifulSoup url = 'https://someurl.com' resp = requests.get(url) txt = resp.text soup = BeautifulSoup(txt, 'lxml') imgThis = soup.find_all('img', class_='this') for i in (imgThis):	imgThis[i]['alt']
The find_all method returns matches for both class_="this" and class_="this and that"

Output:
this this this this and that this and that this and that
How do I specify only to return class_="this"?
Reply
#3
for example,
<img class="this" alt="this" src="this_source1.gif">
use:
 source1 = soup.find('img', {'class': 'this'})
Reply
#4
Thank you Larz.

I did try:

test = soup.find('img', {'class': 'this'})
But that returned just the first instance of <img class="this
Which happened to be a <img class="this and that"

and
test = soup.find_all('img', {'class': 'this'}) [python] returns all img tags with class="this" and class="this and that" [hr] and [python] test = soup.find_all('img', {'class': 'this'})
returns all img tags with class="this" and class="this and that"

...and

test = soup.find_all('img', {'class': 'this'})
returns all img tags with class="this" and class="this and that"
Reply
#5
If you really must use bs4, I would use its CSS selector support and stay away from the weird find/find_all api.
This is one way to achieve what you want:
soup.select('img[class="this"]')
In general, I'd recommend using lxml instead of bs4 for pretty much anything.
Reply
#6
Thanks stranac!

That seems to have done the trick.

It's a shame the BeautifulSoup documentation is less than optimal!
Reply
#7
Edit this is merge of Threads,so my answer is same as @stranac.
-----
Can use CSS selectors to match the exact class name.
from bs4 import BeautifulSoup html = '''\ <img class="this" alt="this" src="this_source1.gif"> <img class="this" alt="this" src="this_source2.gif"> <img class="this" alt="this" src="this_source3.gif"> <img class="this and that" alt="not this" src="this__and_that_source1.gif"> <img class="this and that" alt="not this" src="this__and_that_source2.gif"> <img class="this and that" alt="not this" src="this__and_that_source3.gif">''' soup = BeautifulSoup(html, 'lxml') only_this = soup.select('img[class="this"]')
Test:
>>> only_this [<img alt="this" class="this" src="this_source1.gif"/>, <img alt="this" class="this" src="this_source2.gif"/>, <img alt="this" class="this" src="this_source3.gif"/>] >>> [i.get('src') for i in only_this] ['this_source1.gif', 'this_source2.gif', 'this_source3.gif']
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Div Class HTML selector in Python Artur 1 1,930 Mar-28-2024, 09:46 AM
Last Post: StevenSnyder
  Beautifulsoup4 help samuelbachorik 1 2,419 Feb-05-2022, 10:44 PM
Last Post: snippsat
  Python Obstacles | Karate | HTML/Scrape Specific Tag and Store it in MariaDB BrandonKastning 8 5,793 Nov-22-2021, 01:38 AM
Last Post: BrandonKastning
  HTML multi select HTML listbox with Flask/Python rfeyer 0 7,075 Mar-14-2021, 12:23 PM
Last Post: rfeyer
  Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to write 3 Columns to MariaDB? BrandonKastning 21 13,646 Mar-23-2020, 05:51 PM
Last Post: ndc85430
  Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row BrandonKastning 0 3,676 Mar-22-2020, 06:10 AM
Last Post: BrandonKastning
  How to get the href value of a specific word in the html code julio2000 2 5,182 Mar-05-2020, 07:50 PM
Last Post: julio2000
  BeautifulSoup4 plugin help Lathem01 2 3,270 Feb-16-2020, 11:56 AM
Last Post: snippsat
  Web crawler extracting specific text from HTML lewdow 1 4,678 Jan-03-2020, 11:21 PM
Last Post: snippsat
  How do I extract specific lines from HTML files before and after a word? glittergirl 1 6,506 Aug-06-2019, 07:23 AM
Last Post: fishhook

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020
This forum uses Lukasz Tkacz MyBB addons.
Forum use Krzysztof "Supryk" Supryczynski addons.