Contents: intro, imports, what will be scraped, process, code, links, outro.
Intro
Well, hello there to people who came from the last Bing series! This blog post is a continuation of Bing's web scraping series and contains info about how to scrape Bing News results using Python. An alternative solution will be shown after the first block of code.
Imports
import requests import lxml from bs4 import BeautifulSoup from serpapi import GoogleSearch
What will be scraped
Process
The process is straight-forward. SelectorGadget Chrome extension was to grab CSS
selectors.
The following GIF illustrates how to get CSS
selectors of the Title, URL, Snippet, Source website, and when news has been posted.
Code
from bs4 import BeautifulSoup import requests, lxml headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582" } html = requests.get('https://www.bing.com/news/search?q=faze+clan', headers=headers) soup = BeautifulSoup(html.text, 'lxml') for result in soup.select('.card-with-cluster'): title = result.select_one('.title').text link = result.select_one('.title')['href'] snippet = result.select_one('.snippet').text source = result.select_one('.source a').text date_posted = result.select_one('#algocore span+ span').text print(f'{title}\n{link}\n{source}\n{date_posted}\n{snippet}\n') # part of the output: ''' FaZe Clan shows off new execute for Mirage against Furia Esports https://win.gg/news/8521/faze-clan-shows-off-new-execute-for-mirage-against-furia-esports WIN.gg 2h During a match against Team Furia in the Gamers Without Borders Cup, the camera spotted an interesting interaction between ... '''
Using Bing News Engine Results API
SerpApi is a paid API with a free trial of 5,000 searches.
from serpapi import GoogleSearch import json params = { "api_key": "YOUR_API_KEY", "engine": "bing_news", "q": "faze clan" } search = GoogleSearch(params) results = search.get_dict() for result in results['organic_results']: print(json.dumps(result, indent=2, ensure_ascii=False)) # part of the output: ''' { "title": "FaZe Clan shows off new execute for Mirage against Furia Esports", "link": "https://win.gg/news/8521/faze-clan-shows-off-new-execute-for-mirage-against-furia-esports", "snippet": "During a match against Team Furia in the Gamers Without Borders Cup, the camera spotted an interesting interaction between ...", "source": "WIN.gg", "date": "2h", "thumbnail": "https://serpapi.com/searches/60d82f308ccee022b4ab7525/images/62e054f4209c882415dd75f5245f96d23bd4c1538d707fb513a0918671c831d7.jpeg" } '''
Link
Code in the online IDE • Bing News Engine Results API
Outro
If you have any questions or something isn't working correctly or you want to write something else, feel free to drop a comment in the comment section or via Twitter at @serp_api.
Yours,
Dimitry, and the rest of SerpApi Team.
Top comments (0)