No Internet connection when running a Python script

basil_555 · Mar-10-2024, 09:18 AM

Hello. I have the follwoing Python code that checks a website for change. This script always gives me an error "Error checking website". What am I doing wrong?

import requests import os from bs4 import BeautifulSoup import time import logging import smtplib as smtp URL_TO_MONITOR = "https://www.yahoo.com/" #change this to the URL you want to monitor DELAY_TIME = 15 # seconds def process_html(string): soup = BeautifulSoup(string, features="lxml") # make the html look good soup.prettify() # remove script tags for s in soup.select('script'): s.extract() # remove meta tags for s in soup.select('meta'): s.extract() # convert to a string, remove '\r', and return return str(soup).replace('\r', '') def webpage_was_changed(): """Returns true if the webpage was changed, otherwise false.""" headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36', 'Pragma': 'no-cache', 'Cache-Control': 'no-cache'} response = requests.get(URL_TO_MONITOR, headers=headers) # create the previous_content.txt if it doesn't exist if not os.path.exists("previous_content.txt"): open("previous_content.txt", 'w+').close() filehandle = open("previous_content.txt", 'r') previous_response_html = filehandle.read() filehandle.close() processed_response_html = process_html(response.text) if processed_response_html == previous_response_html: return False else: filehandle = open("previous_content.txt", 'w') filehandle.write(processed_response_html) filehandle.close() return True def main(): log = logging.getLogger(__name__) logging.basicConfig(level=os.environ.get("LOGLEVEL", "INFO"), format='%(asctime)s %(message)s') log.info("Running Website Monitor") while True: try: if webpage_was_changed(): log.info("WEBPAGE WAS CHANGED.") print("The website was changed") else: log.info("Webpage was not changed.") except: log.info("Error checking website.") time.sleep(DELAY_TIME) if __name__ == "__main__": main()

**buran** · Mar-10-2024, 09:42 AM

The indentation is a mess. Fix that in your post. Also don't use bare except. Remove the try/except to get meaningful error message na debug properly

**deanhystad** · Mar-10-2024, 01:44 PM

I suggest not using try/except at all. That will provide more information about the problem.

DeaD_EyE · Mar-10-2024, 05:53 PM

import requests import os from bs4 import BeautifulSoup import time import logging import smtplib as smtp try: import lxml except ImportError: raise RuntimeError("Please install lxml") URL_TO_MONITOR = "https://www.yahoo.com/" # change this to the URL you want to monitor DELAY_TIME = 15 # seconds def process_html(string): soup = BeautifulSoup(string, features="lxml") # make the html look good soup.prettify() # remove script tags for s in soup.select("script"): s.extract() # remove meta tags for s in soup.select("meta"): s.extract() # convert to a string, remove '\r', and return return str(soup).replace("\r", "") def webpage_was_changed(): """Returns true if the webpage was changed, otherwise false.""" headers = { "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36", "Pragma": "no-cache", "Cache-Control": "no-cache", } response = requests.get(URL_TO_MONITOR, headers=headers) # create the previous_content.txt if it doesn't exist if not os.path.exists("previous_content.txt"): open("previous_content.txt", "w+").close() filehandle = open("previous_content.txt", "r") previous_response_html = filehandle.read() filehandle.close() processed_response_html = process_html(response.text) if processed_response_html == previous_response_html: return False else: filehandle = open("previous_content.txt", "w") filehandle.write(processed_response_html) filehandle.close() return True def main(): log = logging.getLogger(__name__) logging.basicConfig( level=os.environ.get("LOGLEVEL", "INFO"), format="%(asctime)s %(message)s" ) log.info("Running Website Monitor") while True: try: if webpage_was_changed(): log.info("WEBPAGE WAS CHANGED.") else: log.info("Webpage was not changed.") except Exception as e: log.exception(e) time.sleep(DELAY_TIME) if __name__ == "__main__": main()

Do not use a bare except. It suppresses programming errors. In this case lxml wasn't installed, which is required by bs4 (explicit features="lxml").

I didn't check the other stuff, just used an format tool (ruff format).

**buran** · (This post was last modified: Mar-10-2024, 07:28 PM by buran.)

(Mar-10-2024, 05:53 PM)DeaD_EyE Wrote: In this case lxml wasn't installed,

well, I don't know how you know it is not installed on OP machine

DeaD_EyE · Mar-10-2024, 08:49 PM

(Mar-10-2024, 07:28 PM)buran Wrote: well, I don't know how you know it is not installed on OP machine

I tested his code and ran into this issue. lxml was not installed.

**buran** · (This post was last modified: Mar-10-2024, 10:11 PM by buran.)

(Mar-10-2024, 08:49 PM)DeaD_EyE Wrote: I tested his code and ran into this issue. lxml was not installed.

It was not installed on YOUR machine. There is no info about OP setup - may or may not be installed, we just don't know

DeaD_EyE · Mar-10-2024, 10:49 PM

(Mar-10-2024, 10:11 PM)buran Wrote: It was not installed on YOUR machine. There is no info about OP setup - may or may not be installed, we just don't know

This is why I mentioned, that bare excepts are bad. Further, I had this issue with lxml, which is also eaten up by the bare except. The exception was raised by bs4. If his internet connection does not work, then he will see it with the modified code:

 except Exception as e: log.exception(e)

***snippsat*** · (This post was last modified: Mar-11-2024, 11:02 AM by snippsat.)

Some improvement,like time.sleep(blocking) is not the best for schedule stuff.
So schedule and loguru(great) for logging.

import requests import os from bs4 import BeautifulSoup import time from loguru import logger logger.add("log_file.log", rotation="2 days") import schedule try: from lxml import etree except ImportError: raise RuntimeError("Please install lxml with `pip install lxml`") URL_TO_MONITOR = "https://hckrnews.com/" CHECK_INTERVAL = 15 def process_html(site_content): soup = BeautifulSoup(site_content, features="lxml") # Combining tag selections for s in soup(["script", "meta"]): s.extract() return str(soup).replace("\r", "") def webpage_was_changed(): headers = { "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36", "Pragma": "no-cache", "Cache-Control": "no-cache", } response = requests.get(URL_TO_MONITOR, headers=headers) if not os.path.exists("previous_content.txt"): open("previous_content.html", "w+").close() with open("previous_content.html", "r+") as filehandle: previous_response_html = filehandle.read() processed_response_html = process_html(response.content) if processed_response_html != previous_response_html: filehandle.seek(0) filehandle.write(processed_response_html) filehandle.truncate() return True return False def check_webpage(): try: if webpage_was_changed(): logger.info("WEBPAGE WAS CHANGED.") else: logger.info("Webpage was not changed.") except Exception as e: logger.exception(e) def main(): schedule.every(CHECK_INTERVAL).seconds.do(check_webpage) logger.info("Running Website Monitor") while True: schedule.run_pending() time.sleep(1) if __name__ == "__main__": main()

Also a tips i would say that soup.prettify() is broken,make new lines in tag so dos look like standard HTML at all.
Use Prettier have a command line tool so do just prettier --write . in folder then get correct formatted HTML.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Running script from remote to server	invisiblemind	4	1,758	Mar-28-2025, 07:57 AM Last Post: buran
	Detect if another copy of a script is running from within the script	gw1500se	4	2,033	Jan-31-2025, 11:30 PM Last Post: Skaperen
	I don't know what is wrong (Python and SQL connection)	shereen	3	2,320	Apr-01-2024, 08:56 AM Last Post: Pedroski55
	Running Python script through Task Scheduler?	Winfried	8	13,748	Mar-10-2024, 07:24 PM Last Post: Winfried
	Connection LTspice-Python with PyLTSpice	bartel90	0	1,503	Feb-05-2024, 11:46 AM Last Post: bartel90
	Virtual Env changing mysql connection string in python	Fredesetes	0	1,424	Dec-20-2023, 04:06 PM Last Post: Fredesetes
	connection python and SQL	dawid294	4	2,316	Dec-12-2023, 08:22 AM Last Post: Pedroski55
	Help Running Python Script in Mac OS	emojistickers	0	1,338	Nov-20-2023, 01:58 PM Last Post: emojistickers
	Trying to make a board with turtle, nothing happens when running script	Quascia	3	2,503	Nov-01-2023, 03:11 PM Last Post: deanhystad
	Is there a .bat DOS batch script to .py Python Script converter?	pstein	3	10,051	Jun-29-2023, 11:57 AM Last Post: gologica

No Internet connection when running a Python script

User Panel Messages

Announcements