Introduction:
Keeping a website up and running smoothly is crucial for any business or service in today’s digital world. Website health check alerts help by automatically notifying you when something goes wrong like if your site goes down, slows down, or becomes inaccessible. These alerts allow teams to respond quickly, often before users even notice there's an issue. The basic idea is simple: regularly check the website’s status (like uptime and response speed), and if anything looks off, send out an alert. Using AWS services, this can be done in a smart, automated way. AWS Lambda can run the health checks, EventBridge can schedule and manage when those checks happen, and SNS (Simple Notification Service) can send alerts via email or text. Together, these tools create a reliable, serverless system to keep your website healthy and your users happy.
Why Lambda?
We choose AWS Lambda for website health check alerts because it offers a serverless, scalable, and cost-effective solution that perfectly fits this kind of lightweight, periodic task. Here are the main reasons:
1- Lambda runs your code without requiring you to provision or manage servers. You just write the health check logic, and AWS handles the rest.
2- Whether you’re monitoring one website or hundreds, Lambda automatically scales to handle as many checks as needed.
3- You only pay for the compute time your function actually uses. Since health checks are usually quick and infrequent (e.g., every 5 minutes), this makes Lambda a very cost-effective option.
4- Lambda integrates seamlessly with services like:
- Amazon EventBridge to schedule regular health checks
- Amazon SNS to send alerts via email or SMS
- Amazon CloudWatch for logging and monitoring the function’s performance
Task Definition:
Create an AWS Lambda function that performs periodic health checks on multiple live websites. The function should:
-
Perform Health Checks:
Ping each website and verify if it is accessible (e.g., by checking for a successful HTTP response status code like 200 OK).
-
Trigger Every 3 Minutes:
Use Amazon EventBridge to schedule the Lambda function to run every 3 minutes automatically.
Send Alerts for Failures:
If any website fails the health check (e.g., is unreachable, slow, or returns an error status code), the function should:
- Publish a detailed alert to an Amazon SNS topic
- The SNS topic should send a notification email to all subscribed recipients with information about the failed website(s)
Configuration Steps:
1. Creating Lambda Function:
- Log in to your AWS Console.
- Go to AWS Lambda and create a function.
- While creating a function, it’ll also create an IAM role with basic Lambda permissions.
2. Setup CloudWatch Alarm:
- Create a CloudWatch alarm
- Choose metric Lambda —> By Function Name —> You_lambda_function_name - Invocations (metric function name).
- Select or create an SNS Topic on the next stage. If creating add your email address on which you want to receive the alerts.
- Add Lambda action and select your created Lambda function.
3. Create your Lambda Function Code and upload it to the Lambda Console:
- Create and deploy the function. Test it for verification. You can verify it in the logs found in CloudWatch logs.
import urllib3 import boto3 import os # Create a PoolManager with default SSL verification http = urllib3.PoolManager() # Create SNS client sns = boto3.client('sns') # Website list URLs = [ "https://www.abc1.com", "https://www.abc2.com”, "https://www.abc3.com”, ……… ] # Get SNS topic ARN from environment variable snsTopicArn = os.environ['SNS_TOPIC_ARN'] # Standard headers to avoid being blocked HEADERS = { 'User-Agent': 'Mozilla/5.0 (compatible; WebsiteHealthBot/1.0)', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept-Language': 'en-US,en;q=0.5' } def lambda_handler(event, context): for URL in URLs: try: # Set higher timeout for known slow-loading domains if "algurgrealestate.com" in URL or "e11logistics.com" in URL: timeout_value = 20.0 else: timeout_value = 10.0 # Make the GET request response = http.request( 'GET', URL, headers=HEADERS, timeout=timeout_value, retries=False # prevent auto-retries, handle manually if needed ) # Only alert if the site is not returning HTTP 200 OK if response.status != 200: message = f"ALERT: Website {URL} is down. Status code: {response.status}" sns.publish( TopicArn=snsTopicArn, Message=message, Subject="Website Health Alert" ) except Exception as e: # If the request fails (e.g., timeout, DNS, connection error), alert message = f"ALERT: Website {URL} check failed with error: {str(e)}" sns.publish( TopicArn=snsTopicArn, Message=message, Subject="Website Health Alert" )
Code Explanation:
`import urllib3 import boto3 import os`
-
urllib3
: Used to make HTTP requests to websites. -
boto3
: AWS SDK for Python, used here to send alerts via SNS. -
os
: Used to access environment variables (like the SNS topic ARN).
http = urllib3.PoolManager()
- Initializes an HTTP connection pool manager with SSL verification enabled.
- Manages connections efficiently (reuse, persistent sessions, etc.).
sns = boto3.client('sns')
- Creates an SNS client used to publish alerts when a website is down.
snsTopicArn = os.environ['SNS_TOPIC_ARN']
- Retrieves the SNS topic ARN from environment variables configured in Lambda settings.
- Helps avoid hardcoding sensitive resource identifiers in code.
HEADERS = { 'User-Agent': 'Mozilla/5.0 ...', 'Accept': ..., 'Accept-Language': ... }
- Prevents some websites from blocking or misidentifying the health check as a bot or attack.
- Makes the request look like it's coming from a normal browser.
def lambda_handler(event, context): for URL in URLs:
- This is the entry point for AWS Lambda.
- Loops through every website URL in the list.
if "abc1.com" in URL or "abc2.com" in URL: timeout_value = 20.0 else: timeout_value = 10.0
- Sets longer timeout for specific slow websites (
abc1.com
,abc2.com
). - Others have a 10-second timeout.
response = http.request( 'GET', URL, headers=HEADERS, timeout=timeout_value, retries=False )
- Sends an HTTP GET request to the current website.
- No retries (
retries=False
) — if it fails, it will go to theexcept
block. - Uses custom headers and the specified timeout.
if response.status != 200: message = f"ALERT: Website {URL} is down..." sns.publish(...)
- If the site returns anything other than
200 OK
, it's considered "down". - Sends an SNS alert with a clear subject and message.
except Exception as e: message = f"ALERT: Website {URL} check failed with error: {str(e)}" sns.publish(...)
- If the request throws an error (e.g., timeout, DNS issue), it will catch the exception.
- Publishes a different SNS alert indicating the reason for failure
4. Create an Environment variable for SNS Topic:
5. Add EvenBridge trigger and SNS Destination:
Now your function will be triggered every 3 minutes. If any health check of any website fails, Lambda will trigger CloudWatch and generate an alarm notification to the subscribed email.
Top comments (0)