Python Program to Check if an URL is valid or not using Regular Expression

Python Program to Check if an URL is valid or not using Regular Expression

Here's a tutorial on how to check if a URL is valid using regular expressions in Python.

1. Introduction:

Validating URLs can be somewhat complex due to the various parts a URL can contain, such as protocol, subdomain, domain name, TLD (Top Level Domain), port, path, query parameters, and fragment.

However, for the sake of this tutorial, we'll use a basic approach to validate common URLs. This method won't be exhaustive, but it will cover a majority of common URL structures.

2. Using Python's re Module:

Python's re module provides functionalities to work with regular expressions. We'll use this to match the given URL against our pattern.

3. Regular Expression for URL Validation:

pattern = re.compile( r'http[s]?://' # http:// or https:// r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|' # domain r'localhost|' # localhost r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})' # ...or ip r'(?::\d+)?' # optional port r'(?:/?|[/?]\S+)$', re.IGNORECASE) 

4. URL Validation Function:

import re def is_valid_url(url): pattern = re.compile( r'http[s]?://' r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|' r'localhost|' r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})' r'(?::\d+)?' r'(?:/?|[/?]\S+)$', re.IGNORECASE) return re.match(pattern, url) is not None 

5. Testing the Function:

Let's test the function with some sample URLs:

urls = [ "https://www.google.com", "http://localhost:8000/path/to/resource?query=value#fragment", "ftp://files.server.com", "not-a-url" ] for u in urls: print(f"'{u}' is a valid URL: {is_valid_url(u)}") 

6. Notes:

  • The regular expression checks for common parts of a URL, but it's not exhaustive. For instance, it doesn't validate ftp:// URLs as shown in the test. If you need to support more URL types or specific structures, you might need to adjust the regex pattern.

  • The above pattern considers localhost and common IP formats as valid. You can adjust based on your requirements.

Summary:

In this tutorial, we demonstrated how to validate URLs using regular expressions in Python. While this method works for most common URLs, always remember that validating URLs can be complex, and a more comprehensive approach may be needed based on specific requirements.


More Tags

spring-data-redis smartcard right-align ta-lib android-contentresolver multibranch-pipeline pipe crc16 culture reactjs-testutils

More Programming Guides

Other Guides

More Programming Examples