DEV Community

Prince
Prince

Posted on

Web Scraping with Python, getting the table of countries and country codes from countrycode.org

Ever needed some data that is sitting on a webpage you cannot easily copy and paste from?
I'm going to show you how to get data from webpages (webscraping, essentially) with Python. Specifically we'll be getting the countries, iso codes and phone number codes in the table found on this page

Before we get into the code proper, we need to install some Python packages first (requests and beautifulsoup4)
pip install requests beautifulsoup4

Follow the following steps.

Import required packages

from bs4 import BeautifulSoup import json import requests 
Enter fullscreen mode Exit fullscreen mode

Get the webpages content using the requests package

url = "https://countrycode.org/" r = requests.get(url) r.raise_for_status() 
Enter fullscreen mode Exit fullscreen mode

The last line raises an exception if the request's response code is not a successful one, thus stopping the program.

Create the 'soup' and select all the rows found in the table's body. This 'soup' object helps us to get particular elements from the HTML page's content.

soup = BeautifulSoup(r.content, 'html.parser') rows = soup.select("tbody>row") # select all the rows that are direct descendants of a tbody element 
Enter fullscreen mode Exit fullscreen mode

Get the countries from the table

list_of_countries = [] for row in rows: keys = ["name", "country_code", "iso_codes", "population", "area/km2", "gdp $USD"] # the different columns in the table country_object = {} for key in keys: country_object[key] = '' # creating a dictionary for the row for index, cell in enumerate(row.find_all('td')): # looping through the different td elements found in this row if index < len(keys): if index == 0: # get the text found in the hyperlink in the cell country_object[keys[index]] = cell.find('a').text else: # get the text found in the cell country_object[keys[index]] = cell.text list_of_countries.append(country_object) 
Enter fullscreen mode Exit fullscreen mode

Save the list to a json file

with open("countries.json", "w") as _: # replace countries.json with whatever you want json.dump(list_of_countries, _) 
Enter fullscreen mode Exit fullscreen mode

VOILA! You have successfully gotten the list of countries, their ISO and area codes, surface areas and gdp.

Top comments (0)