DEV Community

Cover image for Top Universities Dataset
Saumitra Jagdale
Saumitra Jagdale

Posted on

Top Universities Dataset

Introduction

Dataset of World University Rankings of past 8 years along with the parameters like Quality of Staff, Alumni Employment, Publications, Citations, etc. I used beautiful soup parser to extract data from html source code, which I obtained using requests package in python.

You can find the folder of dataset on Kaggle : https://www.kaggle.com/saumitrajagdale/university-rankings

Scraping Data

I scraped this data from the official rankings page provided by CWUR.
URL: https://cwur.org/

The dependecy packages which I use were:

  • Pandas: For keeping data in dataframe format.
  • Beautiful Soup (bs4): For parsing HTML source code.
  • Requests: For obtaining the source code of a given url.
  • Numpy: For basic array operations.

Code Snippet For Scraping:

# Dependencies import pandas as pd import bs4 import urllib.request import numpy as np # Obtaining source code from the url url ="https://cwur.org/2012.php" url_contents = urllib.request.urlopen(url).read() # Parsing the HTML source code soup = bs4.BeautifulSoup(url_contents, "html.parser") # Extracting the data according to the HTML tags rows=[] r=soup.findAll("tr") for i in range(1,len(r)): temp=r[i].findAll("td") row=[] for j in range(0,len(temp)): if j==0: s=str(temp[j]) s=s[4:] s=s[:-5] row.append(s) else: s=str(temp[j]) s=s[4:] s=s[:-5] row.append(s) print(row) rows.append(row) # Converting data into dataframe usings pandas df=pd.DataFrame(rows,columns=["World Rank","University","Location","National Rank", "Quality of Education", "Alumni Employment", "Quality of Faculty", "Publications", "Influence", "Citations", "Patents","Score"]) print(df) # Creating csv file from the dataframe df.to_csv("University_Ranks_2012.csv") 
Enter fullscreen mode Exit fullscreen mode

Scope of Analysis

This Dataset can be used for following analysis:

  1. To find the most significant and weighted parameter affecting the ranks of Universities
  2. To find the trend of rankings of past 8 years based on the parameters provided as columns in dataset.
  3. To visualise the ranking rise and fall of a particular university with rankings as y- axis and years as x-axis. [Line Graphs]

Top comments (0)