Introduction
Dataset of World University Rankings of past 8 years along with the parameters like Quality of Staff, Alumni Employment, Publications, Citations, etc. I used beautiful soup parser to extract data from html source code, which I obtained using requests package in python.
You can find the folder of dataset on Kaggle : https://www.kaggle.com/saumitrajagdale/university-rankings
Scraping Data
I scraped this data from the official rankings page provided by CWUR.
URL: https://cwur.org/
The dependecy packages which I use were:
- Pandas: For keeping data in dataframe format.
- Beautiful Soup (bs4): For parsing HTML source code.
- Requests: For obtaining the source code of a given url.
- Numpy: For basic array operations.
Code Snippet For Scraping:
# Dependencies import pandas as pd import bs4 import urllib.request import numpy as np # Obtaining source code from the url url ="https://cwur.org/2012.php" url_contents = urllib.request.urlopen(url).read() # Parsing the HTML source code soup = bs4.BeautifulSoup(url_contents, "html.parser") # Extracting the data according to the HTML tags rows=[] r=soup.findAll("tr") for i in range(1,len(r)): temp=r[i].findAll("td") row=[] for j in range(0,len(temp)): if j==0: s=str(temp[j]) s=s[4:] s=s[:-5] row.append(s) else: s=str(temp[j]) s=s[4:] s=s[:-5] row.append(s) print(row) rows.append(row) # Converting data into dataframe usings pandas df=pd.DataFrame(rows,columns=["World Rank","University","Location","National Rank", "Quality of Education", "Alumni Employment", "Quality of Faculty", "Publications", "Influence", "Citations", "Patents","Score"]) print(df) # Creating csv file from the dataframe df.to_csv("University_Ranks_2012.csv")
Scope of Analysis
This Dataset can be used for following analysis:
- To find the most significant and weighted parameter affecting the ranks of Universities
- To find the trend of rankings of past 8 years based on the parameters provided as columns in dataset.
- To visualise the ranking rise and fall of a particular university with rankings as y- axis and years as x-axis. [Line Graphs]
Top comments (0)