The project aims to build a search engine for EncyclEarthpedia by retrieving and processing content from Wikipedia articles, despite the unavailability of their database and API. Key tasks include retrieving Wikipedia content, cleaning and processing text data, tokenizing the content, counting token frequency, and visualizing the mostfrequenttokens
data-visualization text-processing regular-expressions data-cleaning frequency-analysis tokenization json-data-handling wikipedia-api-usage pandas-dataframe-manipulation seaborn-plotting
- Updated
Aug 7, 2023 - Jupyter Notebook