web-data-extraction

Here are 3 public repositories matching this topic...

SaurabhSSB / BookMiner

A pipeline to scrape, extract, and analyze book data from web pages to insights.

python books jupyter-notebook eda data-visualization web-scraping data-analysis html-parsing beautifulsoup csv-export data-pipeline web-data-extraction data-science-project project-portfolio book-dataset

Updated Sep 30, 2025
HTML

wbsg-uni-mannheim / wdc-page

Star

This repository contains the source files of the Web Data Commons website and is used to maintain the site. The Web Data Commons project extracts structured data from the Common Crawl

web-data-extraction

Updated Jan 13, 2025
HTML

dariga-sm / Word-Frequency-in-Moby-Dick

Star

Scrape the novel Moby Dick from the website Project Gutenberg using the Python package requests. Then you'll extract words from this web data using BeautifulSoup. Finally, we'll dive into analyzing the distribution of words using the Natural Language ToolKit (nltk)

python requests beautifulsoup nlp-machine-learning case-study web-data-extraction

Updated Oct 21, 2019
HTML

Improve this page

Add a description, image, and links to the web-data-extraction topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the web-data-extraction topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

web-data-extraction

Here are 3 public repositories matching this topic...

SaurabhSSB / BookMiner

wbsg-uni-mannheim / wdc-page

dariga-sm / Word-Frequency-in-Moby-Dick

Improve this page

Add this topic to your repo