Jupyter Notebook Pyspark

Open-source Jupyter Notebook projects categorized as Pyspark

Top 14 Jupyter Notebook Pyspark Projects

  1. pyspark-tutorial

    PySpark-Tutorial provides basic algorithms using PySpark

  2. Stream

    Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.

    Stream logo
  3. WallStreetBets_BigDataAnalysis

    Research project aimed to classify the best stock research posts from r/WallStreetBets for you. 😏

  4. pyspark-tutorial

    PySpark Tutorial for Beginners - Practical Examples in Jupyter Notebook with Spark version 3.4.1. The tutorial covers various topics like Spark Introduction, Spark Installation, Spark RDD Transformations and Actions, Spark DataFrame, Spark SQL, and more. It is completely free on YouTube and is beginner-friendly without any prerequisites. (by coder2j)

  5. anovos

    Anovos - An Open Source Library for Scalable feature engineering Using Apache-Spark

  6. lasagna

    A Docker Compose template that builds a interactive development environment for PySpark with Jupyter Lab, MinIO as object storage, Hive Metastore, Trino and Kafka

  7. ESG-AI-investment-by-streamlit

    ESG-investment AI

  8. reddit-streaming

    streaming eight subreddits from reddit api using kafka producer & spark structured streaming.

  9. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  10. pyspark_nlp_workshop

    Instructions and code for the workshop "From Big Data to NLP Insights: Unlocking the Power of PySpark and Spark NLP"

  11. project-atlas-sao-paulo

    A project for the development of rich geospatial data from the city of São Paulo for use in Machine Learning models.

  12. workshop-introduction-to-machine-learning

    Come ready to discover the goals and approaches of machine learning, and how to build effective algorithms and solutions!

  13. project

    Predict how many points an European football team will end the season with, according to the characteristics of its players. Project for the Big Data Computing course at Sapienza University of Rome (2021-22) (by Big-Data-FC)

  14. synapse-azure-data-explorer-101

    Getting started with Azure Synapse and Azure Data Explorer

  15. file-format-benchmark

    benchmark script of key operations between different file formats

  16. dracula

    a brief analysis to the most common words in Dracula, by Bram Stoker (by geazi-anc)

  17. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Jupyter Notebook Pyspark discussion

Jupyter Notebook Pyspark related posts

  • [P] ESG scoring with Node2Vec and web-site with streamlit!

    2 projects | /r/MachineLearning | 10 Mar 2023
  • PySpark: A brief analysis to the most common words in Dracula, by Bram Stoker

    2 projects | dev.to | 11 Jan 2023
  • I built an AI to classify good DD and bad DD, also shows the growth percentage of a stock associated with a post.

    1 project | /r/wallstreetbets | 16 May 2021
  • Release John Snow Labs Spark-NLP 2.7.0: New T5 and MarianMT seq2seq transformers, detect up to 375 languages, word segmentation, over 720+ models and pipelines, support for 192+ languages, and many more! · JohnSnowLabs/spark-nlp

    5 projects | /r/scala | 4 Jan 2021

Index

What are some of the best open-source Pyspark projects in Jupyter Notebook? This list will help you:

# Project Stars
1 pyspark-tutorial 1,264
2 WallStreetBets_BigDataAnalysis 181
3 pyspark-tutorial 134
4 anovos 74
5 lasagna 47
6 ESG-AI-investment-by-streamlit 31
7 reddit-streaming 18
8 pyspark_nlp_workshop 12
9 project-atlas-sao-paulo 11
10 workshop-introduction-to-machine-learning 8
11 project 6
12 synapse-azure-data-explorer-101 4
13 file-format-benchmark 3
14 dracula 0

Sponsored
Stream - Scalable APIs for Chat, Feeds, Moderation, & Video.
Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
getstream.io

Did you know that Jupyter Notebook is
the 13th most popular programming language
based on number of references?