This project was a joint effort by Lucas De Oliveira, Chandrish Ambati, and Anish Mukherjee to create a song and playlist embeddings for recommendations in a distributed fashion using a 1M playlist dataset by Spotify.
- Updated
May 18, 2023 - HTML
This project was a joint effort by Lucas De Oliveira, Chandrish Ambati, and Anish Mukherjee to create a song and playlist embeddings for recommendations in a distributed fashion using a 1M playlist dataset by Spotify.
Toolkit for Apache Spark ML for Feature clean-up, feature Importance calculation suite, Information Gain selection, Distributed SMOTE, Model selection and training, Hyper parameter optimization and selection, Model interprability.
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟
End-to-end data engineer project
Projects and studies regarding Data Engineering Area
Workshop Big Data en Español
This project provides an end-to-end data processing and visualization of visa numbers in Japan using PySpark and Plotly. The spark clusters are set up within a Docker container on Azure.
Twitter Spark Streaming using PySpark
parallel implementation of hierarchical clustering algorithm based on pyspark
Movie Recommendation Engine using PySpark
Samples for Azure Databricks Orientation
A comprehensive implementation of Bitcoin address clustering using multiple heuristic conditions for blockchain analysis and chain analysis applications.
Transformation of Akamai Logs with Spark ETL and discover of Values and similarities in logs used SparkML and H2O ML
This is the final project for the Data Scientist Nanodegree, where our goal is to predict churn for a fictional streaming service called Sparkify.
Churn Prediction using PySpark
This is a group project I worked on with my classmates. This project uses pyspark ML and created a SparkSession object.
The goal was to perform predictive maintenance on commercial turbofan engine. The approach used here is a data-driven approach, meaning that data collected from the operational jet engine is used to perform predictive maintenance modeling. To be specific, to build a predictive model to estimate the Remaining Useful Life ( RUL) of a jet engine ba…
Add a description, image, and links to the pyspark topic page so that developers can more easily learn about it.
To associate your repository with the pyspark topic, visit your repo's landing page and select "manage topics."