data-transformation

Reusable Python classes that extend open source PySpark capabilities. Examples of implementation is available under notebooks of repo https://github.com/bennyaustin/synapse-dataplatform

apache-spark data-transformation pyspark azure-databricks azure-synapse-analytics synapse-spark azure-synapse-sparkpool

Updated Nov 1, 2024
Python

VishanthSurresh / Spotify-Capstone-Project---Data-Engineering

Star

This repository is a working ETL framework which utilizes user data from Spotify API using ➲Python for Extraction and Transformation ➲SQL for Data Loading and Staging ➲Airflow for Data Orchestration and Monitoring ➲PowerBI for Reporting

scheduling data-transformation data-visualization orchestration api-call etl-pipeline data-loading data-modelling

Updated Apr 16, 2023
Python

HKUSTDial / megatran

Star

[VLDB'25] Official repo for Paper "Weak-to-Strong Prompts with Lightweight-to-Powerful LLMs for High-Accuracy, Low-Cost, and Explainable Data Transformation"

data-transformation code-generation data-cleaning self-reflection large-language-models weak-to-strong lazy-rag

Updated Aug 21, 2025
Python

cybersader / jsonaut

Star

GUI and library made to flatten HUGE JSON files. A library and utility for exploring, analyzing, and flattening JSON files of any size (LARGE - GBs) into CSVs, along with CSV transformations, dynamic CSV filtering, and all with low memory utilization.

python json gui csv etl data-transformation pandas data-engineering awesome-list data-integration data-pipelines csv-export json-to-csv json-flattener huge-data-files pyqt5-gui

Updated Jun 7, 2023
Python

bagher / fast-resource

Star

fast-resource is a data transformation layer that sits between the database and the application's users, enabling quick data retrieval. It further enhances performance by caching data using Redis and Memcached.

python redis flask memcached django cache data-transformation fastapi

Updated May 15, 2023
Python

quantumudit / Insurance-Portfolio-Analysis

Star

This project focuses on analyzing and visualizing the insurance portfolio of an anonymous company that implemented an aggressive growth plan in 2021 across the counties of Florida using Python and Power BI

python etl jupyter-notebook data-transformation power-bi data-visualization data-analytics geospatial-analysis

Updated Dec 29, 2021
Python

CoDS-GCS / KGFarm

Star

A Holistic Platform for Automating Data Preparation

data-transformation feature-selection feature-engineering data-cleaning datapreparation

Updated Apr 19, 2024
Python

ezvezdov / Dataset-Wrapper

Star

NuScenes, Lyft, Waymo and a2d2 datasets parser.

python data-transformation dataset lyft lidar self-driving-car unification nuscenes waymo a2d2

Updated Aug 16, 2022
Python

pillowTree3 / YouTube-Data-Harvesting-and-Warehousing-using-SQL-MongoDB-and-Streamlit

Star

This project is a powerful Streamlit application designed to provide users with seamless access and analysis of data from multiple YouTube channels. This intuitive tool leverages the Google API to retrieve a comprehensive range of information, including channel details, video statistics, and viewer engagement metrics.

mysql python mongodb data-transformation streamlit