☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️
- Updated
Jan 12, 2025 - Python
☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
🤖 An automated machine learning framework for audio, text, image, video, or .CSV files (50+ featurizers and 15+ model trainers). Python 3.6 required.
Python package implementing ML feature engineering and pre-processing for polars or pandas dataframes.
A tool to read CSV files with CSVW metadata and transform them into other formats.
Easy ETL
DEPRECATED: YAML-based data transformations
Reusable Python classes that extend open source PySpark capabilities. Examples of implementation is available under notebooks of repo https://github.com/bennyaustin/synapse-dataplatform
This repository is a working ETL framework which utilizes user data from Spotify API using ➲Python for Extraction and Transformation ➲SQL for Data Loading and Staging ➲Airflow for Data Orchestration and Monitoring ➲PowerBI for Reporting
[VLDB'25] Official repo for Paper "Weak-to-Strong Prompts with Lightweight-to-Powerful LLMs for High-Accuracy, Low-Cost, and Explainable Data Transformation"
GUI and library made to flatten HUGE JSON files. A library and utility for exploring, analyzing, and flattening JSON files of any size (LARGE - GBs) into CSVs, along with CSV transformations, dynamic CSV filtering, and all with low memory utilization.
fast-resource is a data transformation layer that sits between the database and the application's users, enabling quick data retrieval. It further enhances performance by caching data using Redis and Memcached.
This project focuses on analyzing and visualizing the insurance portfolio of an anonymous company that implemented an aggressive growth plan in 2021 across the counties of Florida using Python and Power BI
A Holistic Platform for Automating Data Preparation
NuScenes, Lyft, Waymo and a2d2 datasets parser.
This project is a powerful Streamlit application designed to provide users with seamless access and analysis of data from multiple YouTube channels. This intuitive tool leverages the Google API to retrieve a comprehensive range of information, including channel details, video statistics, and viewer engagement metrics.
Python scripts to process, and analyze log files using PySpark.
Web scraper for ICE (Intercontinental Exchange) Markit Settlement Prices. Transforms and inserts data into an MSSQL database.
Add a description, image, and links to the data-transformation topic page so that developers can more easily learn about it.
To associate your repository with the data-transformation topic, visit your repo's landing page and select "manage topics."