Drop-in replacement for Apache Spark UI
- Updated
Dec 1, 2025 - TypeScript
Drop-in replacement for Apache Spark UI
DataMate is an enterprise-level data processing platform designed for model fine-tuning and RAG retrieval.
Jayvee is a domain-specific language and runtime for automated processing of data pipelines
🔍 Data pipeline for crawling PDFs from the Web and transforming their contents into structured data using AWS textract. Built with AWS CDK + TypeScript
Using LLMs and AI browser automation to robustly extract web data
⚡️ Next-generation data transformation framework for TypeScript that puts developer experience first
📺 Instill Console for 🔮 Instill Core: https://github.com/instill-ai/instill-core
Never sift through endless dbt™ logs again. dbt Command Center is a free, open-source, local web application that provides a user-friendly interface to monitor and manage dbt runs.
Aqueduct Core is responsible for the core functionality of Aqueduct, an experiment management system.
Sync your team's data to your LLM applications in real-time
An AI-powered, India-origin web platform that redefines how users read, analyze, and discuss news — built using Next.js, TypeScript, Tailwind CSS, and Framer Motion.
Watchmen Platform is a low code data platform for data pipeline, meta data management , analysis, indicator objective analysis and quality management
An extensible pipelining tool to build data pipelines from your bank account to any destination.
Real-time data pipeline streaming sensor readings through Kafka and Spark to a live React dashboard. Built to learn distributed streaming systems, big data processing, and end-to-end data engineering.
Due Diligence Automation for Crypto Funds is an AI tool from ESPRIT University that automates digital asset evaluations using GPT, web scraping, and financial data. It quickly generates smart questions, gathers data, and creates dynamic reports to help investors make informed decisions.
Create Database agnostic aggregations base on data pipelines
IoT-powered smart factory simulation that combines Kafka, MongoDB, and Flask with machine learning-driven insights for Energy Optimization, Predictive Maintenance, and Operational Safety Monitoring.
BridgeGate n8n node for EMR/EHR integration enables seamless connectivity with any supported EMR/EHR system such as Athena, Cerner, eClinicalWorks, EPIC, NextGen, PointClickCare, Practice Fusion.
Real-time data processing architecture using Apache Kafka, Flink, and Kubernetes. This project demonstrates how to build a scalable and resilient pipeline for streaming data, performing ETL with Flink, and storing the processed data in a Data Warehouse for analysis.
Add a description, image, and links to the data-pipeline topic page so that developers can more easily learn about it.
To associate your repository with the data-pipeline topic, visit your repo's landing page and select "manage topics."