Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
- Updated
Oct 15, 2025 - Java
Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
OpenRefine is a free, open source power tool for working with messy data and improving it
Statistical Machine Intelligence & Learning Engine
Java dataframe and visualization library
Maestro: Netflix’s Workflow Orchestrator
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
Hopsworks - Data-Intensive AI platform with a Feature Store
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.
ELKI Data Mining Toolkit
Know your data better!Datavines is Next-gen Data Observability Platform, support metadata manage and data quality.
The premier open source Data Quality solution
Categorical Query Language IDE
Ultimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Roadmap for Data Engineering
Quick start: pip install jsoniq ⛈️ RumbleDB 2.0.0 "Lemon Ironwood" 🌳 for Apache Spark | Run queries on your large-scale, messy datasets (JSON, text, CSV, Parquet, Delta...) | Data Lakehouse with Updates, Scripting, Declarative Machine Learning and more
Una introduccion al analisis de datos con R y R Studio
Blockchain2graph extracts blockchain data (bitcoin) and insert them into a graph database (neo4j).
🔥 One of the most comprehensive open-source data annotation platform.
A Java Toolbox for Scalable Probabilistic Machine Learning