Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure. Learn more →
Top 23 Java Machine Learning Projects
- Project mention: AWS EKS Deployment: Real-Time Data Streaming Platform - 50K Events/Sec for $1,250/Month | dev.to | 2025-10-26
Apache Flink: flink.apache.org
-
InfluxDB
InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
- Project mention: 15 AWS EMR Cost Optimization Tips to Slash Your EMR Spending (2025) | dev.to | 2025-12-16
AWS EMR (Elastic MapReduce) is a fully managed big data platform. It manages the setup, configuration, and tuning of open source frameworks like Apache Hadoop, Apache Spark, Apache Hive, Presto, and more at scale on AWS infrastructure. EMR handles cluster scaling, resource allocation, and lifecycle management. This allows you to work with large datasets for various use cases, from ETL pipelines to ML workloads. EMR uses a pay-as-you-go pricing model. Costs for compute, storage, and other AWS services can add up quickly as your data grows, clusters get bigger, and jobs become more complex. If you're not careful, costs can skyrocket due to inefficient resource use, poor instance choices, and misconfigured storage. That's why AWS EMR Cost Optimization is key. It helps you get the best performance per dollar while maintaining data processing speed, reliability, and scalability.
-
Deeplearning4j
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learn...
Kotlin can use any Java library, giving you access to powerful machine learning frameworks like DeepLearning4J, Smile, and Weka.
-
mit-deep-learning-book-pdf
MIT Deep Learning Book in PDF format (complete and parts) by Ian Goodfellow, Yoshua Bengio and Aaron Courville
-
-
Kotlin can use any Java library, giving you access to powerful machine learning frameworks like DeepLearning4J, Smile, and Weka.
-
-
Stream
Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.
- Project mention: How I Improved Zero-Shot Classification in Deep Java Library (DJL) OSS | dev.to | 2025-06-15
# install release version of djl-converter pip install https://publish.djl.ai/djl_converter/djl_converter-0.30.0-py3-none-any.whl # install from djl master branch pip install "git+https://github.com/deepjavalibrary/djl.git#subdirectory=extensions/tokenizers/src/main/python" # install djl-convert from local djl repo git clone https://github.com/deepjavalibrary/djl.git cd djl/extensions/tokenizers/src/main/python python3 -m pip install -e . # Add djl-convert to PATH (if installed locally or not globally available) export PATH="$HOME/.local/bin:$PATH" # install optimum if you want to convert to OnnxRuntime pip install optimum # convert a single model to TorchScript, Onnxruntime or Rust djl-convert --help # import models as DJL Model Zoo djl-import --help
- Project mention: Starting July 1, Academic Publishers Can't Paywall NIH-Funded Research | news.ycombinator.com | 2025-05-01
what do you mean exactly? I was suprised how with grobid many of at least the arXiv papers are easily converted to xml for better processing than PDF.
Most of the papers are constructed from their latex sources so there's an easy way to undo it i guess.
https://github.com/kermitt2/grobid
- Project mention: Fahmatrix – A Lightweight, Pandas-Like DataFrame Library for Java (GitHub) | news.ycombinator.com | 2025-05-16
Always great to see efforts to make working with data frames easier. Here are some similar data frame libraries for Java:
https://github.com/jtablesaw/tablesaw
https://github.com/dflib/dflib
My preferred way is just use duckdb java API. I didn't see anything better in performance/efficiency. Also a SQL query is often easier to write
-
-
-
Astra DB's vector indexing capabilities are a combination of Cassandra's storage-attached indexing (SAI) and JVector, a non-blocking, concurrent, graph-based vector index. What this means is that Astra DB doesn't need to rebuild or block access to its index when you are inserting vectors, they are updated live.
-
-
elasticsearch-learning-to-rank
Plugin to integrate Learning to Rank (aka machine learning for better relevance) with Elasticsearch
-
-
-
DatumBox
Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.
-
-
-
- Project mention: Show HN: Kreuzberg – Modern async Python library for document text extraction | news.ycombinator.com | 2025-02-15
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Java Machine Learning discussion
Java Machine Learning related posts
-
Apache Spark vs Apache Hadoop—10 Crucial Differences (2025)
-
InnovaML: Driving Business Innovation Through Intelligent ML Solutions
-
AWS EKS Deployment: Real-Time Data Streaming Platform - 50K Events/Sec for $1,250/Month
-
🔥 Simulating Course Schedules 600x Faster with Web Workers in CourseCast
-
Timestone: A Lightweight Java Library for Testing Time-Based Logic
-
JuiceFS 1.3 Beta 2 Integrates Apache Ranger for Fine-Grained Access Control
-
The Grug Brained Developer
- A note from our sponsor - Stream getstream.io | 23 Dec 2025
Index
What are some of the best open-source Machine Learning projects in Java? This list will help you:
| # | Project | Stars |
|---|---|---|
| 1 | Apache Flink | 25,615 |
| 2 | Apache Hadoop | 15,417 |
| 3 | Deeplearning4j | 14,171 |
| 4 | mit-deep-learning-book-pdf | 13,630 |
| 5 | vespa | 6,680 |
| 6 | Smile | 6,316 |
| 7 | useful-java-links | 6,084 |
| 8 | Deep Java Library (DJL) | 4,723 |
| 9 | grobid | 4,510 |
| 10 | Tablesaw | 3,721 |
| 11 | maestro | 3,688 |
| 12 | modeldb | 1,740 |
| 13 | jvector | 1,662 |
| 14 | Siddhi | 1,570 |
| 15 | elasticsearch-learning-to-rank | 1,520 |
| 16 | Tribuo | 1,385 |
| 17 | hopsworks | 1,268 |
| 18 | DatumBox | 1,087 |
| 19 | JSAT | 799 |
| 20 | knime-core | 761 |
| 21 | jblas | 601 |
| 22 | CERMINE | 502 |
| 23 | oj! Algorithms | 492 |