Java Machine Learning

Open-source Java projects categorized as Machine Learning

Top 23 Java Machine Learning Projects

Machine Learning
  1. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  2. Apache Hadoop

    Apache Hadoop

    Project mention: 15 AWS EMR Cost Optimization Tips to Slash Your EMR Spending (2025) | dev.to | 2025-12-16

    AWS EMR (Elastic MapReduce) is a fully managed big data platform. It manages the setup, configuration, and tuning of open source frameworks like Apache Hadoop, Apache Spark, Apache Hive, Presto, and more at scale on AWS infrastructure. EMR handles cluster scaling, resource allocation, and lifecycle management. This allows you to work with large datasets for various use cases, from ETL pipelines to ML workloads. EMR uses a pay-as-you-go pricing model. Costs for compute, storage, and other AWS services can add up quickly as your data grows, clusters get bigger, and jobs become more complex. If you're not careful, costs can skyrocket due to inefficient resource use, poor instance choices, and misconfigured storage. That's why AWS EMR Cost Optimization is key. It helps you get the best performance per dollar while maintaining data processing speed, reliability, and scalability.

  3. Deeplearning4j

    Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learn...

    Project mention: Kotlin for AI-Powered App Development | dev.to | 2025-05-23

    Kotlin can use any Java library, giving you access to powerful machine learning frameworks like DeepLearning4J, Smile, and Weka.

  4. mit-deep-learning-book-pdf

    MIT Deep Learning Book in PDF format (complete and parts) by Ian Goodfellow, Yoshua Bengio and Aaron Courville

  5. vespa

    AI + Data, online. https://vespa.ai

  6. Smile

    Statistical Machine Intelligence & Learning Engine

    Project mention: Kotlin for AI-Powered App Development | dev.to | 2025-05-23

    Kotlin can use any Java library, giving you access to powerful machine learning frameworks like DeepLearning4J, Smile, and Weka.

  7. Stream

    Stream - Scalable APIs for Chat, Feeds, Moderation, & Video. Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.

    Stream logo
  8. Deep Java Library (DJL)

    An Engine-Agnostic Deep Learning Framework in Java

    Project mention: How I Improved Zero-Shot Classification in Deep Java Library (DJL) OSS | dev.to | 2025-06-15

    # install release version of djl-converter pip install https://publish.djl.ai/djl_converter/djl_converter-0.30.0-py3-none-any.whl # install from djl master branch pip install "git+https://github.com/deepjavalibrary/djl.git#subdirectory=extensions/tokenizers/src/main/python" # install djl-convert from local djl repo git clone https://github.com/deepjavalibrary/djl.git cd djl/extensions/tokenizers/src/main/python python3 -m pip install -e . # Add djl-convert to PATH (if installed locally or not globally available) export PATH="$HOME/.local/bin:$PATH" # install optimum if you want to convert to OnnxRuntime pip install optimum # convert a single model to TorchScript, Onnxruntime or Rust djl-convert --help # import models as DJL Model Zoo djl-import --help

  9. grobid

    A machine learning software for extracting information from scholarly documents

    Project mention: Starting July 1, Academic Publishers Can't Paywall NIH-Funded Research | news.ycombinator.com | 2025-05-01

    what do you mean exactly? I was suprised how with grobid many of at least the arXiv papers are easily converted to xml for better processing than PDF.

    Most of the papers are constructed from their latex sources so there's an easy way to undo it i guess.

    https://github.com/kermitt2/grobid

  10. Tablesaw

    Java dataframe and visualization library

    Project mention: Fahmatrix – A Lightweight, Pandas-Like DataFrame Library for Java (GitHub) | news.ycombinator.com | 2025-05-16

    Always great to see efforts to make working with data frames easier. Here are some similar data frame libraries for Java:

    https://github.com/jtablesaw/tablesaw

    https://github.com/dflib/dflib

    My preferred way is just use duckdb java API. I didn't see anything better in performance/efficiency. Also a SQL query is often easier to write

  11. maestro

    Maestro: Netflix’s Workflow Orchestrator (by Netflix)

  12. modeldb

    Open Source ML Model Versioning, Metadata, and Experiment Management

  13. jvector

    JVector: the most advanced embedded vector search engine

    Project mention: 5 GenAI Things You Didn't Know About Astra DB | dev.to | 2025-03-06

    Astra DB's vector indexing capabilities are a combination of Cassandra's storage-attached indexing (SAI) and JVector, a non-blocking, concurrent, graph-based vector index. What this means is that Astra DB doesn't need to rebuild or block access to its index when you are inserting vectors, they are updated live.

  14. Siddhi

    Stream Processing and Complex Event Processing Engine

  15. elasticsearch-learning-to-rank

    Plugin to integrate Learning to Rank (aka machine learning for better relevance) with Elasticsearch

  16. Tribuo

    Tribuo - A Java machine learning library

  17. hopsworks

    Hopsworks - Data-Intensive AI platform with a Feature Store

  18. DatumBox

    Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.

  19. JSAT

    Java Statistical Analysis Tool, a Java library for Machine Learning

  20. knime-core

    KNIME Analytics Platform

  21. jblas

    Linear Algebra for Java

  22. CERMINE

    Content ExtRactor and MINEr

    Project mention: Show HN: Kreuzberg – Modern async Python library for document text extraction | news.ycombinator.com | 2025-02-15
  23. oj! Algorithms

    oj! Algorithms

  24. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Java Machine Learning discussion

Java Machine Learning related posts

  • Apache Spark vs Apache Hadoop—10 Crucial Differences (2025)

    1 project | dev.to | 16 Nov 2025
  • InnovaML: Driving Business Innovation Through Intelligent ML Solutions

    2 projects | dev.to | 27 Oct 2025
  • AWS EKS Deployment: Real-Time Data Streaming Platform - 50K Events/Sec for $1,250/Month

    2 projects | dev.to | 26 Oct 2025
  • 🔥 Simulating Course Schedules 600x Faster with Web Workers in CourseCast

    2 projects | dev.to | 21 Aug 2025
  • Timestone: A Lightweight Java Library for Testing Time-Based Logic

    3 projects | dev.to | 24 Jul 2025
  • JuiceFS 1.3 Beta 2 Integrates Apache Ranger for Fine-Grained Access Control

    2 projects | dev.to | 19 Jun 2025
  • The Grug Brained Developer

    5 projects | news.ycombinator.com | 17 Jun 2025
  • A note from our sponsor - Stream
    getstream.io | 23 Dec 2025
    Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure. Learn more →

Index

What are some of the best open-source Machine Learning projects in Java? This list will help you:

# Project Stars
1 Apache Flink 25,615
2 Apache Hadoop 15,417
3 Deeplearning4j 14,171
4 mit-deep-learning-book-pdf 13,630
5 vespa 6,680
6 Smile 6,316
7 useful-java-links 6,084
8 Deep Java Library (DJL) 4,723
9 grobid 4,510
10 Tablesaw 3,721
11 maestro 3,688
12 modeldb 1,740
13 jvector 1,662
14 Siddhi 1,570
15 elasticsearch-learning-to-rank 1,520
16 Tribuo 1,385
17 hopsworks 1,268
18 DatumBox 1,087
19 JSAT 799
20 knime-core 761
21 jblas 601
22 CERMINE 502
23 oj! Algorithms 492

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com

Did you know that Java is
the 9th most popular programming language
based on number of references?