Scala MapReduce

Open-source Scala projects categorized as MapReduce

Scala MapReduce Projects

  1. Apache Spark

    Apache Spark - A unified analytics engine for large-scale data processing

    Project mention: 15 AWS EMR Cost Optimization Tips to Slash Your EMR Spending (2025) | dev.to | 2025-12-16

    AWS EMR (Elastic MapReduce) is a fully managed big data platform. It manages the setup, configuration, and tuning of open source frameworks like Apache Hadoop, Apache Spark, Apache Hive, Presto, and more at scale on AWS infrastructure. EMR handles cluster scaling, resource allocation, and lifecycle management. This allows you to work with large datasets for various use cases, from ETL pipelines to ML workloads. EMR uses a pay-as-you-go pricing model. Costs for compute, storage, and other AWS services can add up quickly as your data grows, clusters get bigger, and jobs become more complex. If you're not careful, costs can skyrocket due to inefficient resource use, poor instance choices, and misconfigured storage. That's why AWS EMR Cost Optimization is key. It helps you get the best performance per dollar while maintaining data processing speed, reliability, and scalability.

  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Scala MapReduce discussion

Scala MapReduce related posts

  • From Pandas to Upstream Control: The Evolution PyData Needs Next

    1 project | dev.to | 11 Nov 2025
  • Build a Self-Hosted Apache Iceberg Lakehouse in Minutes with RisingWave

    4 projects | dev.to | 9 Oct 2025
  • How to Reduce Big Data Analytics Costs by 90% with Karpenter and Spark

    3 projects | dev.to | 21 Apr 2025
  • Unveiling the Apache License 2.0: A Deep Dive into Open Source Freedom

    3 projects | dev.to | 11 Mar 2025
  • The Application of Java Programming In Data Analysis and Artificial Intelligence

    1 project | dev.to | 10 Mar 2025
  • Apache Spark: Revolutionizing Big Data with Sustainable Open Source Funding

    1 project | dev.to | 6 Mar 2025
  • Run PySpark Local Python Windows Notebook

    2 projects | dev.to | 21 Jan 2025
  • A note from our sponsor - Stream
    getstream.io | 22 Dec 2025
    Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure. Learn more →

Index

# Project Stars
1 Apache Spark 42,518

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com

Did you know that Scala is
the 32nd most popular programming language
based on number of references?