Skip to content

This repo sets up a fully functional Apache Hadoop single-node cluster using Docker on Ubuntu. It allows beginners to explore HDFS, run MapReduce jobs, and understand core Big Data concepts in a simplified and containerized environment — perfect for learning and testing.

Notifications You must be signed in to change notification settings

ronnie-allen/Hadoop-Single-Node-Setup-using-Docker-on-Ubuntu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Hadoop Single Node Setup using Docker on Ubuntu

This guide helps you install and run Apache Hadoop (Single Node) using Docker on Ubuntu.


Prerequisites

  • Ubuntu 20.04 / 22.04
  • Docker installed
  • Internet connection
  • Basic terminal usage

Step 1: Install Docker

sudo apt update sudo apt install ca-certificates curl gnupg # Add Docker's GPG key sudo install -m 0755 -d /etc/apt/keyrings curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \ sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg sudo chmod a+r /etc/apt/keyrings/docker.gpg # Add Docker's repository echo \ "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \ https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | \ sudo tee /etc/apt/sources.list.d/docker.list > /dev/null # Install Docker sudo apt update sudo apt install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin 

Step 2: (Optional) Run Docker Without sudo

sudo usermod -aG docker $USER newgrp docker 

Or restart your system for group changes to take effect.


Step 3: Create a Docker Network

docker network create hadoop 

Step 4: Pull the Hadoop Docker Image

We are using the official BDE Hadoop image:

docker pull bde2020/hadoop-namenode:2.0.0-hadoop3.2.1-java8 

Step 5: Run the Hadoop NameNode Container

docker run -itd \ --net hadoop \ --name hadoop-master \ -p 9870:9870 -p 9000:9000 \ -e CLUSTER_NAME=HadoopCluster \ bde2020/hadoop-namenode:2.0.0-hadoop3.2.1-java8 

Step 6: Access Hadoop Web Interface

Open your browser:

http://localhost:9870 

Step 7: Interact with HDFS (Inside Container)

docker exec -it hadoop-master bash 

Example HDFS Commands:

# Check if directory exists hdfs dfs -ls / # Create directory (only if it doesn't exist) hdfs dfs -mkdir /test # Upload file hdfs dfs -put /etc/hosts /test # List files hdfs dfs -ls /test # Download file back to container FS hdfs dfs -get /test/hosts /tmp/ 

Step 8: Run a WordCount MapReduce Job

cd $HADOOP_HOME hdfs dfs -mkdir /input hdfs dfs -put etc/hadoop/*.xml /input hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar wordcount /input /output hdfs dfs -cat /output/part-r-00000 

Step 9: Stop and Remove Container/Image

docker stop hadoop-master docker rm hadoop-master docker rmi bde2020/hadoop-namenode:2.0.0-hadoop3.2.1-java8 docker network rm hadoop 

Optional: Clean HDFS

hdfs dfs -rm -r /test hdfs dfs -rm -r /input hdfs dfs -rm -r /output 

Further Learning

  • Explore Hive (SQL on Hadoop)

  • Add Spark to the cluster

  • Build real-time pipelines with Kafka + Hadoop

  • Use Hadoop with Jupyter + PySpark


Author

Made by a beginner learning Big Data with Docker and Hadoop.
Tested on Ubuntu 22.04 with Docker 24+.

About

This repo sets up a fully functional Apache Hadoop single-node cluster using Docker on Ubuntu. It allows beginners to explore HDFS, run MapReduce jobs, and understand core Big Data concepts in a simplified and containerized environment — perfect for learning and testing.

Topics

Resources

Stars

Watchers

Forks