This guide helps you install and run Apache Hadoop (Single Node) using Docker on Ubuntu.
- Ubuntu 20.04 / 22.04
- Docker installed
- Internet connection
- Basic terminal usage
sudo apt update sudo apt install ca-certificates curl gnupg # Add Docker's GPG key sudo install -m 0755 -d /etc/apt/keyrings curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \ sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg sudo chmod a+r /etc/apt/keyrings/docker.gpg # Add Docker's repository echo \ "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \ https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | \ sudo tee /etc/apt/sources.list.d/docker.list > /dev/null # Install Docker sudo apt update sudo apt install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin sudo usermod -aG docker $USER newgrp docker Or restart your system for group changes to take effect.
docker network create hadoop We are using the official BDE Hadoop image:
docker pull bde2020/hadoop-namenode:2.0.0-hadoop3.2.1-java8 docker run -itd \ --net hadoop \ --name hadoop-master \ -p 9870:9870 -p 9000:9000 \ -e CLUSTER_NAME=HadoopCluster \ bde2020/hadoop-namenode:2.0.0-hadoop3.2.1-java8 Open your browser:
http://localhost:9870 docker exec -it hadoop-master bash # Check if directory exists hdfs dfs -ls / # Create directory (only if it doesn't exist) hdfs dfs -mkdir /test # Upload file hdfs dfs -put /etc/hosts /test # List files hdfs dfs -ls /test # Download file back to container FS hdfs dfs -get /test/hosts /tmp/ cd $HADOOP_HOME hdfs dfs -mkdir /input hdfs dfs -put etc/hadoop/*.xml /input hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar wordcount /input /output hdfs dfs -cat /output/part-r-00000 docker stop hadoop-master docker rm hadoop-master docker rmi bde2020/hadoop-namenode:2.0.0-hadoop3.2.1-java8 docker network rm hadoop hdfs dfs -rm -r /test hdfs dfs -rm -r /input hdfs dfs -rm -r /output -
Explore Hive (SQL on Hadoop)
-
Add Spark to the cluster
-
Build real-time pipelines with Kafka + Hadoop
-
Use Hadoop with Jupyter + PySpark
Made by a beginner learning Big Data with Docker and Hadoop.
Tested on Ubuntu 22.04 with Docker 24+.