DEV Community

Tomer Ben David
Tomer Ben David

Posted on

Local hadoop on laptop for practice

Introduction

Here is what I learned last week about hadoop installation:

Hadoop sounds like a really big thing, complex installation, cluster, hundreds of machines, Tera's if not Peta's of data, but actually, you can download a simple jar and run hadoop with hdfs on your laptop, for practice, it's very easy!

Our plan

  1. Setup JAVA_HOME (hadoop is built on java).
  2. Download hadoop tar.gz.
  3. Extract hadoop tar.gz
  4. Setup hadoop config
  5. Start and format hdfs
  6. Upload files to hdfs.
  7. Run hadoop job on these uploaded files.
  8. Get back and print results!

Sounds like a plan!

Setup JAVA_HOME

As we said hadoop is built on java so we need JAVA_HOME set.

➜ hadoop$ ls /Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home ➜ hadoop$ echo $JAVA_HOME /Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home 
Enter fullscreen mode Exit fullscreen mode

Download Hadoop tar.gz

Next we download hadoop, nice :)

➜ hadoop$ curl http://apache.spd.co.il/hadoop/common/hadoop-3.1.0/hadoop-3.1.0.tar.gz --output hadoop.tar.gz % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 1 310M 1 3581k 0 0 484k 0 0:10:57 0:00:07 0:10:50 580k 
Enter fullscreen mode Exit fullscreen mode

Extract hadoop tar.gz

Now that we have the tar.gz on our laptop let's extract it.

➜ hadoop$ tar xvfz ~/Downloads/hadoop-3.1.0.tar.gz 
Enter fullscreen mode Exit fullscreen mode

Setup HDFS

Now let's config HDFS on our laptop:

➜ hadoop$ cd hadoop-3.1.0 ➜ hadoop-3.1.0$ ➜ hadoop-3.1.0$ vi etc/hadoop/core-site.xml 
Enter fullscreen mode Exit fullscreen mode

Configuration should be:

<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration> 
Enter fullscreen mode Exit fullscreen mode

So we configured the hdfs port, let's configure how many replicas we need, we are on laptop we want only one replica for our data:

➜ hadoop-3.1.0$ vi etc/hadoop/hdfs-site.xml: 
Enter fullscreen mode Exit fullscreen mode

The above hdfs-site.xml is the site for replica configuration below is the configuration it should have (hint: 1):

<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration> 
Enter fullscreen mode Exit fullscreen mode

Enabled SSHD

Hadoop connects to nodes with ssh so let's enable it on our mac laptop:

http://cdn.osxdaily.com/wp-content/uploads/2011/09/enable-sftp-server-mac-os-x-lion.jpg

You should be able to ssh with no pass:

➜ hadoop-3.1.0 ssh localhost Last login: Wed May 9 17:15:28 2018 ➜ ~ 
Enter fullscreen mode Exit fullscreen mode

If you can't do this:

 $ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys $ chmod 0600 ~/.ssh/authorized_keys 
Enter fullscreen mode Exit fullscreen mode

Start HDFS

Next we start and format HDFS on our laptop:

bin/hdfs namenode -format ➜ hadoop-3.1.0$ bin/hdfs namenode -format WARNING: /Users/tomer.bendavid/tmp/hadoop/hadoop-3.1.0/logs does not exist. Creating. 2018-05-10 22:12:02,493 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = Tomers-MacBook-Pro.local/192.168.1.104 ➜ hadoop-3.1.0$ sbin/start-dfs.sh Starting namenodes on [localhost] Starting datanodes 
Enter fullscreen mode Exit fullscreen mode

Create folders on hdfs

Next we create sample input folder on HDFS on our laptop:

➜ hadoop-3.1.0$ bin/hdfs dfs -mkdir /user 2018-05-10 22:13:16,982 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable ➜ hadoop-3.1.0$ bin/hdfs dfs -mkdir /user/tomer 2018-05-10 22:13:22,474 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable ➜ hadoop-3.1.0$ 
Enter fullscreen mode Exit fullscreen mode

Upload testdata to HDFS

Now that we have HDFS up and running on our laptop lets upload some files:

➜ hadoop-3.1.0$ bin/hdfs dfs -put etc/hadoop input 2018-05-10 22:14:28,802 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable put: `input': No such file or directory: `hdfs://localhost:9000/user/tomer.bendavid/input' ➜ hadoop-3.1.0$ bin/hdfs dfs -put etc/hadoop /user/tomer/input 2018-05-10 22:14:37,526 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable ➜ hadoop-3.1.0$ bin/hdfs dfs -ls /user/tomer/input 2018-05-10 22:16:09,325 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 1 items drwxr-xr-x - tomer.bendavid supergroup 0 2018-05-10 22:14 /user/tomer/input/hadoop 
Enter fullscreen mode Exit fullscreen mode

Run hadoop job

So we have hdfs with files on our laptop, let's run a job on it what do you think?

➜ hadoop-3.1.0$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0.jar grep /user/tomer/input/hadoop/*.xml /user/tomer/output1 'dfs[a-z.]+' ➜ hadoop-3.1.0$ bin/hdfs dfs -cat /user/tomer/output1/part-r-00000 2018-05-10 22:22:29,118 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 1 dfsadmin 1 dfs.replication 
Enter fullscreen mode Exit fullscreen mode

We managed to have local hadoop installation with HDFS for tests! and run a test job! That is so cool!.

Summary

We managed to download hadoop, startup hdfs, upload files to this hdfs, run hadoop job, and get results from hdfs, all on our laptop on a single directory! that is cool!

In addition there is nothing new here, I just followed that straight forward guidance at hadoop installation docs. With a few minor modifications and some minor updated explanations to myself so it's clearer for me when I look at it in future for reference.

If you want to see more of what I learned last week i'm always at https://tomer-ben-david.github.io

Top comments (0)