helloworld

Hadoop MapReduce wordcount example in Java. Introduction to Hadoop job.

In this article we are going to review the classic Hadoop word count example, customizing it a little bit. As usual I suggest to use Eclipse with Maven in order to create a project that can be modified, compiled and easily executed on the cluster. First of all, download the maven boilerplate project from here: https://github.com/H4ml3t/maven-hadoop-java-wordcount-template

$ git clone git@github.com:H4ml3t/maven-hadoop-java-wordcount-template.git

If you want to compile it directly than you can

$ cd maven-hadoop-java-wordcount-template
$ mvn package

the result fat jar will be found in the target folder with name “maven-hadoop-java-wordcount-template-0.0.1-SNAPSHOT-jar-with-dependencies.jar“.

Alternatively, if you want to modify the code (like we are about to do now) open Eclipse and go for [File] -> [Import] -> [Existing maven project] -> Browse for the directory …Continue reading →

HelloWorld Spark? Smart (selective) wordcount Scala example!

In the previous post I showed how to build a Spark Scala jar and submit a job using spark-submit, now let’s customize a little bit our main Scala Spark object. You can find the project of the following example here on github.

Let’s imagine we’ve collected a series of messages about football (tweets or whatever) and we want to count all words, but not simply every word, all those are of interest. Say we have a “dictionary” of football players’ names, and we want to see which of them appears the most in those messages.

Example time!

Imagine we have a file (called names) with a list of names (one per line):

Kane
Sirigu
Neymar

Totti

And in another file (called messages) we have a list of messages

I’m obviously with Harry Kane (Hurricane) today… Let’s go Tottenham!!!
Kid Kane with Arsenal jersey, lol
Another top save from Sirigu to keep Toulouse out. PSG seconds away from top spot.
Why Sirigu and Verratti are laughing so much?
Francesco Totti Scores With Flying Kung Fu Kick, Celebrates With Selfie
What would Roma do without Totti? See his great goal & celebration HERE
! #Totti #Goals
Wenger wasn’t the Arsenal coach when Totti started to play in Serie A

The result analyzing these lines has to be (Kane, 2), (Sirigu, 2), (Totti, 3). To achieve this, …continue reading →

o/ …ehi!

Titling the first post “Hello World” is too predictable and boring, I suppose. Well, maybe I should have named it so, since all the blog will be predictable and boring (I’ll try to put some effort in being original :P). Then, let me first apologize to you, both for the predicable-boring issue and for my not-mathertongue english writing skill. What is this blog all about? Yet another programming blog in which I will drop some snippets of code, projects, test, experiences, ideas… together with everything else you don’t care, like stuff concerning my professional life and thoughts (I’ll introduce myself in the “About” page).

Leaving the latter apart, the first part that you do care (or at least surely much more than the second), will involve big-data topics. Now, I hate two things in the world:

  1. anchovies on the pizza
  2. the word big-data

The number 1 because (OK, if not for the horrible English, now you’ll understand I’m from Italy for this) when you eat pizza with anchovies on it, you really can’t understand what is that flavor you have in your mouth. It is so strong and misleading that you could be chewing a sock with an anchovy on top and you wouldn’t feel the difference. “Big-data” for computer science is …continue reading →