0% found this document useful (0 votes)

441 views11 pages

Hadoop Notes

Hadoop Handbook

Uploaded by

Vijay Vishwanath Thombare

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

441 views11 pages

Hadoop Notes

Hadoop Handbook

Uploaded by

Vijay Vishwanath Thombare

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Intro to Hadoop and MapReduce

Lesson 1 Notes

Introduction

Hi!WelcometoFundamentalsofHadoopandMapReduce.MynamesSarah
Sproehnle,andImtheVicePresidentofEducationalServicesatCloudera,a
companywhichhelpsdevelop,support,andmanageHadoop.

AndImIanWrigley,ClouderasSeniorCurriculumManager.Betweenus,Sarah
andIhavebeenresponsibleforbringingHadooptrainingtoover20,000people,
andwereexcitedtoreachamuchbiggeraudiencehereatUdacity.Duringthis
courseweregoingtodiscusswhatbigdatais,whatHadoopis,whyitsuseful,
andhowtowriteMapReducecode.

Bytheendofthecourse,youllbeabletodescribethekindsofproblemsHadoopaddresses,
andyoullhavewrittenMapReduceprogramstoefficientlyanalyzeverylargeWebserverlog
files.Infact,youllhavehadhandsonexperiencerunningaHadoopjobbytheendoflessontwo.

So,letsstart.Inthislesson,we'regoingtodefine'bigdata',thesortofproblemsitintroduces,
andhowtoaddressthoseproblems.

Sources of Data

Organizationshavebeengeneratingdatasince
wayback,butastimegoeson,moreandmore
dataisbeinggenerated.IBMestimatesthatas
muchas90%ofthedataintheworldtoday
hasbeencreatedinthelasttwoyearsalone.

Justasasimpleexample,thinkaboutyourcellphone.Wheneverits
turnedon,itsconnectingtocelltowerstogetreception.Asyoumove
around,itwillconnecttodifferenttowers,andatdifferentsignal
strengthsdependingonhowfarawayfromthemyouare.Allofthat
connectiondataiscollectedbythephonecompany,anditslogged.
Copyright2014Udacity,Inc.AllRightsReserved.

Theycanuseittofinddeadspotsintheircoverage,toworkoutwhichtowersarethebusiest
andneedincreasedcapacity...theycaneventraceyouifyoumakeanemergencycallbutdont
giveyourexactlocation.Thatsanenormousamountofdatarightthere.

Anotherexampleiswhenyouvisita
WebsitelikeAmazonorNetflix.
Everythingyoudothereislogged:what
pagesyouviewed,whatproductsyou
lookedat,howlongyouspentoneach
page...eventhingslikewhatWeb
browseryouwereusingandwhatsortof
computeryouwereconnectingfrom.
Again,hugeamountsofdata.

Andthatsjustinthecorporateworld.Inmedicine,forexample,eachXRaycreateshuge
amountsofpotentiallyincrediblyvaluableinformation,andcomparinglargenumbersofthemcan
helpustodetectsimilaritiesintumors.

Thisincreaseintheamountofdataweregeneratingopensuphugepossibilities.Butitcomes
withproblemstoo.Wehavetostoreallthatdata,andwehavetobeabletoprocessitina
sensibleamountoftime.

Quiz: What is a Big Data problem?

ThiscourseisaboutHadoop,andhowithelpstodealwithBigData.Butnoteverythingis
actuallyabigdataproblem.Therearelotsofcaseswhereyoucanusetraditionalsystemsto
store,manage,andprocessyourdata.Sothefirstthingyouneedtodoisdecideifwhatyou
havereallydoesfallundertheheadingofbigdatainthefirstplace.Andtomakethatcall,we
havetocreatesomekindofdefinitionforwhatbigdatais.

Letsstartwithaquickquestion.Whichofthesewouldyouconsidertobebigdata?Youarenot
goingtobegradedonthisanswer,butgiveityourbestguess.

[]orderdetailsforapurchaseatastore
[]allordersacrosshundredsofbranchesnationwide
[]informationaboutapersonsstockportfolio
[]allstocktransactionsmadeontheNewYorkStockExchangeduringtheyear

Answer:
Formostpeople,theanswersaregoingtobe2and4.Alistofpurchasesatasinglestoreis
Copyright2014Udacity,Inc.AllRightsReserved.

almostcertainlysmallenoughtobeeasilyhandledbyatraditionalrelationaldatabasesystem
orevenjustaspreadsheet.Ordersfromhundredsofstoresnationwide,though,couldstartto
overwhelmtraditionalsystems.Likewise,informationaboutasinglepersonsstockportfolioisa
smallandeasilymanagedchunkofdata.ButdataontradesacrosstheentireNYSEforayear
willrunintotensorhundredsofterabytesandthatswheretraditionalsystemsreallydostartto
struggle.

Definition of Big Data

Theresnoonedefinitionforbigdataitsaverysubjectiveterm.Mostpeoplewouldconsidera
datasetofterabytesormoretobebigdata,buttherearecertainlypeopleusingHadoopwith
greatsuccessonsmallerchunksofdatathanthat.Onereasonabledefinitionisthatitsdata
whichcantcomfortablybeprocessedonasinglemachine.

Quiz: Challenges
ButBigDataismorethanjustsizeofthedata.Whatadditionalproblemscanyouseeinthis
field?

[]mostdataisworthlessanditshardtofindtheusefulparts
[]itshardtogatherdata
[]dataiscreatedveryfast
[]datafromdifferentsourcesisindifferentformats

Answer:
Apotentialchallengewithbigdataisthatitiscreatedveryfastanddoescomefromdifferent
sourceswhichcouldcomeinavarietyofformats.Inmyexperience,mostdataisnotworthless
butactuallydoeshavealotofvalue.

The 3 Vs of Big Data:

WhenyoureadortalkaboutBigData,youlloftenhearpeoplerefertothethreeVs.Volume
referstothesizeofdatathatyouredealingwith,Varietyreferstothefactthatthedataisoften
comingfromlotsofdifferentsourcesandinmanydifferentformats,andVelocityreferstothe
speedatwhichthedataisbeinggenerated,andthespeedatwhichitneedstobemade
availableforprocessing.Soletslookinmoredetailateachofthem.

Volume
Thepricetostoredatahasdroppedincrediblyoverthelast60years.In1980,thecostper
gigabytewasseveralhundredthousanddollars.In2013,itswellunder10cents.

Althoughitsworthsayingthatifyouactuallywanttostorethedatareliably,youregoingtoend
uppayingrathermorethanthatprobablyseveraldollarspergigabyte,maybeevenmore.

Thatsparticularlythecasewithmore
traditionaldatastoragedevicessuchas
storageareanetworks,orSANs,which
canbeextremelyexpensive.Thehigh
costofreliablestorageputsacaponthe
amountofdatacompaniescan
practicallystore.Atsomepoint,theyd
say,OK,itstooexpensivetostoreall
thatdatathatImnotdoinganythingwith.
Letsjuststorethecriticalstuff:my
actualsales,forexample,ratherthanall
thatstuffabouthowlongpeoplespenton
eachpageofmyWebsite.Butitturns
out,aswellsee,thatthedatatheyre
currentlythrowingawaycanbeincredibly
useful.Whatweneedisacheaperway
tostoreitreliably.

Andofcoursestoringthedataisonlyonepartoftheequationyoualsoneedtobeabletoread
Copyright2014Udacity,Inc.AllRightsReserved.

andprocessitefficiently.StoringaterabyteofdataonaSANisntsohard,butstreamingthe
datafromtheSANacrossthenetworktosomecentralprocessorcantakealongtime,and
processingitcanbeextremelyslow.

QUIZ: Volume

Whichofthefollowingdatadoyouthinkisworthstoringandanalyzing?

[]transactions(financial,governmentrelated)
[]logs(recordsofactivity,location)
[]businessdata(productcatalogs,prices,customers)
[]userdata(images,documents,video)
[]sensordata(temperature,pollution)
[]medicaldata(xrays,brainactivityrecords)
[]social(email,twitteretc)

Answer
Andtheansweristhatallofthesecanprovideusefulinformation.Butinordertostoreit,youll
needawaytoscaleyourstoragecapacityuptomassivevolume.Hadoop,whichstoresdatain
adistributedwayacrossmultiplemachines,doesthat.Youllseejusthowinthenextlesson.

Variety

ThesecondVisdatavariety.Foralongtime,peoplehaveuseddatabasestostoreand
processtheirdataeithersmallerdatabaseslikeMySQL,orbigdatawarehousesbasedon
softwarefromcompanieslikeOracleandIBM.Butforadatawarehousetoeffectivelyprocess
information,allthatinformationhastofitnicelyintoapredefinedsetoftables.Theproblemis
thatthesedays,lotsofthedatayouwanttostoreiswhatwetendtocallunstructureddata,or
semistructureddata.Sarahcangiveussomeexamples.

Byunstructured,wemeanthedataarrivesinlotsof
differentformats.Forexample,abankmighthavea
listofyourcreditcardandaccounttransactions,but
theymayalsohavescansofyourchecks,recordsof
yourinteractionswithcustomerservice
representativesontheWebandoverthephone,
perhapsevenrecordingsofthosephonecalls.Allof
thatdataisinavarietyofdifferentformats,anditcan
behardtostoreandreconcileitallusingtraditional
systems.

Andthisalsotiesbacktovolume.Youwanttostorethatdatainitsoriginalformatsoyourenot
throwinganyinformationaway.Thatwayyoucanthenprocessthedatalaterindifferentways
youmightnotevenhavethoughtoforiginally.

Forinstance,ifwejusttranscribecallcenter
conversationsintotext,wehavewhatpeoplesaidto
ourcustomerservicerepresentatives.Butifwehave
theactualrecordings,thenlateronwemightdevelop
softwarewhichcaninterpretthetoneofvoicethe
customerusesandthatmightleadtoavery
differentinterpretationofthedata.AndthenicethingaboutHadoopisthatitdoesntcarewhat
formatthedatacomesin.Unlikeatraditionaldatabase,youcanjuststorethedatainitsraw
format,andmanipulateandreformatitlater.

Quiz: Data Variety

Sometimesthemostunlikelydatacanbeextremelyusefulandleadtosavingsduetobetter
planning.Forexample,aconventionalsystemforcoordinatinglogisticssystemmightsendthe
closesttrucktothewarehousetopickupthepackage.However,itmightbethattheclosest
truckisnotthebestsolutionperhapstherearetrafficjams,orthemostdirectrouteisonsmall
roadsthatwouldtakelongertodrive.Maybethetruckdoesnthaveenoughfreespaceforthe
newload.Sowhatkindofdatawouldbehelpfulinmakingabetterplanthatcouldsavemoney
andtimeforthecompany?

[]CurrentGPSlocationfromalltrucks
[]Currentitinerariesforalltrucks
[]Currenttrafficspeedinrelatedareasasreportedby
servicessuchasWaze
[]Currentloadoftrucksbyvolumeandweight
[]Fuelefficiencyofthedifferentvehicles

Answer:
Andagainalloftheseanswersarecorrect.Youcansavealotofmoney,andtime,bymaking
betterdecisions,drivenbymorevarieddata.Theworldweliveinisextremelycomplex,and
therearealotofvariablestoconsiderthatyoucantweaktogetlargebenefits.

Velocity

Velocity,thethirdV,isaboutthespeedatwhichthedataarrives,readytobeprocessed.We
needtobeabletoacceptandstorethatdataevenwhenitscominginatarateofterabytesor
moreaday,whichisoftenthecase.Ifwecantstoreitasitarrives,wellendupdiscarding
someofit,andthatswhatweabsolutelywanttoavoid.

What problems can we solve?

ThinkaboutanecommerceWebsite.Ifweknowwhatproductsyouvelookedatinthepast,we
couldrecommendsimilarproductsthenexttimeyouvisitoursite.Ifyouspentfiveminutes
lookingataparticularitem,wecouldmaybesendyouanemailinformingyouwhenthatitemis
onsale.IfweknowthatyoutypicallybrowseoursiteusingafirstgenerationiPad,wecould
suggestthelatestmodel.

Thisisahugedifferencetowhatwewoulddobefore,whenweonlystoredrecordsofactual
purchases.IfwecanstoreandprocessallofourWebserverlogfiles,alongwiththepurchase
datathatsinourtraditionaldatawarehouse,wecangivethecustomeramuchbettershopping
experiencewhichshoulddirectlytranslateintobiggerprofits.

YetanotherexampleisamoviesitelikeNetflix.Basedonwhat
theyknowaboutyourviewinghabits,theycanrecommend
moviestoyouasyoucanseehere,becauseofwhatIans
ratedhighlybefore,themovieontheleftisrecommendedfor
himandtheycanevenpredictwhatratinghellgivethe
movie.

History of solving data problems

Sothereareplentyofthingswecandowithbigdata.Butfirstwehavetosolveacoupleof
problems.Weneedtobeabletostorethedatainacosteffectiveway,andweneedtobeable
toprocessitefficiently.Anditturnsoutthatthesearenoteasyproblemstosolvewhenwere
talkingaboutmassiveamountsofdata.Fortunately,though,someextremelysmartpeopleat
Googlewereworkingontheminthelate1990sandreleasedtheresultsoftheirworkas
researchpapersin2003and2004.LetsseewhatDougCutting,oneofthefoundersofHadoop,
hastosay.
Copyright2014Udacity,Inc.AllRightsReserved.

DOUG CUTTING about History of Hadoop:

So,letmetellyouhowHadoopcametobe.Abouttenyearsagoinaround
2003,IwasworkingonanOpenSourcewebsearchenginecalledNutch,and
weknewitneededtobesomethingveryscalable,becausetheWebwasyou
know,billionsofpages.terabytes,petabytes,ofdata,thatweneededtobeable
toprocess,andwesetaboutdoingthebestjobwecouldanditwastough.We
gotthingsupandrunningonfourorfivemachines,notverywell,andaround
thattimeGooglepublishedsomepapersabouthowtheyweredoingthingsinternally.
Publishedapaperabouttheirdistributedfilesystem,TFS.andabouttheirprocessing,
framework,MapReduce.SomypartnerandI,atthetime,inthisproject,MikeCafarella.
saidabouttryingtoreimplementtheseinOpenSource.Sothatmorepeoplecoulduse
themthanjustfolksatGoogle.Tookusacoupleofyears,andwehadNutchupand
runningon,insteadoffourorfivemachines,on,20to40machines.Itwasn'tperfect,it
wasn'ttotallyreliable,butitworked.Andwerealizethattogetittothepointwhereitwas
scaledtothousandsofmachines,andbeasbulletproofasitneededtobe,wouldtake
morethanjustthetwoofus,workingparttime.

Aroundthattime,Yahooapproachedmeandsaidtheywereinterestedininvestingin
this.SoIwenttoworkforYahooinJanuaryof2006.FirstthingIdidthere,was,wetook
thepartsofNutchthatwereadistributedcomputingplatform,andputthemintoa
separateproject.AnewprojectchristenedHadoop.Overthenextcoupleyears,with,
Yahoo'shelp,andthehelpofothers,wetookHadoop,andreallygotittothepointwhere
itdidscaletopetabytes,andrunningonthousandsofprocessors.Anddoingsoquite
reliably.

Itspreadtolotsofcompanies,andmostlyintheInternetsector,andbecamequitea
success.afterthat,we,westartedtoseeabunchofotherprojectsgrowuparoundit.
AndHadoop'sgrowntobethekernelofa,which,prettymuchanoperatingsystemforbig
data.We'vegottoolsthat,allowyouto,moreeasilydo,MapReduceprogramming,so,
youcandevelopusingSQLoradataflowlanguagecalledPig.And
we'vealsogotthebeginningsofhigherleveltools.We'vegotinteractiveSQLwith
Impala.We'vegotSearch.andsowe'rereallyseeingthisdeveloptobeingageneral
purposeplatformfordataprocessing.thatscale'smuchbetterandthatitismuchmore
flexiblethananythingthat's,that's,elseisoutthere.

ThatsthestoryofthegenesisofHadoop:itsbasedonworkdonebythefolksatGoogle,andits
grownfromsmallbeginningstothepointnowwherehundredsofpeoplecontributetothe
project,andwhereitsbeingusedbythousandsandthousandsofcompaniesworldwide.The
Copyright2014Udacity,Inc.AllRightsReserved.

Hadooplogoisactuallyalittleyellowelephant,butdoyouknowwherethenamecamefrom?
Theresafunnystoryattachedtothat.HeresDougagain.

DOUG about Name of Hadoop

SothenameHadoopcomesfrommyson'stoyelephant.Whenhewasabout
two,afriendgavehimalittlestuffedelephantwhichheplayedwith
incessantly.Andweoverheardhimcallingitsomething,thisstrangewordthat
heinvented,andsaidHadoop.SoIimmediatelywroteitdownbecauseIwas
inthesoftwarebusiness.Andwe'realwayslookingforgoodnames.Andthis
onecamewithamascot,even.AndafewyearslaterwhenIneededaproject
name,pulleditout.Now,IwroteitdownasHADOOP.Andfiguredthateveryone
wouldsayHadoop.NowitturnsouteveryonesaysHadoopinstead,butIpersistinsaying
Hadoop.Nowmyson,ofcourse,is13,andexpectsroyaltiesforthename.Hehewants
morecredit.Healsoaccusesmeofstealingthetoy.Atsomepoint,hewasusingitin
somekindofrocketshipexperiment,andIhadtorescueit.Andnowit,itlivesinmysock
drawerfor,forsafety.

Hadoop Cluster
ThecoreHadoopprojectconsistsofaway
tostoredata,knownastheHadoop
DistributedFileSystem,orHDFS,anda
waytoprocessthedata,called
MapReduce.Thekeyconceptisthatwe
splitthethedataupandstoreitacrossa
collectionofmachines,knownasacluster.
Then,whenwewanttoprocessthedata,
weprocessitwhereitsactuallystored.
Ratherthanretrievingthedatafroma
centralserver,insteaditsalreadyonthe
cluster,andwecanprocessitinplace.Youcanaddmoremachinestothecluster(makethe
clusterbigger)astheamountofdatayourestoringgrowsand,indeed,manypeoplestartwith
justafewmachinesandaddmoreastheyreneeded.Themachinesintheclusterdontneedto
beparticularlyhighendalthoughmostclustersarebuiltusingrackmountservers,theyare
typicallymidrangeserversratherthantopoftherangeequipment.

Hadoop Ecosystem

CoreHadoopconsistsofHDFSandMapReduce.

Butsincetheprojectwasfirststarted,anawfullotofothersoftwarehasgrownuparoundit.And
thatswhatwecalltheHadoopEcosystem.Someofthesoftwareisintendedtomakeiteasyto
loaddataintotheHadoopcluster,whilelotsofitisdesignedtomakeHadoopeasiertouse.For
example,asyoullseeinthenextlesson,writingMapReducecodeisntcompletelysimple.You
needtoknowaprogramminglanguagelikeJava,orPython,orRuby,orPerl.Buttherearelots
offolksouttherewhoarentprogrammersbutwhocanwriteSQLqueriestoaccessdataina
traditionalrelationaldatabaselikeSQLServer.Andofcoursealotofbusinessintelligencetools
alsowanttohookintoHadoop.

Forthatreason,otheropensourceprojectshavebeen
createdtomakeiteasierforpeopletoquerytheirdata
withoutknowinghowtocode.TwokeyonesareHiveand
Pig.InsteadofhavingtowriteMappersandReducers,in
Hiveyoujustwritestatements,whichlookverymuchlike
standardSQL.TheHiveinterpreterturnsthatSQLinto
MapReducecode,whichitthenrunsonthecluster.Andan
alternativeisPig,whichallowsyoutowritecodetoanalyse
yourdatainafairlysimplescriptinglanguageratherthanMapReduceagain,thecodeisturned
intoactualJavaMapReduceandrunonthecluster.

HiveandPigaregreat,buttheyrestillrunningMapReducejobs,whichmeantheywilltakea
reasonableamountoftime,especiallywhenrunningonreallylargeamountsofdata.Soanother
opensourceprojectcalledImpalawasdevelopedwhichagainallowsyoutoqueryyourdata
usingSQLbutwhichdirectlyaccessesthatdata,ratherthanaccessingitviaMapReduce.
Impalaisoptimizedforlowlatencyqueriesinotherwords,Impalaqueriesrunveryquickly,
typicallymanytimesfasterthanHivequerieswhileHiveisoptimizedforlongrunningbatch
processingjobs.

Anotherprojectusedbymanypeopleis
Sqoop.Thattakesdatafromatraditional
relationaldatabaseserversuchas
MicrosoftSQLServerandputsitinHDFS
asdelimitedfilessoitcanbeprocessed
alongwiththeotherdataonthecluster.
ThentheresFlume,whichingestsdataas
itsgeneratedbyexternalsystems.HBase
isarealtimedatabasebuiltontopofHDFS.Hueisagraphicalfrontendtothecluster.Oozieis
aworkflowmanagementtool.Mahoutisamachinelearninglibrary

Infact,therearesomanydifferentecosystemprojectsthatmakingthemalltalktoeachother,
andworkwellwitheachother,canbetricky.Tomakeinstallingandmaintainingaclustereasier,
Cloudera,thecompanyweworkfor,hasputtogetheradistributionofHadoopcalledCDH.This
takesallthekeyecosystemprojects,alongwithHadoopitself,andpackagesthemtogetherso
thatinstallationisareallysimpleprocess.Andthecomponentsarealltestedtogether,soyou
canbesurethattherearenoincompatibilitiesbetweenthem.Ofcourseitscompletelyfreeand
opensource,justlikeHadoopitself.Youcouldinstalleverythingfromscratchyourself,butitsfar
easiertouseCDH,andthatscertainlywhatwedrecommend.Inthenextlesson,infact,youll
bedownloadingandrunningavirtualmachinewhichhasCDHinstalled.

Conclusion
Sointhislessonyoulearnedwhatbigdatais,andhowHadoopcanhelpwithbigdata
problems.Inthenextlesson,welltakeadeeperlookatthetwokeypartsofHadoop:thats
HDFS,theHadoopDistributedFileSystem,andMapReduce,thewayyoucanprocessthat
data.

Hadoop Interview Question
No ratings yet
Hadoop Interview Question
25 pages
Hadoop Training #4: Programming With Hadoop
100% (2)
Hadoop Training #4: Programming With Hadoop
46 pages
Class: CS 237 Distributed Systems Middleware Instructor: Nalini Venkatasubramanian
No ratings yet
Class: CS 237 Distributed Systems Middleware Instructor: Nalini Venkatasubramanian
55 pages
Data Visulization and Power Bi Lab Manual
No ratings yet
Data Visulization and Power Bi Lab Manual
42 pages
Hadoop & Kognitio Commands Guide
No ratings yet
Hadoop & Kognitio Commands Guide
1 page
Mongo DB
No ratings yet
Mongo DB
31 pages
Hadoop
No ratings yet
Hadoop
34 pages
Hadoop Interview Questions Guide
100% (1)
Hadoop Interview Questions Guide
34 pages
Data Stream Processing Insights
No ratings yet
Data Stream Processing Insights
67 pages
AaxHadoop Interview Questions and Answers
No ratings yet
AaxHadoop Interview Questions and Answers
37 pages
DBT - Commands
No ratings yet
DBT - Commands
2 pages
HBase: Data Management & Architecture
No ratings yet
HBase: Data Management & Architecture
36 pages
Lecture Notes Hadoop
100% (1)
Lecture Notes Hadoop
11 pages
Big Data Analytics
No ratings yet
Big Data Analytics
134 pages
Spark
No ratings yet
Spark
160 pages
Mining Data Streams (Part 2)
No ratings yet
Mining Data Streams (Part 2)
56 pages
Hadoop and Java Ques - Ans
No ratings yet
Hadoop and Java Ques - Ans
222 pages
Big Data Hadoop Certification Training: About Intellipaat
No ratings yet
Big Data Hadoop Certification Training: About Intellipaat
13 pages
Big Data With Apache Spark 3 and Python From Zero To Expert
No ratings yet
Big Data With Apache Spark 3 and Python From Zero To Expert
28 pages
Hadoop and Mapreduce
No ratings yet
Hadoop and Mapreduce
21 pages
Hadoop Questions
No ratings yet
Hadoop Questions
41 pages
Hadoop Interview Questions New
No ratings yet
Hadoop Interview Questions New
9 pages
Pig Hive
No ratings yet
Pig Hive
72 pages
Data Warehousing&Data Mining
No ratings yet
Data Warehousing&Data Mining
170 pages
CB Queryoptimization 01
No ratings yet
CB Queryoptimization 01
78 pages
Informatica Power Center Best Practices
No ratings yet
Informatica Power Center Best Practices
8 pages
Nosql Database Systems: M.Tech. (Iind, Sem Ce/Cn)
100% (1)
Nosql Database Systems: M.Tech. (Iind, Sem Ce/Cn)
135 pages
Hive Join
No ratings yet
Hive Join
6 pages
Notes Hadoop
No ratings yet
Notes Hadoop
19 pages
Google Cloud Core Infrastructure Guide
No ratings yet
Google Cloud Core Infrastructure Guide
69 pages
Homework Labs Lecture01
No ratings yet
Homework Labs Lecture01
9 pages
Hands-On Hadoop Tutorial
100% (1)
Hands-On Hadoop Tutorial
13 pages
Making Big Data Simple With Databricks
No ratings yet
Making Big Data Simple With Databricks
25 pages
NoSQL Intro
No ratings yet
NoSQL Intro
26 pages
HDFS Commands
No ratings yet
HDFS Commands
15 pages
MapReduce Example
No ratings yet
MapReduce Example
76 pages
Big Data Syllabus For Theory and Lab
No ratings yet
Big Data Syllabus For Theory and Lab
4 pages
Hadoop Pig Presentation
No ratings yet
Hadoop Pig Presentation
33 pages
Data Modeling Techniques & Types
No ratings yet
Data Modeling Techniques & Types
2 pages
Hive Queries
No ratings yet
Hive Queries
5 pages
ClickHouse Grokking
No ratings yet
ClickHouse Grokking
18 pages
Dcap603 Dataware Housing and Datamining PDF
No ratings yet
Dcap603 Dataware Housing and Datamining PDF
281 pages
Big Data Hadoop Architect
No ratings yet
Big Data Hadoop Architect
19 pages
HBase
No ratings yet
HBase
31 pages
Dokumen - Pub - Understanding Etl Data Pipelines For Modern Data Architectures Early Release 9781098159252
No ratings yet
Dokumen - Pub - Understanding Etl Data Pipelines For Modern Data Architectures Early Release 9781098159252
39 pages
Map Reduce
No ratings yet
Map Reduce
40 pages
Hands On
No ratings yet
Hands On
26 pages
Database Systems Introduction
No ratings yet
Database Systems Introduction
35 pages
Hortonworks Data Platform (HDP)
100% (1)
Hortonworks Data Platform (HDP)
56 pages
Spark
No ratings yet
Spark
13 pages
Hive Cheat Sheet - Quick Reference
No ratings yet
Hive Cheat Sheet - Quick Reference
19 pages
Unit 1 J2 Big Data
No ratings yet
Unit 1 J2 Big Data
6 pages
Introduction to Hadoop & Big Data
No ratings yet
Introduction to Hadoop & Big Data
111 pages
Summary and Highlights - Introduction To Hadoop - Coursera
No ratings yet
Summary and Highlights - Introduction To Hadoop - Coursera
1 page
BDH (1 5) ChatGPT
No ratings yet
BDH (1 5) ChatGPT
26 pages
Hadoop 1
No ratings yet
Hadoop 1
109 pages
Hadoop Online Training
No ratings yet
Hadoop Online Training
7 pages
Notes For Big Data
No ratings yet
Notes For Big Data
21 pages
Mca Big Data PDF Sem 3
No ratings yet
Mca Big Data PDF Sem 3
193 pages
Hadoop Tutorial
No ratings yet
Hadoop Tutorial
58 pages
Icici Prudential nv20 Etf
No ratings yet
Icici Prudential nv20 Etf
89 pages
Sid of Icici Prudential Esg Fund
No ratings yet
Sid of Icici Prudential Esg Fund
172 pages
Print
No ratings yet
Print
25 pages
Autotrader
No ratings yet
Autotrader
50 pages
Formal Systems Tutorials Overview
No ratings yet
Formal Systems Tutorials Overview
1 page
Resources Handout 07KMM
No ratings yet
Resources Handout 07KMM
3 pages
Avl Trees
No ratings yet
Avl Trees
12 pages
GFFGHF
No ratings yet
GFFGHF
20 pages
Resources Handout 07KMM
No ratings yet
Resources Handout 07KMM
3 pages
Server Guide - HTML
No ratings yet
Server Guide - HTML
1 page
AirWave Appliance Installation Guide PDF
No ratings yet
AirWave Appliance Installation Guide PDF
2 pages
SAP GUI Installation for Students
No ratings yet
SAP GUI Installation for Students
3 pages
SAP ChaRM User Manual V - 2.1
No ratings yet
SAP ChaRM User Manual V - 2.1
34 pages
Operating System and Development Stages of The Windows Operating System
No ratings yet
Operating System and Development Stages of The Windows Operating System
3 pages
WPF Features Data Grid, Ribbon - VSM
No ratings yet
WPF Features Data Grid, Ribbon - VSM
50 pages
Salesforce Personalization Quick Start Gilead
No ratings yet
Salesforce Personalization Quick Start Gilead
2 pages
Tamil Share Market Guide
No ratings yet
Tamil Share Market Guide
2 pages
Errolog Financeiro
No ratings yet
Errolog Financeiro
4 pages
Hands-On Lab 22 Getting Started With Cognos Dashboard Embedded
No ratings yet
Hands-On Lab 22 Getting Started With Cognos Dashboard Embedded
19 pages
Firmware Ver Up Manual ENG FTDX101MP D 2104-D
No ratings yet
Firmware Ver Up Manual ENG FTDX101MP D 2104-D
4 pages
IP Multimedia Subsystem PDF
No ratings yet
IP Multimedia Subsystem PDF
15 pages
6 Steps - Mobile - App Development - Pintiii
No ratings yet
6 Steps - Mobile - App Development - Pintiii
5 pages
Thomas Calculus 9th Edition Solution Manual PDF
25% (8)
Thomas Calculus 9th Edition Solution Manual PDF
2 pages
KCR Dvp-sr520p OpManual PDF
No ratings yet
KCR Dvp-sr520p OpManual PDF
12 pages
Xiaomi Device Debug Log
No ratings yet
Xiaomi Device Debug Log
24 pages
Electronic Commerce: Architectural Framework
No ratings yet
Electronic Commerce: Architectural Framework
9 pages
Viral Sexy Video29
No ratings yet
Viral Sexy Video29
5 pages
IT Test
No ratings yet
IT Test
13 pages
Scratch Module 1 (ENG) PDF
No ratings yet
Scratch Module 1 (ENG) PDF
20 pages
PAYU+Comparison+Table New02092021
No ratings yet
PAYU+Comparison+Table New02092021
2 pages
PXE Everywhere Installation Guide
No ratings yet
PXE Everywhere Installation Guide
22 pages
View Private Instagram Accounts
0% (2)
View Private Instagram Accounts
13 pages
Foxit Phantom 2.2.4 Bug Fixes
No ratings yet
Foxit Phantom 2.2.4 Bug Fixes
2 pages
Gregory Paul - Programmer
No ratings yet
Gregory Paul - Programmer
1 page
Home Theater System: PHT-300X
No ratings yet
Home Theater System: PHT-300X
44 pages
Flexbox Guide for Web Developers
No ratings yet
Flexbox Guide for Web Developers
1 page
Geographic Map Shapes For Microsoft Visio
No ratings yet
Geographic Map Shapes For Microsoft Visio
3 pages
How To Compress and Email Large PDF Files
No ratings yet
How To Compress and Email Large PDF Files
3 pages
Zkteco SDK
100% (1)
Zkteco SDK
14 pages
Maheen Khizar: Lahore, Pakistan
No ratings yet
Maheen Khizar: Lahore, Pakistan
1 page

Hadoop Notes

Uploaded by

Hadoop Notes

Uploaded by

Intro to Hadoop and MapReduce

Quiz: What is a Big Data problem?

Definition of Big Data

The 3 Vs of Big Data:

Quiz: Data Variety

What problems can we solve?

History of solving data problems

DOUG CUTTING about History of Hadoop:

DOUG about Name of Hadoop

You might also like