hadoop architecture -Big data hadoop

© 2011 IBM Corporation Hadoop distributed file system (HDFS) 13 • Hadoop file system that runs on top of existing file system • Designed to handle very large files with streaming data access patterns • Uses blocks to store a file or parts of a file

© 2011 IBM Corporation HDFS - Blocks 14 • File Blocks – 64MB (default), 128MB (recommended) – compare to 4KB in UNIX – Behind the scenes, 1 HDFS block is supported by multiple operating system (OS) blocks 128 MB OS Blocks HDFS Block . . .

© 2011 IBM Corporation HDFS - Blocks 15 • Fits well with replication to provide fault tolerance and availability • Advantages of blocks: – Fixed size – easy to calculate how many fit on a disk – A file can be larger than any single disk in the network – If a file or a chunk of the file is smaller than the block size, only needed space is used. Eg: 420MB file is split as: 128 MB 128 MB128 MB 36 MB

© 2011 IBM Corporation HDFS Command line interface 29 • FS shell commands take paths URIs as argument • URI format: scheme://authority/path • Scheme: • For the local filesystem, the scheme is file • For HDFS, the scheme is hdfs hadoop fs –copyFromLocal file://myfile.txt hdfs://localhost/user/keith/myfile.txt • Scheme and authority are optional • Defaults are taken from configuration file core-site.xml

© 2011 IBM Corporation HDFS Command line interface 30 • Many POSIX-like commands • cat, chgrp, chmod, chown, cp, du, ls, mkdir, mv, rm, stat, tail • Some HDFS-specific commands • copyFromLocal, copyToLocal, get, getmerge, put, setrep

© 2011 IBM Corporation HDFS – Specific commands 31 • copyFromLocal / put • Copy files from the local file system into fs hadoop fs -copyFromLocal <localsrc> .. <dst> hadoop fs -put <localsrc> .. <dst> Or

© 2011 IBM Corporation HDFS – Specific commands 32 • copyToLocal / get • Copy files from fs into the local file system hadoop fs -copyToLocal [-ignorecrc] [-crc] <src> <localdst> hadoop fs -get [-ignorecrc] [-crc] <src> <localdst> Or

© 2011 IBM Corporation HDFS – Specific commands 33 • getMerge • Get all the files in the directories that match the source file pattern • Merge and sort them to only one file on local fs • <src> is kept hadoop fs -getmerge <src> <localdst>

© 2011 IBM Corporation HDFS – Specific commands 34 • setRep • Set the replication level of a file. • The -R flag requests a recursive change of replication level for an entire tree. • If -w is specified, waits until new replication level is achieved. hadoop fs -setrep [-R] [-w] <rep> <path/file>

© 2011 IBM Corporation Types of nodes - NameNode 43 • NameNode – Only one per Hadoop cluster – Manages the filesystem namespace and metadata – Single point of failure, but mitigated by writing state to multiple filesystems – Single point of failure: Don’t use inexpensive commodity hardware for this node, large memory requirements

© 2011 IBM Corporation Types of nodes - DataNode 44 • DataNode – Many per Hadoop cluster – Manages blocks with data and serves them to clients – Periodically reports to name node the list of blocks it stores – Use inexpensive commodity hardware for this node

© 2011 IBM Corporation Bandwidth becomes progressively smaller in the following scenarios: 1.Process on the same node 2.Different nodes on the same rack 3.Nodes on different racks in the same data center Topology awareness 51

© 2011 IBM Corporation Bandwidth becomes progressively smaller in the following scenarios: 1.Process on the same node 2.Different nodes on the same rack 3.Nodes on different racks in the same data center 4.Nodes in different data centers Topology awareness 52

hadoop architecture -Big data hadoop

More Related Content

What's hot

Similar to hadoop architecture -Big data hadoop

More from jasikadogra

Recently uploaded

hadoop architecture -Big data hadoop