Course Outline:
Introduction to Big Data
and Hadoop
· What is Data, Data Storage and Data Analysis.
· Challenges in processing Big Data?
· Moore’s Law
· Current Computing techniques for Processing Big Data and
drawbacks
· A brief History of Hadoop
· Advantages using Hadoop.
· Hadoop ecosystem
· Overview of HDFS (Hadoop Distributed File System )
· Map Reduce ( Programming Model )
Hadoop Distributed File
System (HDFS)
· Distributed File System basics
· HDFS Concepts
· Name Nodes and Data Nodes
· Configuring HDFS
· Data Flow (File Read and File Write)
· Parallel Copying with distcp
· Hadoop Archives
· HDFS Permissions and Security
· Health of File System ( Fsck command )
· Rack Awareness
· 5 Daemon Process in Hadoop System
Map Reduce
· Map Reduce Basics
· Word Count Example
· Word Count Flow and Solution
· Map Reduce Data Flow
· Algorithms for simple and complex problems
· Hadoop Streaming
Developing a Map Reduce Application
· Setting
up Development Environment
· Custom Data Types (Writable and Custom Key Types)
· Input
and Output File Formats
· Explain Driver, Mapper and Reducer code
· configuring development environment – Eclipse
· Writing Unit Test
· running locally
· Map Reduce Web UI
· Hands on exercises
How Map Reduce Works
· Anatomy of Map Reduce Job run
· Classic Map Reduce (Map Reduce 1)
· YARN (Map Reduce 2)
· Job Scheduling
· Shuffle and Sort
· Failures
· Oozie Workflows
· Hands-on Exercises
Map Reduce Types and Formats
· Map Reduce Types
· Input Formats-Input splits & records, text input,
binary input,
multiple
inputs & database input
· Output Formats- text Output, binary Output, multiple
Outputs,
lazy
Output and database Output
· Hands-on Exercises
Map Reduce Features
· Counters
· Sorting
· Joins- Map Side and Reduce Side
· Side Data Distribution
· Map Reduce Combiner
· Map Reduce Partitioner
· Map Reduce Distributed Cache
· Hands-on Exercises
Administering Hadoop
PIG
· Overview of PIG
· Installing and Running PIG
· PIG Latin
· Loading and Storing Data
· Hands-on Exercises
HIVE
· Overview of HIVE
· Installing and Running Hive
· HiveQL
· Tables
· Hands-on Exercises
HBASE
· Overview of HBase
· Installation
· Clients (Avro , REST,Thrift)
· Hands-on Exercises
SQOOP
· Overview of Sqoop
Case Studies
· Last.fm
· FaceBook
· Nutch Search Engine
· RackSpace Log Processing
No comments:
Post a Comment