Module 1
Hadoop Architecture
What is Big Data
Hadoop Architecture
Hadoop ecosystem components
Hadoop Storage: HDFS
Hadoop Processing: MapReduce Framework
Hadoop Server Roles: NameNode, Secondary NameNode and DataNode,
Anatomy of File Write and Read.
Module 2
Hadoop Cluster Configuration and Data Loading
Hadoop Cluster Architecture
Hadoop Cluster Configuration files
Hadoop Cluster Modes
Multi-Node Hadoop Cluster
A Typical Production Hadoop Cluster
MapReduce Job execution
Common Hadoop Shell commands
Data Loading Techniques: FLUME, SQOOP, Hadoop Copy Commands
Module 3
Hadoop MapReduce framework
Hadoop Data Types
Hadoop MapReduce paradigm
Map and Reduce tasks
MapReduce Execution Framework
Partitioners and Combiners
Input Formats (Input Splits and Records, Text Input, Binary Input, Multiple Inputs)
Output Formats (TextOutput, BinaryOutPut, Multiple Output)
Module 4
Advance MapReduce
Counters
Custom Writables
Unit Testing: JUnit and MRUnit testing framework
Error Handling
Tuning
Advance MapReduce
Module 5
Pig and Pig Latin
Installing and Running Pig
Grunt
Pig's Data Model
Pig Latin
Developing & Testing Pig Latin Scripts
Writing Evaluation
Filter
Load & Store Functions
Module 6
Hive and HiveQL
Hive Architecture and Installation
Comparison with Traditional Database
HiveQL: Data Types
Operators and Functions
Hive Tables(Managed Tables and External Tables, Partitions and Buckets, Storage Formats, Importing Data, Altering Tables, Dropping Tables)
Querying Data (Sorting And Aggregating, Map Reduce Scripts, Joins &Subqueries, Views, Map and Reduce side Joins to optimize Query).
Module 7
Advance Hive, NoSQL Databases and HBase
Hive: Data manipulation with Hive
User Defined Functions
Appending Data into existing Hive Table
Custom Map/Reduce in Hive
Hadoop Project: Hive Scripting
HBase: Introduction to HBase
Client API's and their features
Available Client
HBase Architecture
MapReduce Integration.
Module 8
Advance HBase and ZooKeeper
HBase: Advanced Usage
Schema Design
Advance Indexing
Coprocessors
Module 9
Hadoop 2.0, MRv2 and YARN
Schedulers:Fair and Capacity
Hadoop 2.0 New Features: NameNode High Availability
HDFS Federation
MRv2
YARN
Running MRv1 in YARN
Upgrade your existing MRv1 code to MRv2
Programming in YARN framework. |