Get in Touch

Course Outline

Section 1: Introduction to Hadoop

  • Hadoop history and core concepts
  • Ecosystem overview
  • Distributions
  • High-level architecture
  • Common Hadoop myths
  • Hadoop challenges
  • Hardware and software requirements
  • Lab: First look at Hadoop

Section 2: HDFS

  • Design and architecture
  • Key concepts (horizontal scaling, replication, data locality, rack awareness)
  • Daemons: NameNode, Secondary NameNode, DataNode
  • Communications and heartbeats
  • Data integrity
  • Read and write paths
  • NameNode High Availability (HA) and Federation
  • Lab: Interacting with HDFS

Section 3: MapReduce

  • Core concepts and architecture
  • Daemons (MRv1): JobTracker and TaskTracker
  • Phases: Driver, Mapper, Shuffle/Sort, Reducer
  • MapReduce Version 1 and Version 2 (YARN)
  • Internal workings of MapReduce
  • Introduction to Java MapReduce programming
  • Lab: Running a sample MapReduce program

Section 4: Pig

  • Pig versus Java MapReduce
  • Pig job flow
  • Pig Latin language
  • ETL with Pig
  • Transformations and joins
  • User-defined functions (UDFs)
  • Lab: Writing Pig scripts to analyse data

Section 5: Hive

  • Architecture and design
  • Data types
  • SQL support in Hive
  • Creating and querying Hive tables
  • Partitions
  • Joins
  • Text processing
  • Lab: Various labs on processing data with Hive

Section 6: HBase

  • Core concepts and architecture
  • HBase versus RDBMS versus Cassandra
  • HBase Java API
  • Time-series data in HBase
  • Schema design
  • Lab: Interacting with HBase using the shell; programming with the HBase Java API; schema design exercise

Requirements

  • Proficiency in the Java programming language (most programming exercises are in Java)
  • Comfortable working in a Linux environment (able to navigate the Linux command line and edit files using vi or nano)

Lab environment

Zero Install: There is no need to install Hadoop software on students' machines! A fully functional Hadoop cluster will be provided for students.

Students will require the following:

  • An SSH client (Linux and Mac already include SSH clients; for Windows, PuTTY is recommended)
  • A web browser to access the cluster (Firefox is recommended)
 28 Hours

Number of participants


Price per participant

Testimonials (1)

Provisional Upcoming Courses (Require 5+ participants)

Related Categories