Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Section 1: Introduction to Hadoop
- Hadoop history and core concepts
- Ecosystem overview
- Distributions
- High-level architecture
- Common Hadoop myths
- Hadoop challenges
- Hardware and software requirements
- Lab: First look at Hadoop
Section 2: HDFS
- Design and architecture
- Key concepts (horizontal scaling, replication, data locality, rack awareness)
- Daemons: NameNode, Secondary NameNode, DataNode
- Communications and heartbeats
- Data integrity
- Read and write paths
- NameNode High Availability (HA) and Federation
- Lab: Interacting with HDFS
Section 3: MapReduce
- Core concepts and architecture
- Daemons (MRv1): JobTracker and TaskTracker
- Phases: Driver, Mapper, Shuffle/Sort, Reducer
- MapReduce Version 1 and Version 2 (YARN)
- Internal workings of MapReduce
- Introduction to Java MapReduce programming
- Lab: Running a sample MapReduce program
Section 4: Pig
- Pig versus Java MapReduce
- Pig job flow
- Pig Latin language
- ETL with Pig
- Transformations and joins
- User-defined functions (UDFs)
- Lab: Writing Pig scripts to analyse data
Section 5: Hive
- Architecture and design
- Data types
- SQL support in Hive
- Creating and querying Hive tables
- Partitions
- Joins
- Text processing
- Lab: Various labs on processing data with Hive
Section 6: HBase
- Core concepts and architecture
- HBase versus RDBMS versus Cassandra
- HBase Java API
- Time-series data in HBase
- Schema design
- Lab: Interacting with HBase using the shell; programming with the HBase Java API; schema design exercise
Requirements
- Proficiency in the Java programming language (most programming exercises are in Java)
- Comfortable working in a Linux environment (able to navigate the Linux command line and edit files using vi or nano)
Lab environment
Zero Install: There is no need to install Hadoop software on students' machines! A fully functional Hadoop cluster will be provided for students.
Students will require the following:
- An SSH client (Linux and Mac already include SSH clients; for Windows, PuTTY is recommended)
- A web browser to access the cluster (Firefox is recommended)
28 Hours
Testimonials (1)
Hands on exercises. Class should have been 5 days, but the 3 days helped to clear up a lot of questions that I had from working with NiFi already