Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Introduction:
- Apache Spark within the Hadoop ecosystem
- Brief overview of Python and Scala
Basics (theory):
- Architecture
- RDDs
- Transformations and Actions
- Stages, Tasks, and Dependencies
Hands-on workshop: Using the Databricks environment to grasp the basics:
- Exercises using the RDD API
- Basic transformation and action functions
- PairRDDs
- Joins
- Caching strategies
- Exercises using the DataFrame API
- Spark SQL
- DataFrame operations: select, filter, group, sort
- UDFs (User Defined Functions)
- Exploring the DataSet API
- Streaming
Hands-on workshop: Using the AWS environment to understand deployment:
- Introduction to AWS Glue
- Understanding the differences between AWS EMR and AWS Glue
- Example jobs on both platforms
- Review of pros and cons
Extra:
- Introduction to Apache Airflow orchestration
Requirements
Programming skills (preferably Python or Scala)
Foundational SQL knowledge
21 Hours
Testimonials (3)
Having hands on session / assignments
Poornima Chenthamarakshan - Intelligent Medical Objects
Course - Apache Spark in the Cloud
1. Right balance between high level concepts and technical details. 2. Andras is very knowledgeable about his teaching. 3. Exercise
Steven Wu - Intelligent Medical Objects
Course - Apache Spark in the Cloud
Get to learn spark streaming , databricks and aws redshift