Course Outline
Introduction
- Apache Spark versus Hadoop MapReduce
Overview of Apache Spark Features and Architecture
Choosing a Programming Language
Setting up Apache Spark
Creating a Sample Application
Choosing the Data Set
Running Data Analysis on the Data
Processing Structured Data with Spark SQL
Processing Streaming Data with Spark Streaming
Integrating Apache Spark with Third-Party Machine Learning Tools
Using Apache Spark for Graph Processing
Optimising Apache Spark
Troubleshooting
Summary and Conclusion
Requirements
- Experience with the Linux command line
- A general understanding of data processing
- Programming experience in Java, Scala, Python, or R
Audience
- Developers
Testimonials (3)
I liked that it was practical. Loved to apply the theoretical knowledge with practical examples.
Aurelia-Adriana - Allianz Services Romania
Course - Python and Spark for Big Data (PySpark)
The fact that we were able to take with us most of the information/course/presentation/exercises done, so that we can look over them and perhaps redo what we didint understand first time or improve what we already did.
Raul Mihail Rat - Accenture Industrial SS
Course - Python, Spark, and Hadoop for Big Data
very interactive...