Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Introduction:
- Apache Spark within the Hadoop Ecosystem
- Quick overview of Python and Scala
Core Concepts (Theory):
- System Architecture
- Resilient Distributed Datasets (RDDs)
- Transformations and Actions
- Stages, Tasks, and Dependencies
Foundational Concepts via Databricks (Hands-on Workshop):
- RDD API exercises
- Essential action and transformation functions
- PairRDDs
- Join operations
- Caching strategies
- DataFrame API exercises
- SparkSQL
- DataFrame operations: select, filter, group, sort
- User-Defined Functions (UDFs)
- Introduction to the Dataset API
- Streaming
Deployment on AWS Environment (Hands-on Workshop):
- AWS Glues basics
- Differences between AWS EMR and AWS Glue
- Sample jobs on both platforms
- Advantages and disadvantages of each
Additional Topics:
- Introduction to Apache Airflow for orchestration
Requirements
Programming proficiency (preferably in Python or Scala)
Basic SQL knowledge
21 Hours
Testimonials (3)
Having hands on session / assignments
Poornima Chenthamarakshan - Intelligent Medical Objects
Course - Apache Spark in the Cloud
1. Right balance between high level concepts and technical details. 2. Andras is very knowledgeable about his teaching. 3. Exercise
Steven Wu - Intelligent Medical Objects
Course - Apache Spark in the Cloud
Get to learn spark streaming , databricks and aws redshift