Apache Spark in the Cloud Training Course

The initial learning curve for Apache Spark can be steep, requiring considerable effort before seeing tangible results. This course is designed to help you surmount that early hurdle. Upon completion, participants will grasp the fundamentals of Apache Spark, clearly distinguish between RDDs and DataFrames, gain proficiency in both Python and Scala APIs, and comprehend the roles of executors and tasks. Furthermore, adhering to industry best practices, the course places strong emphasis on cloud deployment, with specific focus on Databricks and AWS. Students will also learn to differentiate between AWS EMR and AWS Glue, one of AWS's most recent Spark-related services.

AUDIENCE:

Data Engineers, DevOps Professionals, Data Scientists

This course is available as onsite live training in Botswana or online live training.

Thank you for sending your enquiry! One of our team members will contact you shortly.

Thank you for sending your booking! One of our team members will contact you shortly.

Course Outline

Introduction:

Apache Spark within the Hadoop Ecosystem
Quick overview of Python and Scala

Core Concepts (Theory):

System Architecture
Resilient Distributed Datasets (RDDs)
Transformations and Actions
Stages, Tasks, and Dependencies

Foundational Concepts via Databricks (Hands-on Workshop):

RDD API exercises
Essential action and transformation functions
PairRDDs
Join operations
Caching strategies
DataFrame API exercises
SparkSQL
DataFrame operations: select, filter, group, sort
User-Defined Functions (UDFs)
Introduction to the Dataset API
Streaming

Deployment on AWS Environment (Hands-on Workshop):

AWS Glues basics
Differences between AWS EMR and AWS Glue
Sample jobs on both platforms
Advantages and disadvantages of each

Additional Topics:

Introduction to Apache Airflow for orchestration

Requirements

Programming proficiency (preferably in Python or Scala)

Basic SQL knowledge

21 Hours

Need help picking the right course?
southafrica@nobleprog.co.za or +27 (0)10 005 5793

Testimonials (3)

Having hands on session / assignments

Poornima Chenthamarakshan - Intelligent Medical Objects

Course - Apache Spark in the Cloud

1. Right balance between high level concepts and technical details. 2. Andras is very knowledgeable about his teaching. 3. Exercise

Apache Spark in the Cloud Training Course

Course Outline

Requirements

Testimonials (3)

Poornima Chenthamarakshan - Intelligent Medical Objects

Course - Apache Spark in the Cloud

Steven Wu - Intelligent Medical Objects

Course - Apache Spark in the Cloud

Lim Meng Tee - Jobstreet.com Shared Services Sdn. Bhd.

Course - Apache Spark in the Cloud

Related Categories

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites

Apache Spark in the Cloud Training Course

Course Outline

Requirements

Testimonials (3)

Poornima Chenthamarakshan - Intelligent Medical Objects

Course - Apache Spark in the Cloud

Steven Wu - Intelligent Medical Objects

Course - Apache Spark in the Cloud

Lim Meng Tee - Jobstreet.com Shared Services Sdn. Bhd.

Course - Apache Spark in the Cloud

Related Courses

Big Data Analytics with Google Colab and Apache Spark

PySpark and Machine Learning

Apache Spark Fundamentals

Administration of Apache Spark

Python and Spark for Big Data (PySpark)

Python, Spark, and Hadoop for Big Data

Stratio: Rocket and Intelligence Modules with PySpark

Related Categories

Apache Spark

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites