Get in Touch

Course Outline

Introduction to the Stratio Platform

  • Overview of Stratio architecture and core modules.
  • The role of Rocket and Intelligence in the data lifecycle.
  • Logging in and navigating the Stratio user interface.

Working with the Rocket Module

  • Data ingestion and pipeline creation.
  • Connecting data sources and configuring transformations.
  • Using PySpark for preprocessing tasks in Rocket.

PySpark Essentials for Stratio Users

  • PySpark data structures and operations.
  • Looping constructs: utilisation of for, while, and if/else.
  • Writing custom functions with 'def' and applying them.

Advanced Usage of Rocket with PySpark

  • Streaming ingestion and transformations.
  • Using loops and functions in batch and real-time scenarios.
  • Best practices for performance in PySpark pipelines.

Exploring the Intelligence Module

  • Overview of data modelling and analysis features.
  • Feature selection, transformation, and exploration.
  • The role of PySpark in custom analytics and insights.

Building Advanced Analytics Workflows

  • Creating user-defined functions (UDFs) in Intelligence.
  • Applying conditionals and loops for data logic.
  • Use cases: segmentation, aggregation, and prediction.

Deployment and Collaboration

  • Saving, exporting, and reusing workflows.
  • Collaborating with other team members on Stratio.
  • Reviewing output and integrating with downstream tools.

Summary and Next Steps

Requirements

  • Experience with Python programming.
  • Understanding of data analytics or big data processing concepts.
  • Foundational knowledge of Apache Spark and distributed computing.

Audience

  • Data engineers working on Stratio-based platforms.
  • Analysts or developers using Rocket and Intelligence modules.
  • Technical teams transitioning to PySpark workflows within Stratio.
 14 Hours

Testimonials (2)

Related Categories