Apache Airflow for Data Science: Automating Machine Learning Pipelines Training Course
Apache Airflow is an open-source platform designed for orchestrating workflows and automating complex data pipelines.
This instructor-led live training, available online or onsite, is tailored for intermediate-level participants who want to automate and manage machine learning workflows. This includes model training, validation, and deployment using Apache Airflow.
By the conclusion of this training, participants will be able to:
- Configure Apache Airflow for orchestrating machine learning workflows.
- Automate tasks such as data preprocessing, model training, and validation.
- Integrate Airflow with various machine learning frameworks and tools.
- Deploy machine learning models through automated pipelines.
- Monitor and optimise machine learning workflows in production environments.
Course Format
- Interactive lectures and discussions.
- Numerous exercises and practice sessions.
- Hands-on implementation within a live-lab environment.
Course Customisation Options
- To request customised training for this course, please contact us to make arrangements.
Course Outline
Introduction to Apache Airflow for Machine Learning
- Overview of Apache Airflow and its relevance to data science
- Key features for automating machine learning workflows
- Setting up Airflow for data science projects
Building Machine Learning Pipelines with Airflow
- Designing DAGs for end-to-end ML workflows
- Using operators for data ingestion, preprocessing, and feature engineering
- Scheduling and managing pipeline dependencies
Model Training and Validation
- Automating model training tasks with Airflow
- Integrating Airflow with ML frameworks (e.g., TensorFlow, PyTorch)
- Validating models and storing evaluation metrics
Model Deployment and Monitoring
- Deploying machine learning models using automated pipelines
- Monitoring deployed models with Airflow tasks
- Handling retraining and model updates
Advanced Customization and Integration
- Developing custom operators for ML-specific tasks
- Integrating Airflow with cloud platforms and ML services
- Extending Airflow workflows with plugins and sensors
Optimizing and Scaling ML Pipelines
- Improving workflow performance for large-scale data
- Scaling Airflow deployments with Celery and Kubernetes
- Best practices for production-grade ML workflows
Case Studies and Practical Applications
- Real-world examples of ML automation using Airflow
- Hands-on exercise: Building an end-to-end ML pipeline
- Discussion of challenges and solutions in ML workflow management
Summary and Next Steps
Requirements
- Familiarity with machine learning workflows and concepts
- Basic understanding of Apache Airflow, including DAGs and operators
- Proficiency in Python programming
Audience
- Data scientists
- Machine learning engineers
- AI developers
Need help picking the right course?
southafrica@nobleprog.co.za or +27 (0)10 005 5793
Apache Airflow for Data Science: Automating Machine Learning Pipelines Training Course - Enquiry
Related Courses
AdaBoost Python for Machine Learning
14 HoursThis instructor-led, live training in Botswana (online or onsite) is designed for data scientists and software engineers who wish to utilise AdaBoost to build boosting algorithms for machine learning with Python.
By the end of this training, participants will be able to:
- Set up the necessary development environment to start building machine learning models with AdaBoost.
- Understand the ensemble learning approach and how to implement adaptive boosting.
- Learn how to build AdaBoost models to boost machine learning algorithms in Python.
- Use hyperparameter tuning to increase the accuracy and performance of AdaBoost models.
AlphaFold: AI-Driven Protein Structure Prediction and Interpretation
7 HoursThis instructor-led, live training in Botswana (online or onsite) is aimed at biologists who wish to understand how AlphaFold works and use AlphaFold models as guides in their experimental studies.
By the end of this training, participants will be able to:
- Understand the basic principles of AlphaFold.
- Learn how AlphaFold works.
- Learn how to interpret AlphaFold predictions and results.
Anaconda Ecosystem for Data Scientists
14 HoursThis live training, facilitated by an instructor and available in Botswana (online or at your premises), is designed for data scientists who intend to leverage the Anaconda ecosystem to capture, manage, and deploy packages alongside data analysis workflows within a unified platform.
Upon completing this training, participants will be equipped to:
- Install and set up Anaconda components and libraries.
- Grasp the fundamental concepts, features, and advantages of Anaconda.
- Oversee packages, environments, and channels via Anaconda Navigator.
- Utilise Conda, R, and Python packages for data science and machine learning applications.
- Explore practical use cases and techniques for managing multiple data environments.
Creating Custom Chatbots with Google AutoML
14 HoursThis instructor-led live training in Botswana (online or onsite) is targeted at participants with varying levels of expertise who aim to utilise Google's AutoML platform to construct tailored chatbots for a wide array of applications.
Upon completion of this training, participants will be capable of:
- Grasping the fundamentals of chatbot development.
- Navigating the Google Cloud Platform and accessing AutoML.
- Preparing data for training chatbot models.
- Training and assessing custom chatbot models using AutoML.
- Deploying and integrating chatbots into various platforms and channels.
- Monitoring and optimising chatbot performance over time.
Pattern Recognition
21 HoursThis instructor-led, live training in Botswana (online or onsite) provides an introduction into the field of pattern recognition and machine learning. It touches on practical applications in statistics, computer science, signal processing, computer vision, data mining, and bioinformatics.
By the end of this training, participants will be able to:
- Apply core statistical methods to pattern recognition.
- Use key models like neural networks and kernel methods for data analysis.
- Implement advanced techniques for complex problem-solving.
- Improve prediction accuracy by combining different models.
DataRobot
7 HoursThis instructor-led live training in Botswana (online or on-site) is targeted at data scientists and analysts seeking to automate, evaluate, and manage predictive models leveraging DataRobot's machine learning features.
Upon completion of this training, participants will be capable of:
- Loading datasets into DataRobot to analyse, assess, and perform quality checks on data.
- Constructing and training models to pinpoint key variables and achieve prediction objectives.
- Interpreting models to derive valuable insights that aid in business decision-making.
- Monitoring and managing models to sustain optimal prediction performance.
Edge AI with TensorFlow Lite
14 HoursThis instructor-led, live training in Botswana (online or onsite) is targeted at intermediate-level developers, data scientists, and AI practitioners who wish to leverage TensorFlow Lite for Edge AI applications.
By the end of this training, participants will be able to:
- Understand the fundamentals of TensorFlow Lite and its role in Edge AI.
- Develop and optimize AI models using TensorFlow Lite.
- Deploy TensorFlow Lite models on various edge devices.
- Utilize tools and techniques for model conversion and optimization.
- Implement practical Edge AI applications using TensorFlow Lite.
Google Cloud AutoML
7 HoursThis instructor-led, live training in Botswana (online or onsite) is designed for data scientists, data analysts, and developers who wish to explore AutoML products and features to create and deploy custom ML training models with minimal effort.
Upon completion of this training, participants will be equipped to:
- Explore the AutoML product suite to implement diverse services for various data types.
- Prepare and label datasets to generate custom ML models.
- Train and manage models to ensure the production of accurate and fair machine learning outcomes.
- Utilize trained models to make predictions that align with business objectives and needs.
Kaggle
14 HoursThis instructor-led live training in Botswana (available online or onsite) is designed for data scientists and developers who wish to learn and build their careers in Data Science using Kaggle.
By the end of this training, participants will be able to:
- Learn about data science and machine learning.
- Explore data analytics.
- Learn about Kaggle and how it works.
Kubeflow Essentials: Build, Train & Serve with Kubernetes
14 HoursKubeflow is an open-source platform designed to streamline building, training, and deploying machine learning workloads on Kubernetes.
This instructor-led, live training (online or onsite) is aimed at beginner-level to intermediate-level professionals who wish to build reliable ML workflows using Kubeflow.
Upon completion of this training, attendees will gain the skills to:
- Navigate the Kubeflow ecosystem and core components.
- Build reproducible workflows with Kubeflow Pipelines.
- Run scalable training jobs on Kubernetes.
- Serve machine learning models efficiently using Kubeflow Serving.
Format of the Course
- Guided presentations and collaborative discussions.
- Hands-on labs with real Kubeflow components.
- Practical exercises to build end-to-end ML workflows.
Course Customization Options
- Customized versions of this training can be arranged to align with your team’s technology stack and project requirements.
Kubeflow Fundamentals
28 HoursThis instructor-led, live training in Botswana (online or onsite) is aimed at developers and data scientists who wish to build, deploy, and manage machine learning workflows on Kubernetes.
By the end of this training, participants will be able to:
- Install and configure Kubeflow on premise and in the cloud.
- Build, deploy, and manage ML workflows based on Docker containers and Kubernetes.
- Run entire machine learning pipelines on diverse architectures and cloud environments.
- Using Kubeflow to spawn and manage Jupyter notebooks.
- Build ML training, hyperparameter tuning, and serving workloads across multiple platforms.
Machine Learning for Mobile Apps using Google’s ML Kit
14 HoursThis instructor-led live training, accessible online or onsite, is intended for developers who wish to utilise Google’s ML Kit to build machine learning models optimised for mobile device processing.
By the conclusion of this training, participants will be able to:
- Set up the necessary development environment to commence the development of machine learning features for mobile apps.
- Integrate new machine learning technologies into Android and iOS apps using the ML Kit APIs.
- Enhance and optimise existing apps by employing the ML Kit SDK for on-device processing and deployment.
Machine Learning with Random Forest
14 HoursThis instructor-led live training in Botswana (online or onsite) is designed for data scientists and software engineers who wish to utilise Random Forest to construct machine learning algorithms for large datasets.
By the conclusion of this training, participants will be able to:
- Set up the necessary development environment to begin building machine learning models with Random Forest.
- Understand the advantages of Random Forest and how to implement it to resolve classification and regression problems.
- Learn how to handle large datasets and interpret multiple decision trees in Random Forest.
- Evaluate and optimise machine learning model performance by tuning the hyperparameters.
Advanced Analytics with RapidMiner
14 HoursThis instructor-led live training in Botswana (online or onsite) is aimed at intermediate-level data analysts who wish to learn how to use RapidMiner to estimate and project values and utilize analytical tools for time series forecasting.
By the end of this training, participants will be able to:
- Learn to apply the CRISP-DM methodology, select appropriate machine learning algorithms, and enhance model construction and performance.
- Use RapidMiner to estimate and project values, and utilize analytical tools for time series forecasting.
GPU Data Science with NVIDIA RAPIDS
14 HoursThis instructor-led, live training in Botswana (online or onsite) is designed for data scientists and developers who wish to use RAPIDS to build GPU-accelerated data pipelines, workflows, and visualizations, applying machine learning algorithms, such as XGBoost, cuML, etc.
By the end of this training, participants will be able to:
- Set up the necessary development environment to build data models with NVIDIA RAPIDS.
- Understand the features, components, and advantages of RAPIDS.
- Leverage GPUs to accelerate end-to-end data and analytics pipelines.
- Implement GPU-accelerated data preparation and ETL with cuDF and Apache Arrow.
- Learn how to perform machine learning tasks with XGBoost and cuML algorithms.
- Build data visualizations and execute graph analysis with cuXfilter and cuGraph.