NVIDIA GPU Programming - Extended Training Course
This instructor-led, live training course provides guidance on programming GPUs for parallel computing, utilizing different platforms, working with the CUDA platform and its features, and executing various optimization techniques via CUDA. Applications encompass deep learning, analytics, image processing, and engineering solutions.
This course is available as onsite live training in Botswana or online live training.Course Outline
Introduction
Grasping the Fundamentals of Heterogeneous Computing Methodology
Why Parallel Computing? Understanding the Need for Parallel Computing
Multi-Core Processors - Architecture and Design
Introduction to Threads, Thread Basics and Basic Concepts of Parallel Programming
Grasping the Fundamentals of GPU Software Optimization Processes
OpenMP - A Standard for Directive-Based Parallel Programming
Practical / Demonstration of Various Programs on Multicore Machines
Introduction to GPU Computing
GPUs for Parallel Computing
GPUs Programming Model
Practical / Demonstration of Various Programs on GPU
SDK, Toolkit and Installation of Environment for GPU
Working with Various Libraries
Demonstration of GPU and Tools with Sample Programs and OpenACC
Understanding the CUDA Programming Model
Learning the CUDA Architecture
Exploring and Setting Up the CUDA Development Environments
Working with the CUDA Runtime API
Understanding the CUDA Memory Model
Exploring Additional CUDA API Features
Accessing Global Memory Efficiently in CUDA: Global Memory Optimization
Optimizing Data Transfers in CUDA Using CUDA Streams
Using Shared Memory in CUDA
Understanding and Using Atomic Operations and Instructions in CUDA
Case Study: Basic Digital Image Processing with CUDA
Working with Multi-GPU Programming
Advanced Hardware Profiling and Sampling on NVIDIA / CUDA
Using CUDA Dynamic Parallelism API for Dynamic Kernel Launch
Summary and Conclusion
Requirements
- C Programming
- Linux GCC
Need help picking the right course?
southafrica@nobleprog.co.za or +27 (0)10 005 5793
NVIDIA GPU Programming - Extended Training Course - Enquiry
Testimonials (1)
Trainers energy and humor.
Tadeusz Kaluba - Nokia Solutions and Networks Sp. z o.o.
Course - NVIDIA GPU Programming - Extended
Related Courses
Developing AI Applications with Huawei Ascend and CANN
21 HoursHuawei Ascend comprises a series of AI processors engineered for high-performance inference and training tasks.
This instructor-led live training, available online or on-site, targets intermediate-level AI engineers and data scientists keen on developing and optimizing neural network models via Huawei's Ascend platform and the CANN toolkit.
Upon completion of this training, participants will be capable of:
- Setting up and configuring the CANN development environment.
- Creating AI applications employing MindSpore and CloudMatrix workflows.
- Enhancing performance on Ascend NPUs through custom operators and tiling techniques.
- Deploying models to either edge or cloud environments.
Course Format
- Interactive lectures and discussions.
- Practical application of Huawei Ascend and the CANN toolkit within sample applications.
- Guided exercises centred on model construction, training, and deployment.
Course Customization Options
- For customized training tailored to your specific infrastructure or datasets, please reach out to us to make arrangements.
Deploying AI Models with CANN and Ascend AI Processors
14 HoursCANN (Compute Architecture for Neural Networks) serves as Huawei's AI compute stack, designed to deploy and optimize AI models on Ascend AI processors.
This live training session, led by an instructor and available either online or onsite, targets intermediate-level AI developers and engineers. It focuses on efficiently deploying trained AI models onto Huawei Ascend hardware by leveraging the CANN toolkit alongside tools such as MindSpore, TensorFlow, or PyTorch.
Upon completing this training, participants will be able to:
- Grasp the CANN architecture and its critical role within the AI deployment pipeline.
- Convert and adapt models from widely used frameworks into formats compatible with Ascend.
- Utilise tools like ATC, OM model conversion, and MindSpore for both edge and cloud inference tasks.
- Troubleshoot deployment challenges and optimise performance on Ascend hardware.
Course Format
- Interactive lectures combined with practical demonstrations.
- Hands-on laboratory exercises using CANN tools and Ascend simulators or physical devices.
- Practical deployment scenarios grounded in real-world AI models.
Course Customisation Options
- To request a customised training programme for this course, please contact us to make arrangements.
AI Inference and Deployment with CloudMatrix
21 HoursCloudMatrix serves as Huawei's unified platform for AI development and deployment, specifically designed to facilitate scalable, production-ready inference pipelines.
This live training session, which can be conducted online or at your premises under the guidance of an instructor, targets beginner to intermediate AI professionals. The objective is to equip participants with the skills to deploy and monitor AI models using the CloudMatrix platform, leveraging its integration with CANN and MindSpore.
Upon completing this training, participants will be capable of:
- Utilising CloudMatrix for the packaging, deployment, and serving of models.
- Converting and optimising models specifically for Ascend chipsets.
- Establishing pipelines for both real-time and batch inference tasks.
- Monitoring deployments and tuning performance within production environments.
Course Format
- Interactive lectures and group discussions.
- Practical application of CloudMatrix through real-world deployment scenarios.
- Guided exercises focusing on conversion, optimisation, and scaling.
Customization Options
- If you require a tailored version of this course based on your specific AI infrastructure or cloud environment, please contact us to make arrangements.
GPU Programming on Biren AI Accelerators
21 HoursBiren AI Accelerators are high-performance GPUs designed for AI and HPC workloads with support for large-scale training and inference.
This instructor-led, live training (online or onsite) is aimed at intermediate-level to advanced-level developers who wish to program and optimize applications using Biren’s proprietary GPU stack, with practical comparisons to CUDA-based environments.
By the end of this training, participants will be able to:
- Understand Biren GPU architecture and memory hierarchy.
- Set up the development environment and use Biren’s programming model.
- Translate and optimize CUDA-style code for Biren platforms.
- Apply performance tuning and debugging techniques.
Format of the Course
- Interactive lecture and discussion.
- Hands-on use of Biren SDK in sample GPU workloads.
- Guided exercises focused on porting and performance tuning.
Course Customization Options
- To request a customized training for this course based on your application stack or integration needs, please contact us to arrange.
Cambricon MLU Development with BANGPy and Neuware
21 HoursCambricon MLUs (Machine Learning Units) are purpose-built AI chips engineered for optimal inference and training performance in both edge computing and data centre environments.
This instructor-led, live training session (available online or at your premises) is designed for intermediate developers who want to build and deploy AI models leveraging the BANGPy framework and Neuware SDK on Cambricon MLU hardware.
Upon completing this training, participants will be equipped to:
- Configure and set up the BANGPy and Neuware development environments.
- Develop and optimise Python- and C++-based models tailored for Cambricon MLUs.
- Deploy models onto edge and data centre devices operating with the Neuware runtime.
- Integrate machine learning workflows with hardware-specific acceleration capabilities.
Course Format
- Interactive lectures and discussions.
- Practical, hands-on experience using BANGPy and Neuware for development and deployment.
- Guided exercises concentrating on optimization, integration, and testing.
Customisation Options
- For a training programme tailored to your specific Cambricon device model or use case, please contact us to make arrangements.
Introduction to CANN for AI Framework Developers
7 HoursCANN (Compute Architecture for Neural Networks) is Huawei’s AI computing toolkit designed to compile, optimise, and deploy AI models on Ascend AI processors.
This instructor-led, live training (available online or onsite) is tailored for beginner-level AI developers who wish to understand how CANN fits into the model lifecycle from training to deployment, and how it works with frameworks like MindSpore, TensorFlow, and PyTorch.
By the end of this training, participants will be able to:
- Understand the purpose and architecture of the CANN toolkit.
- Set up a development environment with CANN and MindSpore.
- Convert and deploy a simple AI model to Ascend hardware.
- Gain foundational knowledge for future CANN optimisation or integration projects.
Format of the Course
- Interactive lecture and discussion.
- Hands-on labs with simple model deployment.
- Step-by-step walkthrough of the CANN toolchain and integration points.
Course Customisation Options
- To request a customised training for this course, please contact us to arrange.
CANN for Edge AI Deployment
14 HoursHuawei's Ascend CANN toolkit facilitates robust AI inference on edge devices like the Ascend 310. It offers critical tools for compiling, optimising, and deploying models in environments with limited compute power and memory.
This instructor-led, live training (available online or onsite) targets intermediate-level AI developers and integrators who wish to deploy and optimise models on Ascend edge devices using the CANN toolchain.
Upon completing this training, participants will be able to:
- Prepare and convert AI models for the Ascend 310 using CANN tools.
- Build lightweight inference pipelines using MindSpore Lite and AscendCL.
- Optimise model performance for environments with constrained compute and memory.
- Deploy and monitor AI applications in real-world edge use cases.
Course Format
- Interactive lectures and demonstrations.
- Hands-on laboratory work involving edge-specific models and scenarios.
- Live deployment examples on virtual or physical edge hardware.
Customisation Options
- To request tailored training for this course, please contact us to arrange.
Understanding Huawei’s AI Compute Stack: From CANN to MindSpore
14 HoursHuawei’s AI stack — spanning from the low-level CANN SDK to the high-level MindSpore framework — delivers a closely integrated environment for AI development and deployment, specifically optimised for Ascend hardware.
This instructor-led, live training (available online or on-site) targets beginner to intermediate-level technical professionals who wish to grasp how the CANN and MindSpore components collaborate to facilitate AI lifecycle management and infrastructure decision-making.
Upon completing this training, participants will be equipped to:
- Comprehend the layered architecture of Huawei’s AI compute stack.
- Identify how CANN facilitates model optimisation and hardware-level deployment.
- Assess the MindSpore framework and toolchain in comparison with industry alternatives.
- Position Huawei's AI stack within enterprise or cloud/on-premises environments.
Format of the Course
- Interactive lectures and discussions.
- Live system demonstrations and case-based walkthroughs.
- Optional guided labs focusing on the model flow from MindSpore to CANN.
Course Customisation Options
- To request bespoke training for this course, please contact us to make arrangements.
Optimizing Neural Network Performance with CANN SDK
14 HoursThe CANN SDK (Compute Architecture for Neural Networks) serves as Huawei’s foundational AI computing platform, enabling developers to fine-tune and maximise the performance of neural networks deployed on Ascend AI processors.
This instructor-led live training session, available online or onsite, is designed for advanced AI developers and system engineers who aim to optimise inference performance through CANN’s sophisticated toolkit, including the Graph Engine, TIK, and custom operator development.
Upon completion of this training, participants will be able to:
- Comprehend CANN’s runtime architecture and performance lifecycle.
- Utilise profiling tools and the Graph Engine for detailed performance analysis and optimisation.
- Develop and optimise custom operators using TIK and TVM.
- Address memory bottlenecks and enhance model throughput.
Course Format
- Interactive lectures and discussions.
- Practical hands-on labs featuring real-time profiling and operator tuning.
- Optimisation exercises based on edge-case deployment scenarios.
Course Customisation Options
- To request a tailored version of this course, please contact us to make arrangements.
CANN SDK for Computer Vision and NLP Pipelines
14 HoursThe CANN SDK (Compute Architecture for Neural Networks) offers robust deployment and optimization tools for real-time AI applications in computer vision and NLP, particularly on Huawei Ascend hardware.
This instructor-led, live training (available online or onsite) is designed for intermediate-level AI practitioners who want to build, deploy, and optimize vision and language models for production environments using the CANN SDK.
By the conclusion of this training, participants will be able to:
- Deploy and optimize CV and NLP models using CANN and AscendCL.
- Utilize CANN tools to convert models and integrate them into live pipelines.
- Optimize inference performance for tasks such as detection, classification, and sentiment analysis.
- Construct real-time CV/NLP pipelines for both edge and cloud-based deployment scenarios.
Course Format
- Interactive lectures and demonstrations.
- Hands-on labs involving model deployment and performance profiling.
- Live pipeline design using real-world CV and NLP use cases.
Course Customization Options
- To request a customized training session for this course, please contact us to make arrangements.
Building Custom AI Operators with CANN TIK and TVM
14 HoursCANN TIK (Tensor Instruction Kernel) and Apache TVM facilitate the advanced optimisation and customisation of AI model operators for Huawei Ascend hardware.
This instructor-led, live training session (available online or onsite) is designed for advanced-level system developers who wish to construct, deploy, and fine-tune custom operators for AI models using CANN’s TIK programming model and TVM compiler integration.
Upon completion of this training, participants will be capable of:
- Writing and testing custom AI operators using the TIK DSL for Ascend processors.
- Integrating custom operations into the CANN runtime and execution graph.
- Leveraging TVM for operator scheduling, auto-tuning, and benchmarking.
- Debugging and optimizing instruction-level performance for custom computation patterns.
Course Format
- Interactive lectures and demonstrations.
- Hands-on coding of operators utilizing TIK and TVM pipelines.
- Testing and tuning on Ascend hardware or simulators.
Course Customization Options
- To request a customized training session for this course, please contact us to arrange.
Migrating CUDA Applications to Chinese GPU Architectures
21 HoursChinese GPU architectures, including Huawei Ascend, Biren, and Cambricon MLUs, present viable alternatives to CUDA, specifically designed for local AI and HPC markets.
This instructor-led, live training (available online or onsite) is intended for advanced GPU programmers and infrastructure specialists seeking to migrate and optimize existing CUDA applications for deployment on Chinese hardware platforms.
Upon completing this training, participants will be able to:
- Evaluate the compatibility of existing CUDA workloads with Chinese chip alternatives.
- Port CUDA codebases to Huawei CANN, Biren SDK, and Cambricon BANGPy environments.
- Compare performance metrics and identify optimization opportunities across different platforms.
- Address practical challenges related to cross-architecture support and deployment.
Course Format
- Interactive lectures and discussions.
- Hands-on labs focusing on code translation and performance comparisons.
- Guided exercises centred on multi-GPU adaptation strategies.
Customization Options
- To request customized training for this course tailored to your specific platform or CUDA project, please contact us to arrange.
Performance Optimization on Ascend, Biren, and Cambricon
21 HoursAscend, Biren, and Cambricon represent the forefront of AI hardware development in China, providing distinct acceleration and profiling capabilities tailored for large-scale AI operations.
This instructor-led training, available online or onsite, is designed for advanced AI infrastructure and performance engineers seeking to enhance model inference and training processes across various Chinese AI chip architectures.
Upon completion, participants will be equipped to:
- Evaluate model performance on Ascend, Biren, and Cambricon systems.
- Pinpoint system bottlenecks and inefficiencies in memory and computation.
- Implement optimizations at the graph, kernel, and operator levels.
- Refine deployment pipelines to maximise throughput and reduce latency.
Course Format
- Engaging lectures combined with interactive discussions.
- Practical application of profiling and optimization tools across each platform.
- Structured exercises focused on real-world tuning scenarios.
Customization Options
- For tailored training based on your specific performance environment or model requirements, please reach out to us to arrange.