GPU Programming with CUDA and Python Training Course

CUDA (Compute Unified Device Architecture) is a parallel computing platform and API developed by Nvidia.

This instructor-led, live training (available online or onsite) is designed for intermediate-level developers who want to leverage CUDA to build Python applications that execute in parallel on NVIDIA GPUs.

Upon completion of this training, participants will be able to:

Utilize the Numba compiler to accelerate Python applications running on NVIDIA GPUs.
Construct, compile, and launch custom CUDA kernels.
Manage GPU memory effectively.
Transform a CPU-based application into a GPU-accelerated one.

Course Format

Interactive lectures and discussions.
Extensive exercises and practical activities.
Hands-on implementation in a live-lab environment.

Course Customization Options

To request customized training for this course, please contact us to arrange it.

This course is available as onsite live training in Botswana or online live training.

Thank you for sending your enquiry! One of our team members will contact you shortly.

Thank you for sending your booking! One of our team members will contact you shortly.

Course Outline

Introduction

What is GPU programming?
Why use CUDA with Python?
Key concepts: Threads, Blocks, Grids

Overview of CUDA Features and Architecture

GPU versus CPU architecture
Understanding SIMT (Single Instruction, Multiple Threads)
CUDA programming model

Setting up the Development Environment

Installing the CUDA Toolkit and drivers
Installing Python and Numba
Setting up and verifying the environment

Parallel Programming Fundamentals

Introduction to parallel execution
Understanding threads and thread hierarchies
Working with warps and synchronization

Working with the Numba Compiler

Introduction to Numba
Writing CUDA kernels with Numba
Understanding @cuda.jit decorators

Building a Custom CUDA Kernel

Writing and launching a basic kernel
Using threads for element-wise operations
Managing grid and block dimensions

Memory Management

Types of GPU memory (global, shared, local, constant)
Memory transfer between host and device
Optimizing memory usage and avoiding bottlenecks

Advanced Topics in GPU Acceleration

Shared memory and synchronization
Using streams for asynchronous execution
Multi-GPU programming basics

Converting CPU-based Applications to GPU

Profiling CPU code
Identifying parallelizable sections
Porting logic to CUDA kernels

Troubleshooting

Debugging CUDA applications
Common errors and how to resolve them
Tools and techniques for testing and validation

Summary and Next Steps

Review of key concepts
Best practices in GPU programming
Resources for continued learning

Requirements

Experience with Python programming
Familiarity with NumPy (ndarrays, ufuncs, etc.)

Audience

Developers

14 Hours

Need help picking the right course?
southafrica@nobleprog.co.za or +27 (0)10 005 5793

Testimonials (1)

Very interactive with various examples, with a good progression in complexity between the start and the end of the training.

GPU Programming with CUDA and Python Training Course

Course Outline

Requirements

Testimonials (1)

Jenny - Andheo

Course - GPU Programming with CUDA and Python

Related Categories

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites

GPU Programming with CUDA and Python Training Course

Course Outline

Requirements

Testimonials (1)

Jenny - Andheo

Course - GPU Programming with CUDA and Python

Related Courses

Advanced Python: Best Practices and Design Patterns

Agentic AI Engineering with Python — Build Autonomous Agents

Introduction to Data Science and AI using Python

Artificial Intelligence with Python (Intermediate Level)

Algorithmic Trading with Python and R

Applied AI from Scratch in Python

AWS Cloud9 and Python: A Practical Guide

Administration of CUDA

Bespoke Applied Artificial Intelligence and LLM Engineering with Python

Scaling Data Analysis with Python and Dask

Data Analysis with Python, Pandas and Numpy

FARM (FastAPI, React, and MongoDB) Full Stack Development

Developing APIs with Python and FastAPI

Fraud Detection with Python and TensorFlow

Machine Learning with Python – 2 Days

Related Categories

Python

CUDA (Compute Unified Device Architecture)

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites