Performance Optimization on Ascend, Biren, and Cambricon Training Course

Ascend, Biren, and Cambricon represent the forefront of AI hardware development in China, providing distinct acceleration and profiling capabilities tailored for large-scale AI operations.

This instructor-led training, available online or onsite, is designed for advanced AI infrastructure and performance engineers seeking to enhance model inference and training processes across various Chinese AI chip architectures.

Upon completion, participants will be equipped to:

Evaluate model performance on Ascend, Biren, and Cambricon systems.
Pinpoint system bottlenecks and inefficiencies in memory and computation.
Implement optimizations at the graph, kernel, and operator levels.
Refine deployment pipelines to maximise throughput and reduce latency.

Course Format

Engaging lectures combined with interactive discussions.
Practical application of profiling and optimization tools across each platform.
Structured exercises focused on real-world tuning scenarios.

Customization Options

For tailored training based on your specific performance environment or model requirements, please reach out to us to arrange.

This course is available as onsite live training in Botswana or online live training.

Thank you for sending your enquiry! One of our team members will contact you shortly.

Thank you for sending your booking! One of our team members will contact you shortly.

Course Outline

Performance Concepts and Metrics

Latency, throughput, power consumption, and resource utilisation
Distinguishing system-level versus model-level bottlenecks
Profiling strategies for inference versus training phases

Profiling on Huawei Ascend

Leveraging CANN Profiler and MindInsight
Diagnostics for kernels and operators
Understand offload patterns and memory mapping

Profiling on Biren GPU

Exploring Biren SDK performance monitoring features
Optimising kernel fusion, memory alignment, and execution queues
Profiling aware of power and temperature factors

Profiling on Cambricon MLU

Utilising BANGPy and Neuware performance tools
Gaining kernel-level visibility and interpreting logs
Integrating the MLU profiler with deployment frameworks

Graph and Model-Level Optimization

Strategies for graph pruning and quantization
Operator fusion and computational graph restructuring
Standardising input sizes and tuning batch parameters

Memory and Kernel Optimization

Enhancing memory layout and reuse efficiency
Effective buffer management across different chipsets
Platform-specific kernel tuning techniques

Cross-Platform Best Practices

Achieving performance portability through abstraction strategies
Developing shared tuning pipelines for multi-chip environments
Case Study: Tuning an object detection model across Ascend, Biren, and MLU

Summary and Next Steps

Requirements

Hands-on experience with AI model training or deployment pipelines
Familiarity with GPU/MLU computing principles and model optimization techniques
Basic knowledge of performance profiling tools and metrics

Target Audience

Performance engineers
Machine learning infrastructure teams
AI system architects

21 Hours

Need help picking the right course?
southafrica@nobleprog.co.za or +27 (0)10 005 5793

Performance Optimization on Ascend, Biren, and Cambricon Training Course

Course Outline

Requirements

Related Categories

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites

Performance Optimization on Ascend, Biren, and Cambricon Training Course

Course Outline

Requirements

Related Courses

Developing AI Applications with Huawei Ascend and CANN

Deploying AI Models with CANN and Ascend AI Processors

AI Inference and Deployment with CloudMatrix

GPU Programming on Biren AI Accelerators

Cambricon MLU Development with BANGPy and Neuware

Introduction to CANN for AI Framework Developers

CANN for Edge AI Deployment

Understanding Huawei’s AI Compute Stack: From CANN to MindSpore

Optimizing Neural Network Performance with CANN SDK

CANN SDK for Computer Vision and NLP Pipelines

Building Custom AI Operators with CANN TIK and TVM

Migrating CUDA Applications to Chinese GPU Architectures

Related Categories

Huawei Ascend

Biren (GPU)

Cambricon (MLU)

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites