Get in Touch

Course Outline

Performance Concepts and Metrics

  • Latency, throughput, power consumption, and resource utilisation
  • Distinguishing system-level versus model-level bottlenecks
  • Profiling strategies for inference versus training phases

Profiling on Huawei Ascend

  • Leveraging CANN Profiler and MindInsight
  • Diagnostics for kernels and operators
  • Understand offload patterns and memory mapping

Profiling on Biren GPU

  • Exploring Biren SDK performance monitoring features
  • Optimising kernel fusion, memory alignment, and execution queues
  • Profiling aware of power and temperature factors

Profiling on Cambricon MLU

  • Utilising BANGPy and Neuware performance tools
  • Gaining kernel-level visibility and interpreting logs
  • Integrating the MLU profiler with deployment frameworks

Graph and Model-Level Optimization

  • Strategies for graph pruning and quantization
  • Operator fusion and computational graph restructuring
  • Standardising input sizes and tuning batch parameters

Memory and Kernel Optimization

  • Enhancing memory layout and reuse efficiency
  • Effective buffer management across different chipsets
  • Platform-specific kernel tuning techniques

Cross-Platform Best Practices

  • Achieving performance portability through abstraction strategies
  • Developing shared tuning pipelines for multi-chip environments
  • Case Study: Tuning an object detection model across Ascend, Biren, and MLU

Summary and Next Steps

Requirements

  • Hands-on experience with AI model training or deployment pipelines
  • Familiarity with GPU/MLU computing principles and model optimization techniques
  • Basic knowledge of performance profiling tools and metrics

Target Audience

  • Performance engineers
  • Machine learning infrastructure teams
  • AI system architects
 21 Hours

Related Categories