Get in Touch

Course Outline

Introduction

  • What is ROCm?
  • What is HIP?
  • ROCm vs CUDA vs OpenCL
  • Overview of ROCm and HIP features and architecture
  • Setting up the development environment

Getting Started

  • Creating a new ROCm project using Visual Studio Code
  • Exploring the project structure and files
  • Compiling and running the program
  • Displaying output using printf and fprintf

ROCm API

  • Understanding the role of the ROCm API in the host program
  • Using the ROCm API to query device information and capabilities
  • Using the ROCm API to allocate and deallocate device memory
  • Using the ROCm API to copy data between host and device
  • Using the ROCm API to launch kernels and synchronise threads
  • Using the ROCm API to handle errors and exceptions

HIP Language

  • Understanding the role of the HIP language in the device program
  • Using the HIP language to write kernels that execute on the GPU and manipulate data
  • Using HIP data types, qualifiers, operators, and expressions
  • Utilising HIP built-in functions, variables, and libraries to perform common tasks and operations

ROCm and HIP Memory Model

  • Understanding the differences between host and device memory models
  • Using ROCm and HIP memory spaces, such as global, shared, constant, and local
  • Using ROCm and HIP memory objects, such as pointers, arrays, textures, and surfaces
  • Using ROCm and HIP memory access modes, such as read-only, write-only, read-write, and others
  • Understanding the ROCm and HIP memory consistency model and synchronisation mechanisms

ROCm and HIP Execution Model

  • Understanding the differences between host and device execution models
  • Using ROCm and HIP threads, blocks, and grids to define parallelism
  • Using ROCm and HIP thread functions, such as hipThreadIdx_x, hipBlockIdx_x, hipBlockDim_x, and others
  • Using ROCm and HIP block functions, such as __syncthreads, __threadfence_block, and others
  • Using ROCm and HIP grid functions, such as hipGridDim_x, hipGridSync, cooperative groups, and others

Debugging

  • Understanding common errors and bugs in ROCm and HIP programs
  • Using the Visual Studio Code debugger to inspect variables, breakpoints, call stacks, and more
  • Using the ROCm Debugger to debug ROCm and HIP programs on AMD devices
  • Using the ROCm Profiler to analyse ROCm and HIP programs on AMD devices

Optimisation

  • Understanding the factors that affect the performance of ROCm and HIP programs
  • Using ROCm and HIP coalescing techniques to improve memory throughput
  • Using ROCm and HIP caching and prefetching techniques to reduce memory latency
  • Using ROCm and HIP shared and local memory techniques to optimise memory accesses and bandwidth
  • Using ROCm and HIP profiling tools to measure and improve execution time and resource utilisation

Summary and Next Steps

Requirements

  • A solid understanding of the C/C++ language and parallel programming concepts
  • Basic knowledge of computer architecture and memory hierarchy
  • Experience with command-line tools and code editors

Audience

  • Developers who wish to learn how to use ROCm and HIP to program AMD GPUs and harness their parallel capabilities
  • Developers aiming to write high-performance, scalable code that runs across different AMD devices
  • Programmers interested in exploring the low-level aspects of GPU programming and optimising their code performance
 28 Hours

Number of participants


Price per participant

Provisional Upcoming Courses (Require 5+ participants)

Related Categories