Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Introduction
- What is ROCm?
- What is HIP?
- ROCm vs CUDA vs OpenCL
- Overview of ROCm and HIP features and architecture
- Setting up the development environment
Getting Started
- Creating a new ROCm project using Visual Studio Code
- Exploring the project structure and files
- Compiling and running the program
- Displaying output using printf and fprintf
ROCm API
- Understanding the role of the ROCm API in the host program
- Using the ROCm API to query device information and capabilities
- Using the ROCm API to allocate and deallocate device memory
- Using the ROCm API to copy data between host and device
- Using the ROCm API to launch kernels and synchronise threads
- Using the ROCm API to handle errors and exceptions
HIP Language
- Understanding the role of the HIP language in the device program
- Using the HIP language to write kernels that execute on the GPU and manipulate data
- Using HIP data types, qualifiers, operators, and expressions
- Utilising HIP built-in functions, variables, and libraries to perform common tasks and operations
ROCm and HIP Memory Model
- Understanding the differences between host and device memory models
- Using ROCm and HIP memory spaces, such as global, shared, constant, and local
- Using ROCm and HIP memory objects, such as pointers, arrays, textures, and surfaces
- Using ROCm and HIP memory access modes, such as read-only, write-only, read-write, and others
- Understanding the ROCm and HIP memory consistency model and synchronisation mechanisms
ROCm and HIP Execution Model
- Understanding the differences between host and device execution models
- Using ROCm and HIP threads, blocks, and grids to define parallelism
- Using ROCm and HIP thread functions, such as hipThreadIdx_x, hipBlockIdx_x, hipBlockDim_x, and others
- Using ROCm and HIP block functions, such as __syncthreads, __threadfence_block, and others
- Using ROCm and HIP grid functions, such as hipGridDim_x, hipGridSync, cooperative groups, and others
Debugging
- Understanding common errors and bugs in ROCm and HIP programs
- Using the Visual Studio Code debugger to inspect variables, breakpoints, call stacks, and more
- Using the ROCm Debugger to debug ROCm and HIP programs on AMD devices
- Using the ROCm Profiler to analyse ROCm and HIP programs on AMD devices
Optimisation
- Understanding the factors that affect the performance of ROCm and HIP programs
- Using ROCm and HIP coalescing techniques to improve memory throughput
- Using ROCm and HIP caching and prefetching techniques to reduce memory latency
- Using ROCm and HIP shared and local memory techniques to optimise memory accesses and bandwidth
- Using ROCm and HIP profiling tools to measure and improve execution time and resource utilisation
Summary and Next Steps
Requirements
- A solid understanding of the C/C++ language and parallel programming concepts
- Basic knowledge of computer architecture and memory hierarchy
- Experience with command-line tools and code editors
Audience
- Developers who wish to learn how to use ROCm and HIP to program AMD GPUs and harness their parallel capabilities
- Developers aiming to write high-performance, scalable code that runs across different AMD devices
- Programmers interested in exploring the low-level aspects of GPU programming and optimising their code performance
28 Hours