Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Introduction
- What is GPU programming?
- Why utilise GPU programming?
- What are the challenges and trade-offs associated with GPU programming?
- What frameworks are available for GPU programming?
- Selecting the appropriate framework for your application.
OpenCL
- What is OpenCL?
- What are the advantages and disadvantages of OpenCL?
- Setting up the development environment for OpenCL.
- Creating a basic OpenCL program that performs vector addition.
- Using the OpenCL API to query device information, allocate and deallocate device memory, transfer data between host and device, launch kernels, and synchronise threads.
- Using OpenCL C language to write kernels that execute on the device and manipulate data.
- Using OpenCL built-in functions, variables, and libraries to perform common tasks and operations.
- Using OpenCL memory spaces, such as global, local, constant, and private, to optimise data transfers and memory accesses.
- Using the OpenCL execution model to manage work-items, work-groups, and ND-ranges that define parallelism.
- Debugging and testing OpenCL programs using tools such as CodeXL.
- Optimising OpenCL programs through techniques including coalescing, caching, prefetching, and profiling.
CUDA
- What is CUDA?
- What are the advantages and disadvantages of CUDA?
- Setting up the development environment for CUDA.
- Creating a basic CUDA program that performs vector addition.
- Using the CUDA API to query device information, allocate and deallocate device memory, transfer data between host and device, launch kernels, and synchronise threads.
- Using CUDA C/C++ language to write kernels that execute on the device and manipulate data.
- Using CUDA built-in functions, variables, and libraries to perform common tasks and operations.
- Using CUDA memory spaces, such as global, shared, constant, and local, to optimise data transfers and memory accesses.
- Using the CUDA execution model to manage threads, blocks, and grids that define parallelism.
- Debugging and testing CUDA programs using tools such as CUDA-GDB, CUDA-MEMCHECK, and NVIDIA Nsight.
- Optimising CUDA programs through techniques including coalescing, caching, prefetching, and profiling.
ROCm
- What is ROCm?
- What are the advantages and disadvantages of ROCm?
- Setting up the development environment for ROCm.
- Creating a basic ROCm program that performs vector addition.
- Using the ROCm API to query device information, allocate and deallocate device memory, transfer data between host and device, launch kernels, and synchronise threads.
- Using ROCm C/C++ language to write kernels that execute on the device and manipulate data.
- Using ROCm built-in functions, variables, and libraries to perform common tasks and operations.
- Using ROCm memory spaces, such as global, local, constant, and private, to optimise data transfers and memory accesses.
- Using the ROCm execution model to manage threads, blocks, and grids that define parallelism.
- Debugging and testing ROCm programs using tools such as ROCm Debugger and ROCm Profiler.
- Optimising ROCm programs through techniques including coalescing, caching, prefetching, and profiling.
Comparison
- Comparing the features, performance, and compatibility of OpenCL, CUDA, and ROCm.
- Evaluating GPU programs using benchmarks and metrics.
- Learning best practices and tips for GPU programming.
- Exploring current and future trends and challenges in GPU programming.
Summary and Next Steps
Requirements
- A solid understanding of the C/C++ programming language and parallel programming concepts.
- Basic knowledge of computer architecture and memory hierarchy.
- Practical experience with command-line tools and code editors.
Audience
- Developers seeking to learn how to utilise different GPU programming frameworks and compare their features, performance, and compatibility.
- Developers aiming to write portable, scalable code capable of running across diverse platforms and devices.
- Programmers interested in exploring the trade-offs and challenges inherent in GPU programming and optimisation.
28 Hours