Get in Touch

Course Outline

Introduction

  • What is GPU programming?
  • Why utilise GPU programming?
  • What are the challenges and trade-offs associated with GPU programming?
  • What frameworks are available for GPU programming?
  • Selecting the appropriate framework for your application.

OpenCL

  • What is OpenCL?
  • What are the advantages and disadvantages of OpenCL?
  • Setting up the development environment for OpenCL.
  • Creating a basic OpenCL program that performs vector addition.
  • Using the OpenCL API to query device information, allocate and deallocate device memory, transfer data between host and device, launch kernels, and synchronise threads.
  • Using OpenCL C language to write kernels that execute on the device and manipulate data.
  • Using OpenCL built-in functions, variables, and libraries to perform common tasks and operations.
  • Using OpenCL memory spaces, such as global, local, constant, and private, to optimise data transfers and memory accesses.
  • Using the OpenCL execution model to manage work-items, work-groups, and ND-ranges that define parallelism.
  • Debugging and testing OpenCL programs using tools such as CodeXL.
  • Optimising OpenCL programs through techniques including coalescing, caching, prefetching, and profiling.

CUDA

  • What is CUDA?
  • What are the advantages and disadvantages of CUDA?
  • Setting up the development environment for CUDA.
  • Creating a basic CUDA program that performs vector addition.
  • Using the CUDA API to query device information, allocate and deallocate device memory, transfer data between host and device, launch kernels, and synchronise threads.
  • Using CUDA C/C++ language to write kernels that execute on the device and manipulate data.
  • Using CUDA built-in functions, variables, and libraries to perform common tasks and operations.
  • Using CUDA memory spaces, such as global, shared, constant, and local, to optimise data transfers and memory accesses.
  • Using the CUDA execution model to manage threads, blocks, and grids that define parallelism.
  • Debugging and testing CUDA programs using tools such as CUDA-GDB, CUDA-MEMCHECK, and NVIDIA Nsight.
  • Optimising CUDA programs through techniques including coalescing, caching, prefetching, and profiling.

ROCm

  • What is ROCm?
  • What are the advantages and disadvantages of ROCm?
  • Setting up the development environment for ROCm.
  • Creating a basic ROCm program that performs vector addition.
  • Using the ROCm API to query device information, allocate and deallocate device memory, transfer data between host and device, launch kernels, and synchronise threads.
  • Using ROCm C/C++ language to write kernels that execute on the device and manipulate data.
  • Using ROCm built-in functions, variables, and libraries to perform common tasks and operations.
  • Using ROCm memory spaces, such as global, local, constant, and private, to optimise data transfers and memory accesses.
  • Using the ROCm execution model to manage threads, blocks, and grids that define parallelism.
  • Debugging and testing ROCm programs using tools such as ROCm Debugger and ROCm Profiler.
  • Optimising ROCm programs through techniques including coalescing, caching, prefetching, and profiling.

Comparison

  • Comparing the features, performance, and compatibility of OpenCL, CUDA, and ROCm.
  • Evaluating GPU programs using benchmarks and metrics.
  • Learning best practices and tips for GPU programming.
  • Exploring current and future trends and challenges in GPU programming.

Summary and Next Steps

Requirements

  • A solid understanding of the C/C++ programming language and parallel programming concepts.
  • Basic knowledge of computer architecture and memory hierarchy.
  • Practical experience with command-line tools and code editors.

Audience

  • Developers seeking to learn how to utilise different GPU programming frameworks and compare their features, performance, and compatibility.
  • Developers aiming to write portable, scalable code capable of running across diverse platforms and devices.
  • Programmers interested in exploring the trade-offs and challenges inherent in GPU programming and optimisation.
 28 Hours

Number of participants


Price per participant

Provisional Upcoming Courses (Require 5+ participants)

Related Categories