GPU Programming - OpenCL vs CUDA vs ROCm Training Course

GPU programming is a technique that harnesses the parallel processing power of graphics processing units to accelerate high-performance computing applications, including artificial intelligence, gaming, graphics rendering, and scientific simulations. Several frameworks facilitate GPU programming, each offering distinct advantages and limitations. OpenCL is an open standard enabling programming across CPUs, GPUs, and other devices from various vendors. CUDA is specifically designed for NVIDIA GPUs, while ROCm is a platform supporting GPU programming on AMD hardware and offering compatibility with both CUDA and OpenCL.

This instructor-led, live training (available online or on-site) is designed for beginner to intermediate-level developers seeking to explore different GPU programming frameworks and compare their features, performance, and compatibility.

By the conclusion of this training, participants will be able to:

Configure a development environment comprising the OpenCL SDK, CUDA Toolkit, ROCm Platform, a compatible device supporting OpenCL, CUDA, or ROCm, and Visual Studio Code.
Develop a basic GPU program that performs vector addition using OpenCL, CUDA, and ROCm, and compare the syntax, structure, and execution characteristics of each framework.
Leverage respective APIs to query device information, allocate and deallocate device memory, transfer data between host and device, launch kernels, and synchronise threads.
Utilise framework-specific languages to write kernels that execute on the device and manipulate data.
Employ built-in functions, variables, and libraries to perform common tasks and operations.
Apply appropriate memory spaces, such as global, local, constant, and private, to optimise data transfers and memory accesses.
Implement execution models to manage threads, blocks, and grids that define parallelism.
Debug and test GPU programs using tools such as CodeXL, CUDA-GDB, CUDA-MEMCHECK, and NVIDIA Nsight.
Optimise GPU programs through techniques including coalescing, caching, prefetching, and profiling.

Course Format

Interactive lectures and discussion sessions.
Extensive exercises and practical practice.
Hands-on implementation within a live-lab environment.

Course Customisation Options

To request a customised training session for this course, please contact us to arrange.

This course is available as onsite live training in New Zealand or online live training.

Thank you for sending your enquiry! One of our team members will contact you shortly.

Thank you for sending your booking! One of our team members will contact you shortly.

Provisional Upcoming Courses (Require 5+ participants)

GPU Programming - OpenCL vs CUDA vs ROCm

2026-06-11 09:30

28 hours

Te Pou Toetoe Linwood

10880 NZD (Online)

10880 NZD (Classroom)

GPU Programming - OpenCL vs CUDA vs ROCm

2026-06-25 09:30

28 hours

Auckland, ANZ Centre

10880 NZD (Online)

26880 NZD (Classroom)

GPU Programming - OpenCL vs CUDA vs ROCm

2026-07-09 09:30

28 hours

Wellington, Plimmer Towers

10880 NZD (Online)

25760 NZD (Classroom)

GPU Programming - OpenCL vs CUDA vs ROCm

2026-07-23 09:30

28 hours

Te Pou Toetoe Linwood

10880 NZD (Online)

10880 NZD (Classroom)

GPU Programming - OpenCL vs CUDA vs ROCm Training Course

Course Outline

Requirements

Provisional Upcoming Courses (Require 5+ participants)

GPU Programming - OpenCL vs CUDA vs ROCm

GPU Programming - OpenCL vs CUDA vs ROCm

GPU Programming - OpenCL vs CUDA vs ROCm

GPU Programming - OpenCL vs CUDA vs ROCm

Related Categories

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites

GPU Programming - OpenCL vs CUDA vs ROCm Training Course

Course Outline

Requirements

Provisional Upcoming Courses (Require 5+ participants)

GPU Programming - OpenCL vs CUDA vs ROCm

GPU Programming - OpenCL vs CUDA vs ROCm

GPU Programming - OpenCL vs CUDA vs ROCm

GPU Programming - OpenCL vs CUDA vs ROCm

Related Courses

Developing AI Applications with Huawei Ascend and CANN

Deploying AI Models with CANN and Ascend AI Processors

AI Inference and Deployment with CloudMatrix

GPU Programming on Biren AI Accelerators

Cambricon MLU Development with BANGPy and Neuware

Introduction to CANN for AI Framework Developers

CANN for Edge AI Deployment

Understanding Huawei’s AI Compute Stack: From CANN to MindSpore

Optimizing Neural Network Performance with CANN SDK

CANN SDK for Computer Vision and NLP Pipelines

Building Custom AI Operators with CANN TIK and TVM

Migrating CUDA Applications to Chinese GPU Architectures

Performance Optimization on Ascend, Biren, and Cambricon

Related Categories

GPU

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites