Get in Touch

Course Outline

Introduction

  • What is OpenCL?
  • OpenCL vs CUDA vs SYCL
  • Overview of OpenCL features and architecture
  • Setting up the Development Environment

Getting Started

  • Creating a new OpenCL project using Visual Studio Code
  • Exploring the project structure and files
  • Compiling and running the program
  • Displaying the output using printf and fprintf

OpenCL API

  • Understanding the role of the OpenCL API in the host program
  • Using the OpenCL API to query device information and capabilities
  • Using the OpenCL API to create contexts, command queues, buffers, kernels, and events
  • Using the OpenCL API to enqueue commands, such as read, write, copy, map, unmap, execute, and wait
  • Using the OpenCL API to handle errors and exceptions

OpenCL C

  • Understanding the role of OpenCL C in the device program
  • Using OpenCL C to write kernels that execute on the device and manipulate data
  • Using OpenCL C data types, qualifiers, operators, and expressions
  • Using OpenCL C built-in functions, such as mathematical, geometric, relational, and others
  • Using OpenCL C extensions and libraries, such as atomic, image, cl_khr_fp16, and others

OpenCL Memory Model

  • Understanding the difference between host and device memory models
  • Using OpenCL memory spaces, such as global, local, constant, and private
  • Using OpenCL memory objects, such as buffers, images, and pipes
  • Using OpenCL memory access modes, such as read-only, write-only, read-write, and others
  • Understanding the OpenCL memory consistency model and synchronisation mechanisms

OpenCL Execution Model

  • Understanding the difference between host and device execution models
  • Using OpenCL work-items, work-groups, and ND-ranges to define parallelism
  • Using OpenCL work-item functions, such as get_global_id, get_local_id, get_group_id, and others
  • Using OpenCL work-group functions, such as barrier, work_group_reduce, work_group_scan, and others
  • Using OpenCL device functions, such as get_num_groups, get_global_size, get_local_size, and others

Debugging

  • Understanding common errors and bugs in OpenCL programs
  • Using the Visual Studio Code debugger to inspect variables, breakpoints, call stacks, and more
  • Using CodeXL to debug and analyse OpenCL programs on AMD devices
  • Using Intel VTune to debug and analyse OpenCL programs on Intel devices
  • Using NVIDIA Nsight to debug and analyse OpenCL programs on NVIDIA devices

Optimisation

  • Understanding the factors that affect the performance of OpenCL programs
  • Using OpenCL vector data types and vectorisation techniques to improve arithmetic throughput
  • Using OpenCL loop unrolling and loop tiling techniques to reduce control overhead and increase locality
  • Using OpenCL local memory and local memory functions to optimise memory accesses and bandwidth
  • Using OpenCL profiling and profiling tools to measure and improve execution time and resource utilisation

Summary and Next Steps

Requirements

  • An understanding of the C/C++ language and parallel programming concepts
  • Basic knowledge of computer architecture and memory hierarchy
  • Experience with command-line tools and code editors

Audience

  • Developers who wish to learn how to use OpenCL to program heterogeneous devices and exploit their parallelism
  • Developers who wish to write portable and scalable code capable of running on different platforms and devices
  • Programmers who wish to explore the low-level aspects of heterogeneous programming and optimise their code performance
 28 Hours

Number of participants


Price per participant

Provisional Upcoming Courses (Require 5+ participants)

Related Categories