So you want your software to be much faster than the competition?

In 4 days your software team learns all techniques to make extremely fast software.

Your team will learn how to write optimal code for GPUs and make better use of the existing hardware. They will be able to write faster code immediately after the training – doubling the speed is minimal, 100 times is possible. Your customers will notice the difference in speed.

We use advanced, popular techniques like OpenCL and older techniques like cache-flow optimisation. At the end of the training you’ll receive a certificate from StreamHPC.

Want more information? Contact us.

About the training

Location and Time

OpenCL is a rather new subject and hard-coding the location and time has not proved to be successful in the past years for trainers in this subject. Therefore we chose for flexible dates and initially offer the training in large/capital cities and technology centres world-wide.

A final date for a city will be picked once there are 5 to 8 attendees, with a maximum of 12. You can specify your preferences for cities and dates in the form below.

Some discounts are available for developing countries.

Agenda

Day 1: Introduction

Learn about GPU architectures and AVX/SSE, how to program them and why it is faster.

Introduction to parallel programming and GPU-programming
An overview of parallel architectures
The OpenCL model: host-programming and kernel-programming
Comparison with NVIDIA’s CUDA and Intel’s Array Building Blocks.
Data-parallel and task-parallel programming

Lab-session will be an image-filter.

Note: since CUDA is very similar to OpenCL, you are free to choose to do the lab-sessions in CUDA.

Day 2: Tools and advanced subjects

Learn about parallel-programming tactics, host-programming (transferring data), IDEs and tools.

Static kernel analysis
Profiling
Debugging
Data handling and preparation
Theoretical backgrounds for faster code
Cache flow optimisation

Lab-session: yesterday’s image-filters using a video-stream from a web-cam or file.

Day 3: Optimisation of memory and group-sizes

Learn the concept of “data-transport is expensive, computations are cheap”.

Register usage
Data-rearrangement
Local and private memory
Image/texture memory
Bank-conflicts
Coalescence
Prefetching

Lab-session: various small puzzles, which can be solved using the explained techniques.

Day 4: Optimisation of algorithms

Learn techniques to help the compiler make better and faster code.

Precision tinkering
Vectorisation
Manual loop-unrolling
Unbranching

Lab-session: like day 3, but now with compute-oriented problems.

Enrolment

When filling in this form, you declare that you intend to follow the course. Cancellation can be done via e-mail or phone at any time.

StreamHPC will keep you up-to-date for the training at your location(s). When the minimum of 5 attendees has been reached, a final date will be discussed. If you selected more locations, you have the option to wait for a training at another city.

Put any remarks you have in the message. If you have any question, mail to trainings@streamhpc.com.

[si-contact-form form=’7′]

Promotion for OpenCL Training (’12 Q4 – ’13 Q2)