Khronos OpenCL presentation at SIGGRAPH 2010

Here you find the videos uploaded by Khronos of their presentation about OpenCL. I added the time-line, so you can scroll to the more interesting parts easily. The presentation by Ofer Rosenberg of Intel and Cliff Woolly of NVIDIA were not uploaded (yet). Please note that for non-American people the speech of Affi Munchie is hard to hear; luckily his sheets explain most.

http://www.youtube.com/watch?v=BdZFtcQ2LYw

For the first two presentations the sheets can be downloaded from the Khronos-website. The time-line has the sheet-numbers mentioned.

0:00 [sheet 1] Presentation by the president of Khronos and chair of the session: Neill Trevett of NVIDIA.
0:06 [sheet 2] Welcome and a quick overview
1:12 [sheet 3] The prizes for the attendees (not us, online viewers)
1:40 [4] Overview of all members of Khronos. Khronos does not only take care of OpenCL but also the more famous OpenGL and projects like Collada.
2:26 [5] Processor Parallelism. CPUs are getting more parallel and GPUs more programmable. The overlapping area is called Heteregenous Computing and there is where OpenCL pops up.
3:10 [6] OpenCL timeline from version 1.0 to 1.1.
4:44 [7] OpenCL workinggroup with only 30 logos. He mentions missing logos like the one from Apple.
5:18 [8] The Visual Computing Ecosystem, where OpenCL interoperability with other standards are shown. The talk is not complete, so I don;t know if he talks about DirectX.

http://www.youtube.com/watch?v=5s4KUCfUhCo

0:00 [sheet 9] Presenation by Affie Munchie, OpenCL specifications editor.
0:12 [10-12] An overview of OpenCL, mentioning the design goals and the platform model. It is very hard to hear what he says.
4:53 [13] The execution model mentioning Work-item, kernel, program, context and queues. These all come back in his talk later, especially queues.
6:23 [14] The big idea behind OpenCL, here with an 1D-problem.
6:42 Introduction to the easy example where you can see the difference between CPU-code (a sequentially repeated line) and GPU-code (multiply parallel executed “kernel”). The “get_global_id(0)” is to be discussed later.
7:13 [15] a 2D-dimensional example. He explains the “global worksize” versus the “local worksize”.
8:52 The image (global) is divided in smaller pieces (work-groups). Discusses the domain of synchronisation, which is only done locally within a work-group.
9:27 [16] The OpenCL memory model. We know this from the books.
9:45 Camera switch and sheets 17 and 18 have been skipped.
9:45 [19] Queues and events. Discusses in-order and out-of-order queues more thoroughly. In short: in-order need events.
11:13 [20] The different queue-types get more clear here: out-of-order is the left picture, in-order the right one.
11:55 [21] An overview of the OpenCL kernel-language
13:36 [22] Explanation of the what a kernel exactly is.

http://www.youtube.com/watch?v=5rwqrseAG6A

0:00 [sheet 22] Affie Munchie continues. Most of the following sheets are pretty much scrolled through with 20 to 40 seconds per page, so just read along.
0:10 [23] Workitems and groups
0:50 [24] Datatypes
1:31 [25] Vector Operations: assigning, accessing and basic operations
1:53 [26] What did Khronos add to OpenCL version 1.1? The following sheets explain in different words what I discussed in my article about OpenCL 1.1 changes.
2:28 [27] Thread-safety & Buffers: where (not) to worry about. Discusses why there is one function not thread-safe, what about threads and multiple devices, application to CPUs
5:00 [28] Events & call-backs by example. Discuses blocking and the preferable avoiding of death-lock.
6:43 [29] Callbacks for memory-handling
8:02 [30] New queries to mainly see the difference between 1.0 and 1.1 supported devices.
9:54 [31] Other new features, like the C++ wrapper.
10:50 [32] Language features I: implicit widening
11:16 [33] Language features II.
12:41 [34] New built-in functions.

http://www.youtube.com/watch?v=HaXt0PB8YD0

o:oo [sheet 34] Affie Munchie continues on “built-in functions”.
0:18 [35] New image-formats.
0:55 [36] OpenCL – OpenGL interoperability
2:10 [37] Code-example of object-sharing with OpenGL.
3:08 [38] Short summary of the changes in 1.1

Sheets for the following presentation by AMD and Graphics Media can be downloaded from the Khronos website.

Ben Gaster of AMD and Yaki Tebeka of Graphics Remedy have a talk named “Rendering the Breeze”.
3:46 [sheet 1] Introduction. Ben introduces his talk which focuses on a physics simulation. After his talk, Yaki will take over to demo debugging of OpenCL/OpenGL software.
4:13 [2] Acknowledgements and contact information.
4:48 [3] Introduction of the “Bullet Physics” library.
5:59 [4] Examples of three physics simulations done with OpenCL (pictures only).
6:31 [5] Cloth simulation introduction: how a pieve of cloth can be represented.
7:05 [sheets 6 – 8] Springs & Masses: the three types of interconnecting particles and how forces work on them.
7:49 [9 – 13] Parallelism explained with 4 particles.
8:33 [14 – 17] CPU approach for this problem.
9:15 [18] GPU approach, compared to the CPU. Each particle (or vertex) needs several updates and that makes the problem somewhat harder when wanting to parallelise the code.
10:50 [19] Vertex solver in a single batch. (don’t mention the guy standing at the left of the camera for the next 10 minutes). Ben discusses (dis)advantages of different techniques.
11:58 [20] A view on what a GPU is from the current perspective of the cloth-problem.
13:00 [21] Why this first approach does not suffice.
13:34 [22-26] Another approach: batching the simulation.

http://www.youtube.com/watch?v=jvRv9bIyiio

o:oo [sheets 27 – 31] Ben Gaster continues with code-examples of the batches.
0:39 [32 – 33] Dispatching a batch: the parallel stuff.
1:21 [34] “Link solver kernel header” – code example
1:36 [35 – 36] The body, part I – code example
2:15 [37] The body, part II – code example
2:49 [38] Back to batching without code.
3:29 [39] Higher efficiency by increasing group worksize.
4:06 [40] Higher efficiency: the limits.
4:28 [41] “Solving clothes together”
5:25 [42] Discusses SIMD for this problem. The disadvantage is that it breaches platform-independence. (You might have seen this kind of optimalisations in code-examples of CUDA-optimalisation, when taking i.e. “warps” into account).
6:35 [43] Explanation why this works.
7:05 [44] Benefits of working at SIMD-level. The trade-off is the platform-independence.
8:10 [45] Nice overview where we stand in the current big three processor-architectures.

8:30 Yaki Tebeka of Graphics Media takes the microphone.
8:34 [45] Explanation of the sheet.
9:24 [46] Overview of the OpenCL development challenges.
9:48 [47] gDEBugger introduction.
10:07 [48] Demo-setup. Please stand by, or fast-forward.
11:06 Demo of gDEBugger.

http://www.youtube.com/watch?v=TjzzFa46wcg

not transcribed yet

http://www.youtube.com/watch?v=BPKhPY3bh5A

not transcribed yet

Please visit the GDEBugger site to learn more.

Sheets of the presentation by Ofer Rosenberg of Intel can be downloaded from the Khronos website. Sheets of the presentation by Cliff Wooley of NVIDIA can be downloaded from the Khronos website.

Related Posts

One thought on “Khronos OpenCL presentation at SIGGRAPH 2010

  1. Pingback: OpenCL – the battle, part III - StreamHPC

Comments are closed.