A list of Desktop GPU architectures

p3-architectureUPDATED in February 2017

Some optimisation tricks work really well on one architecture, and are useless on others. And even with better drivers, the older architectures need some help. In other words, it helps to know what architecture the GPU has. Therefore you get some help from your friends at StreamHPC.

Below you’ll find a list of the architecture names of all OpenCL-capable GPU models of Intel, NVIDA and AMD. It does not contain the professional lines for now – first we are focusing on getting the general models right.

Understand it took a lot of time to gather the below information, and normally we share such information only with our clients.

Continue reading “A list of Desktop GPU architectures”

Google blocked OpenCL on Nexus with Android 4.3

renderscript-eats-openclImportant: this is only for Google-branded Nexus phones – other brands are free to do what they want, and they most-probably will.

Also important: this doesn’t mean that OpenCL on Android devices will be over, but that there is a bump in the road now Google tries to lock-in customers to their own APIs.

The big idea behind OpenCL is that higher level languages and libraries can be built on top of it. This is exactly what was done under Android: RenderScript Compute (a higher-level language) was implemented using OpenCL for ARM Mali GPUs.

Having OpenCL drivers on Android has several advantages, such that OpenCL can directly be used on Android and that there is room for other high-level languages that have OpenCL as back-end. Especially the latter is what probably made Google decide to cripple the OpenCL-drivers.

Google seems to be afraid of competition, and that’s a shame, as competition is the key factor that drives innovation. The OpenCL community is not the only one complaining about Google’s intentions concerning Android. Read page 3 of that article to understand how Google is controlling handset-vendors and chip-makers.

Google’s statement

In February OpenCL drivers were discovered on two Nexus tablets using a MALI T604 GPU. Around the same time there was one public answer from Google employee Tim Murray (twitter) why Google did not want to choose OpenCL: Continue reading “Google blocked OpenCL on Nexus with Android 4.3”

OpenCL 2.0 book on Indiegogo

indiegogo-opencl20Edit: the project unfortunately did not get enough funding on Indiegogo

Launching a book takes a lot of effort. By using crowd funding, we hope to get the book be published much earlier and for a lower price.

[button text=”Pre-order via Indiegogo – only in August 2013″ url=”http://igg.me/at/opencl20manual” color=”orange” target=”_blank”]

What you’ll get

You will get the first OpenCL 2.0 book on market. Fully updated with the latest function-references and power-tips. Also usable for OpenCL 1.1/1.2, to help you write backward-compatible software.

Reference pages for quick access of all OpenCL function – available online and offline. This has nothing to do with Khronos reference pages of OpenCL 2.0, as this is a complete rewrite and redesign of the description of each function-definition.

Reference pages of functions

A lot of energy goes into completely revising the original OpenCL reference pages, to create real value for you. This is not just a small upgrade, but an alternative (and more complete) explanation of all the functions. Expect it to contain twice as much information.

Each function will be explained in a clear language with full explanation of background-knowledge and an example. If the function can be used in more contexts, more examples are given.

At one glance you can see what is new per OpenCL version. Also all functions are extensively tagged and grouped, so you can easily find similar functions.

Basic concepts and programming theories

Various new additions to the series of basic concepts and the series on programming theories will only be available in the book, not on the blog. These chapters will help you connect the dots and get a better overview of how OpenCL works.

This content is unique and not found anywhere else. It has its foundation in hundreds of articles and research papers, combined with the years of experience in the field as a developer and a trainer.

Hardware and Optimisation guide

An explanation of all OpenCL optimisation techniques. Including a guide how to use auto-tuning to find the best configurations for each optimisation.

How well does each optimisation work on the various architectures? The results of mini-benchmarks will give you a complete overview what helps and what not.

Tools & software

There are various tools out there – both open source and commercial. These tools make it easier to program more efficiently and faster. The top 10 of best OpenCL tools are described, even software not discussed before online.

For all contributors

Reference pages

You get access to the reference pages while I work on it. When finished, you also get a zip-file with html-files in times you don’t have access to internet. You will get updates for all 2.0 updates. You can give feedback at any time and with this you have influence on the direction the manual is going.

E-book

At all times you get a progress report with a TOC. When finished you’ll get the book sent as PDF. After some time of feedback, you’ll receive a new version. People who bought the print, will receive it with the second version.

All prices are including Dutch VAT.

Ways You Can Help

Have you supported this project? Thank you very much for your support!

Please also tell your friends and colleagues, and on Twitter, Facebook, LinkedIn!

 

 

Installing and using Portable Computing Language (PoCL)

pocl
PoCL, a perfect companion for Portable Apps?

Update August’13: 0.8 has been released

PoCL stands for Portable Portable Computing Language and the goal is to make a full and open source implementation of OpenCL 1.2 for LLVM.

This is about installing and using PoCL on Ubuntu 64. If you want to put some effort to build it on Windows, you will certainly help the project. See also this TODO for version 0.8, if you want to help out (or want to know its current state). Not all functionality is implemented, but the project progresses using test-driven development – using the samples in the SDKs as a base.

Backends

They are eager for collaboration, so new backends can be added. For what I’ve seen this project is one of the best starts for new OpenCL-drivers. First because of the work that already has been done (implement by example), second because it’s an active open source project (continuous post-development), third because of the MIT-license (permits reuse within proprietary software). Here at StreamHPC we keep a close eye on the project.

On a normal desktop it has only one device and that’s the CPU. It has backends for several other types of CPUs (check ./lib/kernel in the source):

  • ARM
  • Cell SPU
  • Powerpc
  • Powerpc64
  • x86_64

Also the TCE libraries can be used as backend. The maturity of each backend differs.

More support is coming. For instance Radeon via the R600 project, but PoCL first needs to support LLVM 3.3 for that.

It is also a good start to build drivers for your own processor (contact us for letting us assist you in building such backend, see Supporting OpenCL on your own hardware for some related info)

Continue reading “Installing and using Portable Computing Language (PoCL)”

Starting with Altera OpenCL-on-FPGAs

SVpressgraphicAltera has been very busy adding resources and has kicked off the beginning of June with opening up their OpenCL-program for the general public.
Only Stratix V devices are supported, but that could change later.

Below are all pages and PDFs concerning OpenCL I’ve found while searching Altera’s website.

Evaluation of CPUs, GPUs, and FPGAs as Acceleration Platforms

Altera wanted to know where they could compete with GPUs and CPUs. For a big company their comparisons are quite honest (for instance about their limited access-speed to memory), but they don’t tell everything – like the hours(!) of compilation-time. The idea is that you develop on a GPU and when it’s correct, you port the correctly working software to the FPGA.

If you don’t have any experience working with their FPGAs, best is to ask around.

Medical_OpenCL
Image taken from Altera website.

Continue reading “Starting with Altera OpenCL-on-FPGAs”

AMD vs NVIDIA – Two figures that can tell a whole story

titanUpdate September ’13: AMD gets their new GPUs “Volcanic Islands” with GCN 2.0 out in October. For this reason the HD 7970’s price has dropped to €250. This shakes up some of the things described in this article.

Update June ’14: It has become clear that Titan is not a consumer device and should be categorised as a “Quadro for compute”. All consumer devices of both AMD and Nvidia show relatively low GFLOPS for dual precision.

Update July’14: Graphs updated with GTX Titan Z and R9 290X.

AMD/ATI has always had the fastest GPU out there. Yes, there were lots of times in which NVIDIA approached the throne, or even held the crown for a while (at least theoretically), but it was Radeon, at the end, the one who had the right claim.

Nevertheless, some things have changed:

  • AMD has focused more on the new architecture, making it easier to program while keeping the GFLOPS the same.
  • AMD bets on their A-series APU with integrated GPU.
  • NVIDIA has increased both memory bandwidth and GFLOPS at a steady pace.
  • NVIDIA has done the nitro-trick for double precision.

With NVIDIA GTX Titan (see three of them in the image), NVIDIA snatched victory from the jaws of defeat.

I’m not saying you should jump now to CUDA; there’s more than just GFLOPS. We should think also of costs and prevention of vendor-lockin. More particularly, I would like to show how unpredictable the market for accelerator-processors is.

Let’s take a look at the figures. Continue reading “AMD vs NVIDIA – Two figures that can tell a whole story”

Sheets GPGPU-day 2012 online

GPGPU Day Speakers_small.2
Photos made by Cyrille Favreau

Better now than never. It has almost been a year, but finally they’re online: the sheets of the GPGPU-day Amsterdam 2012.

You can find the sheets at http://www.platformparallel.com/nl/gpgpu-day-2012/abstracts/ – don’t hotlink the files, but link to this page. The abstracts should introduce the sheets, but if you need more info just ask them in the comments here.

PDFs from two unmentioned talks:

I hope you enjoy the sheets. On 20 June the second edition will take place – see you there!

 

WebCL Widget for WordPress

webcl-widget-adminSee the widget at the right showing if your browser+computer supports WebCL?

It is available under the GPL 2.0 license and based on code from WebCL@NokiaResearch (thanks guys for your great Firefox-plugin!)

Download from WordPress.org and unzip in /wp-content/plugins/. Or (better), search for a new plugin: “WebCL”. Feedback can be given in the comments.

I’d like to get your feedback what features you would like to see in the next version.

Continue reading “WebCL Widget for WordPress”

The 13 application areas where OpenCL and CUDA can be used

visitekaartje-achter-2013-V
Did you find your specialism in the list? The formula is the easiest introduction to GPGPU I could think of, including the need of auto-tuning.

Which algorithms map is best to which accelerator? In other words: What kind of algorithms are faster when using accelerators and OpenCL/CUDA?

Professor Wu Feng and his group from VirginiaTech took a close look at which types of algorithms were a good fit for vector-processors. This resulted in a document: “The 13 (computational) dwarves of OpenCL” (2011). It became an important document here in StreamHPC, as it gave a good starting point for investigating new problem spaces.

The document is inspired by Phil Colella, who identified seven numerical methods that are important for science and engineering. He named “dwarves” these algorithmic methods. With 6 more application areas in which GPUs and other vector-accelerated processors did well, the list was completed.

As a funny side-note, in Brothers Grimm’s “Snow White” there were 7 dwarves and in Tolkien’s “The Hobbit” there were 13.

Continue reading “The 13 application areas where OpenCL and CUDA can be used”

Applied GPGPU-days Amsterdam 2013

6754632287-2December 2013: Videos are not ready yet, but link will be put here.

Amsterdam, 20 June – Applied GPGPU-days in Amsterdam. Keep your agenda free for this event.

What can you do with GPUs to speed up computations? This year we can see various examples where OpenCL and CUDA have been used. We hope to give you an answer if you can use GPUs for your software, research or algorithm.

After the success of last year (fully booked with 66 attendees), we now have reserved a larger location with place for 100 people. Difference with last year is that we focus more on applications, less on technical aspects.

The program has been made public recently:

Title of talk Company/Institute Presenter
Introduction to GPGPU and GPU-architectures StreamHPC Vincent Hindriksen
Blender Cycles & Tiles: Enhancing user experience AtMind bv Monique Dewanchand & Jeroen Bakker
XeonPhi vs K20: The fight of the titans SURFsara Evghenii Gaburov
A real-time simulation technique for ship-ship and ship-port interaction PMH bv Jo Pinkster
CUDA Accelerated Neural Networks LIACS Ana Balevic
Efficient Reconstruction of Biological Networks via Transitive Reduction on GPUs TU Eindhoven Anton Wijs
Running Petsc on GPUs with an example from fluid dynamics SURFsara Thomas Geenen
Connected Component Labelling, an embarrassingly sequential algorithm Leeuwarden University Jaap van de Loosdrecht
Visualizing sound and vibrations using a GPU and a 1024-channel microphone array TU Eindhoven Wouter Ouwens
Gravitational N-body simulations on 1 to many GPUs Leiden observatory Jeroen Bédorf

A few demos will be shown.

For more information, see the Platform Parallel webpage. Also to find other events by the platform.

Tickets are €75,-. If you are from a Dutch university or research institute affiliated with SURF, your ticket has been fully sponsored by SURFsara.

Associated events in the Netherlands

For the technical aspects (GPU-programming techniques, optimisation, etc) we have a special day: the GPU Dev Day 2013. More information on the Platform Parallel webpage. Date and place will be made public in June.

The first Khronos Meetup Benelux will take place just before the Applied GPGPU day, on 19 June in Amsterdam. More information on the meetup-page.

GPU-developers, work for StreamHPC and friends

applyBored at work? Go start working for one of the anti-boring GPU-expert companies: StreamHPC (Netherlands, EU), Appilo (Israel) or AccelerEyes (Georgia, US).

We all look for people who know how to code GPUs. Experience is key, so you need to have at least one year of experience in GPU-programming.

Submit your CV now and don’t give up on your dream leaving behind the boring times.

StreamHPC

Amsterdam-based StreamHPC is a young startup with a typical Dutch, open atmosphere (gezellig). Projects are always different from the previous as we work in various fields. Most work is in parallisation and OpenCL-coding. We are quite demanding of your knowledge on the various hardware-platforms OpenCL works on (GPUs, mobile GPUs, array-processors, DSPs, FPGAs).

See the jobs-page for more information and how to apply.

Appilo

North Israel based Appilo is seeking GPU-programmers for project-basis contracts. Depending on the project, most of the work could usually be performed remotely.

Use the mail on the contact-page at Appilo to send your CV.

AccelerEyes

Atlanta-based AccelerEyes delivers products which are used to accelerate C, C++, and Fortran codes on CUDA GPUs and OpenCL devices. They look for people who believe in their products and want to help make them better.

See their jobs-page for more information and how to apply.

Winning demo of Tokyo Demo Fest 2013 uses OpenCL

flagThe Tokyo Demo Fest 2013 is one of the many demo-parties around the globe. At such parties is where great programmers meet great artists and show off what came out of their collaborations.

The winner of this year used OpenCL to compute real-time procedurally generated geometries. For the rest C++, OpenGL and Gamemonkey Script was used.

Tech features: curl noise, volumetric textures, Perlin noise, mesh deformations, HDR/bloom, film grain, fractals, Hermite splines, Tweens and quaternion iridescent particles.

The creator, Tokyo-based Eddie Lee, has done more projects – be sure to visit his homepage. I hope more demosceners start using the power of OpenCL to get more out of their demo’s.

Do you see where below kernel is used? Hint: check the other videos of Eddie.

__kernel void curlnel( 
                      __read_only image3d_t volume,  
                      sampler_t volumeSampler,  
                      __global float4 *output,  
                      float speed 
                      ) 
{ 
    uint index = get_global_id(0); 
    uint numParticles = get_global_size(0); 
    float indexNormalized = (float)index/numParticles; 

    // read from 3D texture 
    float4 vertData = output[index]; 

    float3 samplePos = vertData.s012; 
    samplePos = samplePos+(float3)(0.5f); 

    float4 voxel = (read_imagef(volume, volumeSampler, 
                   (float4)(samplePos,1.0f))); 

    vertData.s012 += voxel.xyz*speed; 

    output[index] = vertData; 
}

According to GPUVerify (see previous post) the line starting with “float4 voxel” has an error.

Verify your OpenCL and CUDA kernels online for race conditions

gpuverifyGPUVerify is a tool for formal analysis of GPU kernels written in OpenCL and CUDA. The tool can prove that kernels are free from certain types of defect, such as data races and bugs. This is quite useful feedback for any GPU-programmer.

Below you find a online version of the tool (please don’t break it!). Play around and test your kernels. Be aware the number of groups is the global worksize divided by local worksize.

For demo-purposes some values have been pre-filled with a simple kernel – press “Check my OpenCL kernel” to find the results. Did you expect this from this kernel? Can you explain the result?

After the LEAP-conference I’ll extend this article – till then I’m too time-limited. For now I wanted to share the online version with you, especially with the people who will attend the tutorial at LEAP. Be sure to check out the GPUVerify website and paper to learn more about this fantastic tool! Continue reading “Verify your OpenCL and CUDA kernels online for race conditions”

21-23 August: OpenCL Training London

From 21 to 23 August StreamHPC will give a 3-day training in OpenCL. Here you will learn how to develop OpenCL-programs.

A separate ticket for only the first day can be bought, as then will be a crash-course into OpenCL. Module basics.

The second and third day will all about parallel-algorithm design, optimisation and error-handling. Module optimisation with several new subjects added.

The last part of the third day is reserved for special subjects, as requested by the attendees. Continue reading “21-23 August: OpenCL Training London”

OpenCL error codes (1.x and 2.x)

computer-says-no
Little Britain: “Compu’er says no”. (links to Youtube movie)

Knowing all errors by heart is good for quick programming, but not always the best option. Therefore I started to create a full list with extra info, taken from cl.h and the reference documentation.

The problem with many error-codes is that they are sometimes context-dependent and then become quite useless in helping the programmer out. Also some drivers return different error-codes. Notice also that different errors are given per OpenCL-version for the same function. If you find problems, help make OpenCL better and give feedback.

Want it on your wall? You can easily copy these two tables into Excel or alike software and print it out.

Continue reading “OpenCL error codes (1.x and 2.x)”

ERSA-NVIDIA award for “Best Young Entrepreneur”

ersa-logoStreamHPC supports the ERSA conference, 22-25 July in Las Vegas. At that conference there will be an award given to “Best Young Entrepreneur” and I’d like you to send in a proposal. The winner gets an NVIDIA Tesla K20!

Young entrepreneurs and academics with a great product/project are invited to present their solution. As the event draws around 2000 people, you get the attention needed to show-case your new company or research-group. Your solution does not need to be based on FPGAs or GPUs, as long as Von Neumann’s architecture is not in it.

Read the information below or directly go to the ERSA-NVIDIA awards-homepage.
“Von Neumann’s architecture lasted for 75 years.”
That genius can no longer lead us into the new age of computing that is upon us. This competition seeks to acknowledge those pioneers that are helping to build the new computing landscape”

Submission of Proposals for ERSA-NVIDIA award Candidates
Deadline: 6 May 2013 31 May 2013 – extended deadline!
Send proposals to org@ersaconf.org

The Award is devoted for entrepreneurs developing tools, advanced technologies and opportunities for supporting applications, both academic and commercial, across broad area of high-performance, embedded systems implemented as multicore systems and reconfigurable heterogeneous parallel processing systems.

The Award Committee includes:

Leading Universities

  • Stanford University, USA, Prof. Michael Flynn
  • Imperial College London, UK, Prof. Wayne Luk
  • Karlsruhe Institute of Technology, Germany, Prof. Joerg Henkel
  • Keio University, Japan, Prof. Hideharu Amano
  • Shanghai Jiao Tong University, China, Prof. Simon See

Leading Companies (tentative list):

  • NVIDIA, Can Ozdoruk, Product Manager
  • Altera, Steve Casselmanm, Principal Engineer
  • National Instruments, Hugo Andrade, Principal Architect

For more info go to: http://ersaconf.org/awards/

If you have any question, just ask them in the comments or send us an email.

12-14 June: OpenCL Training Amsterdam

From 12 to 14 June StreamHPC will give a 3-day course in OpenCL (was 3 to 5 June). Here you will learn how to develop OpenCL-programs.

A separate ticket for only the first day can be bought, as then will be a crash-course into OpenCL. Module basics.

The second and third day will all about parallel-algorithm design, optimisation and error-handling. Module optimisation with several new subjects added.

The last part of the third day is reserved for special subjects, as requested by the attendees. Continue reading “12-14 June: OpenCL Training Amsterdam”

Scaling mobile GPUs to 1000 GFLOPS

arm_mali_cover_151112297646_640x360On the 20th of April 2013 there was an interesting discussion between Jan Gray and David Kanter. Jan is a specialist in C++ and FPGAs (twitter, homepage). David is a specialist in CPU and GPU architectures (twitter, homepage). Both know their ways well in the field of semiconductors. It is always a joy to follow their short discussions when they happen, but there was something about this one that made me want to share it with special attention.

OpenCL on ARM: Growth-expectation of GFLOPS/Watt of mobile GPUs exceeds Moore’s law. That’s incredible!

Jan Gray: .@OpenCLonARM GFLOPS/W more a factor of almost-over Dennard Scaling. But plenty of waste still to quash. http://www.fpgacpu.org/papers/Gray_AutumnOfMooresLaw_SingularityUniversity_11-06-23.pdf

Jan Gray‏: .@openclonarm Scratch Dennard tweet: reduced capacitance of yet smaller devices shd improve GFLOPS/W even as we approach end of Vdd scaling.

David Kanter: @jangray @OpenCLonARM I think some companies would argue Vdd scaling isn’t dead…

Jan Gray: @TheKanter @openclonarm it’s not dead, but slowing, we’ve gone from 5V to 1V (25x power savings) and have maybe several hundred mVs to go.

David Kanter: @jangray I reckon we have at least 400mV, so ~2X; slower than ideal, but still significant

Jan Gray: @TheKanter We agree, I think.

David Kanter: @jangray I suspect that if GPU scaling > Moore’s Law then they are just spending more area or power; like discrete GPUs in the last decade

David Kanter: @jangray also, most positive comment I’ve heard from industry folks on mobile GPU software and drivers is “catastrophically terrible”

Jan Gray: @TheKanter Many ways to reduce power, soup to nuts. For ex HMC DRAM on interposer for lower energy signaling. I’m sure many tricks to come.

In a nutshell, all the reasons they think mobile GPUs can outpace Moore’s law while staying under a certain power-usage.

It needs some background-info, so let’s start the background of the first tweet, and then explain what has been said. Continue reading “Scaling mobile GPUs to 1000 GFLOPS”

Q&A with Adrien Plagnol and Frédéric Langlade-Bellone on WebCL

WebCL_300WebCL is a great technique to have compute-power in the browser. After WebGL which gives high-end graphics in the browser, this is a logical step on the road towards the browser-only operating system (like Chrome OS, but more will follow).

Another way to look at technologies like WebCL, is that it makes it possible to lift the standard base from the OS to the browser. If you remember the trial of Microsoft’s integration of Internet Explorer, the focus was on the OS needing the browser for working well. Now it is the other way around, but it can be any OS. This is because the push doesn’t come from below, but from above.

Last year two guys from Lyon (South-France) got quite some attention, as they wrote a WebCL-plugin. Their names: Adrien Plagnol and Frédéric Langlade-Bellone. Below you’ll find a Q&A with them on WebCL. Enjoy! Continue reading “Q&A with Adrien Plagnol and Frédéric Langlade-Bellone on WebCL”

LEAP-conference call for papers

921752_m
Building bridges in a new industry

Embedded processors always have had the focus on low-energy. Now a combination of Moore’s law, the frequency-wall and multi-processor developments have made it possible for these processors to compete in completely new market segments. Most notable due to impressive advancements in graphics IP.
We are now looking at four groups who are interested in learning from each other:

  • The embedded processor market
  • The FPGA market
  • The HPC and server market
  • The GPGPU market

And answer the question: how can we get more out of low-energy processors by looking at other industries?

The goal of the LEAP conference is to bring these three groups together. Creating the windows to each other and paving roads over the newly constructed bridges. This makes it one of its kind. Half of the conference is focused on quality information sharing and the other half on networking. For more information, check the website of the LEAP-conference. StreamHPC is a co-organiser.

Call for papers is now open! Programme is filled!

Continue reading “LEAP-conference call for papers”