OpenCL vs CUDA Misconceptions

Posted by Vincent Hindriksen on 22 June 2011 with 47 Comments

Translation available: Russian/Русский. (Let us know if you have translated this article too… And thank you!)

Last year I explained the main differences between CUDA and OpenCL. Now I want to get some old (and partly) false stories around CUDA-vs-OpenCL out of this world. While it has been claimed too often that one technique is just better, it should be also said that CUDA is better in some aspects, whereas OpenCL is better in others.

Why did I write this article? I think NVIDIA is visionary in both technology and marketing. But as I’ve written before, the potential market for dedicated graphics cards is shrinking and therefore forecasting the end of CUDA on desktop. Not having this discussion opens the door for closed standards and delaying innovation, which can happen on top of OpenCL. The sooner people & companies start choosing for a standard that gives equal competitive advantages, the more we can expect from the upcoming hardware.

Let’s stand by what we have learnt at school when gathering information sources, don’t put all your eggs in one basket! Gather as many sources and references as possible. Please also read articles which claim (and underpin!) why CUDA has a more promising future than OpenCL. If you can, post comments with links to articles you think others should read too. We appreciate contributions!

Also found that Google Insights agrees with what I constructed manually.

The trends

The word “CUDA” existed for a long time as slang for “Could have” and there is some party-bar with that name in Canada, a ’71 car (once by Plymouth, an US car manufacturer based close to Canada) and an upcoming documentary (also from Canada… What would South Park say?!). If you peek at Google trends, the first thing you see is that CUDA (red) is much bigger than OpenCL (blue). Not paying too much attention gives the common idea that OpenCL is just cute in comparison to CUDA. I fixed the graph by setting the pre-2007 to zero (see the image above). Then you see clearly that CUDA is not as huge as it seemed, and that it has even been going down for the last 2 years. At the end of the year, you might see the two lines much, closer than NVIDIA wants you to see. In other words: if you had the feeling that CUDA was only rising, then note how OpenCL grew even harder according to Google trends.

SimplyHired gives a comparable view on CUDA vs OpenCL (OpenMP is for comparison, MPI is much bigger). Though CUDA is still bigger, it is comparable and the lines sometimes even touched (might it be love?). Nice to see: you can recognise the dates of CUDA-releases in the peaks. I can’t explain the big decline for both CUDA and OpenCL started in March ’11.

Then there is the potential R&D that can be put in developing new techniques. I found at Yahoo Finance the annual spending on R&D (based on last Quarter). For the most important X86-companies in OpenCL this is:

NVIDIA: 848M USD
AMD: 1,405M USD
APPLE:1,782M USD
IBM: 6,026M USD
Intel: 6,576M USD

You understand that once the time is right, there’s no match for NVIDIA. Not that all R&D will be put into OpenCL, NVIDIA doesn’t put all R&D into CUDA.

Toolset

CUDA and OpenCL do mostly the same. It’s like Italians and French fighting over who has the most beautiful language, while they both come from the same Latin/Romanic branches. But there are some differences though. CUDA tries to be one in a packet for developers, while OpenCL is mostly language-description only. For OpenCL the SDK, IDE, debugger, etc., all come from different vendors. So, if you have an Intel SandyBridge and an AMD Radeon, you need even more software when working on performance-optimizing kernels for different hardware. In reality, this is not ideal, but all you need is really there. You need to go to different places, but it is not that the software is not available as is claimed much too often.

Currently VisualStudio-support is very good from NVIDIA, AMD and Intel. At OSX XCode gives all the developer-needs around OpenCL. Last year the developer-support for CUDA was better, but the catch-up here is finished.

Libraries

Where CUDA comes in strong and OpenCL needs a lot of catch-up is with what they’ve built on top of the language. CUDA has support for templates, which brings nice advantages. Then there is a math-libary which comes for free:

cuFFT – Fast Fourier Transforms Library
cuBLAS – Complete BLAS Library
cuSPARSE – Sparse Matrix Library
cuRAND – Random Number Generation (RNG) Library
NPP – Performance Primitives for Image & Video Processing
Thrust – Templated Parallel Algorithms & Data Structures
math.h – C99 floating-point Library

For most, there are alternatives you can easily build by yourself, but there is nothing alike. This will of course come in time for each architecture, but now this is the big win for CUDA.

One example of free math-software for OpenCL is ViennaCL, a full linear algebra library and iterative solvers. More about CUDA Math-libraries later in an upcoming article.

Heterogeneous Programming

OpenCL works on different hardware, but the software needs to be adapted for each architecture. It is not something that will blow minds: you need different types of cars to be fastest on different kinds of area. If CUDA could work on both Intel CPUs and NVIDIA GPUs there would be a problem: the performance of a GPU-optimized kernel will not work well on a CPU. Just as with OpenCL, you need to program the code specifically for CPUs. The claim that CUDA is better because of its performance is about the same on each piece of hardware it runs on its bogus. It just does not touch the problem OpenCL tries to solve: having a programming-concept for may types of hardware.

This makes you think about why we never saw the X86-implementation of CUDA that had been developed last year. Actually, it was announced as a public beta recently, but it is still not performance optimized and costs $299,- as a part of the Portland compiler suite. A performance-optimized version will be released the end of 2011, so then let’s have a look again.

Performance-comparisons

OpenCL 1.1 has some speed-up with i.e. strided copies. Comparing CUDA to OpenCL 1.0 (since NVIDIA’s 1.1-drivers a a year old and not been updated since) is just not fair. What is fair to say is that one piece of hardware is faster than another, and certain compilers can be more advanced in optimizing. But since CUDA and OpenCL as a language are so much alike, it is impossible to put a verdict on which language is (potentially) faster. Would it be like saying that Objective C is faster than C++? No, again it’s the compiler (and the programmer) which makes it faster.

I also still see some comparisons to RADEON HD4000-series, which are not really fit for GPGPU. The 5000 and 6000 series are. This problem will slowly fade away with more benchmarking, but not as fast as I hoped it would.

Bang per buck

A Tesla c2050 with 3GB of RAM costs $2400,-, giving 1 TFLOPS single precision (0.5 TFLOPS double precision). The fastest AMD Radeon, the HD6990 with 4GB, costs $715,- and gives 5.1 TFLOPS performance single precision (1.2 TFLOPS double precision). Three of them give more than 15 TFLOPS for $2145,-. Of course these are theoretical numbers and we still have the issue of the limits of PCIe. But for many problems, RADEONs are just much faster than TESLA/GeForce with GPGPU. TESLAs have higher transfer-rates and can have 6GB of memory, so they are a better fit for other problems. FFT and alike computations, for instance, still rock on NVIDIA-hardware.

Edit 28-1-2012: There were comments on the above comparison of Tesla to Radeon and GeForce. This is not a technical comparison between the graphics cards but more a marketing perspective. Many serious research and financial institutes were buying Tesla-cards as they were marketed as they must be the best, as they are so expensive. People who chose GPGPU but did not know what to buy, bought Tesla-cards since it was an obvious choice according to the marketing-stories. The reason why you would buy one is because you want, for example, ECC, but not if you want the fastest card (highest memory bandwidth + processor-power).

Books

Books are a very good measurement for expected popularity. But it is also used to push technologies (books published by the company who makes the software/hardware).

Since there are more (English) books on CUDA than on OpenCL, you might think CUDA is the bigger one. A nice one is the recently released GPU gems. But the only to-be-released-soon book I could find that mentioned CUDA was Multi-core programming with CUDA and OpenCL, and there are 3 books in the making for OpenCL (but actually three and a half then). I also understood that UK-based CUDA Developer is working on a book.

Edit 21-07-2011: Elsevier releases “CUDA” in august.
Edit 1-02-2012: As I mentioned on Twitter, “Multi-core programming with CUDA and OpenCL” was pulled back from release.

4.0 > 1.1?

This claim was made not long ago, and they were being serious: 4.0 is bigger than 1.1, so CUDA is much more advanced. This reminds me of the browser-discussions, where was said Firefox would be behind since it had only reached to version 4. But I understand; 1.0 sounds so new and just finished; 1.1 sounds like the first bugfix-release. But in reality OpenCL 1.1 has support for Cloud-computing, which CUDA only added recently. As said, CUDA still has support for graphics cards only, which OpenCL had since 1.0.

It is often said that CUDA has a 2 year advantage, but ATI already had a lot of research done on GPGPU (Close to metal) years before AMD eventually chose for OpenCL and almost a year before CUDA 1.0 was launched. Close-to-Metal was replaced by AMD’s Stream and then by OpenCL. Don’t think all projects started from scratch, and be aware that OpenCL was co-designed by both NVIDIA, AMD and others. This means that it has all the best the predecessors (including CUDA and AMD Stream) had to offer.

CUDA is said to be more mature, but since the language is comparable mature, they refer to the drivers and the programming environments (IDEs). This is the OpenCL driver-status:

AMD-drivers are mature (both GPU and CPU).
NVIDIA-drivers is still on 1.0.
Intel-drivers is in Beta.
IBM-drivers are stable (POWER), but still in ‘alphaworks‘.
ARM-drivers (various) are in closed beta.

So CUDA-drivers are as mature as AMD OpenCL drivers. Also, since many companies have put all their knowledge from other products into OpenCL, the technique is much older than the name and the version-number.

Conclusion (2011)

You might be completely missing the differences in the API. There are language-differences between CUDA 4.0, OpenCL 1.0 and OpenCL 1.1, but I will give an overview of differences later (and I’ll put the link here). We think we have enough to tell you how to port your CUDA-software to OpenCL.

My verdict:

CUDA

+ is marketed better.
+ has developer-support in one package.
+ has more built-in functions and features.
– only works on GPUs of NVIDIA.

OpenCL

+ has support for more types of processor architectures.
+ is a completely open standard.
– Only AMD’s and NVIDIA’s OpenCL-drivers are mature – Intel and IBM expected soon to mature their drivers.
– is supplied by many vendors, not provided as one packet or centrally orchestrated.

I hope you found that OpenCL is not a cute alternative to CUDA, but an equal standard which offers more potential. OpenCL has to do some catch-up, yes, but it will all happen soon this year.

http://www.google.com/products/catalog?client=ubuntu&channel=fs&q=tesla+s2050&oe=utf-8&um=1&ie=UTF-8&tbm=shop&cid=8721727382375528152&sa=X&ei=lQcBTuecNcWfOteR1YAO&ved=0CDIQ8wIwAw

47 thoughts on “OpenCL vs CUDA Misconceptions”

MySchizoBuddy 22 June 2011

Can you talk more about math libraries for opencl. plus what will be new in 1.2 and 1.3
- Vincent Hindriksen Post author22 June 2011
  
  An upcoming article will discuss all math libraries for both CUDA and OpenCL, including LibJacket.
MySchizoBuddy 22 June 2011

one more thing i would like to mention that favors cuda is Jacket library for matlab so you can speed up your matlab codes.
Christophe Riccio 22 June 2011

I am all about open standard and on that stuff but for many “professional” context I really don’t see the point to use something else than a NVIDIA card because either with OpenCL and CUDA, it’s much much faster. The new AMD architecture may change this but when you wrote:
“I also still see some comparisons to RADEON HD4000-series, which are not so fit for GPGPU as the 5000 and 6000 series are.” I am sorry but the Radeon 5000 and 6000 series aren’t either!

Also the advantage on the CUDA compiler is a big one to me. Being able to use a subset of C++ instead of a custom language. Finally, even is feature CUDA a something huge: malloc!

The advantage of being “cross platform” isn’t really one as the code need to be tweaked a reach optimal performance but in any case Radeon are magnitudes slower than GeForce so they can’t be used the same way, especially when the user experience is involved. (real-time visualization / editing?)

However, I don’t believe in the long term future of CUDA, the future has more chances with OpenCL, (an OpenCL 2.0 using C++?) or something else resulting of the convergence of parallel programming languages, libraries, hardware.
- Vincent Hindriksen Post author22 June 2011
  
  Thank you for your comment. Please send me links where it’s obvious that Radeons aren’t fit for GPGPU or much slower than Geforce-GPUs; your claim has no base.
  OpenCL is also a subset of C99, so I assume you’ve never worked with OpenCL. Actually starting a kernel in CUDA uses a custom <<>> syntax and needs nvcc to compile, where OpenCL can use normal C/C++-compilers. Have you checked out the self-study section at the bottom of http://www.streamhpc.com/education/?
  I really like the cross-architecture, but it is a choice to support is. If you only want to have support for NVIDIA, then it’s your choice to do so – analogy: Java runs on each platform, but Java-software most times is not tested on mobile phones or IBM POWER.
  Personally I hope C++-templates arrive in version 1.2.
- MySchizoBuddy 23 June 2011
  
  OpenCl is just a specification with a reference C99 implementation. It’s NOT a new language at all. Plus since it is just a specification you can have opencl in java, ruby and what not even C++.
  
  I would also like to see your source of “Radeon MAGNITUDES SLOWER than Geforce”.
Christophe Riccio 22 June 2011

How “OpenCL is also a subset of C99” compete with “Being able to use a subset of C++”? Sorry but I don’t have a base code written in C99 to please OpenCL!

With CUDA, basically we can do include “mylibrary.hpp” in a .cu and as far as we add __device__ __host__ where they belong to and be happy. Well the limitations are actually stronger be reusing existing code and keep this existing code building and working in the context it originally belong to is possible and not complex at all.
Christophe Riccio 22 June 2011

Starting peaking you own links:
http://www.sisoftware.net/?d=qa&f=gpgpu_gpu_perf&l=en&a=oca
Radeon 5850 barely compete with a GeForce 9500. Outstanding performance!

But other around Internet tend to go the same way.
http://www.geeks3d.com/20100330/geforce-gtx-480-opencl-performance-tested/

Indeed it all comes down to how a code is optimized for a hardware.
This link compare OpenCL on NVIDIA and AMD but actually OpenCL on NVIDIA tend to be slower than CUDA:
http://arxiv.org/ftp/arxiv/papers/1005/1005.2581.pdf
I am pretty sure that NVIDIA has implemented a slow down on OpenCL to make CUDA looks stronger because, as you said as well, ” it’s the compiler (and the programmer) which makes it faster.”
- Vincent Hindriksen Post author28 June 2011
  
  The nqueen-program (from geeks3d) dropped my bitcoin-mining to 55% on one Radeon HD6870, giving a result of 2.9 seconds. I cannot conclude other than that it is optimised for GeForce, since else it would make use of 100% of the GPU. I come back to this, and honestly admit if I cannot solve the N-queen problem for size 17 under a second.
Christophe Riccio 22 June 2011

I can admit that more recent review and test but also with more details are needed as think appears to have improve pretty well for AMD.

However in the end, OpenCL is just like CUDA, not cross platform and I would avoid the trouble to rewrite more code than I need.

“Personally I hope C++-templates arrive in version 1.2.” maybe… overloading as well then!
- Vincent Hindriksen Post author28 June 2011
  
  C++-templates/overloading can be taken care of by using kernel-source generation – more about that later, since I’m still looking into all the details.
MySchizoBuddy 23 June 2011

btw how do you deliver an opencl application to customers. do they need to have some prerequisite installed. How will the opencl optimize itself depending on what hardware it encounters on the customers PC
- Vincent Hindriksen Post author28 June 2011
  
  Optimised and tested for given hardware (SSE/AVX and GPU). My customers want two things: a very fast solution for now and the flexibility to switch to different hardware later. They buy the solution (explanation and understanding of the paralleled algorithm) and get the code just for free.
justme 1 July 2011

Wow, what an ludicrously flawed analysis!. Funny you made a bang-per-buck comparison between NVIDIA enterprise and AMD consumer-level hardware. A quick search shows the GTX590 with 3GB selling for 750 dollars, did you even bothered to look?

http://www.newegg.com/Product/Product.aspx?Item=N82E16814130630&cm_re=gtx_590-_-14-130-630-_-Product

You also mentioned PCIe speed as an issue here… well if today you are limited by the (theoretical 128 Gbits) bandwith PCIe 3.0 has, then you should try to learn how to program those things.

“Currently VisualStudio-support is very good from NVIDIA, AMD and Intel. At OSX XCode gives all the developer-needs around OpenCL. Last year the developer-support for CUDA was better, but the catch-up here is finished.”

Yeah, do you know Parallel Nsight? Because I don’t know any comparable tool from ATI, Intel or Apple. Jesus…

The article is full of gratuitous statements and speculations like “Actually that makes you think why we never saw the X86-implementation of CUDA that has been developed last year” (guessing), ” I also pointed out that OpenCL has to do some catch-up, but that will all happen this year” (future reading powers) or “What is fair to say is that one piece of hardware is faster than one another” (which is faster a hexacore nehalem or a fermi based GPU?, of course it depends which kind of problem you are solving).

Please note I am not saying CUDA is better nor faster.
- Vincent Hindriksen Post author1 July 2011
  
  Thank you for sharing your frustration; as this article was out of frustration too.
  
  In most discussions it is claimed that AMD has nothing like Tesla, so that’s why I compared to Tesla and not the GTX 590. This GPU has 2.5GFlops single precision, by the way. I hope NVIDIA starts supporting OpenCL 1.1 on their CUDA 4.0 drivers, so I can experiment more to find the perfect device for certain algorithms.
  
  If you can show me a PCIe 3.0 GPU, then we have a start on that discussion. With 2.1 it can be a limit. Still heterogeneous processors have certain advantages over discrete GPUs.
  
  Parallel NSight is a nice piece of software, but there is really a catch-up!
  Debugging on Intel: http://software.intel.com/en-us/articles/performance-debugging-intro/
  Debugging on AMD: http://developer.amd.com/tools/gDEBugger/Pages/default.aspx
  
  About the X86-implementation. A lot of arguments in favour of CUDA is that it is performance-portable on all supported devices. By some it is seen as a disadvantage that you need to rewrite kernels for specific devices in OpenCL, while this is not needed for CUDA. So I think I do have a point here with my “gratuitous statements and speculations” as you will see in the coming months (cannot tell all now).
  
  The claim I wanted to debunk is that CUDA is faster because benchmarks based on ported software say so. I totally agree with you that the one device is better in specific problems than the other, so that’s why I use OpenCL to have that flexibility.
nooron 15 July 2011

AMD’s got the new Southern Islands and their APUs. Intel’s got MIC and SandyBridge. NVIDIA is bringing some changes in Kepler and more in Maxwell. There are still other architectures such as Tilera that are interesting as well.

Indeed the world of GPGPU, especially its underlying hardware, is changing fast. OpenCL, however, is not well-positioned to take the lead in any changes. Due to its need to accommodate a variety of different hardware, the progress of OpenCL’s development would be more complicated and thus slower than that of CUDA.

NVIDIA, on the other hand, is in a better position to respond quickly to problems that emerge as GPGPU matures. After all, they have only their own hardware to deal with and they do not have to argue with anyone to decide how CUDA should be changed.

Vendors who rely solely on OpenCL will perhaps find OpenCL to be more of a hurdle than help in the long term.

That aside, I do think OpenCL will succeed in areas where compatibility is more important than performance. Low-end graphics software and the like perhaps wouldn’t need CUDA at all.

Also, I would like to point out that if you check the trend for CUDA and OpenCL of countries where “cuda” is not used in their languages, you will see a clearer divide between their search volumes.
- Vincent Hindriksen Post author16 July 2011
  
  What you are missing is that OpenCL is only the low-level part and by that very flexible. CUDA is everything from low-to-high, where most of the changes happen in the higher level parts. OpenCL’s flexibility makes is possible for each vendor to make it’s own personalised adaptation – just like there is a C-compiler for ARM and a C-compiler for Intel. That said OpenCL is more a common language than a compatibility-layer; I really don’t get your suggestion companies should not choose it for performance, so could you elaborate? Nvidia doesn’t have hybrid processors, so next year high-performance software will certainly not be CUDA-only.
  Google trends does not work well for regiosn, as http://www.google.com/trends?q=gpgpu&ctab=0&geo=de&date=all&sort=0 shows.
Vivek 16 July 2011

A true comparision between opencl and cuda.
Nice Article.
Ken Domino 4 August 2011

I like your comment “CUDA and OpenCL do mostly the same – it’s like Italians and French fighting over who has the most beautiful language, while they’re both Roman languages.” But, I would compare them more to dialects for Neanderthals instead of French/Italians. Both CUDA and OpenCL are still very primitive, with CUDA only slightly better than OpenCL in supporting kernel and host code in one source file, some C++ features, and the chevron syntax for kernel calls. But, kernels are essentially blocks for parallel “for” loops. Imagine if we had to place every block in C/C++ of the plain-ol’ “for(…;…;…)” loop into another function. Nobody would like that very much. But that is the current situation with CUDA and OpenCL currently. Microsoft’s C++ AMP fixes this, but who knows what it will look like exactly, and whether they’ll screw it up. (I’m a little worried about their automatic data copying mechanism between the GPU and CPU.)
nooron 2 September 2011

This google doc shows a clear divide between the popularity of OpenCL and CUDA: goo.gl/jm7TD It’s not exactly so big, though.

With the need for plugins and other vendor-specific tweaks, the greatest advantage that OpenCL has over CUDA – compatibility – is damaged, though not completely lost.

Also, OpenCL, as a standard, is fragmented due to the many plugins that different vendors will have to supply for their own hardware. Basically, my opinion on this is that right now it’s a time for change. A standard, at this time, only appears inappropriate to me.

Perhaps the argument I made with regard to performance was not a good one. OpenCL can deliver good performance with sufficient hardware-specific tweaking.

I think you are right to say that next year more performance-critical software will not be CUDA-only. However, there are other market factors that need to be considered as well: NVIDIA dominates two markets:
(1) professional graphics market (around 80% share by unit and 89% by revenue, though AMD is slowly catching up)
(2) Android tablet computer market: NVIDIA currently has no valid competitor in this field.
Developer for these markets are more likely to use CUDA instead of OpenCL.

I do think the most important factor that will decide the fate of CUDA will be Kepler’s relative performance comparing to MIC and Southern Island.

Currently CUDA’s greatest advantage is in the field of HPC. You see very few commercial HPC infrastructure built around OpenCL, but for CUDA there are a lot: Enzo, LibJacket, PGI CUDA compiler, GPU.NET and so on. Some vendors are so straight to the point that they simply state that their products support the Tesla line of NVIDIA products. This trend certainly will change when the performance of Kepler falls behind MIC or SI by a considerable margin. However, if Kepler only falls behind by a little bit, CUDA will continue to take the lead in HPC, because they one thing HPC people hate is rewriting their code. Computer scientists will have the time to do that, but people who specialise in other fields are less likely to change their code base once it’s written. Basically, my point here is that CUDA’s status in HPC is unlikely to be shaken, unless Kepler really falls behind by a lot. Of course, there’s still Project Denvor, which might be an interesting challenger to Intel’s No1 position in HPC.

My expectation is that CUDA will continue to thrive in the HPC market, while OpenCL will become increasingly influential in the consumer market.
- Vincent Hindriksen Post author11 October 2011
  
  I think that the one who wins the consumer and non-HPC business market, wins the most. How many HPC-programmers are there? Exactly. So it was a very good bet to go with Tegra for reasons we both agree on.
academic 4 September 2011

i just read the “Bang per buck” part and the part “4.0 > 1.1?”… i think you like ATI too much to be objective to do any comparisons in complete
what we found out (we’re not the only ones): NVidia costs more in theorectical Peak-Flop comparisons, but it solves most numerical problem (especially in computer vision and pattern recognition) faster because they’re more efficient (due to architectural reasons… AMD is switching their very long instruction word stuff)… so you got more from the peak 🙂
and then… opencl is still bugged as hell… AMD / NVidia at least… i am testing intel these days… seems quite good
and yeah opencl claims itself not to be as “full” as NVidia: OpenCL was designed as close as possible to common technology (that time only CUDA… brook+ was a whole other thing)… i could make a quotation here… but i’m too lazy now… CUDA was way more specialized, opencl not! so there’s the difference… CUDA has abilities you only can enable through extensions in OpenCL, even with 1.1!
And the worst thing here: comparing Tesla to a Radeon is more or less like comparing a BMW to a Fiat -> either you compare FireStream and Tesla or Geforce and Radeon! Tesla and FireaStream have ECC and a lot other stuff you really want to have in servers! so these cards are for professionals! (CAD, number crunching) geforce and readeon don;t have these capabilities because they’re for gamers and don’t need these things! so they are much cheaper because some little error in your game screen doesn’t bother you as much as your simulation of 2 weeks duration to go rats without you taking any notice of it… believe me, you want all these funny things that Tesla and FireStream offers you… and of course their drivers are not made for games… they are made for professional applications
- Vincent Hindriksen Post author11 October 2011
  
  Sorry for the ate reply. If you want to know my coloured background: I like open standards too much to be objective about closed programming models – meaning I am in favour of Intel, IBM and ARM too. I want to compare hardware, as I have written in “keep the hardware focus” – mostly because you really want to be flexible with all those upcoming different processor architectures.
  As you said NVidia has a different architecture, meaning they can solve specific solutions faster than AMD. But also other problems can be solved faster by CPUs from Intel and AMD (see my post about SSE/AVX). So yes, CUDA is more specialised as it runs on NVidia only (and on CPUs, but still unoptimised).
  Tesla has a quite nice bandwidth indeed, and ECC. But turning on ECC makes the device quite slow. So I disagree on Tesla being incomparable to GPUs. A BMW is indeed as expensive as 3 Fiats, meaning the latter can have more power after all. But seriously thanks for the feedback, I will look closer into it as I have focused most on peak-performance (20 minutes max) and not on weeks of computations.
- Mikey 29 July 2013
  
  Completely disagree here.
  
  Performance is definitely application dependent and the whole point of these wasn’t to say “AMD is better” it was specifically to set people like you straight who just believe and repeat the BS they hear.
  
  At price-per-megahash (MH/s), show me any nVidia graphics card that can beat an ATI when it comes to GPU bitcoin mining — go ahead and try you won’t find one that comes anywhere close! — And price of the card aside, let’s say you compare a really high-end nVidia to a low-end ATI, the price-of-power (watts) per MH/s will be much higher, likely an order of magnitude.
  
  (I’m not advocating Bitcoin here, although I am an enthusiast — more stressing the point that each has their own weaknesses and strengths.)
  
  nVidia cards seem to be able to do much less in parallel than the AMD/ATI, but at higher clock rates, and they’re able to do more complicated math with less resources.
  
  So if I needed to write a program that did LOTS of simple math (like simulate a neural network for example) — AMD/ATI, no question.
  
  But, if I wanted to do something that was only moderately parallel but much more complex (like offloading game AI/logic/physics), I would probably shoot for nVidia.
  
  And regarding ECC — yes paying more for a card with advanced features like ECC is definitely worth it — especially in servers, or mission critical number crunching — may I suggest you look into the HD 7000 line of Radeon cards. 😉
Zulu123 29 January 2012

As someone else mentioned comparing Radeon to Tesla makes no sense whatsoever. Compare GeForce to Radeon if you wish. After reading that sentence the credibility of this site went to zero for me.
- StreamHPC 29 January 2012
  
  It does makes sense, as many people think a Tesla is much faster than a Radeon, which it is not per se. The reasons of “academic” are good technical reasons to choose for a professional card.
  Please also note that it was compared under “bang-per-buck”, not “extras you get for the extra bucks” – know the movie Herbie?
- noneyo_getit_0011232 21 June 2013
  
  Look at benchmarks on Julia sets and ray tracing. Tom’s hardware showed that the better card had more to do with application than architecture alone.
Squall 26 March 2012

cuda is always better on nvidia because the opencl to cuda wrapper nvidia uses is a performance crippler.
Dejan Lekic 29 May 2012

I am not advocating nVidia here just offering a personal perspective – people do not buy Tesla cards only because they are expensive, but also because of the well-know fact that nVidia offers Linux drivers for a decade, while AMD … well, you know the story…

Second thing is the quality of drivers – Embarrassing fact is that open-source AMD drivers are more stable than the official ones…
- Bobsander 19 September 2012
  
  That’s funny, I and many others have no issues with AMD drivers on linux.
  
  Maybe you’re not really qualified to operate linux and should instead be running something simple, like OSX.
  
  How about you site your sources, instead of spout highly opinionated rubbish?
  - The Tux L33t 8 October 2012
    
    Does ATi (AMD) support FreeBSD and Solaris like nVIDIA? How many legacy cards ATi support like nVIDIA?
    
    It’s not only about perfomance, it’s about support and driver stability.
    
    Check Phoronix for details about Radeon issuses.
  - MiJyn 4 July 2013
    
    I used to use an AMD card, and MAN, did I have issues with it. When the drivers actually _worked_ (it was very hard to compile and install them, I remember I even had to edit the public source code for it to work!), it was a huge pain to debug OpenGL applications with (there were thousands of memory leaks everywhere, at every frame). When I finally got an nvidia card (and nvidia drivers), everything worked perfectly from the start.
    
    Though I know you weren’t talking to me about that (obviously), I can tell you, I think that just because you have no issues with it, it doesn’t mean that he’s speaking “highly opinionated rubbish”.
  - Mikey 29 July 2013
    
    I think you mean “cite” (not “site”). Also, I’ve had some issues running lower-end ATI hardware on Linux (had to go with the open source drivers in the end — the official AMD drivers were just too buggy).
- rishi 20 June 2013
  
  Dejan Lekic, your ignorance knows no limits.
  
  AMD and even ATI, have and had excellent unparalled linux support, which is cancelled out by not having any freeBSD suppport….BOINC stuff is used on freeBSD and hence CUDA and Nvidia are the Only things they use.
  - Mikey 29 July 2013
    
    Completely bogus.
    
    Don’t get me wrong, I love the ATI hardware, but I’m also a windows guy; ATI only provides *proprietary* drivers for Linux, the open drivers are community produced and maintained (and neither one is completely better than the other, I’ve found. With Linux you have to experiment with several builds of each driver (open and closed) and do lots of testing (and even forum hunting) for each hardware/OS version combination.
    
    It’s a huge PITA! Next time I put Linux on a box, I’m going with nVidia.
  - Aditya 15 January 2017
    
    “..the open drivers are community produced and maintained.. ” Partly. ATI/AMD has a history of actually contributing to developing the open-source radeon driver. Nvidia does not do that – when Torvalds thought they were being deliberately unhelpful, he slammed them.
- Hugh Gribben 6 June 2014
  
  In the words of Linus Torvalds “F*** you NvIdia!!!” that to me does not sound very linux friendly 😛
  - noneyo_getit_0011232 3 February 2015
    
    Oh spare us all…
    
    Linus Torvalds says all kinds of stuff involving four-letter words. He may be instrumental in making Linux the fantastic and unparalleled thing that it is but that does not mean he is judicious enough with his words to “speak on behalf of Linux”. Quite the opposite, i would say…
    
    I don’t begrudge Nvidia having closed source internals with an open-source front-end. It would be not be unlike demanding Intel publish their i7 microcode open-source. There is a legitimate question of what exactly is necessary for stuff outside the hardware to know and what hardware-level trade secrets would be compromised.
    
    Also, Nvidia actually goes for an elegant architecture and VERY nice equivalents to opensource tools… cuda-gdb literally works the same as gdb more-or-less with an extra namespace of commands available for GPU debugging. I have yet to see an equivalent “nod to open source mindset” from any other hardware company…
Guest 16 October 2012

The big problem with OpenCL is that it is developed by a consortium, so it’s very hard to get hardware specific stuff in there.
NVIDIA has full control over CUDA so they can add full support to their newest hardware immediately, where it would never go through into OpenCL.
Anybody who knows how kronos is functioning with OpenCL understands why it’s stuck and people use CUDA instead.
The thing I dislike most about OpenCL is the memory management model with delayed allocation and very little explicit control over what is happening.
- StreamHPC 16 October 2012
  
  That’s why OpenCL has extensions, so special functionality can be made available. In comparison: in CUDA you would only be able to launch the kernel that uses dynamic parallelism if the compute level is 3.5 or higher – so you need a kernel for Kepler and one for older architectures.
  
  If I understand you correctly, you are talking about GPU-specific challenges (both CUDA and OpenCL) and compare to CPU-programming. You actually have a lot of control on memory-operations and I have never heard of “delayed allocation” in the context of GPGPU.
yyyy yyyy 19 April 2013

Without the support of template is a big pain for me, I chose openCL because CUDA only support NVIDIA.I wonder openCL would ever support template or not since there are many c programmers hate template(they hate many things come with c++) just like linus do. Although those c hackers always denigrate template and believe that we should use macro rather than template, but I never found that macro could handle different types as simple as template do. template is a very powerful code generators which extremely good at handle different types, it is stupid to reject template just because you hate it.

You could found some libraries which try to mimic the behavior of template in openCL, I admire their contributions, but this is not a good solution, to solve the problem, openCL need to change the standard, make the kernal part support template.
- noneyo_getit_0011232 21 June 2013
  
  Templates exist in C++ for very good reasons but understandably have a bad reputation because of how much inexperienced programmers misuse them.
  
  A good C++ programmer avoids using high-level features (OOP and templates) unless the problem is complex enough to make them necessary. The number of people I meet who have degrees in computer science but have never worked with more than 500 lines of code, let alone gigantic projects with huge numbers of large modules to link up… it is absurd. Procedural programming (functions) handles small projects just fine.
- JJ Abrams 18 October 2013
  
  They hate c++ because they have no skill!
Pingback: Apple’s move to FirePro GPUs in the new Mac Pro - Creative Impatience
Pingback: Harlan maakt van grafische processor supercomputer | Geleerd uitschot
Jake 14 November 2014

A note to the author: Please don’t come up with examples like Objective C faster than C++. It is much slower due to the way things like functions work in Objective C vs C++. In Objective C, a function is a HASHTABLE lookup, which is obviously much much slower than a c function call (which has even more optimizations with inlining and so forth).
- StreamHPC 9 December 2014
  
  You’re just explaining an implementation in the compiler, not a feature of the language.