Let’s enter the Top500 HPC list using GPUs

The #500 super-computer has only 24 TFlops (2010-06-06): http://www.top500.org/system/9677

update: scroll down to see the best configuration I have found. In other words: a cluster with at least 30 nodes with 4 high-end GPUs each (costing almost €2000,- per node and giving roughly 5 TFlops single precision, 1 TFLOPS double precision) would enter the Top500. 25 nodes to get to a theoretic 25TFlops and 5 extra for overcoming the overhead. So for about €60 000,- of hardware anyone can be on the list (and add at least €13 000 if you want to use Windows instead of Linux for some reason). Ok, you pay most for the services and actual building when buying such a cluster, but you get the idea it does not cost you a few millions any more. I’m curious: who is building these kind of clusters? Could you tell me the specs (theoretical TFlops, LinPack TFlops and watts/TFlop) of your (theoretical) cluster, which costs the customer less then €100 000,- in total? Or do you know companies who can do this? I’ll make a list of companies who will be building the clusters of tomorrow, the “Top €100.000,- HPC cluster list”. You can mail me via vincent [at] this domain, or put your answer in a comment.

Update: the hardware shopping-list

Nobody told in the remarks it is easy to build a faster machine than the one described above. So I’ll do it. We want the most flops per box, so here’s the wishlist:

  • A motherboard with as many slots as possible for PCI-E, CPU-sockets and memory-banks. This because the lag between the nodes is high.
  • A CPU with at least 4 cores.
  • Focus on the bandwidth, else we will not be able to use all power.
  • Focus on price per GFLOPS.

The following is what I found in local computer stores (which for some reason people there love to talk about extreme machines). AMD currently has the graphics cards with the most double precision power, so I chose for their products. I’m looking around for Intel + Nvidia, but currently they are far behind. Is AMD back on stage after being beaten by Intel’s Core-products for so many years?

The GigaByte GA-890FXA-UD7 (€245,-) has 1 AM3-socket, 6(!) PCI-e slots and supports up to 16GB of memory. We want some power, so we use the AMD Phenom II X6 1090T (€289,-), which I chose for the 6 cores and the low price per FLOPS. And to make it a monster, we add 6 times a AMD HD5970 (€599,-) giving 928 x 6 = 3264 DP-GLOPS. If it can handle 16GB DDR3 (€750,-), so we put it in. It needs about 3 Power-supplies of 700 Watt (€100,-). We add 128GB SSD (€350,-) for working data and a big 2 TB HDD (€100,-). Case needs to house the 3 power supplies (€100,-). Cooling is important and I suggest you compete with a wind-tunnel (€500,-). It will cost you €6228,- for 5,6 Double Precision TFLOPS, and 27 TFLOPS single precision. A cluster would be on the HPC500-list for around €38000,- (pure hardware-price, not taking network-devices too much into account, nor the price for man-hours).

Disclaimer: this is the price of a single node, excluding services, maintenance, software-installation, networking, engineering, etc. Please note that the above price is pure for building a single node for yourself, if you have the knowledge to do so.

Want to know more? Get in contact!

We are the acknowledged experts in OpenCL, CUDA and performance optimization. We and proudly boast a portfolio of satisfied customers worldwide, and can also help you to maximize the performance of your software. E-mail us today

  • sami ben salah

    GPU power cannot be takern into account in the Top500 list as Chinese figure out.

    There is no gpu benchmark accepted yet !

    Problem is more software than hardware cost (except network switch which will cost at least 10k just for a 10Gb Ethernet one). Hardware cost is of course lower than the big companies, but a software that can manage that amount of data is a bit tricky.