Company Blog

applyBored at work? Go start working for one of the anti-boring GPU-expert companies: StreamComputing (Netherlands, EU), Appilo (Israel) or AccelerEyes (Georgia, US).

We all look for people who know how to code GPUs. Experience is key, so you need to have at least one year of experience in GPU-programming.

Submit your CV now and don’t give up on your dream leaving behind the boring times.

StreamComputing

Amsterdam-based StreamComputing is a young startup with a typical Dutch, open atmosphere (gezellig). Projects are always different from the previous as we work in various fields. Most work is in parallisation and OpenCL-coding. We are quite demanding of your knowledge on the various hardware-platforms OpenCL works on (GPUs, mobile GPUs, array-processors, DSPs, FPGAs).

See the jobs-page for more information and how to apply.

Appilo

North Israel based Appilo is seeking GPU-programmers for project-basis contracts. Depending on the project, most of the work could usually be performed remotely.

Use the mail on the contact-page at Appilo to send your CV.

AccelerEyes

Atlanta-based AccelerEyes delivers products which are used to accelerate C, C++, and Fortran codes on CUDA GPUs and OpenCL devices. They look for people who believe in their products and want to help make them better.

See their jobs-page for more information and how to apply.

The Tokyo Demo Fest 2013 is one of the many demo-parties around the globe. At such parties is where great programmers meet great artists and show off what came out of their collaborations.

The winner of this year used OpenCL to compute real-time procedurally generated geometries. For the rest C++, OpenGL and Gamemonkey Script was used.

Tech features: curl noise, volumetric textures, Perlin noise, mesh deformations, HDR/bloom, film grain, fractals, Hermite splines, Tweens and quaternion iridescent particles.

The creator, Tokyo-based Eddie Lee, has done more projects – be sure to visit his homepage. I hope more demosceners start using the power of OpenCL to get more out of their demo’s.

Do you see where below kernel is used? Hint: check the other videos of Eddie.

__kernel void curlnel( 
                      __read_only image3d_t volume,  
                      sampler_t volumeSampler,  
                      __global float4 *output,  
                      float speed 
                      ) 
{ 
    uint index = get_global_id(0); 
    uint numParticles = get_global_size(0); 
    float indexNormalized = (float)index/numParticles; 

    // read from 3D texture 
    float4 vertData = output[index]; 

    float3 samplePos = vertData.s012; 
    samplePos = samplePos+(float3)(0.5f); 

    float4 voxel = (read_imagef(volume, volumeSampler, 
                   (float4)(samplePos,1.0f))); 

    vertData.s012 += voxel.xyz*speed; 

    output[index] = vertData; 
}

According to GPUVerify (see previous post) the line starting with “float4 voxel” has an error.

gpuverifyGPUVerify is a tool for formal analysis of GPU kernels written in OpenCL and CUDA. The tool can prove that kernels are free from certain types of defect, such as data races and bugs. This is quite useful feedback for any GPU-programmer.

Below you find a online version of the tool (please don’t break it!). Play around and test your kernels. Currently it is limited to OpenCL-kernels. Be aware the number of groups is different from global worksize.

For demo-purposes some values have been pre-filled with a simple kernel – press “Check my OpenCL kernel” to find the results. Did you expect this from this kernel? Can you explain the result?

After the LEAP-conference I’ll extend this article – till then I’m too time-limited. For now I wanted to share the online version with you, especially with the people who will attend the tutorial at LEAP. Be sure to check out the GPUVerify website and paper to learn more about this fantastic tool! Read more …

From 21 to 23 August StreamComputing will give a 3-day training in OpenCL.

A separate ticket for only the first day can be bought, as then will be a crash-course into OpenCL. Module basics.

The second and third day will all about parallel-algorithm design, optimisation and error-handling. Module optimisation with several new subjects added.

The afternoon of the third day is reserved for special subjects, as requested by the attendees.

If you are at LEAP, wait for the promotional code you get there.

More detailed information is coming soon.

computer-says-no

Little Britain: “Compu’er says no”. (links to Youtube movie)

Knowing all errors by heart is good for quick programming, but not always the best option. Therefore I started to create a full list with extra info, taken from cl.h and the reference documentation.

The problem with many error-codes is that they are sometimes context-dependent and then become quite useless in helping the programmer out. Also some drivers return different error-codes. Notice also that different errors are given per OpenCL-version for the same function. If you find problems, help make OpenCL better and give feedback.

Want it on your wall? You can easily copy these two tables into Excel or alike software and print it out.

Read more …

ersa-logoStreamComputing supports the ERSA conference, 22-25 July in Las Vegas. At that conference there will be an award given to “Best Young Entrepreneur” and I’d like you to send in a proposal. The winner gets an NVIDIA Tesla K20!

Young entrepreneurs and academics with a great product/project are invited to present their solution. As the event draws around 2000 people, you get the attention needed to show-case your new company or research-group. Your solution does not need to be based on FPGAs or GPUs, as long as Von Neumann’s architecture is not in it.

Read the information below or directly go to the ERSA-NVIDIA awards-homepage.
“Von Neumann’s architecture lasted for 75 years.”
That genius can no longer lead us into the new age of computing that is upon us. This competition seeks to acknowledge those pioneers that are helping to build the new computing landscape”

Submission of Proposals for ERSA-NVIDIA award Candidates
Deadline: May 6, 2013
Send proposals to org@ersaconf.org

The Award is devoted for entrepreneurs developing tools, advanced technologies and opportunities for supporting applications, both academic and commercial, across broad area of high-performance, embedded systems implemented as multicore systems and reconfigurable heterogeneous parallel processing systems.

The Award Committee includes:

Leading Universities

  •  Stanford University, USA, Prof. Michael Flynn
  •  Imperial College London, UK, Prof. Wayne Luk
  • Karlsruhe Institute of Technology, Germany, Prof. Joerg Henkel
  • Keio University, Japan, Prof. Hideharu Amano
  • Shanghai Jiao Tong University, China, Prof. Simon See

Leading Companies (tentative list):

  • NVIDIA, Can Ozdoruk, Product Manager
  • Altera, Steve Casselmanm, Principal Engineer
  • National Instruments, Hugo Andrade, Principal Architect

For more info go to: http://ersaconf.org/awards/

If you have any question, just ask them in the comments or send us an email.

From 3 to 5 June StreamComputing will give a 3-day course in OpenCL. Here you will learn how to program

A separate ticket for only the first day can be bought, as then will be a crash-course into OpenCL. Module basics.

The second and third day will all about parallel-algorithm design, optimisation and error-handling. Module optimisation with several new subjects added.

The last part of the third day is reserved for special subjects, as requested by the attendees.

More detailed information is coming soon.

arm_mali_cover_151112297646_640x360On 20 April there was a discussion between Jan Gray and David Kanter. Jan is a specialist in C++ and FPGAs (twitter, homepage), David a specialist in CPU and GPU architectures (twitterhomepage). Both know their ways well in the field of semiconductors. It is always a joy to follow their short discussions when they happen, and this one I’d specially like to share with you.

OpenCL on ARM: Growth-expectation of GFLOPS/Watt of mobile GPUs exceeds Moore’s law. That’s incredible!

Jan Gray: .@OpenCLonARM GFLOPS/W more a factor of almost-over Dennard Scaling. But plenty of waste still to quash. http://www.fpgacpu.org/papers/Gray_AutumnOfMooresLaw_SingularityUniversity_11-06-23.pdf

Jan Gray‏: .@openclonarm Scratch Dennard tweet: reduced capacitance of yet smaller devices shd improve GFLOPS/W even as we approach end of Vdd scaling.

David Kanter: @jangray @OpenCLonARM I think some companies would argue Vdd scaling isn’t dead…

Jan Gray: @TheKanter @openclonarm it’s not dead, but slowing, we’ve gone from 5V to 1V (25x power savings) and have maybe several hundred mVs to go.

David Kanter: @jangray I reckon we have at least 400mV, so ~2X; slower than ideal, but still significant

Jan Gray: @TheKanter We agree, I think.

David Kanter: @jangray I suspect that if GPU scaling > Moore’s Law then they are just spending more area or power; like discrete GPUs in the last decade

David Kanter: @jangray also, most positive comment I’ve heard from industry folks on mobile GPU software and drivers is “catastrophically terrible”

Jan Gray: @TheKanter Many ways to reduce power, soup to nuts. For ex HMC DRAM on interposer for lower energy signaling. I’m sure many tricks to come.

In a nutshell all the reasons they think mobile GPUs can outpace Moore’s law while staying under a certain power-usage.

It needs some background-info, so let’s start the background of the first tweet, and then explain what has been said. Read more …

WebCL_300WebCL is a great technique to have compute-power in the browser. After WebGL which gives high-end graphics in the browser, this is a logical step on the road towards the browser-only operating system (like Chrome OS, but more will follow).

Another way to look at technologies like WebCL, is that it makes it possible to lift the standard base from the OS to the browser. If you remember the trial of Microsoft’s integration of Internet Explorer, the focus was on the OS needing the browser for working well. Now it is the other way around, but it can be any OS. This is because the push doesn’t come from below, but from above.

Last year two guys from Lyon (South-France) got quite some attention, as they wrote a WebCL-plugin. Their names: Adrien Plagnol and Frédéric Langlade-Bellone. Below you’ll find a Q&A with them on WebCL. Enjoy! Read more …

921752_m

Building bridges in a new industry

Embedded processors always have had the focus on low-energy. Now a combination of Moore’s law, the frequency-wall and multi-processor developments have made it possible for these processors to compete in completely new market segments. Most notable due to impressive advancements in graphics IP.
We are now looking at four groups who are interested in learning from each other:

  • The embedded processor market
  • The FPGA market
  • The HPC and server market
  • The GPGPU market

And answer the question: how can we get more out of low-energy processors by looking at other industries?

The goal of the LEAP conference is to bring these three groups together. Creating the windows to each other and paving roads over the newly constructed bridges. This makes it one of its kind. Half of the conference is focused on quality information sharing and the other half on networking. For more information, check the website of the LEAP-conference. StreamComputing is a co-organiser.

Call for papers is now open! Programme is filled!

Read more …

flagsIn OpenCL large memory objects, residing in the main memory of the host or the global memory at the accelerator/GPU, need special treatment. First reason is that these memories are relatively slow. Second reason is that the most times serial copy of objects between these two memories take time.

In this post I’d like to discuss all the flags for when creating memory objects, and what they can do to assist in this special treatment.

This is explained on this page of clCreateBuffer in the specifications, but I think it is not really clear. The function clCreateBuffer (and the alike functions for creating images, sub-buffers, etc) suggests that you create a special OpenCL-object to be given as argument to the kernel. What actually happens is that space is made available in main memory of the accelerator and optionally a link with host-memory is made.

The flags are divided over three groups: device access, host access and host pointer flags.

Read more …

Curved iMac has your back…

Nuno Teixeira designed a large curved monitor in 2008 and assumed it would never be made. For a “few” thousand dollar NEC offers one to you right now. Also Samsung and LG have announced several new curved TVs at CES 2013 (with hdmi-port). We only need a workstation to go with it, where this blog-article might come in handy.

So you want to start developing for OpenCL? When you focus on developing OpenCL for X86, you have these three options: CPUs, GPUs and CPUs with and embedded GPU. This article is for you and represents the current state of hardware – if you want the best hardware for your specific algorithm, the below information is probably not sufficient.

In 2013 we focus on 3 groups: servers/cloud (FirePro, Tesla, XeonPhi), workstations (discussed here), low-power devices (SoCs) and special accelerators (FPGAs and DSPs). This article does not discuss high-end accelerators of a few thousands of Euro, which are laid out in here.

Before reading on, you need to set the goal for your workstation.

  • If you want to learn the basics of OpenCL-programming, first check if your current machine has OpenCL-support.
  • If you need more processing power, be sure you select the right hardware for the job. Don’t buy the most expensive hardware (FirePro, Tesla or XeonPhi), but take your time to find out which hardware supports your algorithms best. Feel free to ask us.
  • If you want to make sure your software works on various types of accelerators, you can choose between:
    • swapping PCIe-cards – disadvantage is the drivers-hazzle and time-consumption.
    • more accelerators in one machine – disadvantage is that only GPU 1 can do OpenGL/DirectX.
    • identical machines with different accelerators – disadvantage is the price.
  • If you want to focus on multi-GPU development, you need:
    • or enough power-supply and the motherboard supports many lanes,
    • or buy a videocard with two GPUs.

This article has the goal to help you with buying a good machine for OpenCL-development. Prices are of January 2013. If you think I make the wrong suggestions, please give feedback via the comments.

My contacts at various companies can tell: I want to stay independent no matter what. No deals have been made nor was there any outside influence, except the friendly people of the local computer shops. I was surprised I ended up with suggestion so much AMD hardware, that I felt quite uncomfortable with it – I finally decided to keep to my first conclusions and leave the comments completely open.

Read more …

Every now and then I read stories on Bitcoins (Wikipedia-article), as GPUs are used a lot to “mine” Bitcoins. They have some extensive benchmarks, and also their discussions giving me insights in specific parts of accelerators like GPUs. Also is this group very upwards if it comes to accepting new techniques. Today something changed: they are a bank now. One of the thoughts I had with this, I’d like to share with you.

If you look at various types of currencies, you see they all have various goals (trade, power, resources, energy, properties, etc). The inequality and differences are even more important than the amount. Various currencies are entangled to a certain goal or resource, but there is nothing entangled strongly to technology. Here is where Bitcoins come in…

Bitcoins are entangled with compute-power – a current benchmark for technological progress.

In this article I’d like to share how the tech-economy and Bitcoins are entangled, seen from the perspective of computing. I left out a lot of the “rules of economy” and hope you can put these in – the below text is just to guide you through the thought-process only. Disagreement is only good – as we learn all from it.

Read more …

Get in contact now!

We offer training in GPU-programming (OpenCL, CUDA, etc),
and consultancy-services for performance engineering.

Mail to info@streamcomputing.eu or fill in below form.

The web-form currently does not work.
Please send an e-mail while we resolve the issue.