Basic Concepts: Writing OpenCL code for single and double precision

  |   General, Tips&Tricks   |   No comment

What’s precise enough?

Support for double precision floating-point type double in OpenCL kernels requires an extension. AMD provides cl_khr_fp64 for newer high-edn hardware, but also a non-fully compliant cl_amd_fp64 extension for other hardware. NVIDIA and Intel support the cl_khr_fp64, so no exceptions need to be made for those drivers.

The code you see bellow these lines is based on a page you can find on Bealto and it was written by Eric Bainville. I added extra typedefs, removed a constant and added DOUBLE_SUPPORT_AVAILABLE for easier fallback.


#if defined(cl_khr_fp64)  // Khronos extension available?
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#elif defined(cl_amd_fp64)  // AMD extension available?
#pragma OPENCL EXTENSION cl_amd_fp64 : enable



// double
typedef double real_t;
typedef double2 real2_t;
typedef double3 real3_t;
typedef double4 real4_t;
typedef double8 real8_t;
typedef double16 real16_t;
#define PI 3.14159265358979323846


// float
typedef float real_t;
typedef float2 real2_t;
typedef float3 real3_t;
typedef float4 real4_t;
typedef float8 real8_t;
typedef float16 real16_t;
#define PI 3.14159265359f


A macro is defined by the OpenCL C compiler for each available extension, which is cl_khr_fp64 in this example. This macro can be tested to enable the extension with #pragma OPENCL EXTENSION cl_khr_fp64 : enable.

Now, you need to use the defined constant(s) and  real_t, real2_t types instead of float or double. The definition of CONFIG_USE_DOUBLE is passed as compilation option to clBuildProgram to make the switch between double and single precision. If there is no double-support, it falls back to single precision.

Enjoyed this post? Share it!

Liked this article? We are the best known experts in OpenCL, CUDA and performance optimization. We can help you to maximize the performance of your software.