Basic Concepts: Writing OpenCL code for single and double precision

Support for double precision floating-point type double in OpenCL kernels requires an extension. AMD provides cl_khr_fp64 for newer high-edn hardware, but also a non-fully compliant cl_amd_fp64 extension for other hardware. NVIDIA and Intel support the cl_khr_fp64, so no exceptions need to be made for those drivers.

The code you see bellow these lines is based on a page you can find on Bealto and it was written by Eric Bainville. I added extra typedefs, removed a constant and added DOUBLE_SUPPORT_AVAILABLE for easier fallback.

#if CONFIG_USE_DOUBLE

#if defined(cl_khr_fp64)  // Khronos extension available?
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#define DOUBLE_SUPPORT_AVAILABLE
#elif defined(cl_amd_fp64)  // AMD extension available?
#pragma OPENCL EXTENSION cl_amd_fp64 : enable
#define DOUBLE_SUPPORT_AVAILABLE
#endif

#endif // CONFIG_USE_DOUBLE

#if defined(DOUBLE_SUPPORT_AVAILABLE)

// double
typedef double real_t;
typedef double2 real2_t;
typedef double3 real3_t;
typedef double4 real4_t;
typedef double8 real8_t;
typedef double16 real16_t;
#define PI 3.14159265358979323846

#else

// float
typedef float real_t;
typedef float2 real2_t;
typedef float3 real3_t;
typedef float4 real4_t;
typedef float8 real8_t;
typedef float16 real16_t;
#define PI 3.14159265359f

#endif

A macro is defined by the OpenCL C compiler for each available extension, which is cl_khr_fp64 in this example. This macro can be tested to enable the extension with #pragma OPENCL EXTENSION cl_khr_fp64 : enable.

Now, you need to use the defined constant(s) and real_t, real2_t types instead of float or double. The definition of CONFIG_USE_DOUBLE is passed as compilation option to clBuildProgram to make the switch between double and single precision. If there is no double-support, it falls back to single precision.

Enjoyed this post? Share it!

StreamHPC communications

Basic Concepts: Writing OpenCL code for single and double precision

Related Posts

NVIDIA’s answer to FirePro S9000: the TESLA K20

Intel’s answer to AMD and NVIDIA: the XEON Phi 5110P

Basic Concepts: out of resources with clEnqueueReadBuffer

Our training concepts for GPGPU