Swan: A simple tool for porting CUDA to OpenCL

Download latest version
For the impatient: download the latest version of Swan here (by downloading this file, you are accepting the GPL2 license http://www.gnu.org/licenses/gpl-2.0.txt). To build, untar and follow the instructions in the README. the requirements are:

A modern(ish) Linux OS
CUDA toolkit and/or OpenCL runtime and a matching GPU
Perl (with the package Text::Balanced)
gcc/g++ 3.x or later
What is it?
Swan is a small tool that aids the reversible conversion of existing CUDA codebases to OpenCL. It does several useful things:

Translates CUDA kernel source-code to OpenCL.
Provides a common API that abstracts both CUDA and OpenCL runtimes.
Preserves the convenience of the CUDA <<< grid, block >>> kernel launch syntax by generating C source-code for kernel entry-point functions.

It can also be usefully used for compiling and managing kernels written directly for OpenCL.

Why might you want it?
Possible uses include:

Evaluating OpenCL performance of an existing CUDA code.
Maintaining a dual-target OpenCL and CUDA code.
Reducing dependence on NVCC when compiling host code.
Support multiple CUDA compute capabilities in a single binary
A runtime library for managing OpenCL kernels for new development
It’s not a drop-in replacement for nvcc. Host code needs to have all kernel invocations and CUDA API calls re-written.

Swan does not support a few things. In particular:

CUDA C++ templating in kernel code.
OpenCL Images/Samplers (analogous to Textures) — texture interpolation done in software
Multiple device management in a single process.
Compiling kernels for the CPU.
CUDA device-emulation mode.
Furthermore, it’s a work in progress. It works for our code but no promises it will for yours.

The essential steps for converting a CUDA code:

Separate CUDA kernels into separate source files.
Compile the kernel source using swan. For a CUDA target:

swan –cuda kernel.kh kernel.cu
or for OpenCL

swan –opencl kernel.kh kernel.cu
The output file kernel.kh contains:

a data block which holds the compiled kernel source code (PTX or IL).
autogenerated source-code functions for running each of the kernels
#include “kernel.kh” into the host source.

Re-write all kernel invocations to use the corresponding entry-point functions. For example, the kernel with formal declaration:

__global__ void vec_add( float *in, float *out, int N );
will have an entry point defined as:

void k_vec_add( int3 grid, int3 block, int shmem, float *in, float *out, int N );
where grid, block and shmem correspond to the first three arguments of the <<< >>> syntax.

Entry points are named systematically. For any kernel kernel(), there will be two entry points: k_kernel() and k_kernel_async(). The former is a synchronous call (blocks until kernel completes execution).

Re-write all of the CUDA API calls with the Swan analogues (see swan_api.h)

Compile and link code against the appropriate Swan library (libswan_ocl or libswan_cuda).

Examples can be found in the package.

For queries, problems and suggestions, contact Matt @ M.J.Harvey (at) imperial (dot) ic (dot) ac (dot) uk.

Experiences porting from CUDA to OpenCL Presentation at the Daresbury Machine Evaluation Workshop, 2009

Swan is licensed under the GPL version 2.

For other type of licenses and for converting legacy code please contact [email protected] (web: http://www.acellera.com).