from __future__ import division
import numpy as np
import pyopencl as cl
import pyopencl.array
Load the PyOpenCL IPython extension:
%load_ext pyopencl.ipython_ext
Create an OpenCL context and a command queue:
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
Define an OpenCL kernel using the %%cl_kernel
magic:
%%cl_kernel
__kernel void sum_vector(__global const float *a,
__global const float *b, __global float *c)
{
int gid = get_global_id(0);
c[gid] = a[gid] + b[gid];
}
This looks for cl_ctx
or ctx
in the user namespace to find a PyOpenCL context.
Kernel names are automatically injected into the user namespace, so we can just use sum_vector
from Python below.
Now create some data to work on:
n = 10000
a = cl.array.empty(queue, n, dtype=np.float32)
a.fill(15)
b_host = np.random.randn(n).astype(np.float32)
b = cl.array.to_device(queue, b_host)
c = cl.array.empty_like(a)
Run the kernel:
sum_vector(queue, (n,), None, a.data, b.data, c.data)
Check the result using numpy
operations:
assert (c.get() == b_host + 15).all()