import pymbolic.primitives as p
u = p.Variable("u")
We'll write code that evaluates $\triangle u$ using finite differences.
To that end, we define a new expression 'thing': An operator for the Laplacian.
class Laplacian(p.Expression):
def __init__(self, child):
self.child = child
def __getinitargs__(self):
return (self.child,)
mapper_method = "map_laplacian"
pde = Laplacian(u)+u**2-1
pde
Now we'll write code to turn Laplacians into their discrete finite difference forms, using i
and j
as formal indices, using
Pay close attention to indexing!
from pymbolic.mapper import IdentityMapper
i = p.Variable("i")
j = p.Variable("j")
ii = i+1
jj = j+1
class FDMapper(IdentityMapper):
def map_variable(self, expr):
return expr[i, j]
def map_laplacian(self, expr):
var = expr.child
return (-4*var[ii,jj] + var[ii+1,jj] + var[ii-1,jj]
+ var[ii,jj+1] + var[ii,jj-1])
fd_mapper = FDMapper()
print(fd_mapper(pde))
Now let's generate some code for this, using loopy
:
import loopy as lp
result = p.Variable("result")
Observe the two parts of the loopy
kernel description:
knl = lp.make_kernel(
"{[i,j]: 0<=i,j<n}",
[lp.ExpressionInstruction(result[i,j], fd_mapper(pde))],
)
Kernels can always be inspected--simply use print
:
print(knl)
Let's move towards running this code. To do so, we need pyopencl
:
import numpy as np
import pyopencl as cl
import pyopencl.array
import pyopencl.clrandom
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
And some data to work with:
n = 1000
u = cl.clrandom.rand(queue, (n+2,n+2), dtype=np.float32)
Now run the code, and tell loopy to print what it generates:
knl = lp.set_options(knl, write_cl=True)
evt, (result,) = knl(queue, u=u, n=n)
That's obviously not very parallel. Introduce parallelism:
tknl = knl
tknl = lp.tag_inames(tknl, {"i": "g.0", "j": "g.1"})
evt, (result,) = tknl(queue, u=u)
But OpenCL/CUDA require blocking to be efficient!
sknl = knl
sknl = lp.split_iname(sknl,
"i", 16, outer_tag="g.1", inner_tag="l.1")
sknl = lp.split_iname(sknl,
"j", 16, outer_tag="g.0", inner_tag="l.0")
evt, (result,) = sknl(queue, u=u)
How about some data reuse?
sknl = knl
sknl = lp.split_iname(sknl,
"i", 16, outer_tag="g.1", inner_tag="l.1")
sknl = lp.split_iname(sknl,
"j", 16, outer_tag="g.0", inner_tag="l.0")
sknl = lp.add_prefetch(sknl, "u",
["i_inner", "j_inner"],
fetch_bounding_box=True)
evt, (result,) = sknl(queue, u=u, n=n)