Code-generation based on a device¶
In numba-dppy
, kernels are written in a device-agnostic fashion making it
easy to write portable code. A kernel is compiled for the device on which the
kernel is enqueued to be executed. The device is specified using a
dpctl.device_context
context manager. In the following example, two versions
of the sum
kernel are compiled, one for a GPU and another for a CPU based on
which context the function was invoked. Currently, numba-dppy
supports
OpenCL CPU and GPU devices and Level Zero GPU devices. In future, compilation
support may be extended to other type of SYCL devices that are supported by
DPC++’s runtime.
import numpy as np import numba_dppy, numba_dppy as dppy import dpctl @dppy.kernel def sum(a, b, c): i = dppy.get_global_id(0) c[i] = a[i] + b[i] a = np.array(np.random.random(20), dtype=np.float32) b = np.array(np.random.random(20), dtype=np.float32) c = np.ones_like(a) with dpctl.device_context("level_zero:gpu"): sum[20, dppy.DEFAULT_LOCAL_SIZE](a, b, c) with dpctl.device_context("opencl:cpu"): sum[20, dppy.DEFAULT_LOCAL_SIZE](a, b, c)
Automatic offload of NumPy expressions¶
A key distinction between numba-dppy
and other the GPU backends in Numba is
the ability to automatically offload specific data-parallel sections of a
Numba jit
function.
Todo
Details and examples to be added.
Controllable Fallback¶
By default, if a section of code cannot be offloaded to the GPU, it is automatically
executed on the CPU and warning is printed. This behavior is only applicable to jit
functions, auto-offloading of NumPy calls, array expressions and prange
loops.
To disable this functionality and force code running on GPU set the environment variable
NUMBA_DPPY_FALLBACK_OPTION
to false (e.g. export NUMBA_DPPY_FALLBACK_OPTION=0
). In this
case the code is not automatically offloaded to the CPU and errors occur if any.
Offload Diagnostics¶
Setting the debug environment variable NUMBA_DPPY_OFFLOAD_DIAGNOSTICS
(e.g. export NUMBA_DPPY_OFFLOAD_DIAGNOSTICS=1
) provides emission of the parallel and
offload diagnostics information based on produced parallel transforms. The level of detail
depends on the integer value between 1 and 4 that is set to the environment variable
(higher is more detailed).
In the “Auto-offloading” section there is the information on which device (device name)
this parfor or kernel was offloaded.