Writing SYCL Kernels¶
Introduction¶
numba-dppy
offers a way of programming SYCL supporting devices using Python.
Similar to SYCL’s C++ programming model for heterogeneous computing,
numba-dppy
offers Python abstractions for expressing data-parallelism using
a hierarchical syntax. Note that not all SYCL concepts are currently supported
in numba-dppy
, and some of the concepts may not be a good fit for Python.
The explicit kernel programming mode of numba-dppy
bears lots of
similarities with Numba’s other GPU backends:numba.cuda
and numba.roc
.
Readers who are familiar with either of the existing backends of Numba, or in
general with OpenCL, CUDA, or SYCL programming should find writing kernels in
numba-dppy
extremely intuitive. Irrespective of the reader’s level of
familiarity with GPU programming frameworks, this documentation should serves
as a guide for using the current features available in numba-dppy
.
Kernel declaration¶
A kernel function is a device function that is meant to be called from host
code, where a device can be any SYCL supported device such as a GPU, CPU, or an
FPGA. The present focus of development of numba-dppy
is mainly on Intel’s
GPU hardware. The main characteristics of a kernel function are:
kernels cannot explicitly return a value; all result data must be written to an array passed to the function (if computing a scalar, you will probably pass a one-element array)
kernels explicitly declare their thread hierarchy when called: i.e. the number of thread blocks and the number of threads per block (note that while a kernel is compiled once, it can be called multiple times with different block sizes or grid sizes).
Example¶
#! /usr/bin/env python
# Copyright 2021 Intel Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import print_function
from timeit import default_timer as time
import sys
import numpy as np
import numpy.testing as testing
import numba_dppy, numba_dppy as dppy
import dpctl
@dppy.kernel
def data_parallel_sum(a, b, c):
i = dppy.get_global_id(0)
c[i] = a[i] + b[i]
def driver(a, b, c, global_size):
print("A : ", a)
print("B : ", b)
data_parallel_sum[global_size, dppy.DEFAULT_LOCAL_SIZE](a, b, c)
print("A + B = ")
print("C ", c)
testing.assert_equal(c, a + b)
def main():
global_size = 10
N = global_size
print("N", N)
a = np.array(np.random.random(N), dtype=np.float32)
b = np.array(np.random.random(N), dtype=np.float32)
c = np.ones_like(a)
if dpctl.has_gpu_queues():
print("\nScheduling on OpenCL GPU\n")
with dpctl.device_context("opencl:gpu") as gpu_queue:
driver(a, b, c, global_size)
else:
print("\nSkip scheduling on OpenCL GPU\n")
if dpctl.has_gpu_queues(dpctl.backend_type.level_zero):
print("\nScheduling on Level Zero GPU\n")
with dpctl.device_context("level0:gpu") as gpu_queue:
driver(a, b, c, global_size)
else:
print("\nSkip scheduling on Level Zero GPU\n")
if dpctl.has_cpu_queues():
print("\nScheduling on OpenCL CPU\n")
with dpctl.device_context("opencl:cpu") as cpu_queue:
driver(a, b, c, global_size)
else:
print("\nSkip scheduling on OpenCL CPU\n")
print("Done...")
if __name__ == "__main__":
main()
Kernel invocation¶
A kernel is typically launched in the following way:
def driver(a, b, c, global_size):
print("A : ", a)
print("B : ", b)
data_parallel_sum[global_size, dppy.DEFAULT_LOCAL_SIZE](a, b, c)
print("A + B = ")
print("C ", c)
testing.assert_equal(c, a + b)
Indexing functions¶
Currently, numba-dppy
supports the following indexing functions that have
the same semantics as OpenCL.
numba_dppy.get_local_id
numba_dppy.get_local_size
numba_dppy.get_group_id
numba_dppy.get_num_groups