************************** Release Notes for PoCL 7.1 ************************** This is mostly a bug-fix/maintenance release after the large 7.0 one. The most notable improvements have been on solidifying and documenting the RISC-V port. For example, now 99% of the chipStar's internal tests pass when ran through PoCL on a Milk-V Jupiter board with RVV vector emission confirmed. =========================== Release highlights =========================== * Support for LLVM 21 for the CPU devices and LevelZero devices. * Support for cl_khr_icd v2.0.0, cl_khr_spirv_queries and SPV_KHR_expect_assume. * Various stability and ease-of-setup improvements on the Windows port, for example no longer requiring MS Visual Studio Build Tools for linking CPU device kernels. ============================= Notable user facing changes ============================= * Improved overhead of clEnqueueNDRange() calls in cases where there are several hundreds of SVM/USM allocations. For example, on OpenVINO running resnet50 inference, the call time reduced to few microseconds from previous ~20us. * Improved error message when a recursive function is encountered: Print the infringing function in addition to the function where the recursion was encountered and demangle C++ function names. * Windows builds no longer require MS Visual Studio Build Tools for linking CPU device kernels. This only works with 1) static LLVM built with lld-link, 2) PoCL built with MSVC compiler for x86(-64) target. The only remaining runtime dependency is the MSVC runtime library. ================ Notable bugfixes ================ * Multiple fixes on the fine-grain sub-buffer migration code. =========================== Driver-specific features =========================== * Implemented version 1.0.0 of the cl_khr_spirv_queries extension for drivers that support SPIR-V. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ CPU drivers ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Report SPIR-V 1.3 and 1.4 support when using LLVM 18. * Support SPV_KHR_expect_assume. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Level Zero driver ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Various bugfixes. * Enable SPV_INTEL_memory_access_aliasing. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ CUDA driver ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * Reimplement support for global offsets and work dim. * Implement {read_|write_|}mem_fence() and get_{global|local}_linear_id(). * note that CUDA driver does not support LLVM 21, due to a bug in upstream Clang code. Users must use LLVM 17 to 20 with CUDA. =================================== Experimental and work-in-progress =================================== * Expanded existing defined built-in kernels and introduced a minimal set of new ones and implemented them on level0/npu for supporting llama.cpp single batch inferences on Intel NPU device. Tested with ~1B fp16 parameter variants of Gemma 3.1, Qwen 3 and Llama 3.2 models using experimental branch https://github.com/linehill/llama.cpp/tree/opencl-dbk. =================================== Deprecation/feature removal notices =================================== * Support for LLVM versions older than 17.0 was removed.