Engininja2
cuda : replace remaining shfl_xor with calls to warp_reduce functions (llama/5744)
753b30d unverified