On Tue, Nov 03, 2020 at 06:00:32PM +0000, Dave Martin wrote: > On Tue, Nov 03, 2020 at 03:34:27PM +0100, Ard Biesheuvel wrote: > > First of all, I don't think it is safe at the moment to use SVE in the > > kernel, as we don't preserve all state IIRC. My memory is a bit hazy, > I'm not convinced that it's safe right now. SVE in the kernel is > unsupported, partly due to cost and partly due to the lack of a > compelling use case. I think at a minimum we'd want to handle the vector length explicitly for kernel mode SVE, vector length independent code will work most of the time but at the very least it feels like a landmine waiting to cause trouble. If nothing else there's probably going to be cases where it makes a difference for performance. Other than that I'm not currently seeing any issues since we're handling SVE in the same paths we handle the rest of the FPSIMD stuff. > I think it would be preferable to see this algo accelerated for NEON > first, since all AArch64 hardware can benefit from that. ... > much of a problem. kernel_neon_begin() may incur a save of the full SVE > state anyway, so in some ways it would be a good thing if we could > actually make use of all those registers. > SVE hardware remains rare, so as a general policy I don't think we > should accept SVE implementations of any algorithm that does not > already have a NEON implementation -- unless the contributor can > explain why nobody with non-SVE hardware is going to care about the > performance of that algo. I tend to agree here, my concerns are around the cost of maintaining a SVE implementation relative to the number of users who'd benefit from it rather than around the basic idea of using SVE at all. If we were seeing substantial performance benefits over an implementation using NEON, or had some other strong push to use SVE like a really solid understanding of why SVE is a good fit for the algorithm but NEON isn't, then it'd be worth finishing up the infrastructure. The infrastructure itself doesn't seem fundamentally problematic.