All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: Dave Martin <Dave.Martin@arm.com>
Cc: Florian Weimer <fweimer@redhat.com>,
	Christoffer Dall <christoffer.dall@linaro.org>,
	gdb@sourceware.org, Marc Zyngier <Marc.Zyngier@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Yao Qi <Yao.Qi@arm.com>, Will Deacon <will.deacon@arm.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Szabolcs Nagy <szabolcs.nagy@arm.com>,
	Richard Sandiford <richard.sandiford@arm.com>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>,
	Alan Hayward <alan.hayward@arm.com>,
	libc-alpha@sourceware.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Torvald Riegel <triegel@redhat.com>,
	Joseph Myers <joseph@codesourcery.com>
Subject: Re: [RFC PATCH v2 00/41] Scalable Vector Extension (SVE) core support
Date: Mon, 3 Apr 2017 11:01:28 +0100	[thread overview]
Message-ID: <CAKv+Gu8vUq0A8BEuLHT8e3w9xJ4fbQ3j7K+mVwJafbQP3Eku5A@mail.gmail.com> (raw)
In-Reply-To: <20170403094501.GN3750@e103592.cambridge.arm.com>

On 3 April 2017 at 10:45, Dave Martin <Dave.Martin@arm.com> wrote:
> On Fri, Mar 31, 2017 at 04:28:16PM +0100, Ard Biesheuvel wrote:
>> On 22 March 2017 at 14:50, Dave Martin <Dave.Martin@arm.com> wrote:
>>
>> Hi Dave,
>>
>> > The Scalable Vector Extension (SVE) [1] is an extension to AArch64 which
>> > adds extra SIMD functionality and supports much larger vectors.
>> >
>> > This series implements core Linux support for SVE.
>> >
>> [...]
>> > KERNEL_MODE_NEON (non-)support
>> > ------------------------------
>> >
>> > "arm64/sve: [BROKEN] Basic support for KERNEL_MODE_NEON" is broken.
>> > There are significant design issues here that need discussion -- see the
>> > commit message for details.
>> >
>> > Options:
>> >
>> >  * Make KERNEL_MODE_NEON a runtime choice, and disable it if SVE is
>> >    present.
>> >
>> >  * Fully SVE-ise the KERNEL_MODE_NEON code: this will involve complexity
>> >    and effort, and may involve unfavourable (and VL-dependent) tradeoffs
>> >    compared with the no-SVE case.
>> >
>> >    We will nonetheless need something like this if there is a desire to
>> >    support "kernel mode SVE" in the future.  The fact that with SVE,
>> >    KERNEL_MODE_NEON brings the cost of kernel-mode SVE but only the
>> >    benefits of kernel-mode NEON argues in favour of this.
>> >
>> >  * Make KERNEL_MODE_NEON a dynamic choice, and have clients run fallback
>> >    C code instead if at runtime on a case-by-case basis, if SVE regs
>> >    would otherwise need saving.
>> >
>> >    This is an interface break, but all NEON-optimised kernel code
>> >    necessarily requires a fallback C implementation to exist anyway, and
>> >    the number of clients is not huge.
>> >
>> > We could go for a stopgap solution that at least works but is suboptimal
>> > for SVE systems (such as the first choice above), and then improve it
>> > later.
>> >
>>
>> Without having looked at the patches in detail yet, let me reiterate
>> my position after we discussed this when this series was sent out the
>> first time around.
>>
>> - The primary use case for kernel mode NEON is special purpose
>> instructions, i.e., AES is 20x faster when using the NEON, simply
>> because that is how one accesses the logic gates that implement the
>> AES algorithm. There is nothing SIMD or FP in nature about AES.
>> Compare the CRC extensions, which use scalar registers and
>> instructions. Of course, there are a couple of exceptions in the form
>> of bit-slicing algorithms, but in general, like general SIMD, I don't
>> think it is highly likely that SVE in kernel mode is something we will
>> have a need for in the foreseeable future.
>
> Certainly there is no immediate need for this, and if we decide we never
> need it then that helps us avoid some complexity.
>
> My main concern is around the extra save/restore cost, given that if
> the SVE registers are live then save/restore of the full SVE vectors
> is needed even if only FPSIMD is used in the meantime.
>
>> - The current way of repeatedly stacking/unstacking NEON register
>> contents in interrupt context is highly inefficient, given that we are
>> usually interrupting user mode, not kernel mode, and so stacking once
>> and unstacking when returning from the exception (which is how we
>> usually deal with it) would be much better. So changing the current
>> implementation to perform the eager stack/unstack only when a kernel
>> mode NEON call is already in progress is likely to improve our current
>> situation already, regardless of whether such a change is needed to
>> accommodate SVE. Note that to my knowledge, the only in-tree users of
>> kernel mode NEON operate in process context or softirq context, never
>> in hardirq context.
>
> Reassuring.
>
>> Given the above, I think it is perfectly reasonable to conditionally
>> disallow kernel mode NEON in hardirq context. The crypto routines that
>> rely on it can easily be fixed up (I already wrote the patches for
>> that). This would only be necessary on SVE systems, and the reason for
>> doing so is that - given how preserving and restoring the NEON
>> register file blows away the upper SVE part of the registers - whoever
>> arrives at the SVE-aware preserve routine first should be allowed to
>> run to completion. This does require disabling softirqs during the
>> time the preserved NEON state is being manipulated but that does not
>> strike me as a huge price to pay. Note that both restrictions
>> (disallowing kernel mode NEON in hardirq context, and the need to
>> disable softirqs while manipulating the state) could be made runtime
>> dependent on whether we are actually running on an SVE system.
>
> Given that we already bail out of kernel_neon_begin() with a WARN() if
> the hardware lacks FPSIMD, I'm not sure it would be worse to do the same
> if SVE is present.
>
> However, we should probably abstract that check: currently, drivers
> seemt to be using a cocktail of Kconfig dependencies,
> MODULE_DEVICE_TABLE() and runtime hwcap checks in order to deduce
> whether kernel_neon_begin() is safe to use.
>

Not quite. The Kconfig dependencies are about
CONFIG_KERNEL_MODE_NEON=y being set, without which it is pretty
pointless to build the modules that depend on it.

The hwcap dependencies are mostly about the actual instructions
themselves, i.e., AES, PMULL, SHA1/2 etc

> Would you be happy with a single API for checking whether
> kernel_neon_begin() works?  Maintaining this check in every driver
> doesn't feel very scalable.
>

Yes, that makes sense, but it is unrelated to hwcap. I'd like to see a
kernel_neon_allowed() that would return '!IS_ENABLED(CONFIG_SVE) ||
!(elf_hwcap & HWCAP_SVE) || !in_irq()' at some point, although I
understand you want to remove the last part for now.
To short-circuit some decisions in the driver when
kernel_neon_allowed() is always true (as it would be currently),
something like kernel_neon_need_fallback() comes to mind.

>
> This would allow SVE to disable KERNEL_MODE_NEON without having to make
> them conflict in Kconfig.  This wouldn't be our end goal, but it allows
> the dependency to be decoupled while we figure out a better solution.
>

I still don't think there is a need to disable kernel mode NEON
altogether. But we can defer that decision until later, but prepare
the existing code to deal with that in the mean time.

As I mentioned, I already looked into this a while ago, so I can
quickly respin the changes to the crypto code to take this into
account.

WARNING: multiple messages have this Message-ID (diff)
From: ard.biesheuvel@linaro.org (Ard Biesheuvel)
To: linux-arm-kernel@lists.infradead.org
Subject: [RFC PATCH v2 00/41] Scalable Vector Extension (SVE) core support
Date: Mon, 3 Apr 2017 11:01:28 +0100	[thread overview]
Message-ID: <CAKv+Gu8vUq0A8BEuLHT8e3w9xJ4fbQ3j7K+mVwJafbQP3Eku5A@mail.gmail.com> (raw)
In-Reply-To: <20170403094501.GN3750@e103592.cambridge.arm.com>

On 3 April 2017 at 10:45, Dave Martin <Dave.Martin@arm.com> wrote:
> On Fri, Mar 31, 2017 at 04:28:16PM +0100, Ard Biesheuvel wrote:
>> On 22 March 2017 at 14:50, Dave Martin <Dave.Martin@arm.com> wrote:
>>
>> Hi Dave,
>>
>> > The Scalable Vector Extension (SVE) [1] is an extension to AArch64 which
>> > adds extra SIMD functionality and supports much larger vectors.
>> >
>> > This series implements core Linux support for SVE.
>> >
>> [...]
>> > KERNEL_MODE_NEON (non-)support
>> > ------------------------------
>> >
>> > "arm64/sve: [BROKEN] Basic support for KERNEL_MODE_NEON" is broken.
>> > There are significant design issues here that need discussion -- see the
>> > commit message for details.
>> >
>> > Options:
>> >
>> >  * Make KERNEL_MODE_NEON a runtime choice, and disable it if SVE is
>> >    present.
>> >
>> >  * Fully SVE-ise the KERNEL_MODE_NEON code: this will involve complexity
>> >    and effort, and may involve unfavourable (and VL-dependent) tradeoffs
>> >    compared with the no-SVE case.
>> >
>> >    We will nonetheless need something like this if there is a desire to
>> >    support "kernel mode SVE" in the future.  The fact that with SVE,
>> >    KERNEL_MODE_NEON brings the cost of kernel-mode SVE but only the
>> >    benefits of kernel-mode NEON argues in favour of this.
>> >
>> >  * Make KERNEL_MODE_NEON a dynamic choice, and have clients run fallback
>> >    C code instead if at runtime on a case-by-case basis, if SVE regs
>> >    would otherwise need saving.
>> >
>> >    This is an interface break, but all NEON-optimised kernel code
>> >    necessarily requires a fallback C implementation to exist anyway, and
>> >    the number of clients is not huge.
>> >
>> > We could go for a stopgap solution that at least works but is suboptimal
>> > for SVE systems (such as the first choice above), and then improve it
>> > later.
>> >
>>
>> Without having looked at the patches in detail yet, let me reiterate
>> my position after we discussed this when this series was sent out the
>> first time around.
>>
>> - The primary use case for kernel mode NEON is special purpose
>> instructions, i.e., AES is 20x faster when using the NEON, simply
>> because that is how one accesses the logic gates that implement the
>> AES algorithm. There is nothing SIMD or FP in nature about AES.
>> Compare the CRC extensions, which use scalar registers and
>> instructions. Of course, there are a couple of exceptions in the form
>> of bit-slicing algorithms, but in general, like general SIMD, I don't
>> think it is highly likely that SVE in kernel mode is something we will
>> have a need for in the foreseeable future.
>
> Certainly there is no immediate need for this, and if we decide we never
> need it then that helps us avoid some complexity.
>
> My main concern is around the extra save/restore cost, given that if
> the SVE registers are live then save/restore of the full SVE vectors
> is needed even if only FPSIMD is used in the meantime.
>
>> - The current way of repeatedly stacking/unstacking NEON register
>> contents in interrupt context is highly inefficient, given that we are
>> usually interrupting user mode, not kernel mode, and so stacking once
>> and unstacking when returning from the exception (which is how we
>> usually deal with it) would be much better. So changing the current
>> implementation to perform the eager stack/unstack only when a kernel
>> mode NEON call is already in progress is likely to improve our current
>> situation already, regardless of whether such a change is needed to
>> accommodate SVE. Note that to my knowledge, the only in-tree users of
>> kernel mode NEON operate in process context or softirq context, never
>> in hardirq context.
>
> Reassuring.
>
>> Given the above, I think it is perfectly reasonable to conditionally
>> disallow kernel mode NEON in hardirq context. The crypto routines that
>> rely on it can easily be fixed up (I already wrote the patches for
>> that). This would only be necessary on SVE systems, and the reason for
>> doing so is that - given how preserving and restoring the NEON
>> register file blows away the upper SVE part of the registers - whoever
>> arrives at the SVE-aware preserve routine first should be allowed to
>> run to completion. This does require disabling softirqs during the
>> time the preserved NEON state is being manipulated but that does not
>> strike me as a huge price to pay. Note that both restrictions
>> (disallowing kernel mode NEON in hardirq context, and the need to
>> disable softirqs while manipulating the state) could be made runtime
>> dependent on whether we are actually running on an SVE system.
>
> Given that we already bail out of kernel_neon_begin() with a WARN() if
> the hardware lacks FPSIMD, I'm not sure it would be worse to do the same
> if SVE is present.
>
> However, we should probably abstract that check: currently, drivers
> seemt to be using a cocktail of Kconfig dependencies,
> MODULE_DEVICE_TABLE() and runtime hwcap checks in order to deduce
> whether kernel_neon_begin() is safe to use.
>

Not quite. The Kconfig dependencies are about
CONFIG_KERNEL_MODE_NEON=y being set, without which it is pretty
pointless to build the modules that depend on it.

The hwcap dependencies are mostly about the actual instructions
themselves, i.e., AES, PMULL, SHA1/2 etc

> Would you be happy with a single API for checking whether
> kernel_neon_begin() works?  Maintaining this check in every driver
> doesn't feel very scalable.
>

Yes, that makes sense, but it is unrelated to hwcap. I'd like to see a
kernel_neon_allowed() that would return '!IS_ENABLED(CONFIG_SVE) ||
!(elf_hwcap & HWCAP_SVE) || !in_irq()' at some point, although I
understand you want to remove the last part for now.
To short-circuit some decisions in the driver when
kernel_neon_allowed() is always true (as it would be currently),
something like kernel_neon_need_fallback() comes to mind.

>
> This would allow SVE to disable KERNEL_MODE_NEON without having to make
> them conflict in Kconfig.  This wouldn't be our end goal, but it allows
> the dependency to be decoupled while we figure out a better solution.
>

I still don't think there is a need to disable kernel mode NEON
altogether. But we can defer that decision until later, but prepare
the existing code to deal with that in the mean time.

As I mentioned, I already looked into this a while ago, so I can
quickly respin the changes to the crypto code to take this into
account.

  reply	other threads:[~2017-04-03 10:01 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-22 14:50 [RFC PATCH v2 00/41] Scalable Vector Extension (SVE) core support Dave Martin
2017-03-22 14:50 ` Dave Martin
2017-03-22 14:50 ` [RFC PATCH v2 01/41] arm64: signal: Refactor sigcontext parsing in rt_sigreturn Dave Martin
2017-03-22 14:50 ` [RFC PATCH v2 02/41] arm64: signal: factor frame layout and population into separate passes Dave Martin
2017-03-22 14:50 ` [RFC PATCH v2 03/41] arm64: signal: factor out signal frame record allocation Dave Martin
2017-03-22 14:50 ` [RFC PATCH v2 04/41] arm64: signal: Allocate extra sigcontext space as needed Dave Martin
2017-03-22 14:50 ` [RFC PATCH v2 05/41] arm64: signal: Parse extra_context during sigreturn Dave Martin
2017-03-22 14:50 ` [RFC PATCH v2 06/41] arm64: efi: Add missing Kconfig dependency on KERNEL_MODE_NEON Dave Martin
2017-03-22 14:50 ` [RFC PATCH v2 07/41] arm64/sve: Allow kernel-mode NEON to be disabled in Kconfig Dave Martin
2017-03-22 14:50 ` [RFC PATCH v2 08/41] arm64/sve: Low-level save/restore code Dave Martin
2017-03-22 14:50 ` [RFC PATCH v2 09/41] arm64/sve: Boot-time feature detection and reporting Dave Martin
2017-03-22 14:50 ` [RFC PATCH v2 10/41] arm64/sve: Boot-time feature enablement Dave Martin
2017-03-22 14:50 ` [RFC PATCH v2 11/41] arm64/sve: Expand task_struct for Scalable Vector Extension state Dave Martin
2017-03-22 16:20   ` Mark Rutland
2017-03-23 10:49     ` Dave Martin
2017-03-23 11:26       ` Mark Rutland
2017-03-22 14:50 ` [RFC PATCH v2 12/41] arm64/sve: Save/restore SVE state on context switch paths Dave Martin
2017-03-22 14:50 ` [RFC PATCH v2 13/41] arm64/sve: [BROKEN] Basic support for KERNEL_MODE_NEON Dave Martin
2017-03-22 14:50 ` [RFC PATCH v2 14/41] Revert "arm64/sve: Allow kernel-mode NEON to be disabled in Kconfig" Dave Martin
2017-03-22 14:50 ` [RFC PATCH v2 15/41] arm64/sve: Restore working FPSIMD save/restore around signals Dave Martin
2017-03-22 14:50 ` [RFC PATCH v2 16/41] arm64/sve: signal: Add SVE state record to sigcontext Dave Martin
2017-03-22 14:50 ` [RFC PATCH v2 17/41] arm64/sve: signal: Dump Scalable Vector Extension registers to user stack Dave Martin
2017-03-22 14:50 ` [RFC PATCH v2 18/41] arm64/sve: signal: Restore FPSIMD/SVE state in rt_sigreturn Dave Martin
2017-03-22 14:50 ` [RFC PATCH v2 19/41] arm64/sve: Avoid corruption when replacing the SVE state Dave Martin
2017-03-22 14:50 ` [RFC PATCH v2 20/41] arm64/sve: traps: Add descriptive string for SVE exceptions Dave Martin
2017-03-22 14:50 ` [RFC PATCH v2 21/41] arm64/sve: Enable SVE on demand for userspace Dave Martin
2017-03-22 16:48   ` Mark Rutland
2017-03-23 11:24     ` Dave Martin
2017-03-23 11:30       ` Suzuki K Poulose
2017-03-23 11:52         ` Mark Rutland
2017-03-23 12:07           ` Dave Martin
2017-03-23 13:40             ` Mark Rutland
2017-03-23 13:45               ` Dave Martin
2017-03-22 14:50 ` [RFC PATCH v2 22/41] arm64/sve: Implement FPSIMD-only context for tasks not using SVE Dave Martin
2017-03-22 14:50 ` [RFC PATCH v2 23/41] arm64/sve: Move ZEN handling to the common task_fpsimd_load() path Dave Martin
2017-03-22 16:55   ` Mark Rutland
2017-03-23 11:52     ` Dave Martin
2017-03-22 14:50 ` [RFC PATCH v2 24/41] arm64/sve: Discard SVE state on system call Dave Martin
2017-03-22 17:03   ` Mark Rutland
2017-03-23 11:59     ` Dave Martin
2017-03-22 14:50 ` [RFC PATCH v2 25/41] arm64/sve: Avoid preempt_disable() during sigreturn Dave Martin
2017-03-22 14:50 ` [RFC PATCH v2 26/41] arm64/sve: Avoid stale user register state after SVE access exception Dave Martin
2017-03-22 14:50 ` [RFC PATCH v2 27/41] arm64/sve: ptrace support Dave Martin
2017-03-22 14:50 ` [RFC PATCH v2 28/41] arm64: KVM: Treat SVE use by guests as undefined instruction execution Dave Martin
2017-03-22 17:06   ` Mark Rutland
2017-03-23 12:10     ` Dave Martin
2017-03-22 14:50 ` [RFC PATCH v2 29/41] prctl: Add skeleton for PR_SVE_{SET,GET}_VL controls Dave Martin
2017-03-22 14:50   ` [RFC PATCH v2 29/41] prctl: Add skeleton for PR_SVE_{SET, GET}_VL controls Dave Martin
2017-03-22 14:51 ` [RFC PATCH v2 30/41] arm64/sve: Track vector length for each task Dave Martin
2017-03-22 14:51 ` [RFC PATCH v2 31/41] arm64/sve: Set CPU vector length to match current task Dave Martin
2017-03-22 14:51 ` [RFC PATCH v2 32/41] arm64/sve: Factor out clearing of tasks' SVE regs Dave Martin
2017-03-22 14:51 ` [RFC PATCH v2 33/41] arm64/sve: Wire up vector length control prctl() calls Dave Martin
2017-03-22 14:51   ` Dave Martin
2017-03-22 14:51 ` [RFC PATCH v2 34/41] arm64/sve: Disallow VL setting for individual threads by default Dave Martin
2017-03-22 14:51 ` [RFC PATCH v2 35/41] arm64/sve: Add vector length inheritance control Dave Martin
2017-03-22 14:51 ` [RFC PATCH v2 36/41] arm64/sve: ptrace: Wire up vector length control and reporting Dave Martin
2017-03-22 14:51 ` [RFC PATCH v2 37/41] arm64/sve: Enable default vector length control via procfs Dave Martin
2017-03-22 14:51 ` [RFC PATCH v2 38/41] arm64/sve: Detect SVE via the cpufeature framework Dave Martin
2017-03-23 14:11   ` Suzuki K Poulose
2017-03-23 14:37     ` Dave Martin
2017-03-23 14:43       ` Dave Martin
2017-03-22 14:51 ` [RFC PATCH v2 39/41] arm64/sve: Migrate to cpucap based detection for runtime SVE code Dave Martin
2017-03-22 14:51 ` [RFC PATCH v2 40/41] arm64/sve: Allocate task SVE context storage dynamically Dave Martin
2017-03-22 14:51 ` [RFC PATCH v2 41/41] arm64/sve: Documentation: Add overview of the SVE userspace ABI Dave Martin
2017-03-22 14:51   ` Dave Martin
2017-03-31 15:28 ` [RFC PATCH v2 00/41] Scalable Vector Extension (SVE) core support Ard Biesheuvel
2017-03-31 15:28   ` Ard Biesheuvel
2017-04-03  9:45   ` Dave Martin
2017-04-03  9:45     ` Dave Martin
2017-04-03 10:01     ` Ard Biesheuvel [this message]
2017-04-03 10:01       ` Ard Biesheuvel
2017-04-03 10:51       ` Dave Martin
2017-04-03 10:51         ` Dave Martin
2017-04-03 10:55         ` Ard Biesheuvel
2017-04-03 10:55           ` Ard Biesheuvel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKv+Gu8vUq0A8BEuLHT8e3w9xJ4fbQ3j7K+mVwJafbQP3Eku5A@mail.gmail.com \
    --to=ard.biesheuvel@linaro.org \
    --cc=Dave.Martin@arm.com \
    --cc=Marc.Zyngier@arm.com \
    --cc=Yao.Qi@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=alan.hayward@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=christoffer.dall@linaro.org \
    --cc=fweimer@redhat.com \
    --cc=gdb@sourceware.org \
    --cc=joseph@codesourcery.com \
    --cc=libc-alpha@sourceware.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=richard.sandiford@arm.com \
    --cc=szabolcs.nagy@arm.com \
    --cc=triegel@redhat.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.