linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 00/41] Scalable Vector Extension (SVE) core support
@ 2017-03-22 14:50 Dave Martin
  2017-03-22 14:50 ` [RFC PATCH v2 29/41] prctl: Add skeleton for PR_SVE_{SET,GET}_VL controls Dave Martin
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Dave Martin @ 2017-03-22 14:50 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Will Deacon, Catalin Marinas, Marc Zyngier, Ard Biesheuvel,
	Florian Weimer, Joseph Myers, Szabolcs Nagy, Andrew Morton,
	linux-kernel, Alan Hayward, Yao Qi, gdb, Christoffer Dall,
	libc-alpha, Richard Sandiford, Torvald Riegel

The Scalable Vector Extension (SVE) [1] is an extension to AArch64 which
adds extra SIMD functionality and supports much larger vectors.

This series implements core Linux support for SVE.

Recipents not copied on the whole series can find the rest of the
patches in the linux-arm-kernel archives [2].

Major changes since v1: [3]

 * SVE vector length now configurable via prctl() and ptrace()
   (based on previously posted work [4]);

 * improved CPU feature detection to allow for mismatched CPUs;

 * dynamic allocation of per-task storage for the SVE registers.


There are a lot of outstanding issues that reviewers should be aware of,
including some design and implementation issues that I'd appreciate
input on.

Due to the length of the cover note, I've split it up as follows:

 * Missing Features and Limitations

 * ABI Design Issues
   (implementated interfaces that may need improvement)

 * Security
   (outstanding security-related design considerations)

 * Bugs and Implementation Issues
   (known and suspected problems with the implementation)


For reviewers, I recommend quickly skimming the remainder of this cover
note and the final (documentation) patch, before deciding what to focus
on in more detail.

Because of the length of the series, be aware that some code added by
earlier patches is substantially rewritten by later patches -- so also
look at the final result of applying the series before commenting
heavily on earlier additions.

Review and comments appreciated.

Cheers
---Dave

[1]
https://community.arm.com/groups/processors/blog/2016/08/22/technology-update-the-scalable-vector-extension-sve-for-the-armv8-a-architecture

[2]
http://lists.infradead.org/pipermail/linux-arm-kernel/2017-March/thread.html
linux-arm-kernel archive

[3]
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-November/470507.html
[RFC PATCH 00/29] arm64: Scalable Vector Extension core support

[4]
http://lists.infradead.org/pipermail/linux-arm-kernel/2017-January/478941.html
[RFC PATCH 00/10] arm64/sve: Add userspace vector length control API


Missing Features and Limitations
================================

Sparse vector length support
----------------------------

Currently, the code assumes that all possible vector lengths are
supported up to the maximum supported by the CPU.  The SVE architecture
doesn't require this, so it will be necessary to probe each possible VL
on every CPU and derive the set of common VLs after the secondaries come
up.

The patches don't currently implement this, which will cause incorrect
context save/restore and userspace malfunctions if a VL is configured
that the CPU implementation does not support.


KVM
---

Use of SVE by KVM guests is not supported yet.

SVE is still detected as present by guests due to the fact that
ID_AA64PFR0_EL1 is still read directly from the hardware, even by the
guest, so right now, a guest kernel configured with CONFIG_ARM64_SVE=y
will go into an illegal-instruction spin during early boot.

Sanitising the the ID registers for guests is a broader problem.  It may
be appropriate to implement a stopgap solution for SVE in the meantime,
either:

 * Require guests to be configured with CONFIG_ARM64_SVE=n, and kill
   affected guests instead of injecting an undef

   (not great)

 * Add a point hack for trapping the CPU ID regs and hiding (just) SVE
   from the guest.

   For one or two features this may be acceptable and this may serve as
   a stepping stone towards proper ID register sanitisation, but this
   approach won't scale well as the number of affected features
   increases over time.

 * Implement minimal KVM support a guest can at least boot and run,
   possibly suboptimally, if it uses SVE.  Full userspace ioctl()
   extensions for management of the guest VM's SVE support might be
   omitted to begin with.

   This is the cleanest approach, but involves would involve more work
   and might delay merge.


KERNEL_MODE_NEON (non-)support
------------------------------

"arm64/sve: [BROKEN] Basic support for KERNEL_MODE_NEON" is broken.
There are significant design issues here that need discussion -- see the
commit message for details.

Options:

 * Make KERNEL_MODE_NEON a runtime choice, and disable it if SVE is
   present.

 * Fully SVE-ise the KERNEL_MODE_NEON code: this will involve complexity
   and effort, and may involve unfavourable (and VL-dependent) tradeoffs
   compared with the no-SVE case.

   We will nonetheless need something like this if there is a desire to
   support "kernel mode SVE" in the future.  The fact that with SVE,
   KERNEL_MODE_NEON brings the cost of kernel-mode SVE but only the
   benefits of kernel-mode NEON argues in favour of this.

 * Make KERNEL_MODE_NEON a dynamic choice, and have clients run fallback
   C code instead if at runtime on a case-by-case basis, if SVE regs
   would otherwise need saving.

   This is an interface break, but all NEON-optimised kernel code
   necessarily requires a fallback C implementation to exist anyway, and
   the number of clients is not huge.

We could go for a stopgap solution that at least works but is suboptimal
for SVE systems (such as the first choice above), and then improve it
later.


ABI Design Issues
=================

Vector length handling in sigcontext
------------------------------------

Currently, the vector length is not saved/restored around signals: it
is not saved in the signal frame, and sigreturn is not allowed to
change it.

It would not be difficult to add this ability now, and retrofitting it
in the future instead would require a kernel upgrade and a mechanism for
new software to know whether it's available.

However, it's unclear whether this feature will	 ever truly be needed, or
should be encouraged.

During a normal sigreturn, restoration of the VL would only be needed if
the signal handler returned with a different VL configured than the one
it was called with -- something that PCS-compliant functions are
generally not supposed to do.

A non-local return, such as invoking some userspace bottom-half or
scheduler function, or dispatching a userspace exception, could
conceivably legitimately want to change VL.


Choices:

 * Implement and support this ability: fairly straightforward, but it
   may be abused by userspace (particularly if we can't decide until
   later what counts as "abuse").

 * Implement it but don't document it and maybe add a pr_info_once() to
   warn about future incompatibility if userspace uses it.

 * Don't implement it: a caller must use PR_SVE_SET_VL prior to return
   if it wants a VL change or to restore VL having previously changed it.
   (The caller must sweat the resulting safety issues itself.)


PR_GET_MINSIGSTKSZ (signal frame size discovery)
------------------------------------------------

(Not currently implemented, not 100% trivial to implement, but should be
fairly straightforward.)

It's not obvious whether the maximum possible frame size for the
_current_ thread configuration (e.g., current VL) should be reported, or
the maximum possible frame size irrespective of configuration.

I'd like to be able to hide this call behind sysconf(), which seems a
more natural and cleaner interface for userspace software than issuing
random prctls(), since there is nothing SVE-specific about the problem
of sizing stacks.  POSIX doesn't permit sysconf() values to vary over
the lifetime of a process, so this would require the configuration-
independent maximum frame size to be returned, but this may result in
the caller allocating more memory than is really needed.

Taking the system's maximum supported VL into account would mitigate
against this, since it's highly likely to be much smaller than
SVE_VL_MAX.


Reporting of supported vector lengths to userspace
--------------------------------------------------

Currently, the set of supported vector lengths and maximum vector length
are not directly reported to userspace.

Instead, userspace will need to probe by trying to set different vector
lengths and seeing what comes back.

This is unlikely to be a significant burden for now, and it could be
addressed later without backwards-incompatibility.


Maximum vector length
---------------------

For VL-setting interfaces (PR_SVE_SET_VL, ptrace, and possibly
sigreturn):

Is it reasonable to have a way to request "the maximum supported VL" via
these interfaces.  Up to now, I've assumed that this is reasonable and
useful, however...

Currently, SVE_VL_MAX is overloaded for this purpose, but this is
intended as an absolute limit encompassing future extensions to SVE --
i.e., this is the limit a remote debug protocol ought to scale up to,
for example.  Code compiled for the current SVE architecture is allowed
by the architecture to assume that VL <= 256, so requesting SVE_VL_MAX
may result in an impossibly large VL if executing on some future
hardware that supports vectors > 256 bytes.

This define should probably be forked in two, but confusion and misuse
seem highly likely.  Alternatively, the kernel could clamp VL to 256
bytes, and a future flag could be required in order to enable larger VLs
could be set.


PR_SVE_SET_VL interface
-----------------------

Should the arguments to this prctl be merged?

In other interfaces, the vl and flags are separate, but an obvious use
of PR_SVE_SET_VL would be to restore the configuration previously
discovered via PR_SVE_GET_VL, which rather ugly to do today.

Options include:

 * merging the PR_SVE_SET_VL arguments

 * provide macros to extract the arguments from the PR_SVE_GET_VL return
   value

 * migrate both prctls to using a struct containing vl and flags.


Vector length setting versus restoration
----------------------------------------

Currently, PTRACE_SETREGSET(NT_ARM_SVE) will fail by default on a
multithreaded target process, even if the vector length is not being
changed.  This can be avoided by OR-ing SVE_PT_VL_THREAD into
user_sve_header.flags before calling PTRACE_SETREGSET, to indicate "I
know what I'm doing".  But it's weird to have to do this when restoring
the VL to a value it had previously, or when leaving the VL unchanged.

A similar issue applies when calling PR_SVE_SET_VL based on the return
from a previous PR_SVE_GET_VL.  If sigreturn is extended to allow VL
changes, it would be affected too.

It's not obvious what the preferred semantics are here, or even if
they're the same in every case.

Options:

 * OR the _THREAD flag into the flags or result when reading the VL, as
   currently done for PR_SVE_SET_VL, but not for PTRACE_GETREGSET.

 * Require the caller to set this flag explicitly, even to restore the
   VL to something it was previously successfully set to.

and/or

 * Relax the behaviour not to treat VL setting without _THREAD as an
   error if the current VL for the thread already matches requested
   value.

Different interfaces might take different decisions about these (as at
present).


Coredump padding
----------------

Currently, the regset API and core ELF coredump implementation don't
allow for regsets to have a dynamic size.

NT_ARM_SVE is therefore declared with the theoretical maximum size based
on SVE_MAX_VL, which is ridiculously large.

This is relatively harmless, but it causes about a quarter of a megabyte
of useless padding to be written into the coredump for each thread.
Readers can skip this data, and software consuming coredumps usually
mmaps them rather then streaming them in, so this won't end the world.

I plan to add a regset method to discover the size at runtime, but for
now this is not implemented.


Security
========

Even though it's preferred to work with any vector length, it's
legitimate for code in userspace to prefer certain VLs, or only work
with or be optimised for certain VLs -- or only be tested against
certain VLs.

Thus, controlling the VL that code may execute with, while generally
useful, may have security implications when there is a change of
privilege.

At the moment, it's still unclear how much of this responsibility the
libc startup code should take on.  There may be merit in taking a
belt-and-braces approach in the kernel/user ABI, to at least apply some
sanity.

Thus:

 * A privilege-escalating execve() (i.e., execing a setuid/setgid binary
   or a binary that has filesystem capabilities set on it) could reset
   the VL to something "sane" instead of allowing the execve() caller to
   control it.

 * Currently, the system default VL (configured via
   /proc/cpu/sve_default_vl) is my best effort at defining a "sane" VL.
   This is writable only by root, but a decision needs to be made about
   the interaction of this control with containers.

Either each container needs its own version (cleanest option), or only
the root container should be able to configure it (simplest option).

(It would also be necessary to define how "container" should be defined
for this purpose).

Decisions will be needed on these issues -- neither is currently
addressed.


Bugs and Implementation Issues
==============================

Regarding the patches themselves, comment and review would be
particularly helpful on the following:

procfs
------

It feels wrong to implement /proc/cpu/sve_default_vl by hand (see patch
37), along with all the potential bugs, buffer overflows, and
behavioural inconsistencies this implies, for a rather trivial bit of
functionality.

This may not even belong in procfs at all, though sysfs doesn't seem
right either and there's no natural kobject to tie this control to.

If there's a better framework for this, I'm open to suggestions...


Race conditions
---------------

Because parts of the FPSIMD/SVE-code can preempt other parts on the back
of context switch or IRQ, various races can occur.

The following in particular need close scrutiny:

 * Access with preemption enabled, to anything touched by
   fpsimd_thread_switch()

 * Access with IRQs enabled, to anything touched by
   kernel_neon_begin{,_partial}()


SVE register flushing
---------------------

Synchronisation of the Z- (TIF_SVE, thread->sve_state) and V- (!TIF_SVE,
thread->fpsimd_state) views of the registers, and zeroing of the high
bits of the SVE Z-registers is not consistently applied in all cases.
This may lead to noncompliance with the SVE programmer's model whereby,
say,

	// syscall
	// ...
	ldr	v0, [x0]
	// ...
	// context switch
	// ...
	str	z0, [x1]

might not result in the high bits stored from z0 all being zero (which
the SVE programmer's model demands), or there may be other similarly
weird effects -- such behaviour would be a bug, but there may be
outstanding cases I've missed.


Context management
------------------

There are up to 4 views of a task's FPSIMD/SVE state
(thread->fpsimd_state, thread->sve_state, CPU smp_processor_id(), CPU
thread->fpsimd.cpu) and various synchronisations that need to occur at
various times.  The desire to minimise preemption/IRQ blackouts when
synchronising complicates matters further by enabling races to occur.

With the addition of SVE on top of KERNEL_MODE_NEON, the code to manage
coherence between these views has grown organically into something
haphazard and hard to reason about and maintain.

I'd like to redesign the way these interactions are abstracted -- any
suggestions are welcome.


Coredump synchronisation
------------------------

In a related, non-SVE-specific issue, the FPSIMD (and SVE) registers are
not necessarily synchronised when generating a coredump, which may
result in stale FPSIMD/SVE register values in the dump compared with the
actual register state at the time the process died.

The series currently makes no attempt to fix this.  A fix may be added,
or this may be handled separately.


Bugs
----

An older version of this series exhibited buggy context switch behaviour
under stress.  This has not been reproduced on any recent version of the
code, but the test environment is currently not reproducible (involving
experimental KVM support that is not portable to the current branch).

To date, the bug (or bugs) remain undiagnosed.  I have reason to belive
that there were multiple contributory bugs in the original code, and it
seems likely that they haven't all been fixed.

The possibility of a bug in the CPU simlation used to run the test has
also never been conclusively ruled out.

The failures:

 * were only ever observed in the host;

 * were only ever observed when running multiple guests, with all guest
   VCPUs busy and all;

 * were never observed to affect FPSIMD state, only the extra SVE state;

 * were never observed to leak data between tasks, between the kernel
   and userspace, or between host and guest;

 * did not seem to involve buffer overruns or memory corruption: high
   bits of SVE Z-registers, or (more rarely) P-registers or FFR would be
   unexpectedly replaced with zeros or stale data belonging to the same
   task.

Thus I have seen no evidence that suggests non-SVE systems can be
affected, but it's difficult to say for certain.

I have a strong suspicion that the complexity of the SVE/FPSIMD context
synchronisation code is the source of these issues, but this remains
unproven.


Alan Hayward (1):
  arm64/sve: ptrace support

Dave Martin (40):
  arm64: signal: Refactor sigcontext parsing in rt_sigreturn
  arm64: signal: factor frame layout and population into separate passes
  arm64: signal: factor out signal frame record allocation
  arm64: signal: Allocate extra sigcontext space as needed
  arm64: signal: Parse extra_context during sigreturn
  arm64: efi: Add missing Kconfig dependency on KERNEL_MODE_NEON
  arm64/sve: Allow kernel-mode NEON to be disabled in Kconfig
  arm64/sve: Low-level save/restore code
  arm64/sve: Boot-time feature detection and reporting
  arm64/sve: Boot-time feature enablement
  arm64/sve: Expand task_struct for Scalable Vector Extension state
  arm64/sve: Save/restore SVE state on context switch paths
  arm64/sve: [BROKEN] Basic support for KERNEL_MODE_NEON
  Revert "arm64/sve: Allow kernel-mode NEON to be disabled in Kconfig"
  arm64/sve: Restore working FPSIMD save/restore around signals
  arm64/sve: signal: Add SVE state record to sigcontext
  arm64/sve: signal: Dump Scalable Vector Extension registers to user
    stack
  arm64/sve: signal: Restore FPSIMD/SVE state in rt_sigreturn
  arm64/sve: Avoid corruption when replacing the SVE state
  arm64/sve: traps: Add descriptive string for SVE exceptions
  arm64/sve: Enable SVE on demand for userspace
  arm64/sve: Implement FPSIMD-only context for tasks not using SVE
  arm64/sve: Move ZEN handling to the common task_fpsimd_load() path
  arm64/sve: Discard SVE state on system call
  arm64/sve: Avoid preempt_disable() during sigreturn
  arm64/sve: Avoid stale user register state after SVE access exception
  arm64: KVM: Treat SVE use by guests as undefined instruction execution
  prctl: Add skeleton for PR_SVE_{SET,GET}_VL controls
  arm64/sve: Track vector length for each task
  arm64/sve: Set CPU vector length to match current task
  arm64/sve: Factor out clearing of tasks' SVE regs
  arm64/sve: Wire up vector length control prctl() calls
  arm64/sve: Disallow VL setting for individual threads by default
  arm64/sve: Add vector length inheritance control
  arm64/sve: ptrace: Wire up vector length control and reporting
  arm64/sve: Enable default vector length control via procfs
  arm64/sve: Detect SVE via the cpufeature framework
  arm64/sve: Migrate to cpucap based detection for runtime SVE code
  arm64/sve: Allocate task SVE context storage dynamically
  arm64/sve: Documentation: Add overview of the SVE userspace ABI

 Documentation/arm64/sve.txt              | 475 ++++++++++++++++++++++++
 arch/arm64/Kconfig                       |  12 +
 arch/arm64/include/asm/cpu.h             |   3 +
 arch/arm64/include/asm/cpucaps.h         |   3 +-
 arch/arm64/include/asm/cpufeature.h      |  13 +
 arch/arm64/include/asm/esr.h             |   3 +-
 arch/arm64/include/asm/fpsimd.h          |  72 ++++
 arch/arm64/include/asm/fpsimdmacros.h    | 150 ++++++++
 arch/arm64/include/asm/kvm_arm.h         |   1 +
 arch/arm64/include/asm/processor.h       |  14 +
 arch/arm64/include/asm/sysreg.h          |  15 +
 arch/arm64/include/asm/thread_info.h     |   2 +
 arch/arm64/include/uapi/asm/hwcap.h      |   1 +
 arch/arm64/include/uapi/asm/ptrace.h     | 130 +++++++
 arch/arm64/include/uapi/asm/sigcontext.h | 117 ++++++
 arch/arm64/kernel/cpufeature.c           |  39 ++
 arch/arm64/kernel/cpuinfo.c              |  14 +
 arch/arm64/kernel/entry-fpsimd.S         |  17 +
 arch/arm64/kernel/entry.S                |  18 +-
 arch/arm64/kernel/fpsimd.c               | 613 ++++++++++++++++++++++++++++++-
 arch/arm64/kernel/head.S                 |  15 +-
 arch/arm64/kernel/process.c              |   6 +-
 arch/arm64/kernel/ptrace.c               | 253 ++++++++++++-
 arch/arm64/kernel/setup.c                |   1 +
 arch/arm64/kernel/signal.c               | 500 +++++++++++++++++++++++--
 arch/arm64/kernel/signal32.c             |   2 +-
 arch/arm64/kernel/traps.c                |   1 +
 arch/arm64/kvm/handle_exit.c             |   8 +
 arch/arm64/mm/proc.S                     |  14 +-
 include/uapi/linux/elf.h                 |   1 +
 include/uapi/linux/prctl.h               |  11 +
 kernel/sys.c                             |  12 +
 32 files changed, 2474 insertions(+), 62 deletions(-)
 create mode 100644 Documentation/arm64/sve.txt

-- 
2.1.4

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC PATCH v2 29/41] prctl: Add skeleton for PR_SVE_{SET,GET}_VL controls
  2017-03-22 14:50 [RFC PATCH v2 00/41] Scalable Vector Extension (SVE) core support Dave Martin
@ 2017-03-22 14:50 ` Dave Martin
  2017-03-22 14:51 ` [RFC PATCH v2 33/41] arm64/sve: Wire up vector length control prctl() calls Dave Martin
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Dave Martin @ 2017-03-22 14:50 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Will Deacon, Catalin Marinas, Marc Zyngier, Ard Biesheuvel,
	Florian Weimer, Joseph Myers, Szabolcs Nagy, Andrew Morton,
	linux-kernel

This patch adds a do-nothing skeleton for the arm64 Scalable Vector
Extension control prctls.

These prctls are only avilable with

    CONFIG_ARM64=y
    CONFIG_ARM64_SVE=y

Otherwise they will compile out and return -EINVAL if attempted
from userspace.

The backend functions sve_{set,get}_task_vl() will be fleshed out
with actual functionality in subsequent patches.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
---
 arch/arm64/include/asm/fpsimd.h    | 16 ++++++++++++++++
 arch/arm64/include/asm/processor.h |  4 ++++
 arch/arm64/kernel/fpsimd.c         | 13 +++++++++++++
 include/uapi/linux/prctl.h         |  4 ++++
 kernel/sys.c                       | 12 ++++++++++++
 5 files changed, 49 insertions(+)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 71b94ee..7dba890 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -17,6 +17,7 @@
 #define __ASM_FP_H
 
 #include <asm/ptrace.h>
+#include <asm/errno.h>
 
 #ifndef __ASSEMBLY__
 
@@ -107,12 +108,27 @@ extern void fpsimd_sync_to_sve(struct task_struct *task);
 extern void sve_sync_to_fpsimd(struct task_struct *task);
 extern void sve_sync_from_fpsimd_zeropad(struct task_struct *task);
 
+extern int sve_set_task_vl(struct task_struct *task,
+			   unsigned long vector_length, unsigned long flags);
+extern int sve_get_task_vl(struct task_struct *task);
+
 #else /* ! CONFIG_ARM64_SVE */
 
 static void __maybe_unused sve_sync_to_fpsimd(struct task_struct *task) { }
 static void __maybe_unused sve_sync_from_fpsimd_zeropad(
 	struct task_struct *task) { }
 
+static int __maybe_unused sve_set_task_vl(struct task_struct *task,
+	unsigned long vector_length, unsigned long flags)
+{
+	return -EINVAL;
+}
+
+static int __maybe_unused sve_get_task_vl(struct task_struct *task)
+{
+	return -EINVAL;
+}
+
 #endif /* ! CONFIG_ARM64_SVE */
 
 #endif
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index c97b8bd..865c279 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -189,4 +189,8 @@ static inline void spin_lock_prefetch(const void *ptr)
 int cpu_enable_pan(void *__unused);
 int cpu_enable_cache_maint_trap(void *__unused);
 
+#define SVE_SET_VL(task, vector_length, flags) \
+	sve_set_task_vl(task, vector_length, flags)
+#define SVE_GET_VL(task) sve_get_task_vl(task)
+
 #endif /* __ASM_PROCESSOR_H */
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index ee59325..4102d13 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -204,6 +204,19 @@ void do_sve_acc(unsigned int esr, struct pt_regs *regs)
 	task_fpsimd_load(current);
 }
 
+/* PR_SVE_SET_VL */
+int sve_set_task_vl(struct task_struct *task,
+		    unsigned long vector_length, unsigned long flags)
+{
+	return -EINVAL;
+}
+
+/* PR_SVE_GET_VL */
+int sve_get_task_vl(struct task_struct *task)
+{
+	return -EINVAL;
+}
+
 #else /* ! CONFIG_ARM64_SVE */
 
 /* Dummy declarations for usage protected with IS_ENABLED(CONFIG_ARM64_SVE): */
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index a8d0759..e32e2da 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -197,4 +197,8 @@ struct prctl_mm_map {
 # define PR_CAP_AMBIENT_LOWER		3
 # define PR_CAP_AMBIENT_CLEAR_ALL	4
 
+/* arm64 Scalable Vector Extension controls */
+#define PR_SVE_SET_VL			48	/* set task vector length */
+#define PR_SVE_GET_VL			49	/* get task vector length */
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/kernel/sys.c b/kernel/sys.c
index 7ff6d1b..ea5d37e 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -110,6 +110,12 @@
 #ifndef SET_FP_MODE
 # define SET_FP_MODE(a,b)	(-EINVAL)
 #endif
+#ifndef SVE_SET_VL
+# define SVE_SET_VL(a,b,c)	(-EINVAL)
+#endif
+#ifndef SVE_GET_VL
+# define SVE_GET_VL(a)		(-EINVAL)
+#endif
 
 /*
  * this is where the system-wide overflow UID and GID are defined, for
@@ -2290,6 +2296,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 	case PR_GET_FP_MODE:
 		error = GET_FP_MODE(me);
 		break;
+	case PR_SVE_SET_VL:
+		error = SVE_SET_VL(me, arg2, arg3);
+		break;
+	case PR_SVE_GET_VL:
+		error = SVE_GET_VL(me);
+		break;
 	default:
 		error = -EINVAL;
 		break;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH v2 33/41] arm64/sve: Wire up vector length control prctl() calls
  2017-03-22 14:50 [RFC PATCH v2 00/41] Scalable Vector Extension (SVE) core support Dave Martin
  2017-03-22 14:50 ` [RFC PATCH v2 29/41] prctl: Add skeleton for PR_SVE_{SET,GET}_VL controls Dave Martin
@ 2017-03-22 14:51 ` Dave Martin
  2017-03-22 14:51 ` [RFC PATCH v2 41/41] arm64/sve: Documentation: Add overview of the SVE userspace ABI Dave Martin
  2017-03-31 15:28 ` [RFC PATCH v2 00/41] Scalable Vector Extension (SVE) core support Ard Biesheuvel
  3 siblings, 0 replies; 9+ messages in thread
From: Dave Martin @ 2017-03-22 14:51 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Will Deacon, Catalin Marinas, Marc Zyngier, Ard Biesheuvel,
	Florian Weimer, Joseph Myers, Szabolcs Nagy, Andrew Morton,
	linux-kernel

This patch provides implementation for the PR_SVE_SET_VL and
PR_SVE_GET_VL prctls, which allow a task to set and query its
vector length and associated control flags (although no flags are
defined in this patch).

Currently any thread can set its VL, allowing a mix of VLs within
a single process -- this behaviour will be refined in susequent
patches.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
---
 arch/arm64/include/asm/fpsimd.h |  3 ++
 arch/arm64/kernel/fpsimd.c      | 65 +++++++++++++++++++++++++++++++++++++++--
 2 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 557e755..764da0f 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -109,6 +109,9 @@ extern void fpsimd_sync_to_sve(struct task_struct *task);
 extern void sve_sync_to_fpsimd(struct task_struct *task);
 extern void sve_sync_from_fpsimd_zeropad(struct task_struct *task);
 
+extern int sve_set_vector_length(struct task_struct *task,
+				 unsigned long vl, unsigned long flags);
+
 extern int sve_set_task_vl(struct task_struct *task,
 			   unsigned long vector_length, unsigned long flags);
 extern int sve_get_task_vl(struct task_struct *task);
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index daceeae..c9934bb 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -221,17 +221,78 @@ void do_sve_acc(unsigned int esr, struct pt_regs *regs)
 	task_fpsimd_load(current);
 }
 
+int sve_set_vector_length(struct task_struct *task,
+			  unsigned long vl, unsigned long flags)
+{
+	BUG_ON(task == current && preemptible());
+
+	if (flags)
+		return -EINVAL; /* No flags defined yet */
+
+	if (!sve_vl_valid(vl))
+		return -EINVAL;
+
+	if (vl > sve_max_vl) {
+		BUG_ON(!sve_vl_valid(sve_max_vl));
+		vl = sve_max_vl;
+	}
+
+	/*
+	 * To ensure the FPSIMD bits of the SVE vector registers are preserved,
+	 * write any live register state back to task_struct, and convert to a
+	 * non-SVE thread.
+	 */
+	if (vl != task->thread.sve_vl) {
+		if (task == current &&
+		    !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
+			task_fpsimd_save(current);
+
+		if (test_and_clear_tsk_thread_flag(task, TIF_SVE))
+			sve_to_fpsimd(task);
+
+		/*
+		 * To avoid surprises, also zero out the SVE regs storage.
+		 * This means that the P-regs, FFR and high bits of Z-regs
+		 * will read as zero on next access:
+		 */
+		clear_sve_regs(task);
+	}
+
+	task->thread.sve_vl = vl;
+
+	fpsimd_flush_task_state(task);
+
+	return 0;
+}
+
 /* PR_SVE_SET_VL */
 int sve_set_task_vl(struct task_struct *task,
 		    unsigned long vector_length, unsigned long flags)
 {
-	return -EINVAL;
+	int ret;
+
+	if (!(elf_hwcap & HWCAP_SVE))
+		return -EINVAL;
+
+	BUG_ON(task != current);
+
+	preempt_disable();
+	ret = sve_set_vector_length(current, vector_length, flags);
+	preempt_enable();
+
+	if (ret)
+		return ret;
+
+	return task->thread.sve_vl;
 }
 
 /* PR_SVE_GET_VL */
 int sve_get_task_vl(struct task_struct *task)
 {
-	return -EINVAL;
+	if (!(elf_hwcap & HWCAP_SVE))
+		return -EINVAL;
+
+	return task->thread.sve_vl;
 }
 
 #else /* ! CONFIG_ARM64_SVE */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH v2 41/41] arm64/sve: Documentation: Add overview of the SVE userspace ABI
  2017-03-22 14:50 [RFC PATCH v2 00/41] Scalable Vector Extension (SVE) core support Dave Martin
  2017-03-22 14:50 ` [RFC PATCH v2 29/41] prctl: Add skeleton for PR_SVE_{SET,GET}_VL controls Dave Martin
  2017-03-22 14:51 ` [RFC PATCH v2 33/41] arm64/sve: Wire up vector length control prctl() calls Dave Martin
@ 2017-03-22 14:51 ` Dave Martin
  2017-03-31 15:28 ` [RFC PATCH v2 00/41] Scalable Vector Extension (SVE) core support Ard Biesheuvel
  3 siblings, 0 replies; 9+ messages in thread
From: Dave Martin @ 2017-03-22 14:51 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Will Deacon, Catalin Marinas, Marc Zyngier, Ard Biesheuvel,
	Florian Weimer, Joseph Myers, Szabolcs Nagy, Andrew Morton,
	linux-kernel, Alan Hayward, Yao Qi, gdb, Christoffer Dall,
	libc-alpha, Richard Sandiford, Torvald Riegel

This patch adds initial documentation of the ABI provided by the
kernel to enable userspace software to make use of SVE.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
---
 Documentation/arm64/sve.txt | 475 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 475 insertions(+)
 create mode 100644 Documentation/arm64/sve.txt

diff --git a/Documentation/arm64/sve.txt b/Documentation/arm64/sve.txt
new file mode 100644
index 0000000..f82fbf8
--- /dev/null
+++ b/Documentation/arm64/sve.txt
@@ -0,0 +1,475 @@
+            Scalable Vector Extension support for AArch64 Linux
+            ===================================================
+
+Author: Dave Martin <Dave.Martin@arm.com>
+Date:   20 March 2017
+
+This document outlines briefly the interface provided to userspace by Linux in
+order to support use of the ARM Scalable Vector Extension (SVE).
+
+This is an outline of the most important features and issues only and not
+intended to be exhaustive.
+
+This document does not aim to describe the SVE architecture or programmer's
+model.  To aid understanding, a minimal description of relevant programmer's
+model features for SVE is included in Appendix A.
+
+
+1.  General
+-----------
+
+* SVE registers Z0..Z31, P0..P15 and FFR and the current vector length VL, are
+  tracked per-thread.
+
+* The presence of SVE is reported to userspace via HWCAP_SVE in the aux vector
+  AT_HWCAP entry.  SVE is reported in /proc/cpuinfo as "sve".
+
+
+2.  Vector length terminology
+-----------------------------
+
+The size of an SVE vector (Z) register is referred to as the "vector length".
+
+To avoid confusion about the units used to express vector length, the kernel
+adopts the following conventions:
+
+* Vector length (VL) = size of a Z-register in bytes
+
+* Vector quadwords (VQ) = size of a Z-register in units of 128 bits
+
+(So, VL = 16 * VQ.)
+
+The VQ convention is used where the underlying granularity is important, such
+as in data structure definitions.  In most other situations, the VL convention
+is used.  This is consistent with the meaning of the "VL" pseudo-register in
+the SVE instruction set architecture.
+
+
+3.  System call behaviour
+-------------------------
+
+* On syscall, V0..V31 are preserved (as without SVE).  Thus, bits [127:0] of
+  Z0..Z31 are preserved.  All other bits of Z0..Z31, and all of P0..P15 and FFR
+  become unspecified on return from a syscall.
+
+* The SVE registers are not used to pass arguments to or receive results from
+  any syscall.
+
+* In practice the affected registers/bits will be preserved or will be replaced
+  with zeros on return from a syscall, but userspace should not make
+  assumptions about this.  The kernel behaviour may vary on a case-by-case
+  basis.
+
+
+4.  Signal handling
+-------------------
+
+* A new signal frame record sve_context encodes the SVE registers on signal
+  delivery. [1]
+
+* This record is supplementary to fpsimd_context.  The FPSR and FPCR registers
+  are only present in fpsimd_context.  For convenience, the content of V0..V31
+  is duplicated between sve_context and fpsimd_context.
+
+* The sve_context record is included if and only if the SVE registers are live
+  at the time of signal delivery.
+
+If the sve_context record is present in the signal frame:
+
+* sve_context.vl encodes the current vector length of the thread.
+
+* The remainder of the data has a vl-dependent size and layout.  Macros
+  SIG_SVE_* are defined [1] to facilitate access to the members.
+
+* If the SVE context is too big to fit in sigcontext.__reserved[], then extra
+  space is allocated on the stack, an extra_context record is written in
+  __reserved[] referencing this space.  sve_context is then written in the
+  extra space.  Refer to [1] for further details about this mechanism.
+
+
+5.  Signal return
+-----------------
+
+When returning from a signal handler:
+
+* If there is no sve_context record in the signal frame, the SVE registers/bits
+  become non-live and unspecified.  (In practice, this is likely to mean that
+  these bits become zero, but software should not make assumptions about this.)
+
+* If sve_context is present in the signal frame, the SVE registers become live
+  and are populated appropriately.  However, for backward compatibility
+  reasons, bits [127:0] of Z0..Z31 are always restored from the corresponding
+  members of fpsimd_context.vregs[] and not from sve_context.  The remaining
+  bits are restored from sve_context.
+
+* Inclusion of fpsimd_context in the signal frame remains mandatory,
+  irrespective of whether sve_context is present or not.
+
+* The vector length cannot be changed via signal return.  If sve_context.vl in
+  the signal frame does not match the current vector length, the signal return
+  attempt is treated as illegal, resulting in a forced SIGSEGV.
+
+
+6.  prctl extensions
+--------------------
+
+Some new prctl() calls are added to allow programs to manage the SVE vector
+length:
+
+prctl(PR_SVE_SET_VL, unsigned long vl, unsigned long flags)
+
+    Sets the vector length of the calling thread.
+
+    flags:
+
+	PR_SVE_SET_VL_THREAD
+
+	    Permit setting even if the calling process has multiple threads.
+	    This is otherwise forbidden in order to prevent accidents.
+
+	    Worker threads with different vector lengths can be created in this
+	    way, within a single process.
+
+	    General-purpose userspace code may not work correctly unless all
+	    threads have the same vector length.
+
+	PR_SVE_SET_VL_INHERIT
+
+	    Inherit the current vector length across execve().  Otherwise, the
+	    vector length is reset to the system default at execve().  (See
+	    Section 9.)
+
+	PR_SVE_SET_VL_ONEXEC
+
+	    Defer the requested vector length change until the next execve().
+	    This allows launching of a new program with a different vector
+	    length, while avoiding runtime side effects in the caller.
+
+	    This also overrides the effect of PR_SVE_SET_VL_INHERIT for the
+	    first execve().
+
+	    Without PR_SVE_SET_VL_ONEXEC, any outstanding deferred vector
+	    length change is cancelled.
+
+    Return value: a nonnegative on success, or a negative value on error:
+	EINVAL: SVE not supported, invalid vector length requested, or
+	    invalid flags.
+
+    The returned value describes the resulting configuration, encoded as for
+    PR_SVE_GET_VL.
+
+    Changing the vector length causes all of P0..P15, FFR and all bits of
+    Z0..V31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become
+    unspecified.  Calling PR_SVE_SET_VL with vl equal to the thread's current
+    vector length does not constitute a change to the vector length for this
+    purpose.
+
+    If vl is greater than the maximum vector length supported by the system,
+    the vector length is set to the maximum supported vector length.
+
+
+prctl(PR_SVE_GET_VL)
+
+    Gets the vector length of the calling thread.
+
+    The following flag may be OR-ed into the result:
+
+	PR_SVE_SET_VL_INHERIT
+
+	    Vector length will be inherited across execve().
+
+    There is no way to determine whether there is an outstanding deferred
+    vector length change (which would only normally be the case between a
+    fork() or vfork() and the corresponding execve() in typical use).
+
+    To extract the vector length from the result, use PR_SVE_GET_VL_LEN().
+
+    Return value: a nonnegative value on success, or a negative value on error:
+	EINVAL: SVE not supported.
+
+
+prctl(PR_GET_MINSIGSTKSZ)
+
+    Returns the minimum amount of stack space in bytes required for guaranteed
+    signal delivery.
+
+    The returned value is analogous to the POSIX macro MINSIGSTKSZ, except that
+    expansion of the signal frame is taken into account.  This allows correct
+    allocation of stacks even for large vector lengths.
+
+    Return value: a nonnegative value on success, or a negative value on error:
+	EINVAL: Function not implemented.
+
+    If this call fails, the value provided for MINSIGSTKSZ by [1] can be
+    assumed.
+
+
+7.  ptrace extensions
+---------------------
+
+* A new regset NT_ARM_SVE is defined for use with PTRACE_GETREGSET and
+  PTRACE_SETREGSET.
+
+  Refer to [2] for definitions.
+
+The regset data starts with struct user_sve_header, containing:
+
+    size
+
+	Size of the complete regset, in bytes.
+	This depends on vl and possibly on other things in the future.
+
+	If a call to PTRACE_GETREGSET requests less data than the value of
+	size, the caller can allocate a larger buffer and retry in order to
+	read the complete regset.
+
+    max_size
+
+	Maximum size in bytes that the regset can grow to for the target
+	thread.  The regset won't grow bigger than this even if the target
+	thread changes its vector length etc.
+
+    vl
+
+	Target thread's current vector length, in bytes.
+
+    max_vl
+
+	Maximum possible vector length for the target thread.
+
+    flags
+
+	either
+
+	    SVE_PT_REGS_FPSIMD
+
+		SVE registers are not live (GETREGSET) or are to be made
+		non-live (SETREGSET).
+
+		The payload is of type struct user_fpsimd_state, with the same
+		meaning as for NT_PRFPREG, starting at offset
+		SVE_PT_FPSIMD_OFFSET from the start of user_sve_header.
+
+		Extra data might be appended in the future: the size of the
+		payload should be obtained using SVE_PT_FPSIMD_SIZE(vq, flags).
+
+		vq should be obtained using sve_vq_from_vl(vl).
+
+		or
+
+	    SVE_PT_REGS_SVE
+
+		SVE registers are live (GETREGSET) or are to be made live
+		(SETREGSET).
+
+		The payload contains the SVE register data, starting at offset
+		SVE_PT_SVE_OFFSET from the start of user_sve_header, and with
+		size SVE_PT_SVE_SIZE(vq, flags);
+
+	... OR-ed with zero or more of the following flags, which have the same
+	meaning and behaviour as the corresponding PR_SET_VL_* flags:
+
+	    SVE_PT_VL_THREAD (SETREGSET only)
+
+	    SVE_PT_VL_INHERIT
+
+	    SVE_PT_VL_ONEXEC (SETREGSET only).
+
+* The effects of changing the vector length and/or flags are equivalent to
+  those documented for PR_SVE_SET_VL.
+
+* In the SVE_PT_REGS_SVE case, the size and layout of the payload depends on
+  the header fields.  The SVE_PT_SVE_*() macros are provided to facilitate
+  access to the members.
+
+* In either case, for SETREGSET it is permissible to omit the payload, in which
+  case only the vector length and flags are changed (along with any
+  consequences of those changes).
+
+  The effect of writing a partial, incomplete payload is unspecified.
+
+
+8.  ELF coredump extensions
+---------------------------
+
+* A NT_ARM_SVE note will be added to each coredump for each thread of the
+  dumped process.  The contents will be equivalent to the data that would have
+  been read if a PTRACE_GETREGSET of NT_ARM_SVE were executed for each thread
+  when the coredump was generated.
+
+  Each of these notes may be padded up to a larger size: if so, the surplus
+  data should be ignored when parsing the coredump.
+
+
+9.  System runtime configuration
+--------------------------------
+
+* To mitigate the ABI impact of expansion of the signal frame, a policy
+  mechanism is provided for administrators, distro maintainers and developers
+  to set the default vector length for userspace processes:
+
+/proc/cpu/sve_default_vector_length
+
+    Writing the text representation of an integer to this file sets the system
+    default vector length to the specified value, unless the value is greater
+    than the maximum vector length supported by the system in which case the
+    default vector length is set to that maximum.
+
+    The result can be determined by reopening the file and reading its
+    contents.
+
+    At boot, the default vector length is initially set to 64 or the maximum
+    supported vector length, whichever is smaller.  This determines the initial
+    vector length of the init process (PID 1).
+
+    Reading this file returns the current system default vector length.
+
+* At every execve() call, the new vector length of the new process is set to
+  the system default vector length, unless
+
+    * PR_SVE_SET_VL_INHERIT (or equivalently SVE_PT_VL_INHERIT) is set for the
+      calling thread, or
+
+    * a deferred vector length change is pending, established via the
+      PR_SVE_SET_VL_ONEXEC flag (or SVE_PT_VL_ONEXEC).
+
+* Modifying the system default vector length does not affect the vector length
+  of any existing process or thread that does not make an execve() call.
+
+
+Appendix A.  SVE programmer's model (informative)
+=================================================
+
+This section provides a minimal description of the additions made by SVE to the
+ARMv8-A programmer's model that are relevant to this document.
+
+Note: This section is for information only and not intended to be complete or
+to replace any architectural specification.
+
+A.1.  Registers
+---------------
+
+In A64 state, SVE adds the following:
+
+* 32 8VL-bit vector registers Z0..Z31
+  For each Zn, Zn bits [127:0] alias the ARMv8-A vector register Vn.
+
+  A register write using a Vn register name zeros all bits of the corresponding
+  Zn except for bits [127:0].
+
+* 16 VL-bit predicate registers P0..P15
+
+* 1 VL-bit special-purpose predicate register FFR (the "first-fault register")
+
+* a VL "pseudo-register" that determines the size of each vector register
+
+  The SVE instruction set architecture provides no way to write VL directly.
+  Instead, it can be modified only by EL1 and above, by writing appropriate
+  system registers.
+
+* The value of VL can be configured at runtime by EL1 and above:
+  16 <= VL <= VLmax, where VL must be a multiple of 16.
+
+* The maximum vector length is determined by the hardware:
+  16 <= VLmax <= 256.
+
+  (The SVE architecture specifies 256, but permits future architecture
+  revisions to raise this limit.)
+
+* FPSR and FPCR are retained from ARMv8-A, and interact with SVE floating-point
+  operations in a similar way to the way in which they interact with ARMv8
+  floating-point operations.
+
+         8VL-1                       128               0  bit index
+        +----          ////            -----------------+
+     Z0 |                               :       V0      |
+      :                                          :
+     Z7 |                               :       V7      |
+     Z8 |                               :     * V8      |
+      :                                       :  :
+    Z15 |                               :     *V15      |
+    Z16 |                               :      V16      |
+      :                                          :
+    Z31 |                               :      V31      |
+        +----          ////            -----------------+
+                                                 31    0
+         VL-1                  0                +-------+
+        +----       ////      --+          FPSR |       |
+     P0 |                       |               +-------+
+      : |                       |         *FPCR |       |
+    P15 |                       |               +-------+
+        +----       ////      --+
+    FFR |                       |               +-----+
+        +----       ////      --+            VL |     |
+                                                +-----+
+
+(*) callee-save:
+    This only applies to bits [63:0] of Z-/V-registers.
+    FPCR contains callee-save and caller-save bits.  See [3] for details.
+
+
+A.2.  Procedure call standard
+-----------------------------
+
+The ARMv8-A base procedure call standard is extended as follows with respect to
+the additional SVE register state:
+
+* All SVE register bits that are not shared with FP/SIMD are caller-save.
+
+* Z8 bits [63:0] .. Z15 bits [63:0] are callee-save.
+
+  This follows from the way these bits are mapped to V8..V15, which are caller-
+  save in the base procedure call standard.
+
+
+Appendix B.  ARMv8-A FP/SIMD programmer's model
+===============================================
+
+Note: This section is for information only and not intended to be complete or
+to replace any architectural specification.
+
+Refer to [3] for for more information.
+
+ARMv8-A defines the following floating-point / SIMD register state:
+
+* 32 128-bit vector registers V0..V31
+* 2 32-bit status/control registers FPSR, FPCR
+
+         127           0  bit index
+        +---------------+
+     V0 |               |
+      : :               :
+     V7 |               |
+   * V8 |               |
+   :  : :               :
+   *V15 |               |
+    V16 |               |
+      : :               :
+    V31 |               |
+        +---------------+
+
+                 31    0
+                +-------+
+           FPSR |       |
+                +-------+
+          *FPCR |       |
+                +-------+
+
+(*) callee-save:
+    This only applies to bits [63:0] of V-registers.
+    FPCR contains a mixture of callee-save and caller-save bits.
+
+
+References
+==========
+
+[1] arch/arm64/include/uapi/asm/sigcontext.h
+    AArch64 Linux signal ABI definitions
+
+[2] arch/arm64/include/uapi/asm/ptrace.h
+    AArch64 Linux ptrace ABI definitions
+
+[3] ARM IHI0055C
+    http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/IHI0055C_beta_aapcs64.pdf
+    http://infocenter.arm.com/help/topic/com.arm.doc.subset.swdev.abi/index.html
+    Procedure Call Standard for the ARM 64-bit Architecture (AArch64)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH v2 00/41] Scalable Vector Extension (SVE) core support
  2017-03-22 14:50 [RFC PATCH v2 00/41] Scalable Vector Extension (SVE) core support Dave Martin
                   ` (2 preceding siblings ...)
  2017-03-22 14:51 ` [RFC PATCH v2 41/41] arm64/sve: Documentation: Add overview of the SVE userspace ABI Dave Martin
@ 2017-03-31 15:28 ` Ard Biesheuvel
  2017-04-03  9:45   ` Dave Martin
  3 siblings, 1 reply; 9+ messages in thread
From: Ard Biesheuvel @ 2017-03-31 15:28 UTC (permalink / raw)
  To: Dave Martin
  Cc: linux-arm-kernel, Will Deacon, Catalin Marinas, Marc Zyngier,
	Florian Weimer, Joseph Myers, Szabolcs Nagy, Andrew Morton,
	linux-kernel, Alan Hayward, Yao Qi, gdb, Christoffer Dall,
	libc-alpha, Richard Sandiford, Torvald Riegel

On 22 March 2017 at 14:50, Dave Martin <Dave.Martin@arm.com> wrote:

Hi Dave,

> The Scalable Vector Extension (SVE) [1] is an extension to AArch64 which
> adds extra SIMD functionality and supports much larger vectors.
>
> This series implements core Linux support for SVE.
>
[...]
> KERNEL_MODE_NEON (non-)support
> ------------------------------
>
> "arm64/sve: [BROKEN] Basic support for KERNEL_MODE_NEON" is broken.
> There are significant design issues here that need discussion -- see the
> commit message for details.
>
> Options:
>
>  * Make KERNEL_MODE_NEON a runtime choice, and disable it if SVE is
>    present.
>
>  * Fully SVE-ise the KERNEL_MODE_NEON code: this will involve complexity
>    and effort, and may involve unfavourable (and VL-dependent) tradeoffs
>    compared with the no-SVE case.
>
>    We will nonetheless need something like this if there is a desire to
>    support "kernel mode SVE" in the future.  The fact that with SVE,
>    KERNEL_MODE_NEON brings the cost of kernel-mode SVE but only the
>    benefits of kernel-mode NEON argues in favour of this.
>
>  * Make KERNEL_MODE_NEON a dynamic choice, and have clients run fallback
>    C code instead if at runtime on a case-by-case basis, if SVE regs
>    would otherwise need saving.
>
>    This is an interface break, but all NEON-optimised kernel code
>    necessarily requires a fallback C implementation to exist anyway, and
>    the number of clients is not huge.
>
> We could go for a stopgap solution that at least works but is suboptimal
> for SVE systems (such as the first choice above), and then improve it
> later.
>

Without having looked at the patches in detail yet, let me reiterate
my position after we discussed this when this series was sent out the
first time around.

- The primary use case for kernel mode NEON is special purpose
instructions, i.e., AES is 20x faster when using the NEON, simply
because that is how one accesses the logic gates that implement the
AES algorithm. There is nothing SIMD or FP in nature about AES.
Compare the CRC extensions, which use scalar registers and
instructions. Of course, there are a couple of exceptions in the form
of bit-slicing algorithms, but in general, like general SIMD, I don't
think it is highly likely that SVE in kernel mode is something we will
have a need for in the foreseeable future.

- The current way of repeatedly stacking/unstacking NEON register
contents in interrupt context is highly inefficient, given that we are
usually interrupting user mode, not kernel mode, and so stacking once
and unstacking when returning from the exception (which is how we
usually deal with it) would be much better. So changing the current
implementation to perform the eager stack/unstack only when a kernel
mode NEON call is already in progress is likely to improve our current
situation already, regardless of whether such a change is needed to
accommodate SVE. Note that to my knowledge, the only in-tree users of
kernel mode NEON operate in process context or softirq context, never
in hardirq context.

Given the above, I think it is perfectly reasonable to conditionally
disallow kernel mode NEON in hardirq context. The crypto routines that
rely on it can easily be fixed up (I already wrote the patches for
that). This would only be necessary on SVE systems, and the reason for
doing so is that - given how preserving and restoring the NEON
register file blows away the upper SVE part of the registers - whoever
arrives at the SVE-aware preserve routine first should be allowed to
run to completion. This does require disabling softirqs during the
time the preserved NEON state is being manipulated but that does not
strike me as a huge price to pay. Note that both restrictions
(disallowing kernel mode NEON in hardirq context, and the need to
disable softirqs while manipulating the state) could be made runtime
dependent on whether we are actually running on an SVE system.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH v2 00/41] Scalable Vector Extension (SVE) core support
  2017-03-31 15:28 ` [RFC PATCH v2 00/41] Scalable Vector Extension (SVE) core support Ard Biesheuvel
@ 2017-04-03  9:45   ` Dave Martin
  2017-04-03 10:01     ` Ard Biesheuvel
  0 siblings, 1 reply; 9+ messages in thread
From: Dave Martin @ 2017-04-03  9:45 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Florian Weimer, Christoffer Dall, gdb, Marc Zyngier,
	Catalin Marinas, Yao Qi, Will Deacon, linux-kernel,
	Szabolcs Nagy, Richard Sandiford, linux-arm-kernel, Alan Hayward,
	libc-alpha, Andrew Morton, Torvald Riegel, Joseph Myers

On Fri, Mar 31, 2017 at 04:28:16PM +0100, Ard Biesheuvel wrote:
> On 22 March 2017 at 14:50, Dave Martin <Dave.Martin@arm.com> wrote:
> 
> Hi Dave,
> 
> > The Scalable Vector Extension (SVE) [1] is an extension to AArch64 which
> > adds extra SIMD functionality and supports much larger vectors.
> >
> > This series implements core Linux support for SVE.
> >
> [...]
> > KERNEL_MODE_NEON (non-)support
> > ------------------------------
> >
> > "arm64/sve: [BROKEN] Basic support for KERNEL_MODE_NEON" is broken.
> > There are significant design issues here that need discussion -- see the
> > commit message for details.
> >
> > Options:
> >
> >  * Make KERNEL_MODE_NEON a runtime choice, and disable it if SVE is
> >    present.
> >
> >  * Fully SVE-ise the KERNEL_MODE_NEON code: this will involve complexity
> >    and effort, and may involve unfavourable (and VL-dependent) tradeoffs
> >    compared with the no-SVE case.
> >
> >    We will nonetheless need something like this if there is a desire to
> >    support "kernel mode SVE" in the future.  The fact that with SVE,
> >    KERNEL_MODE_NEON brings the cost of kernel-mode SVE but only the
> >    benefits of kernel-mode NEON argues in favour of this.
> >
> >  * Make KERNEL_MODE_NEON a dynamic choice, and have clients run fallback
> >    C code instead if at runtime on a case-by-case basis, if SVE regs
> >    would otherwise need saving.
> >
> >    This is an interface break, but all NEON-optimised kernel code
> >    necessarily requires a fallback C implementation to exist anyway, and
> >    the number of clients is not huge.
> >
> > We could go for a stopgap solution that at least works but is suboptimal
> > for SVE systems (such as the first choice above), and then improve it
> > later.
> >
> 
> Without having looked at the patches in detail yet, let me reiterate
> my position after we discussed this when this series was sent out the
> first time around.
> 
> - The primary use case for kernel mode NEON is special purpose
> instructions, i.e., AES is 20x faster when using the NEON, simply
> because that is how one accesses the logic gates that implement the
> AES algorithm. There is nothing SIMD or FP in nature about AES.
> Compare the CRC extensions, which use scalar registers and
> instructions. Of course, there are a couple of exceptions in the form
> of bit-slicing algorithms, but in general, like general SIMD, I don't
> think it is highly likely that SVE in kernel mode is something we will
> have a need for in the foreseeable future.

Certainly there is no immediate need for this, and if we decide we never
need it then that helps us avoid some complexity.

My main concern is around the extra save/restore cost, given that if
the SVE registers are live then save/restore of the full SVE vectors
is needed even if only FPSIMD is used in the meantime.

> - The current way of repeatedly stacking/unstacking NEON register
> contents in interrupt context is highly inefficient, given that we are
> usually interrupting user mode, not kernel mode, and so stacking once
> and unstacking when returning from the exception (which is how we
> usually deal with it) would be much better. So changing the current
> implementation to perform the eager stack/unstack only when a kernel
> mode NEON call is already in progress is likely to improve our current
> situation already, regardless of whether such a change is needed to
> accommodate SVE. Note that to my knowledge, the only in-tree users of
> kernel mode NEON operate in process context or softirq context, never
> in hardirq context.

Reassuring.

> Given the above, I think it is perfectly reasonable to conditionally
> disallow kernel mode NEON in hardirq context. The crypto routines that
> rely on it can easily be fixed up (I already wrote the patches for
> that). This would only be necessary on SVE systems, and the reason for
> doing so is that - given how preserving and restoring the NEON
> register file blows away the upper SVE part of the registers - whoever
> arrives at the SVE-aware preserve routine first should be allowed to
> run to completion. This does require disabling softirqs during the
> time the preserved NEON state is being manipulated but that does not
> strike me as a huge price to pay. Note that both restrictions
> (disallowing kernel mode NEON in hardirq context, and the need to
> disable softirqs while manipulating the state) could be made runtime
> dependent on whether we are actually running on an SVE system.

Given that we already bail out of kernel_neon_begin() with a WARN() if
the hardware lacks FPSIMD, I'm not sure it would be worse to do the same
if SVE is present.

However, we should probably abstract that check: currently, drivers
seemt to be using a cocktail of Kconfig dependencies,
MODULE_DEVICE_TABLE() and runtime hwcap checks in order to deduce
whether kernel_neon_begin() is safe to use.

Would you be happy with a single API for checking whether
kernel_neon_begin() works?  Maintaining this check in every driver
doesn't feel very scalable.


This would allow SVE to disable KERNEL_MODE_NEON without having to make
them conflict in Kconfig.  This wouldn't be our end goal, but it allows
the dependency to be decoupled while we figure out a better solution.

I'll try and come up with a patch.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH v2 00/41] Scalable Vector Extension (SVE) core support
  2017-04-03  9:45   ` Dave Martin
@ 2017-04-03 10:01     ` Ard Biesheuvel
  2017-04-03 10:51       ` Dave Martin
  0 siblings, 1 reply; 9+ messages in thread
From: Ard Biesheuvel @ 2017-04-03 10:01 UTC (permalink / raw)
  To: Dave Martin
  Cc: Florian Weimer, Christoffer Dall, gdb, Marc Zyngier,
	Catalin Marinas, Yao Qi, Will Deacon, linux-kernel,
	Szabolcs Nagy, Richard Sandiford, linux-arm-kernel, Alan Hayward,
	libc-alpha, Andrew Morton, Torvald Riegel, Joseph Myers

On 3 April 2017 at 10:45, Dave Martin <Dave.Martin@arm.com> wrote:
> On Fri, Mar 31, 2017 at 04:28:16PM +0100, Ard Biesheuvel wrote:
>> On 22 March 2017 at 14:50, Dave Martin <Dave.Martin@arm.com> wrote:
>>
>> Hi Dave,
>>
>> > The Scalable Vector Extension (SVE) [1] is an extension to AArch64 which
>> > adds extra SIMD functionality and supports much larger vectors.
>> >
>> > This series implements core Linux support for SVE.
>> >
>> [...]
>> > KERNEL_MODE_NEON (non-)support
>> > ------------------------------
>> >
>> > "arm64/sve: [BROKEN] Basic support for KERNEL_MODE_NEON" is broken.
>> > There are significant design issues here that need discussion -- see the
>> > commit message for details.
>> >
>> > Options:
>> >
>> >  * Make KERNEL_MODE_NEON a runtime choice, and disable it if SVE is
>> >    present.
>> >
>> >  * Fully SVE-ise the KERNEL_MODE_NEON code: this will involve complexity
>> >    and effort, and may involve unfavourable (and VL-dependent) tradeoffs
>> >    compared with the no-SVE case.
>> >
>> >    We will nonetheless need something like this if there is a desire to
>> >    support "kernel mode SVE" in the future.  The fact that with SVE,
>> >    KERNEL_MODE_NEON brings the cost of kernel-mode SVE but only the
>> >    benefits of kernel-mode NEON argues in favour of this.
>> >
>> >  * Make KERNEL_MODE_NEON a dynamic choice, and have clients run fallback
>> >    C code instead if at runtime on a case-by-case basis, if SVE regs
>> >    would otherwise need saving.
>> >
>> >    This is an interface break, but all NEON-optimised kernel code
>> >    necessarily requires a fallback C implementation to exist anyway, and
>> >    the number of clients is not huge.
>> >
>> > We could go for a stopgap solution that at least works but is suboptimal
>> > for SVE systems (such as the first choice above), and then improve it
>> > later.
>> >
>>
>> Without having looked at the patches in detail yet, let me reiterate
>> my position after we discussed this when this series was sent out the
>> first time around.
>>
>> - The primary use case for kernel mode NEON is special purpose
>> instructions, i.e., AES is 20x faster when using the NEON, simply
>> because that is how one accesses the logic gates that implement the
>> AES algorithm. There is nothing SIMD or FP in nature about AES.
>> Compare the CRC extensions, which use scalar registers and
>> instructions. Of course, there are a couple of exceptions in the form
>> of bit-slicing algorithms, but in general, like general SIMD, I don't
>> think it is highly likely that SVE in kernel mode is something we will
>> have a need for in the foreseeable future.
>
> Certainly there is no immediate need for this, and if we decide we never
> need it then that helps us avoid some complexity.
>
> My main concern is around the extra save/restore cost, given that if
> the SVE registers are live then save/restore of the full SVE vectors
> is needed even if only FPSIMD is used in the meantime.
>
>> - The current way of repeatedly stacking/unstacking NEON register
>> contents in interrupt context is highly inefficient, given that we are
>> usually interrupting user mode, not kernel mode, and so stacking once
>> and unstacking when returning from the exception (which is how we
>> usually deal with it) would be much better. So changing the current
>> implementation to perform the eager stack/unstack only when a kernel
>> mode NEON call is already in progress is likely to improve our current
>> situation already, regardless of whether such a change is needed to
>> accommodate SVE. Note that to my knowledge, the only in-tree users of
>> kernel mode NEON operate in process context or softirq context, never
>> in hardirq context.
>
> Reassuring.
>
>> Given the above, I think it is perfectly reasonable to conditionally
>> disallow kernel mode NEON in hardirq context. The crypto routines that
>> rely on it can easily be fixed up (I already wrote the patches for
>> that). This would only be necessary on SVE systems, and the reason for
>> doing so is that - given how preserving and restoring the NEON
>> register file blows away the upper SVE part of the registers - whoever
>> arrives at the SVE-aware preserve routine first should be allowed to
>> run to completion. This does require disabling softirqs during the
>> time the preserved NEON state is being manipulated but that does not
>> strike me as a huge price to pay. Note that both restrictions
>> (disallowing kernel mode NEON in hardirq context, and the need to
>> disable softirqs while manipulating the state) could be made runtime
>> dependent on whether we are actually running on an SVE system.
>
> Given that we already bail out of kernel_neon_begin() with a WARN() if
> the hardware lacks FPSIMD, I'm not sure it would be worse to do the same
> if SVE is present.
>
> However, we should probably abstract that check: currently, drivers
> seemt to be using a cocktail of Kconfig dependencies,
> MODULE_DEVICE_TABLE() and runtime hwcap checks in order to deduce
> whether kernel_neon_begin() is safe to use.
>

Not quite. The Kconfig dependencies are about
CONFIG_KERNEL_MODE_NEON=y being set, without which it is pretty
pointless to build the modules that depend on it.

The hwcap dependencies are mostly about the actual instructions
themselves, i.e., AES, PMULL, SHA1/2 etc

> Would you be happy with a single API for checking whether
> kernel_neon_begin() works?  Maintaining this check in every driver
> doesn't feel very scalable.
>

Yes, that makes sense, but it is unrelated to hwcap. I'd like to see a
kernel_neon_allowed() that would return '!IS_ENABLED(CONFIG_SVE) ||
!(elf_hwcap & HWCAP_SVE) || !in_irq()' at some point, although I
understand you want to remove the last part for now.
To short-circuit some decisions in the driver when
kernel_neon_allowed() is always true (as it would be currently),
something like kernel_neon_need_fallback() comes to mind.

>
> This would allow SVE to disable KERNEL_MODE_NEON without having to make
> them conflict in Kconfig.  This wouldn't be our end goal, but it allows
> the dependency to be decoupled while we figure out a better solution.
>

I still don't think there is a need to disable kernel mode NEON
altogether. But we can defer that decision until later, but prepare
the existing code to deal with that in the mean time.

As I mentioned, I already looked into this a while ago, so I can
quickly respin the changes to the crypto code to take this into
account.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH v2 00/41] Scalable Vector Extension (SVE) core support
  2017-04-03 10:01     ` Ard Biesheuvel
@ 2017-04-03 10:51       ` Dave Martin
  2017-04-03 10:55         ` Ard Biesheuvel
  0 siblings, 1 reply; 9+ messages in thread
From: Dave Martin @ 2017-04-03 10:51 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Florian Weimer, Richard Sandiford, Joseph Myers, Marc Zyngier,
	gdb, Yao Qi, Alan Hayward, Will Deacon, linux-kernel,
	Szabolcs Nagy, linux-arm-kernel, Catalin Marinas, libc-alpha,
	Andrew Morton, Torvald Riegel, Christoffer Dall

On Mon, Apr 03, 2017 at 11:01:28AM +0100, Ard Biesheuvel wrote:
> On 3 April 2017 at 10:45, Dave Martin <Dave.Martin@arm.com> wrote:
> > On Fri, Mar 31, 2017 at 04:28:16PM +0100, Ard Biesheuvel wrote:

[...]

> >> Given the above, I think it is perfectly reasonable to conditionally
> >> disallow kernel mode NEON in hardirq context. The crypto routines that
> >> rely on it can easily be fixed up (I already wrote the patches for
> >> that). This would only be necessary on SVE systems, and the reason for
> >> doing so is that - given how preserving and restoring the NEON
> >> register file blows away the upper SVE part of the registers - whoever
> >> arrives at the SVE-aware preserve routine first should be allowed to
> >> run to completion. This does require disabling softirqs during the
> >> time the preserved NEON state is being manipulated but that does not
> >> strike me as a huge price to pay. Note that both restrictions
> >> (disallowing kernel mode NEON in hardirq context, and the need to
> >> disable softirqs while manipulating the state) could be made runtime
> >> dependent on whether we are actually running on an SVE system.
> >
> > Given that we already bail out of kernel_neon_begin() with a WARN() if
> > the hardware lacks FPSIMD, I'm not sure it would be worse to do the same
> > if SVE is present.
> >
> > However, we should probably abstract that check: currently, drivers
> > seemt to be using a cocktail of Kconfig dependencies,
> > MODULE_DEVICE_TABLE() and runtime hwcap checks in order to deduce
> > whether kernel_neon_begin() is safe to use.
> >
> 
> Not quite. The Kconfig dependencies are about
> CONFIG_KERNEL_MODE_NEON=y being set, without which it is pretty
> pointless to build the modules that depend on it.
> 
> The hwcap dependencies are mostly about the actual instructions
> themselves, i.e., AES, PMULL, SHA1/2 etc

Sure, there are multiple reasons why these checks are being done:
checking that kernel_neon_begin() is available is part of the reason,
but not the whole reason for some of the checks.

Doing other checks for specific ISA extensions that the driver
wants to use remains perfectly appropriate.

> > Would you be happy with a single API for checking whether
> > kernel_neon_begin() works?  Maintaining this check in every driver
> > doesn't feel very scalable.
> >
> 
> Yes, that makes sense, but it is unrelated to hwcap. I'd like to see a
> kernel_neon_allowed() that would return '!IS_ENABLED(CONFIG_SVE) ||
> !(elf_hwcap & HWCAP_SVE) || !in_irq()' at some point, although I
> understand you want to remove the last part for now.
> To short-circuit some decisions in the driver when
> kernel_neon_allowed() is always true (as it would be currently),
> something like kernel_neon_need_fallback() comes to mind.
> 
> >
> > This would allow SVE to disable KERNEL_MODE_NEON without having to make
> > them conflict in Kconfig.  This wouldn't be our end goal, but it allows
> > the dependency to be decoupled while we figure out a better solution.
> >
> 
> I still don't think there is a need to disable kernel mode NEON
> altogether. But we can defer that decision until later, but prepare
> the existing code to deal with that in the mean time.

Agreed -- given that SVE hardware won't exist in the wild for some time,
I prefer to at least have SVE buildable in defconfig without impacting
existing systems.

This means that integrating KMN with SVE could be worked on later, and
also means that the improvements to KMN already discussed can be worked
on independently without necessarily considering SVE.

> As I mentioned, I already looked into this a while ago, so I can
> quickly respin the changes to the crypto code to take this into
> account.

It would be interesting to see what that looks like.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH v2 00/41] Scalable Vector Extension (SVE) core support
  2017-04-03 10:51       ` Dave Martin
@ 2017-04-03 10:55         ` Ard Biesheuvel
  0 siblings, 0 replies; 9+ messages in thread
From: Ard Biesheuvel @ 2017-04-03 10:55 UTC (permalink / raw)
  To: Dave Martin
  Cc: Florian Weimer, Richard Sandiford, Joseph Myers, Marc Zyngier,
	gdb, Yao Qi, Alan Hayward, Will Deacon, linux-kernel,
	Szabolcs Nagy, linux-arm-kernel, Catalin Marinas, libc-alpha,
	Andrew Morton, Torvald Riegel, Christoffer Dall

(- list)

On 3 April 2017 at 11:51, Dave Martin <Dave.Martin@arm.com> wrote:
> On Mon, Apr 03, 2017 at 11:01:28AM +0100, Ard Biesheuvel wrote:
>> On 3 April 2017 at 10:45, Dave Martin <Dave.Martin@arm.com> wrote:
>> > On Fri, Mar 31, 2017 at 04:28:16PM +0100, Ard Biesheuvel wrote:
>
> [...]
>
>> >> Given the above, I think it is perfectly reasonable to conditionally
>> >> disallow kernel mode NEON in hardirq context. The crypto routines that
>> >> rely on it can easily be fixed up (I already wrote the patches for
>> >> that). This would only be necessary on SVE systems, and the reason for
>> >> doing so is that - given how preserving and restoring the NEON
>> >> register file blows away the upper SVE part of the registers - whoever
>> >> arrives at the SVE-aware preserve routine first should be allowed to
>> >> run to completion. This does require disabling softirqs during the
>> >> time the preserved NEON state is being manipulated but that does not
>> >> strike me as a huge price to pay. Note that both restrictions
>> >> (disallowing kernel mode NEON in hardirq context, and the need to
>> >> disable softirqs while manipulating the state) could be made runtime
>> >> dependent on whether we are actually running on an SVE system.
>> >
>> > Given that we already bail out of kernel_neon_begin() with a WARN() if
>> > the hardware lacks FPSIMD, I'm not sure it would be worse to do the same
>> > if SVE is present.
>> >
>> > However, we should probably abstract that check: currently, drivers
>> > seemt to be using a cocktail of Kconfig dependencies,
>> > MODULE_DEVICE_TABLE() and runtime hwcap checks in order to deduce
>> > whether kernel_neon_begin() is safe to use.
>> >
>>
>> Not quite. The Kconfig dependencies are about
>> CONFIG_KERNEL_MODE_NEON=y being set, without which it is pretty
>> pointless to build the modules that depend on it.
>>
>> The hwcap dependencies are mostly about the actual instructions
>> themselves, i.e., AES, PMULL, SHA1/2 etc
>
> Sure, there are multiple reasons why these checks are being done:
> checking that kernel_neon_begin() is available is part of the reason,
> but not the whole reason for some of the checks.
>
> Doing other checks for specific ISA extensions that the driver
> wants to use remains perfectly appropriate.
>
>> > Would you be happy with a single API for checking whether
>> > kernel_neon_begin() works?  Maintaining this check in every driver
>> > doesn't feel very scalable.
>> >
>>
>> Yes, that makes sense, but it is unrelated to hwcap. I'd like to see a
>> kernel_neon_allowed() that would return '!IS_ENABLED(CONFIG_SVE) ||
>> !(elf_hwcap & HWCAP_SVE) || !in_irq()' at some point, although I
>> understand you want to remove the last part for now.
>> To short-circuit some decisions in the driver when
>> kernel_neon_allowed() is always true (as it would be currently),
>> something like kernel_neon_need_fallback() comes to mind.
>>
>> >
>> > This would allow SVE to disable KERNEL_MODE_NEON without having to make
>> > them conflict in Kconfig.  This wouldn't be our end goal, but it allows
>> > the dependency to be decoupled while we figure out a better solution.
>> >
>>
>> I still don't think there is a need to disable kernel mode NEON
>> altogether. But we can defer that decision until later, but prepare
>> the existing code to deal with that in the mean time.
>
> Agreed -- given that SVE hardware won't exist in the wild for some time,
> I prefer to at least have SVE buildable in defconfig without impacting
> existing systems.
>
> This means that integrating KMN with SVE could be worked on later, and
> also means that the improvements to KMN already discussed can be worked
> on independently without necessarily considering SVE.
>
>> As I mentioned, I already looked into this a while ago, so I can
>> quickly respin the changes to the crypto code to take this into
>> account.
>
> It would be interesting to see what that looks like.
>

This stuff is quite stale and out of date, and overloading
may_use_simd() is inappropriate since async crypto algorithms may
still prefer to defer their work to process context even if crypto is
allowed in hardirq context. But this should give you an idea

https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=arm64-sve-crypto

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-04-03 10:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-22 14:50 [RFC PATCH v2 00/41] Scalable Vector Extension (SVE) core support Dave Martin
2017-03-22 14:50 ` [RFC PATCH v2 29/41] prctl: Add skeleton for PR_SVE_{SET,GET}_VL controls Dave Martin
2017-03-22 14:51 ` [RFC PATCH v2 33/41] arm64/sve: Wire up vector length control prctl() calls Dave Martin
2017-03-22 14:51 ` [RFC PATCH v2 41/41] arm64/sve: Documentation: Add overview of the SVE userspace ABI Dave Martin
2017-03-31 15:28 ` [RFC PATCH v2 00/41] Scalable Vector Extension (SVE) core support Ard Biesheuvel
2017-04-03  9:45   ` Dave Martin
2017-04-03 10:01     ` Ard Biesheuvel
2017-04-03 10:51       ` Dave Martin
2017-04-03 10:55         ` Ard Biesheuvel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).