All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v3 0/4] Simplify kernel-mode NEON
@ 2017-05-25 18:24 Dave Martin
  2017-05-25 18:24 ` [RFC PATCH v3 1/4] arm64: neon: Add missing header guard in <asm/neon.h> Dave Martin
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Dave Martin @ 2017-05-25 18:24 UTC (permalink / raw)
  To: linux-arm-kernel

This series aims to simplify kernel-mode NEON.


Changes since RFC v2:

 * Re-enable softirqs between kernel_neon_begin() and kernel_mode_end()

   (This was the original intention, but I was overtaken by irrational
   paranoia when drafting the previous version of the patch.)

 * softirq disable/enable is removed from fpsimd_thread_switch() (which
   runs with hardirqs disabled in any case, thus local_bh_enable() is
   treated as an error by the core code).  The comment block is updated
   to clarify this situation.

   This also means that no cost is added to the context switch critical
   path after all.

 * Move the this_cpu_write(fpsimd_last_state, NULL) back outside the
   conditional in kernel_neon_begin() (i.e., where it was originally
   before this series).  RFC v2 accidentally moved this invalidation
   inside the conditional, which leads to state corruption if switching
   back to a user task previously running on the same CPU, after
   kernel_mode_neon() is used by a kernel thread.

 * Minor formatting and spurious #include tidyups.

Testing:

 * For now, I've hacked linux/kernel/time/timer.c:run_timer_softirq() to
   call kernel_mode_neon_begin() and erase the FPSIMD registers, and
   jacked CONFIG_HZ to 1000.  I also added a test syscall sys_blatt_neon
   that userspace can hammer to trigger the nested kernel_neon_begin()
   scenario.

   See [2].

   This is not hugely realistic, but was sufficient to diagnose the
   corruption problem described above, when running on a model.


Original blurb:

The main motivation for these changes is that supporting kernel-mode
NEON alongside SVE is tricky with the current framework: the current
partial save/restore mechanisms would need additional porting to
save/restore the extended SVE vector bits, and this renders the cost
saving of partial save/restore rather doubtful -- even if not all vector
registers are saved, the save/restore cost will still grow with
increasing vector length.  We could get the mechanics of this to work in
principle, but it doesn't feel like the right thing to do.

If we withdraw kernel-mode NEON support for hardirq context, accept some
extra softirq latency and disable nesting, then we can simplify the code
by always saving the task context directly into task_struct and
deferring all restore to ret_to_user.  Previous discussions with Ard
Biesheuvel suggest that this is a feasible approach and reasonably
aligned with other architectures.

The main impact on client code is that callers must check that
kernel-mode NEON is usable in the current context and fall back to a
non-NEON when necessary.  Ard has already done some work on this. [1]

The interesting changes are all in patch 2: the first patch just adds a
header inclusion guard that I noted was missing.

[1] git://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git kernel-mode-neon

Dave Martin (4):
  arm64: neon: Add missing header guard in <asm/neon.h>
  arm64: fpsimd: Consistently use __this_cpu_ ops where appropriate
  arm64: neon: Remove support for nested or hardirq kernel-mode NEON
  arm64: neon: Add backwards compatibility kernel_neon_begin_partial()

 arch/arm64/include/asm/fpsimd.h       |  14 ----
 arch/arm64/include/asm/fpsimdmacros.h |  56 ----------------
 arch/arm64/include/asm/neon.h         |  12 +++-
 arch/arm64/include/asm/simd.h         |  56 ++++++++++++++++
 arch/arm64/kernel/entry-fpsimd.S      |  24 -------
 arch/arm64/kernel/fpsimd.c            | 119 +++++++++++++++++++++++-----------
 6 files changed, 146 insertions(+), 135 deletions(-)
 create mode 100644 arch/arm64/include/asm/simd.h

-- 

[2] Test hacks, not for merging

>From 584f1187269dcee04cb43fc6cb443d8e56454e2b Mon Sep 17 00:00:00 2001
From: Dave Martin <Dave.Martin@arm.com>
Date: Thu, 25 May 2017 19:08:42 +0100
Subject: [PATCH] test hacks

---
 arch/arm64/kernel/fpsimd.c        | 63 +++++++++++++++++++++++++++++++++-
 arch/arm64/kernel/sys.c           | 58 +++++++++++++++++++++++++++++++
 blatt-neon.c                      | 31 +++++++++++++++++
 include/uapi/asm-generic/unistd.h |  5 ++-
 kernel/time/timer.c               | 72 +++++++++++++++++++++++++++++++++++++++
 5 files changed, 227 insertions(+), 2 deletions(-)
 create mode 100644 blatt-neon.c

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 14329d6..6156e56 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -22,6 +22,7 @@
 #include <linux/cpu_pm.h>
 #include <linux/kernel.h>
 #include <linux/init.h>
+#include <linux/irqflags.h>
 #include <linux/percpu.h>
 #include <linux/sched/signal.h>
 #include <linux/signal.h>
@@ -37,6 +38,8 @@
 #define FPEXC_IXF	(1 << 4)
 #define FPEXC_IDF	(1 << 7)
 
+static void print_suspicious_vregs(char const *str, struct fpsimd_state const *st);
+
 /*
  * In order to reduce the number of times the FPSIMD state is needlessly saved
  * and restored, we need to keep track of two things:
@@ -138,13 +141,18 @@ void fpsimd_thread_switch(struct task_struct *next)
 {
 	if (!system_supports_fpsimd())
 		return;
+
+	BUG_ON(!irqs_disabled());
+
 	/*
 	 * Save the current FPSIMD state to memory, but only if whatever is in
 	 * the registers is in fact the most recent userland FPSIMD state of
 	 * 'current'.
 	 */
-	if (current->mm && !test_thread_flag(TIF_FOREIGN_FPSTATE))
+	if (current->mm && !test_thread_flag(TIF_FOREIGN_FPSTATE)) {
 		fpsimd_save_state(&current->thread.fpsimd_state);
+		print_suspicious_vregs("sched-out", &current->thread.fpsimd_state);
+	}
 
 	if (next->mm) {
 		/*
@@ -197,6 +205,55 @@ void fpsimd_preserve_current_state(void)
 	local_bh_enable();
 }
 
+static bool suspicious_vreg(u8 const *reg)
+{
+	u8 b = *reg;
+	unsigned int i;
+
+	if (b < 0xc2 || b > 0xc7)
+		return false;
+
+	for (i = 1; i < 16; ++i)
+		if (b != reg[i])
+			return false;
+
+	return true;
+}
+
+static int print_vreg_if_suspicious(u8 const *reg, unsigned int num)
+{
+	char s[16 * 2 + 1], *p = s;
+	int n;
+	unsigned int i;
+
+	if (!suspicious_vreg(reg))
+		return 0;
+
+	for (i = 0; i < 16; ++i) {
+		n = snprintf(p, s + sizeof(s) - p, "%.2hhx", reg[i]);
+		if (n < 0)
+			break;
+
+		p += n;
+	}
+
+	*p = '\0';
+	return 1;
+}
+
+static void print_suspicious_vregs(char const *str, struct fpsimd_state const *st)
+{
+	int bad = 0;
+	unsigned int i;
+
+	for (i = 0; i < ARRAY_SIZE(st->vregs); ++i)
+		bad |= print_vreg_if_suspicious((u8 const *)&st->vregs[i], i);
+
+	if (bad)
+		pr_info("%s[%d]: %s: suspicious vreg(s) detected\n",
+			current->comm, task_pid_nr(current), str);
+}
+
 /*
  * Load the userland FPSIMD state of 'current' from memory, but only if the
  * FPSIMD state already held in the registers is /not/ the most recent FPSIMD
@@ -215,6 +272,8 @@ void fpsimd_restore_current_state(void)
 		fpsimd_load_state(st);
 		__this_cpu_write(fpsimd_last_state, st);
 		st->cpu = smp_processor_id();
+
+		print_suspicious_vregs("fpsimd_restore_current_state", st);
 	}
 
 	local_bh_enable();
@@ -290,6 +349,8 @@ void kernel_neon_begin(void)
 	/* Invalidate any task state remaining in the fpsimd regs: */
 	__this_cpu_write(fpsimd_last_state, NULL);
 
+	print_suspicious_vregs("kernel_neon_begin", &current->thread.fpsimd_state);
+
 	preempt_disable();
 
 	local_bh_enable();
diff --git a/arch/arm64/kernel/sys.c b/arch/arm64/kernel/sys.c
index 26fe8ea..ab73b51 100644
--- a/arch/arm64/kernel/sys.c
+++ b/arch/arm64/kernel/sys.c
@@ -26,6 +26,8 @@
 #include <linux/slab.h>
 #include <linux/syscalls.h>
 #include <asm/cpufeature.h>
+#include <asm/neon.h>
+#include <asm/simd.h>
 
 asmlinkage long sys_mmap(unsigned long addr, unsigned long len,
 			 unsigned long prot, unsigned long flags,
@@ -45,6 +47,62 @@ SYSCALL_DEFINE1(arm64_personality, unsigned int, personality)
 	return sys_personality(personality);
 }
 
+SYSCALL_DEFINE0(blatt_neon)
+{
+	int ret = 0;
+	unsigned int i;
+
+	BUG_ON(!may_use_simd());
+
+	kernel_neon_begin();
+
+	for (i = 0; i < 1000; ++i) {
+		asm volatile (
+			"movi	v0.16b, #0\n\t"
+			"movi	v1.16b, #0\n\t"
+			"movi	v2.16b, #0\n\t"
+			"movi	v3.16b, #0\n\t"
+			"movi	v4.16b, #0\n\t"
+			"movi	v5.16b, #0\n\t"
+			"movi	v6.16b, #0\n\t"
+			"movi	v7.16b, #0\n\t"
+			"movi	v8.16b, #0\n\t"
+			"movi	v9.16b, #0\n\t"
+			"movi	v10.16b, #0\n\t"
+			"movi	v11.16b, #0\n\t"
+			"movi	v12.16b, #0\n\t"
+			"movi	v13.16b, #0\n\t"
+			"movi	v14.16b, #0\n\t"
+			"movi	v15.16b, #0\n\t"
+			"movi	v16.16b, #0\n\t"
+			"movi	v17.16b, #0\n\t"
+			"movi	v18.16b, #0\n\t"
+			"movi	v19.16b, #0\n\t"
+			"movi	v20.16b, #0\n\t"
+			"movi	v21.16b, #0\n\t"
+			"movi	v22.16b, #0\n\t"
+			"movi	v23.16b, #0\n\t"
+			"movi	v24.16b, #0\n\t"
+			"movi	v25.16b, #0\n\t"
+			"movi	v26.16b, #0\n\t"
+			"movi	v27.16b, #0\n\t"
+			"movi	v28.16b, #0\n\t"
+			"movi	v29.16b, #0\n\t"
+			"movi	v30.16b, #0\n\t"
+			"movi	v31.16b, #0"
+		);
+
+		if (test_thread_flag(TIF_SIGPENDING)) {
+			ret = -EINTR;
+			break;
+		}
+	}
+
+	kernel_neon_end();
+
+	return ret;
+}
+
 /*
  * Wrappers to pass the pt_regs argument.
  */
diff --git a/blatt-neon.c b/blatt-neon.c
new file mode 100644
index 0000000..fafb144
--- /dev/null
+++ b/blatt-neon.c
@@ -0,0 +1,31 @@
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <asm/unistd.h>
+
+static long blatt_neon(void)
+{
+	register long ret asm("x0");
+	register unsigned int scno asm("x8") = __NR_blatt_neon;
+
+	asm volatile (
+		"svc	#0"
+		: "=r" (ret)
+		: "r" (scno)
+	);
+
+	return ret;
+}
+
+int main(void)
+{
+	long ret;
+
+	while (1) {
+		ret = blatt_neon();
+		if (ret) {
+			fprintf(stderr, "%s\n", strerror(ret));
+			return EXIT_FAILURE;
+		}
+	}
+}
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 061185a..bf2d3e9 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -732,8 +732,11 @@ __SYSCALL(__NR_pkey_free,     sys_pkey_free)
 #define __NR_statx 291
 __SYSCALL(__NR_statx,     sys_statx)
 
+#define __NR_blatt_neon 292
+__SYSCALL(__NR_blatt_neon, sys_blatt_neon)
+
 #undef __NR_syscalls
-#define __NR_syscalls 292
+#define __NR_syscalls 293
 
 /*
  * All syscalls below here should go away really,
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 152a706..dc79771 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -34,6 +34,7 @@
 #include <linux/posix-timers.h>
 #include <linux/cpu.h>
 #include <linux/syscalls.h>
+#include <linux/debugfs.h>
 #include <linux/delay.h>
 #include <linux/tick.h>
 #include <linux/kallsyms.h>
@@ -50,6 +51,8 @@
 #include <asm/div64.h>
 #include <asm/timex.h>
 #include <asm/io.h>
+#include <asm/neon.h>
+#include <asm/simd.h>
 
 #include "tick-internal.h"
 
@@ -1604,6 +1607,21 @@ static inline void __run_timers(struct timer_base *base)
 	spin_unlock_irq(&base->lock);
 }
 
+static u64 neon_busy_count, neon_idle_count;
+
+static int __init timer_softirq_debugfs_init(void)
+{
+	struct dentry *dir = debugfs_create_dir("neon-softirq", NULL);
+
+	if (!dir ||
+	    !debugfs_create_u64("busy", 0444, dir, &neon_busy_count) ||
+	    !debugfs_create_u64("idle", 0444, dir, &neon_idle_count))
+		WARN_ON(1);
+
+	return 0;
+}
+early_initcall(timer_softirq_debugfs_init);
+
 /*
  * This function runs timers and the timer-tq in bottom half context.
  */
@@ -1611,6 +1629,60 @@ static __latent_entropy void run_timer_softirq(struct softirq_action *h)
 {
 	struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);
 
+	BUG_ON(preemptible());
+	BUG_ON(!softirq_count());
+
+	if (!may_use_simd())
+		neon_busy_count++;
+	else /* may_use_simd() */ {
+		static u8 n = 0xc2;
+
+		neon_idle_count++;
+
+		kernel_neon_begin();
+		asm volatile (
+			"dup	v0.16b, %w0\n\t"
+			"dup	v1.16b, %w0\n\t"
+			"dup	v2.16b, %w0\n\t"
+			"dup	v3.16b, %w0\n\t"
+			"dup	v4.16b, %w0\n\t"
+			"dup	v5.16b, %w0\n\t"
+			"dup	v6.16b, %w0\n\t"
+			"dup	v7.16b, %w0\n\t"
+			"dup	v8.16b, %w0\n\t"
+			"dup	v9.16b, %w0\n\t"
+			"dup	v10.16b, %w0\n\t"
+			"dup	v11.16b, %w0\n\t"
+			"dup	v12.16b, %w0\n\t"
+			"dup	v13.16b, %w0\n\t"
+			"dup	v14.16b, %w0\n\t"
+			"dup	v15.16b, %w0\n\t"
+			"dup	v16.16b, %w0\n\t"
+			"dup	v17.16b, %w0\n\t"
+			"dup	v18.16b, %w0\n\t"
+			"dup	v19.16b, %w0\n\t"
+			"dup	v20.16b, %w0\n\t"
+			"dup	v21.16b, %w0\n\t"
+			"dup	v22.16b, %w0\n\t"
+			"dup	v23.16b, %w0\n\t"
+			"dup	v24.16b, %w0\n\t"
+			"dup	v25.16b, %w0\n\t"
+			"dup	v26.16b, %w0\n\t"
+			"dup	v27.16b, %w0\n\t"
+			"dup	v28.16b, %w0\n\t"
+			"dup	v29.16b, %w0\n\t"
+			"dup	v30.16b, %w0\n\t"
+			"dup	v31.16b, %w0"
+			:: "r" (n)
+		);
+
+		++n;
+		if (n == 0xc8)
+			n = 0xc2;
+
+		kernel_neon_end();
+	} /* may_use_simd() */
+
 	__run_timers(base);
 	if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && base->nohz_active)
 		__run_timers(this_cpu_ptr(&timer_bases[BASE_DEF]));
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH v3 1/4] arm64: neon: Add missing header guard in <asm/neon.h>
  2017-05-25 18:24 [RFC PATCH v3 0/4] Simplify kernel-mode NEON Dave Martin
@ 2017-05-25 18:24 ` Dave Martin
  2017-05-31 11:41   ` Ard Biesheuvel
  2017-05-25 18:24 ` [RFC PATCH v3 2/4] arm64: fpsimd: Consistently use __this_cpu_ ops where appropriate Dave Martin
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Dave Martin @ 2017-05-25 18:24 UTC (permalink / raw)
  To: linux-arm-kernel

asm/neon.h doesn't have a header inclusion guard, but it should
have one for consistency with other headers.

This patch adds a suitable guard.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
---
 arch/arm64/include/asm/neon.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/include/asm/neon.h b/arch/arm64/include/asm/neon.h
index ad4cdc9..5368bd0 100644
--- a/arch/arm64/include/asm/neon.h
+++ b/arch/arm64/include/asm/neon.h
@@ -8,6 +8,9 @@
  * published by the Free Software Foundation.
  */
 
+#ifndef __ASM_NEON_H
+#define __ASM_NEON_H
+
 #include <linux/types.h>
 #include <asm/fpsimd.h>
 
@@ -17,3 +20,5 @@
 
 void kernel_neon_begin_partial(u32 num_regs);
 void kernel_neon_end(void);
+
+#endif /* ! __ASM_NEON_H */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH v3 2/4] arm64: fpsimd: Consistently use __this_cpu_ ops where appropriate
  2017-05-25 18:24 [RFC PATCH v3 0/4] Simplify kernel-mode NEON Dave Martin
  2017-05-25 18:24 ` [RFC PATCH v3 1/4] arm64: neon: Add missing header guard in <asm/neon.h> Dave Martin
@ 2017-05-25 18:24 ` Dave Martin
  2017-05-31 11:43   ` Ard Biesheuvel
  2017-05-25 18:25 ` [RFC PATCH v3 3/4] arm64: neon: Remove support for nested or hardirq kernel-mode NEON Dave Martin
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Dave Martin @ 2017-05-25 18:24 UTC (permalink / raw)
  To: linux-arm-kernel

__this_cpu_ ops are not used consistently with regard to this_cpu_
ops in a couple of places in fpsimd.c.

Since preemption is explicitly disabled in
fpsimd_restore_current_state() and fpsimd_update_current_state(),
this patch converts this_cpu_ ops in those functions to __this_cpu_
ops.  This doesn't save cost on arm64, but benefits from additional
assertions in the core code.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
---
 arch/arm64/kernel/fpsimd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 06da8ea..d7e5f8a 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -194,7 +194,7 @@ void fpsimd_restore_current_state(void)
 		struct fpsimd_state *st = &current->thread.fpsimd_state;
 
 		fpsimd_load_state(st);
-		this_cpu_write(fpsimd_last_state, st);
+		__this_cpu_write(fpsimd_last_state, st);
 		st->cpu = smp_processor_id();
 	}
 	preempt_enable();
@@ -214,7 +214,7 @@ void fpsimd_update_current_state(struct fpsimd_state *state)
 	if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
 		struct fpsimd_state *st = &current->thread.fpsimd_state;
 
-		this_cpu_write(fpsimd_last_state, st);
+		__this_cpu_write(fpsimd_last_state, st);
 		st->cpu = smp_processor_id();
 	}
 	preempt_enable();
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH v3 3/4] arm64: neon: Remove support for nested or hardirq kernel-mode NEON
  2017-05-25 18:24 [RFC PATCH v3 0/4] Simplify kernel-mode NEON Dave Martin
  2017-05-25 18:24 ` [RFC PATCH v3 1/4] arm64: neon: Add missing header guard in <asm/neon.h> Dave Martin
  2017-05-25 18:24 ` [RFC PATCH v3 2/4] arm64: fpsimd: Consistently use __this_cpu_ ops where appropriate Dave Martin
@ 2017-05-25 18:25 ` Dave Martin
  2017-05-31 11:51   ` Ard Biesheuvel
  2017-05-25 18:25 ` [RFC PATCH v3 4/4] arm64: neon: Add backwards compatibility kernel_neon_begin_partial() Dave Martin
  2017-05-30 18:02 ` [RFC PATCH v3 0/4] Simplify kernel-mode NEON Dave Martin
  4 siblings, 1 reply; 13+ messages in thread
From: Dave Martin @ 2017-05-25 18:25 UTC (permalink / raw)
  To: linux-arm-kernel

Support for kernel-mode NEON to be nested and/or used in hardirq
context adds significant complexity, and the benefits may be
marginal.  In practice, kernel-mode NEON is not used in hardirq
context, and is rarely used in softirq context (by certain mac80211
drivers).

This patch implements an arm64 may_use_simd() function to allow
clients to check whether kernel-mode NEON is usable in the current
context, and simplifies kernel_neon_{begin,end}() to handle only
saving of the task FPSIMD state (if any).  Without nesting, there
is no other state to save.

The partial fpsimd save/restore functions become redundant as a
result of these changes, so they are removed too.

The save/restore model is changed to operate directly on
task_struct without additional percpu storage.  This simplifies the
code and saves a bit of memory, but means that softirqs must now be
disabled when manipulating the task fpsimd state from task context:
correspondingly, preempt_{en,dis}sable() calls are upgraded to
local_bh_{en,dis}able() as appropriate.  fpsimd_thread_switch()
already runs with hardirqs disabled and so is already protected
from softirqs.

These changes should make it easier to support kernel-mode NEON in
the presence of the Scalable Vector extension in the future.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
---
 arch/arm64/include/asm/fpsimd.h       |  14 -----
 arch/arm64/include/asm/fpsimdmacros.h |  56 -----------------
 arch/arm64/include/asm/neon.h         |   4 +-
 arch/arm64/include/asm/simd.h         |  56 +++++++++++++++++
 arch/arm64/kernel/entry-fpsimd.S      |  24 -------
 arch/arm64/kernel/fpsimd.c            | 115 +++++++++++++++++++++++-----------
 6 files changed, 136 insertions(+), 133 deletions(-)
 create mode 100644 arch/arm64/include/asm/simd.h

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 50f559f..ff2f6cd 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -41,16 +41,6 @@ struct fpsimd_state {
 	unsigned int cpu;
 };
 
-/*
- * Struct for stacking the bottom 'n' FP/SIMD registers.
- */
-struct fpsimd_partial_state {
-	u32		fpsr;
-	u32		fpcr;
-	u32		num_regs;
-	__uint128_t	vregs[32];
-};
-
 
 #if defined(__KERNEL__) && defined(CONFIG_COMPAT)
 /* Masks for extracting the FPSR and FPCR from the FPSCR */
@@ -77,10 +67,6 @@ extern void fpsimd_update_current_state(struct fpsimd_state *state);
 
 extern void fpsimd_flush_task_state(struct task_struct *target);
 
-extern void fpsimd_save_partial_state(struct fpsimd_partial_state *state,
-				      u32 num_regs);
-extern void fpsimd_load_partial_state(struct fpsimd_partial_state *state);
-
 #endif
 
 #endif
diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
index a2daf12..0f5fdd3 100644
--- a/arch/arm64/include/asm/fpsimdmacros.h
+++ b/arch/arm64/include/asm/fpsimdmacros.h
@@ -75,59 +75,3 @@
 	ldr	w\tmpnr, [\state, #16 * 2 + 4]
 	fpsimd_restore_fpcr x\tmpnr, \state
 .endm
-
-.macro fpsimd_save_partial state, numnr, tmpnr1, tmpnr2
-	mrs	x\tmpnr1, fpsr
-	str	w\numnr, [\state, #8]
-	mrs	x\tmpnr2, fpcr
-	stp	w\tmpnr1, w\tmpnr2, [\state]
-	adr	x\tmpnr1, 0f
-	add	\state, \state, x\numnr, lsl #4
-	sub	x\tmpnr1, x\tmpnr1, x\numnr, lsl #1
-	br	x\tmpnr1
-	stp	q30, q31, [\state, #-16 * 30 - 16]
-	stp	q28, q29, [\state, #-16 * 28 - 16]
-	stp	q26, q27, [\state, #-16 * 26 - 16]
-	stp	q24, q25, [\state, #-16 * 24 - 16]
-	stp	q22, q23, [\state, #-16 * 22 - 16]
-	stp	q20, q21, [\state, #-16 * 20 - 16]
-	stp	q18, q19, [\state, #-16 * 18 - 16]
-	stp	q16, q17, [\state, #-16 * 16 - 16]
-	stp	q14, q15, [\state, #-16 * 14 - 16]
-	stp	q12, q13, [\state, #-16 * 12 - 16]
-	stp	q10, q11, [\state, #-16 * 10 - 16]
-	stp	q8, q9, [\state, #-16 * 8 - 16]
-	stp	q6, q7, [\state, #-16 * 6 - 16]
-	stp	q4, q5, [\state, #-16 * 4 - 16]
-	stp	q2, q3, [\state, #-16 * 2 - 16]
-	stp	q0, q1, [\state, #-16 * 0 - 16]
-0:
-.endm
-
-.macro fpsimd_restore_partial state, tmpnr1, tmpnr2
-	ldp	w\tmpnr1, w\tmpnr2, [\state]
-	msr	fpsr, x\tmpnr1
-	fpsimd_restore_fpcr x\tmpnr2, x\tmpnr1
-	adr	x\tmpnr1, 0f
-	ldr	w\tmpnr2, [\state, #8]
-	add	\state, \state, x\tmpnr2, lsl #4
-	sub	x\tmpnr1, x\tmpnr1, x\tmpnr2, lsl #1
-	br	x\tmpnr1
-	ldp	q30, q31, [\state, #-16 * 30 - 16]
-	ldp	q28, q29, [\state, #-16 * 28 - 16]
-	ldp	q26, q27, [\state, #-16 * 26 - 16]
-	ldp	q24, q25, [\state, #-16 * 24 - 16]
-	ldp	q22, q23, [\state, #-16 * 22 - 16]
-	ldp	q20, q21, [\state, #-16 * 20 - 16]
-	ldp	q18, q19, [\state, #-16 * 18 - 16]
-	ldp	q16, q17, [\state, #-16 * 16 - 16]
-	ldp	q14, q15, [\state, #-16 * 14 - 16]
-	ldp	q12, q13, [\state, #-16 * 12 - 16]
-	ldp	q10, q11, [\state, #-16 * 10 - 16]
-	ldp	q8, q9, [\state, #-16 * 8 - 16]
-	ldp	q6, q7, [\state, #-16 * 6 - 16]
-	ldp	q4, q5, [\state, #-16 * 4 - 16]
-	ldp	q2, q3, [\state, #-16 * 2 - 16]
-	ldp	q0, q1, [\state, #-16 * 0 - 16]
-0:
-.endm
diff --git a/arch/arm64/include/asm/neon.h b/arch/arm64/include/asm/neon.h
index 5368bd0..fb9d137 100644
--- a/arch/arm64/include/asm/neon.h
+++ b/arch/arm64/include/asm/neon.h
@@ -16,9 +16,7 @@
 
 #define cpu_has_neon()		system_supports_fpsimd()
 
-#define kernel_neon_begin()	kernel_neon_begin_partial(32)
-
-void kernel_neon_begin_partial(u32 num_regs);
+void kernel_neon_begin(void);
 void kernel_neon_end(void);
 
 #endif /* ! __ASM_NEON_H */
diff --git a/arch/arm64/include/asm/simd.h b/arch/arm64/include/asm/simd.h
new file mode 100644
index 0000000..bbfc68f
--- /dev/null
+++ b/arch/arm64/include/asm/simd.h
@@ -0,0 +1,56 @@
+/*
+ * Copyright (C) 2017  ARM Limited
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_SIMD_H
+#define __ASM_SIMD_H
+
+#include <linux/percpu.h>
+#include <linux/preempt.h>
+#include <linux/types.h>
+
+#ifdef CONFIG_KERNEL_MODE_NEON
+
+DECLARE_PER_CPU(bool, kernel_neon_busy);
+
+/*
+ * Returns true if and only if it is safe to call kernel_neon_begin().
+ * Callers must not assume that the result remains true beyond the next
+ * preempt_enable() or return from softirq context.
+ */
+static inline bool may_use_simd(void)
+{
+	/*
+	 * The raw_cpu_read() is racy if called with preemption enabled.
+	 * This is not a bug: kernel_neon_busy is only set when
+	 * preemption is disabled, so we cannot migrate to another CPU
+	 * while it is set, nor can we migrate to a CPU where it is set.
+	 * So, if we find it clear on some CPU then we're guaranteed to
+	 * find it clear on any CPU we could migrate to.
+	 *
+	 * If we are in between kernel_neon_begin()...kernel_neon_end(),
+	 * the flag will be set, but preemption is also disabled, so we
+	 * can't migrate to another CPU and spuriously see it become
+	 * false.
+	 */
+	return !in_irq() && !in_nmi() && !raw_cpu_read(kernel_neon_busy);
+}
+
+#else /* ! CONFIG_KERNEL_MODE_NEON */
+
+static inline bool may_use_simd(void) { return false; }
+
+#endif /* ! CONFIG_KERNEL_MODE_NEON */
+
+#endif /* ! __ASM_SIMD_H */
diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
index c44a82f..6a27cd6 100644
--- a/arch/arm64/kernel/entry-fpsimd.S
+++ b/arch/arm64/kernel/entry-fpsimd.S
@@ -41,27 +41,3 @@ ENTRY(fpsimd_load_state)
 	fpsimd_restore x0, 8
 	ret
 ENDPROC(fpsimd_load_state)
-
-#ifdef CONFIG_KERNEL_MODE_NEON
-
-/*
- * Save the bottom n FP registers.
- *
- * x0 - pointer to struct fpsimd_partial_state
- */
-ENTRY(fpsimd_save_partial_state)
-	fpsimd_save_partial x0, 1, 8, 9
-	ret
-ENDPROC(fpsimd_save_partial_state)
-
-/*
- * Load the bottom n FP registers.
- *
- * x0 - pointer to struct fpsimd_partial_state
- */
-ENTRY(fpsimd_load_partial_state)
-	fpsimd_restore_partial x0, 8, 9
-	ret
-ENDPROC(fpsimd_load_partial_state)
-
-#endif
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index d7e5f8a..14329d6 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -17,16 +17,18 @@
  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <linux/bottom_half.h>
 #include <linux/cpu.h>
 #include <linux/cpu_pm.h>
 #include <linux/kernel.h>
 #include <linux/init.h>
+#include <linux/percpu.h>
 #include <linux/sched/signal.h>
 #include <linux/signal.h>
-#include <linux/hardirq.h>
 
 #include <asm/fpsimd.h>
 #include <asm/cputype.h>
+#include <asm/simd.h>
 
 #define FPEXC_IOF	(1 << 0)
 #define FPEXC_DZF	(1 << 1)
@@ -62,6 +64,13 @@
  * CPU currently contain the most recent userland FPSIMD state of the current
  * task.
  *
+ * In order to allow softirq handlers to use FPSIMD, kernel_neon_begin() may
+ * save the task's FPSIMD context back to task_struct from softirq context.
+ * To prevent this from racing with the manipulation of the task's FPSIMD state
+ * from task context and thereby corrupting the state, it is necessary to
+ * protect any manipulation of a task's fpsimd_state or TIF_FOREIGN_FPSTATE
+ * flag with local_bh_disable() unless softirqs are already masked.
+ *
  * For a certain task, the sequence may look something like this:
  * - the task gets scheduled in; if both the task's fpsimd_state.cpu field
  *   contains the id of the current CPU, and the CPU's fpsimd_last_state per-cpu
@@ -161,9 +170,14 @@ void fpsimd_flush_thread(void)
 {
 	if (!system_supports_fpsimd())
 		return;
+
+	local_bh_disable();
+
 	memset(&current->thread.fpsimd_state, 0, sizeof(struct fpsimd_state));
 	fpsimd_flush_task_state(current);
 	set_thread_flag(TIF_FOREIGN_FPSTATE);
+
+	local_bh_enable();
 }
 
 /*
@@ -174,10 +188,13 @@ void fpsimd_preserve_current_state(void)
 {
 	if (!system_supports_fpsimd())
 		return;
-	preempt_disable();
+
+	local_bh_disable();
+
 	if (!test_thread_flag(TIF_FOREIGN_FPSTATE))
 		fpsimd_save_state(&current->thread.fpsimd_state);
-	preempt_enable();
+
+	local_bh_enable();
 }
 
 /*
@@ -189,7 +206,9 @@ void fpsimd_restore_current_state(void)
 {
 	if (!system_supports_fpsimd())
 		return;
-	preempt_disable();
+
+	local_bh_disable();
+
 	if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
 		struct fpsimd_state *st = &current->thread.fpsimd_state;
 
@@ -197,7 +216,8 @@ void fpsimd_restore_current_state(void)
 		__this_cpu_write(fpsimd_last_state, st);
 		st->cpu = smp_processor_id();
 	}
-	preempt_enable();
+
+	local_bh_enable();
 }
 
 /*
@@ -209,7 +229,9 @@ void fpsimd_update_current_state(struct fpsimd_state *state)
 {
 	if (!system_supports_fpsimd())
 		return;
-	preempt_disable();
+
+	local_bh_disable();
+
 	fpsimd_load_state(state);
 	if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
 		struct fpsimd_state *st = &current->thread.fpsimd_state;
@@ -217,7 +239,8 @@ void fpsimd_update_current_state(struct fpsimd_state *state)
 		__this_cpu_write(fpsimd_last_state, st);
 		st->cpu = smp_processor_id();
 	}
-	preempt_enable();
+
+	local_bh_enable();
 }
 
 /*
@@ -230,49 +253,69 @@ void fpsimd_flush_task_state(struct task_struct *t)
 
 #ifdef CONFIG_KERNEL_MODE_NEON
 
-static DEFINE_PER_CPU(struct fpsimd_partial_state, hardirq_fpsimdstate);
-static DEFINE_PER_CPU(struct fpsimd_partial_state, softirq_fpsimdstate);
+DEFINE_PER_CPU(bool, kernel_neon_busy);
 
 /*
  * Kernel-side NEON support functions
  */
-void kernel_neon_begin_partial(u32 num_regs)
+
+/*
+ * kernel_neon_begin(): obtain the CPU FPSIMD registers for use by the calling
+ * context
+ *
+ * Must not be called unless may_use_simd() returns true.
+ * Task context in the FPSIMD registers is saved back to memory as necessary.
+ *
+ * A matching call to kernel_neon_end() must be made before returning from the
+ * calling context.
+ *
+ * The caller may freely use the FPSIMD registers until kernel_neon_end() is
+ * called.
+ */
+void kernel_neon_begin(void)
 {
 	if (WARN_ON(!system_supports_fpsimd()))
 		return;
-	if (in_interrupt()) {
-		struct fpsimd_partial_state *s = this_cpu_ptr(
-			in_irq() ? &hardirq_fpsimdstate : &softirq_fpsimdstate);
 
-		BUG_ON(num_regs > 32);
-		fpsimd_save_partial_state(s, roundup(num_regs, 2));
-	} else {
-		/*
-		 * Save the userland FPSIMD state if we have one and if we
-		 * haven't done so already. Clear fpsimd_last_state to indicate
-		 * that there is no longer userland FPSIMD state in the
-		 * registers.
-		 */
-		preempt_disable();
-		if (current->mm &&
-		    !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
-			fpsimd_save_state(&current->thread.fpsimd_state);
-		this_cpu_write(fpsimd_last_state, NULL);
-	}
+	BUG_ON(!may_use_simd());
+
+	local_bh_disable();
+
+	__this_cpu_write(kernel_neon_busy, true);
+
+	/* Save unsaved task fpsimd state, if any: */
+	if (current->mm && !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
+		fpsimd_save_state(&current->thread.fpsimd_state);
+
+	/* Invalidate any task state remaining in the fpsimd regs: */
+	__this_cpu_write(fpsimd_last_state, NULL);
+
+	preempt_disable();
+
+	local_bh_enable();
 }
-EXPORT_SYMBOL(kernel_neon_begin_partial);
+EXPORT_SYMBOL(kernel_neon_begin);
 
+/*
+ * kernel_neon_end(): give the CPU FPSIMD registers back to the current task
+ *
+ * Must be called from a context in which kernel_neon_begin() was previously
+ * called, with no call to kernel_neon_end() in the meantime.
+ *
+ * The caller must not use the FPSIMD registers after this function is called,
+ * unless kernel_neon_begin() is called again in the meantime.
+ */
 void kernel_neon_end(void)
 {
+	bool busy;
+
 	if (!system_supports_fpsimd())
 		return;
-	if (in_interrupt()) {
-		struct fpsimd_partial_state *s = this_cpu_ptr(
-			in_irq() ? &hardirq_fpsimdstate : &softirq_fpsimdstate);
-		fpsimd_load_partial_state(s);
-	} else {
-		preempt_enable();
-	}
+
+	busy = __this_cpu_xchg(kernel_neon_busy, false);
+	WARN_ON(!busy);	/* No matching kernel_neon_begin()? */
+
+	preempt_enable();
 }
 EXPORT_SYMBOL(kernel_neon_end);
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH v3 4/4] arm64: neon: Add backwards compatibility kernel_neon_begin_partial()
  2017-05-25 18:24 [RFC PATCH v3 0/4] Simplify kernel-mode NEON Dave Martin
                   ` (2 preceding siblings ...)
  2017-05-25 18:25 ` [RFC PATCH v3 3/4] arm64: neon: Remove support for nested or hardirq kernel-mode NEON Dave Martin
@ 2017-05-25 18:25 ` Dave Martin
  2017-05-30 18:02 ` [RFC PATCH v3 0/4] Simplify kernel-mode NEON Dave Martin
  4 siblings, 0 replies; 13+ messages in thread
From: Dave Martin @ 2017-05-25 18:25 UTC (permalink / raw)
  To: linux-arm-kernel

kernel_neon_begin_partial() is no longer available, but the
implementation of kernel_neon_begin() should be sufficient to
satisfy kernel_neon_begin_partial() callers.

Because of the changed semantics of may_use_simd() and
kernel_neon_begin() such code really needs to be ported.

In the meantime, this patch restores buildability of kernel-mode
NEON client code.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
---
 arch/arm64/include/asm/neon.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/include/asm/neon.h b/arch/arm64/include/asm/neon.h
index fb9d137..885f7a6 100644
--- a/arch/arm64/include/asm/neon.h
+++ b/arch/arm64/include/asm/neon.h
@@ -19,4 +19,7 @@
 void kernel_neon_begin(void);
 void kernel_neon_end(void);
 
+/* FIXME: Backwards compatibility only, should go away: */
+#define kernel_neon_begin_partial(num_regs) kernel_neon_begin()
+
 #endif /* ! __ASM_NEON_H */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH v3 0/4] Simplify kernel-mode NEON
  2017-05-25 18:24 [RFC PATCH v3 0/4] Simplify kernel-mode NEON Dave Martin
                   ` (3 preceding siblings ...)
  2017-05-25 18:25 ` [RFC PATCH v3 4/4] arm64: neon: Add backwards compatibility kernel_neon_begin_partial() Dave Martin
@ 2017-05-30 18:02 ` Dave Martin
  2017-05-31  8:41   ` Ard Biesheuvel
  4 siblings, 1 reply; 13+ messages in thread
From: Dave Martin @ 2017-05-30 18:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 25, 2017 at 07:24:57PM +0100, Dave Martin wrote:
> This series aims to simplify kernel-mode NEON.

Hi Ard, do you have any further comments on this series?

I'd like to have it finalised as far as possible (modulo minor tweaks
and bugfixes) so that I can port the SVE patches on top of it.

Also, how do you think we should handle merging of this change?  There's
a flag-day issue here, since the kernel_mode_neon() API is being changed
in an incompatible way.

Cheers
---Dave

[...]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH v3 0/4] Simplify kernel-mode NEON
  2017-05-30 18:02 ` [RFC PATCH v3 0/4] Simplify kernel-mode NEON Dave Martin
@ 2017-05-31  8:41   ` Ard Biesheuvel
  2017-05-31 10:08     ` Dave Martin
  0 siblings, 1 reply; 13+ messages in thread
From: Ard Biesheuvel @ 2017-05-31  8:41 UTC (permalink / raw)
  To: linux-arm-kernel

On 30 May 2017 at 18:02, Dave Martin <Dave.Martin@arm.com> wrote:
> On Thu, May 25, 2017 at 07:24:57PM +0100, Dave Martin wrote:
>> This series aims to simplify kernel-mode NEON.
>
> Hi Ard, do you have any further comments on this series?
>
> I'd like to have it finalised as far as possible (modulo minor tweaks
> and bugfixes) so that I can port the SVE patches on top of it.
>
> Also, how do you think we should handle merging of this change?  There's
> a flag-day issue here, since the kernel_mode_neon() API is being changed
> in an incompatible way.
>

I think the patches look fine now. The best way to merge these imo is
to start with the changes in the clients, i.e., add an arm64 specific
asm/simd.h that defines may_use_simd() as { return true; }, update all
the crypto code with the fallbacks, and put this stuff on top of that.
That way, there is a small window where the 'hint' is interpreted
differently in the sha256 code, but apart from that, we should be
bisection proof without a flag day AFAICT.

BTW I got my ZD1211 working on my MacchiatioBin board. The performance
is terrible, but that should not matter: if I can saturate a CPU doing
NEON from userland and/or kernel process context, the softirq
interruptions by the mac80211 code should exercise the updated code
paths. I haven't tried that yet: let me get the code changes out
today, so you can put your stuff on top. Then we can give it a good
spin.

-- 
Ard.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH v3 0/4] Simplify kernel-mode NEON
  2017-05-31  8:41   ` Ard Biesheuvel
@ 2017-05-31 10:08     ` Dave Martin
  2017-05-31 11:07       ` Ard Biesheuvel
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Martin @ 2017-05-31 10:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, May 31, 2017 at 08:41:01AM +0000, Ard Biesheuvel wrote:
> On 30 May 2017 at 18:02, Dave Martin <Dave.Martin@arm.com> wrote:
> > On Thu, May 25, 2017 at 07:24:57PM +0100, Dave Martin wrote:
> >> This series aims to simplify kernel-mode NEON.
> >
> > Hi Ard, do you have any further comments on this series?
> >
> > I'd like to have it finalised as far as possible (modulo minor tweaks
> > and bugfixes) so that I can port the SVE patches on top of it.
> >
> > Also, how do you think we should handle merging of this change?  There's
> > a flag-day issue here, since the kernel_mode_neon() API is being changed
> > in an incompatible way.
> >
> 
> I think the patches look fine now. The best way to merge these imo is
> to start with the changes in the clients, i.e., add an arm64 specific
> asm/simd.h that defines may_use_simd() as { return true; }, update all
> the crypto code with the fallbacks, and put this stuff on top of that.

Yes, that sounds feasible.

Something like [1] below?  Either way, it probably makes sense for that
stub function to be added by your series.

> That way, there is a small window where the 'hint' is interpreted
> differently in the sha256 code, but apart from that, we should be
> bisection proof without a flag day AFAICT.
> 
> BTW I got my ZD1211 working on my MacchiatioBin board. The performance
> is terrible, but that should not matter: if I can saturate a CPU doing

Do you mean that my series causes a performance regression here, or is
the performance terrible anyway?

> NEON from userland and/or kernel process context, the softirq
> interruptions by the mac80211 code should exercise the updated code
> paths. I haven't tried that yet: let me get the code changes out
> today, so you can put your stuff on top. Then we can give it a good
> spin.

That would be great, thanks.

Cheers
---Dave

[1]:

>From 325ba5718ee0f136bfb3b4a43f7f42b1d8f2ab12 Mon Sep 17 00:00:00 2001
From: Dave Martin <Dave.Martin@arm.com>
Date: Wed, 31 May 2017 10:31:26 +0100
Subject: [PATCH] arm64: neon: Add stub may_use_simd() in preparation for
 refactoring

In preparation for refactoring that will make the conditions for
kernel-mode NEON use non-trivial, this patch adds a stub
may_use_simd() function.

Since arm64 currently supports kernel-mode NEON from any context
(excluding NMI) this function will return true for now.

This should allow drivers to be ported to use
kernel_neon_begin()/_end() based on this check, without impacting
rebaseability.

Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
---
 arch/arm64/include/asm/simd.h | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)
 create mode 100644 arch/arm64/include/asm/simd.h

diff --git a/arch/arm64/include/asm/simd.h b/arch/arm64/include/asm/simd.h
new file mode 100644
index 0000000..bd3acd8
--- /dev/null
+++ b/arch/arm64/include/asm/simd.h
@@ -0,0 +1,22 @@
+/*
+ * Copyright (C) 2017  ARM Limited
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __ASM_SIMD_H
+#define __ASM_SIMD_H
+
+/* arm64 kernel_mode_neon() currently supports all contexts up to hardirq */
+static inline bool may_use_simd(void) { return true; }
+
+#endif /* ! __ASM_SIMD_H */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH v3 0/4] Simplify kernel-mode NEON
  2017-05-31 10:08     ` Dave Martin
@ 2017-05-31 11:07       ` Ard Biesheuvel
  2017-05-31 11:33         ` Dave Martin
  0 siblings, 1 reply; 13+ messages in thread
From: Ard Biesheuvel @ 2017-05-31 11:07 UTC (permalink / raw)
  To: linux-arm-kernel

On 31 May 2017 at 10:08, Dave Martin <Dave.Martin@arm.com> wrote:
> On Wed, May 31, 2017 at 08:41:01AM +0000, Ard Biesheuvel wrote:
>> On 30 May 2017 at 18:02, Dave Martin <Dave.Martin@arm.com> wrote:
>> > On Thu, May 25, 2017 at 07:24:57PM +0100, Dave Martin wrote:
>> >> This series aims to simplify kernel-mode NEON.
>> >
>> > Hi Ard, do you have any further comments on this series?
>> >
>> > I'd like to have it finalised as far as possible (modulo minor tweaks
>> > and bugfixes) so that I can port the SVE patches on top of it.
>> >
>> > Also, how do you think we should handle merging of this change?  There's
>> > a flag-day issue here, since the kernel_mode_neon() API is being changed
>> > in an incompatible way.
>> >
>>
>> I think the patches look fine now. The best way to merge these imo is
>> to start with the changes in the clients, i.e., add an arm64 specific
>> asm/simd.h that defines may_use_simd() as { return true; }, update all
>> the crypto code with the fallbacks, and put this stuff on top of that.
>
> Yes, that sounds feasible.
>
> Something like [1] below?  Either way, it probably makes sense for that
> stub function to be added by your series.
>

Pretty much, yeah. But don't forget to remove simd.h from
arch/arm64/include/asm/Kbuild

>> That way, there is a small window where the 'hint' is interpreted
>> differently in the sha256 code, but apart from that, we should be
>> bisection proof without a flag day AFAICT.
>>
>> BTW I got my ZD1211 working on my MacchiatioBin board. The performance
>> is terrible, but that should not matter: if I can saturate a CPU doing
>
> Do you mean that my series causes a performance regression here, or is
> the performance terrible anyway?
>

No, the performance is terrible, which shouldn't matter per se, but it
would be nice if the load induced by the mac80211 were visible in
'top' as wait, sys or whatever-it-is-called time. Currently, the 3
Mbit/s throughput combined with the 2.2 cycles per byte performance of
the AES-CCM code makes the code unnoticeable.

>> NEON from userland and/or kernel process context, the softirq
>> interruptions by the mac80211 code should exercise the updated code
>> paths. I haven't tried that yet: let me get the code changes out
>> today, so you can put your stuff on top. Then we can give it a good
>> spin.
>
> That would be great, thanks.
>

I have updated my branch here:
https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=kernel-mode-neon

I removed all kernel_neon_begin_partial() invocations as well.

-- 
Ard.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH v3 0/4] Simplify kernel-mode NEON
  2017-05-31 11:07       ` Ard Biesheuvel
@ 2017-05-31 11:33         ` Dave Martin
  0 siblings, 0 replies; 13+ messages in thread
From: Dave Martin @ 2017-05-31 11:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, May 31, 2017 at 11:07:56AM +0000, Ard Biesheuvel wrote:
> On 31 May 2017 at 10:08, Dave Martin <Dave.Martin@arm.com> wrote:
> > On Wed, May 31, 2017 at 08:41:01AM +0000, Ard Biesheuvel wrote:

[...]

> > Something like [1] below?  Either way, it probably makes sense for that
> > stub function to be added by your series.
> >
> 
> Pretty much, yeah. But don't forget to remove simd.h from
> arch/arm64/include/asm/Kbuild

Oh wow, I thought that was done by magic.

> >> BTW I got my ZD1211 working on my MacchiatioBin board. The performance
> >> is terrible, but that should not matter: if I can saturate a CPU doing
> >
> > Do you mean that my series causes a performance regression here, or is
> > the performance terrible anyway?
> >
> 
> No, the performance is terrible, which shouldn't matter per se, but it
> would be nice if the load induced by the mac80211 were visible in
> 'top' as wait, sys or whatever-it-is-called time. Currently, the 3
> Mbit/s throughput combined with the 2.2 cycles per byte performance of
> the AES-CCM code makes the code unnoticeable.
> 
> >> NEON from userland and/or kernel process context, the softirq
> >> interruptions by the mac80211 code should exercise the updated code
> >> paths. I haven't tried that yet: let me get the code changes out
> >> today, so you can put your stuff on top. Then we can give it a good
> >> spin.
> >
> > That would be great, thanks.
> >
> 
> I have updated my branch here:
> https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=kernel-mode-neon

Looks good.

> I removed all kernel_neon_begin_partial() invocations as well.

OK, I will drop that from my series.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH v3 1/4] arm64: neon: Add missing header guard in <asm/neon.h>
  2017-05-25 18:24 ` [RFC PATCH v3 1/4] arm64: neon: Add missing header guard in <asm/neon.h> Dave Martin
@ 2017-05-31 11:41   ` Ard Biesheuvel
  0 siblings, 0 replies; 13+ messages in thread
From: Ard Biesheuvel @ 2017-05-31 11:41 UTC (permalink / raw)
  To: linux-arm-kernel

On 25 May 2017 at 18:24, Dave Martin <Dave.Martin@arm.com> wrote:
> asm/neon.h doesn't have a header inclusion guard, but it should
> have one for consistency with other headers.
>
> This patch adds a suitable guard.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>

Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

> ---
>  arch/arm64/include/asm/neon.h | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/arch/arm64/include/asm/neon.h b/arch/arm64/include/asm/neon.h
> index ad4cdc9..5368bd0 100644
> --- a/arch/arm64/include/asm/neon.h
> +++ b/arch/arm64/include/asm/neon.h
> @@ -8,6 +8,9 @@
>   * published by the Free Software Foundation.
>   */
>
> +#ifndef __ASM_NEON_H
> +#define __ASM_NEON_H
> +
>  #include <linux/types.h>
>  #include <asm/fpsimd.h>
>
> @@ -17,3 +20,5 @@
>
>  void kernel_neon_begin_partial(u32 num_regs);
>  void kernel_neon_end(void);
> +
> +#endif /* ! __ASM_NEON_H */
> --
> 2.1.4
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH v3 2/4] arm64: fpsimd: Consistently use __this_cpu_ ops where appropriate
  2017-05-25 18:24 ` [RFC PATCH v3 2/4] arm64: fpsimd: Consistently use __this_cpu_ ops where appropriate Dave Martin
@ 2017-05-31 11:43   ` Ard Biesheuvel
  0 siblings, 0 replies; 13+ messages in thread
From: Ard Biesheuvel @ 2017-05-31 11:43 UTC (permalink / raw)
  To: linux-arm-kernel

On 25 May 2017 at 18:24, Dave Martin <Dave.Martin@arm.com> wrote:
> __this_cpu_ ops are not used consistently with regard to this_cpu_
> ops in a couple of places in fpsimd.c.
>
> Since preemption is explicitly disabled in
> fpsimd_restore_current_state() and fpsimd_update_current_state(),
> this patch converts this_cpu_ ops in those functions to __this_cpu_
> ops.  This doesn't save cost on arm64, but benefits from additional
> assertions in the core code.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>

Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

> ---
>  arch/arm64/kernel/fpsimd.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 06da8ea..d7e5f8a 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -194,7 +194,7 @@ void fpsimd_restore_current_state(void)
>                 struct fpsimd_state *st = &current->thread.fpsimd_state;
>
>                 fpsimd_load_state(st);
> -               this_cpu_write(fpsimd_last_state, st);
> +               __this_cpu_write(fpsimd_last_state, st);
>                 st->cpu = smp_processor_id();
>         }
>         preempt_enable();
> @@ -214,7 +214,7 @@ void fpsimd_update_current_state(struct fpsimd_state *state)
>         if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
>                 struct fpsimd_state *st = &current->thread.fpsimd_state;
>
> -               this_cpu_write(fpsimd_last_state, st);
> +               __this_cpu_write(fpsimd_last_state, st);
>                 st->cpu = smp_processor_id();
>         }
>         preempt_enable();
> --
> 2.1.4
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH v3 3/4] arm64: neon: Remove support for nested or hardirq kernel-mode NEON
  2017-05-25 18:25 ` [RFC PATCH v3 3/4] arm64: neon: Remove support for nested or hardirq kernel-mode NEON Dave Martin
@ 2017-05-31 11:51   ` Ard Biesheuvel
  0 siblings, 0 replies; 13+ messages in thread
From: Ard Biesheuvel @ 2017-05-31 11:51 UTC (permalink / raw)
  To: linux-arm-kernel

On 25 May 2017 at 18:25, Dave Martin <Dave.Martin@arm.com> wrote:
> Support for kernel-mode NEON to be nested and/or used in hardirq
> context adds significant complexity, and the benefits may be
> marginal.  In practice, kernel-mode NEON is not used in hardirq
> context, and is rarely used in softirq context (by certain mac80211
> drivers).
>

To clarify: it is actually the core mac80211 code that invokes the AES
crypto routines, but only if the crypto is not performed by the
hardware.

> This patch implements an arm64 may_use_simd() function to allow
> clients to check whether kernel-mode NEON is usable in the current
> context, and simplifies kernel_neon_{begin,end}() to handle only
> saving of the task FPSIMD state (if any).  Without nesting, there
> is no other state to save.
>
> The partial fpsimd save/restore functions become redundant as a
> result of these changes, so they are removed too.
>
> The save/restore model is changed to operate directly on
> task_struct without additional percpu storage.  This simplifies the
> code and saves a bit of memory, but means that softirqs must now be
> disabled when manipulating the task fpsimd state from task context:
> correspondingly, preempt_{en,dis}sable() calls are upgraded to
> local_bh_{en,dis}able() as appropriate.

I don't think the latter implies the former: please look at the
difference between SOFTIRQ_DISABLE_OFFSET and SOFTIRQ_LOCK_OFFSET

>  fpsimd_thread_switch()
> already runs with hardirqs disabled and so is already protected
> from softirqs.
>
> These changes should make it easier to support kernel-mode NEON in
> the presence of the Scalable Vector extension in the future.
>
> Signed-off-by: Dave Martin <Dave.Martin@arm.com>
> ---
>  arch/arm64/include/asm/fpsimd.h       |  14 -----
>  arch/arm64/include/asm/fpsimdmacros.h |  56 -----------------
>  arch/arm64/include/asm/neon.h         |   4 +-
>  arch/arm64/include/asm/simd.h         |  56 +++++++++++++++++
>  arch/arm64/kernel/entry-fpsimd.S      |  24 -------
>  arch/arm64/kernel/fpsimd.c            | 115 +++++++++++++++++++++++-----------
>  6 files changed, 136 insertions(+), 133 deletions(-)
>  create mode 100644 arch/arm64/include/asm/simd.h
>
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index 50f559f..ff2f6cd 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -41,16 +41,6 @@ struct fpsimd_state {
>         unsigned int cpu;
>  };
>
> -/*
> - * Struct for stacking the bottom 'n' FP/SIMD registers.
> - */
> -struct fpsimd_partial_state {
> -       u32             fpsr;
> -       u32             fpcr;
> -       u32             num_regs;
> -       __uint128_t     vregs[32];
> -};
> -
>
>  #if defined(__KERNEL__) && defined(CONFIG_COMPAT)
>  /* Masks for extracting the FPSR and FPCR from the FPSCR */
> @@ -77,10 +67,6 @@ extern void fpsimd_update_current_state(struct fpsimd_state *state);
>
>  extern void fpsimd_flush_task_state(struct task_struct *target);
>
> -extern void fpsimd_save_partial_state(struct fpsimd_partial_state *state,
> -                                     u32 num_regs);
> -extern void fpsimd_load_partial_state(struct fpsimd_partial_state *state);
> -
>  #endif
>
>  #endif
> diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h
> index a2daf12..0f5fdd3 100644
> --- a/arch/arm64/include/asm/fpsimdmacros.h
> +++ b/arch/arm64/include/asm/fpsimdmacros.h
> @@ -75,59 +75,3 @@
>         ldr     w\tmpnr, [\state, #16 * 2 + 4]
>         fpsimd_restore_fpcr x\tmpnr, \state
>  .endm
> -
> -.macro fpsimd_save_partial state, numnr, tmpnr1, tmpnr2
> -       mrs     x\tmpnr1, fpsr
> -       str     w\numnr, [\state, #8]
> -       mrs     x\tmpnr2, fpcr
> -       stp     w\tmpnr1, w\tmpnr2, [\state]
> -       adr     x\tmpnr1, 0f
> -       add     \state, \state, x\numnr, lsl #4
> -       sub     x\tmpnr1, x\tmpnr1, x\numnr, lsl #1
> -       br      x\tmpnr1
> -       stp     q30, q31, [\state, #-16 * 30 - 16]
> -       stp     q28, q29, [\state, #-16 * 28 - 16]
> -       stp     q26, q27, [\state, #-16 * 26 - 16]
> -       stp     q24, q25, [\state, #-16 * 24 - 16]
> -       stp     q22, q23, [\state, #-16 * 22 - 16]
> -       stp     q20, q21, [\state, #-16 * 20 - 16]
> -       stp     q18, q19, [\state, #-16 * 18 - 16]
> -       stp     q16, q17, [\state, #-16 * 16 - 16]
> -       stp     q14, q15, [\state, #-16 * 14 - 16]
> -       stp     q12, q13, [\state, #-16 * 12 - 16]
> -       stp     q10, q11, [\state, #-16 * 10 - 16]
> -       stp     q8, q9, [\state, #-16 * 8 - 16]
> -       stp     q6, q7, [\state, #-16 * 6 - 16]
> -       stp     q4, q5, [\state, #-16 * 4 - 16]
> -       stp     q2, q3, [\state, #-16 * 2 - 16]
> -       stp     q0, q1, [\state, #-16 * 0 - 16]
> -0:
> -.endm
> -
> -.macro fpsimd_restore_partial state, tmpnr1, tmpnr2
> -       ldp     w\tmpnr1, w\tmpnr2, [\state]
> -       msr     fpsr, x\tmpnr1
> -       fpsimd_restore_fpcr x\tmpnr2, x\tmpnr1
> -       adr     x\tmpnr1, 0f
> -       ldr     w\tmpnr2, [\state, #8]
> -       add     \state, \state, x\tmpnr2, lsl #4
> -       sub     x\tmpnr1, x\tmpnr1, x\tmpnr2, lsl #1
> -       br      x\tmpnr1
> -       ldp     q30, q31, [\state, #-16 * 30 - 16]
> -       ldp     q28, q29, [\state, #-16 * 28 - 16]
> -       ldp     q26, q27, [\state, #-16 * 26 - 16]
> -       ldp     q24, q25, [\state, #-16 * 24 - 16]
> -       ldp     q22, q23, [\state, #-16 * 22 - 16]
> -       ldp     q20, q21, [\state, #-16 * 20 - 16]
> -       ldp     q18, q19, [\state, #-16 * 18 - 16]
> -       ldp     q16, q17, [\state, #-16 * 16 - 16]
> -       ldp     q14, q15, [\state, #-16 * 14 - 16]
> -       ldp     q12, q13, [\state, #-16 * 12 - 16]
> -       ldp     q10, q11, [\state, #-16 * 10 - 16]
> -       ldp     q8, q9, [\state, #-16 * 8 - 16]
> -       ldp     q6, q7, [\state, #-16 * 6 - 16]
> -       ldp     q4, q5, [\state, #-16 * 4 - 16]
> -       ldp     q2, q3, [\state, #-16 * 2 - 16]
> -       ldp     q0, q1, [\state, #-16 * 0 - 16]
> -0:
> -.endm
> diff --git a/arch/arm64/include/asm/neon.h b/arch/arm64/include/asm/neon.h
> index 5368bd0..fb9d137 100644
> --- a/arch/arm64/include/asm/neon.h
> +++ b/arch/arm64/include/asm/neon.h
> @@ -16,9 +16,7 @@
>
>  #define cpu_has_neon()         system_supports_fpsimd()
>
> -#define kernel_neon_begin()    kernel_neon_begin_partial(32)
> -
> -void kernel_neon_begin_partial(u32 num_regs);
> +void kernel_neon_begin(void);
>  void kernel_neon_end(void);
>
>  #endif /* ! __ASM_NEON_H */
> diff --git a/arch/arm64/include/asm/simd.h b/arch/arm64/include/asm/simd.h
> new file mode 100644
> index 0000000..bbfc68f
> --- /dev/null
> +++ b/arch/arm64/include/asm/simd.h
> @@ -0,0 +1,56 @@
> +/*
> + * Copyright (C) 2017  ARM Limited
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +#ifndef __ASM_SIMD_H
> +#define __ASM_SIMD_H
> +
> +#include <linux/percpu.h>
> +#include <linux/preempt.h>
> +#include <linux/types.h>
> +
> +#ifdef CONFIG_KERNEL_MODE_NEON
> +
> +DECLARE_PER_CPU(bool, kernel_neon_busy);
> +
> +/*
> + * Returns true if and only if it is safe to call kernel_neon_begin().
> + * Callers must not assume that the result remains true beyond the next
> + * preempt_enable() or return from softirq context.
> + */
> +static inline bool may_use_simd(void)
> +{
> +       /*
> +        * The raw_cpu_read() is racy if called with preemption enabled.
> +        * This is not a bug: kernel_neon_busy is only set when
> +        * preemption is disabled, so we cannot migrate to another CPU
> +        * while it is set, nor can we migrate to a CPU where it is set.
> +        * So, if we find it clear on some CPU then we're guaranteed to
> +        * find it clear on any CPU we could migrate to.
> +        *
> +        * If we are in between kernel_neon_begin()...kernel_neon_end(),
> +        * the flag will be set, but preemption is also disabled, so we
> +        * can't migrate to another CPU and spuriously see it become
> +        * false.
> +        */
> +       return !in_irq() && !in_nmi() && !raw_cpu_read(kernel_neon_busy);
> +}
> +
> +#else /* ! CONFIG_KERNEL_MODE_NEON */
> +
> +static inline bool may_use_simd(void) { return false; }
> +
> +#endif /* ! CONFIG_KERNEL_MODE_NEON */
> +
> +#endif /* ! __ASM_SIMD_H */
> diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S
> index c44a82f..6a27cd6 100644
> --- a/arch/arm64/kernel/entry-fpsimd.S
> +++ b/arch/arm64/kernel/entry-fpsimd.S
> @@ -41,27 +41,3 @@ ENTRY(fpsimd_load_state)
>         fpsimd_restore x0, 8
>         ret
>  ENDPROC(fpsimd_load_state)
> -
> -#ifdef CONFIG_KERNEL_MODE_NEON
> -
> -/*
> - * Save the bottom n FP registers.
> - *
> - * x0 - pointer to struct fpsimd_partial_state
> - */
> -ENTRY(fpsimd_save_partial_state)
> -       fpsimd_save_partial x0, 1, 8, 9
> -       ret
> -ENDPROC(fpsimd_save_partial_state)
> -
> -/*
> - * Load the bottom n FP registers.
> - *
> - * x0 - pointer to struct fpsimd_partial_state
> - */
> -ENTRY(fpsimd_load_partial_state)
> -       fpsimd_restore_partial x0, 8, 9
> -       ret
> -ENDPROC(fpsimd_load_partial_state)
> -
> -#endif
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index d7e5f8a..14329d6 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -17,16 +17,18 @@
>   * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>   */
>
> +#include <linux/bottom_half.h>
>  #include <linux/cpu.h>
>  #include <linux/cpu_pm.h>
>  #include <linux/kernel.h>
>  #include <linux/init.h>
> +#include <linux/percpu.h>
>  #include <linux/sched/signal.h>
>  #include <linux/signal.h>
> -#include <linux/hardirq.h>
>
>  #include <asm/fpsimd.h>
>  #include <asm/cputype.h>
> +#include <asm/simd.h>
>
>  #define FPEXC_IOF      (1 << 0)
>  #define FPEXC_DZF      (1 << 1)
> @@ -62,6 +64,13 @@
>   * CPU currently contain the most recent userland FPSIMD state of the current
>   * task.
>   *
> + * In order to allow softirq handlers to use FPSIMD, kernel_neon_begin() may
> + * save the task's FPSIMD context back to task_struct from softirq context.
> + * To prevent this from racing with the manipulation of the task's FPSIMD state
> + * from task context and thereby corrupting the state, it is necessary to
> + * protect any manipulation of a task's fpsimd_state or TIF_FOREIGN_FPSTATE
> + * flag with local_bh_disable() unless softirqs are already masked.
> + *
>   * For a certain task, the sequence may look something like this:
>   * - the task gets scheduled in; if both the task's fpsimd_state.cpu field
>   *   contains the id of the current CPU, and the CPU's fpsimd_last_state per-cpu
> @@ -161,9 +170,14 @@ void fpsimd_flush_thread(void)
>  {
>         if (!system_supports_fpsimd())
>                 return;
> +
> +       local_bh_disable();
> +
>         memset(&current->thread.fpsimd_state, 0, sizeof(struct fpsimd_state));
>         fpsimd_flush_task_state(current);
>         set_thread_flag(TIF_FOREIGN_FPSTATE);
> +
> +       local_bh_enable();
>  }
>
>  /*
> @@ -174,10 +188,13 @@ void fpsimd_preserve_current_state(void)
>  {
>         if (!system_supports_fpsimd())
>                 return;
> -       preempt_disable();
> +
> +       local_bh_disable();
> +
>         if (!test_thread_flag(TIF_FOREIGN_FPSTATE))
>                 fpsimd_save_state(&current->thread.fpsimd_state);
> -       preempt_enable();
> +
> +       local_bh_enable();
>  }
>
>  /*
> @@ -189,7 +206,9 @@ void fpsimd_restore_current_state(void)
>  {
>         if (!system_supports_fpsimd())
>                 return;
> -       preempt_disable();
> +
> +       local_bh_disable();
> +
>         if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
>                 struct fpsimd_state *st = &current->thread.fpsimd_state;
>
> @@ -197,7 +216,8 @@ void fpsimd_restore_current_state(void)
>                 __this_cpu_write(fpsimd_last_state, st);
>                 st->cpu = smp_processor_id();
>         }
> -       preempt_enable();
> +
> +       local_bh_enable();
>  }
>
>  /*
> @@ -209,7 +229,9 @@ void fpsimd_update_current_state(struct fpsimd_state *state)
>  {
>         if (!system_supports_fpsimd())
>                 return;
> -       preempt_disable();
> +
> +       local_bh_disable();
> +
>         fpsimd_load_state(state);
>         if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) {
>                 struct fpsimd_state *st = &current->thread.fpsimd_state;
> @@ -217,7 +239,8 @@ void fpsimd_update_current_state(struct fpsimd_state *state)
>                 __this_cpu_write(fpsimd_last_state, st);
>                 st->cpu = smp_processor_id();
>         }
> -       preempt_enable();
> +
> +       local_bh_enable();
>  }
>
>  /*
> @@ -230,49 +253,69 @@ void fpsimd_flush_task_state(struct task_struct *t)
>
>  #ifdef CONFIG_KERNEL_MODE_NEON
>
> -static DEFINE_PER_CPU(struct fpsimd_partial_state, hardirq_fpsimdstate);
> -static DEFINE_PER_CPU(struct fpsimd_partial_state, softirq_fpsimdstate);
> +DEFINE_PER_CPU(bool, kernel_neon_busy);
>
>  /*
>   * Kernel-side NEON support functions
>   */
> -void kernel_neon_begin_partial(u32 num_regs)
> +
> +/*
> + * kernel_neon_begin(): obtain the CPU FPSIMD registers for use by the calling
> + * context
> + *
> + * Must not be called unless may_use_simd() returns true.
> + * Task context in the FPSIMD registers is saved back to memory as necessary.
> + *
> + * A matching call to kernel_neon_end() must be made before returning from the
> + * calling context.
> + *
> + * The caller may freely use the FPSIMD registers until kernel_neon_end() is
> + * called.
> + */
> +void kernel_neon_begin(void)
>  {
>         if (WARN_ON(!system_supports_fpsimd()))
>                 return;
> -       if (in_interrupt()) {
> -               struct fpsimd_partial_state *s = this_cpu_ptr(
> -                       in_irq() ? &hardirq_fpsimdstate : &softirq_fpsimdstate);
>
> -               BUG_ON(num_regs > 32);
> -               fpsimd_save_partial_state(s, roundup(num_regs, 2));
> -       } else {
> -               /*
> -                * Save the userland FPSIMD state if we have one and if we
> -                * haven't done so already. Clear fpsimd_last_state to indicate
> -                * that there is no longer userland FPSIMD state in the
> -                * registers.
> -                */
> -               preempt_disable();
> -               if (current->mm &&
> -                   !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
> -                       fpsimd_save_state(&current->thread.fpsimd_state);
> -               this_cpu_write(fpsimd_last_state, NULL);
> -       }
> +       BUG_ON(!may_use_simd());
> +
> +       local_bh_disable();
> +
> +       __this_cpu_write(kernel_neon_busy, true);
> +
> +       /* Save unsaved task fpsimd state, if any: */
> +       if (current->mm && !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE))
> +               fpsimd_save_state(&current->thread.fpsimd_state);
> +
> +       /* Invalidate any task state remaining in the fpsimd regs: */
> +       __this_cpu_write(fpsimd_last_state, NULL);
> +
> +       preempt_disable();
> +
> +       local_bh_enable();
>  }
> -EXPORT_SYMBOL(kernel_neon_begin_partial);
> +EXPORT_SYMBOL(kernel_neon_begin);
>
> +/*
> + * kernel_neon_end(): give the CPU FPSIMD registers back to the current task
> + *
> + * Must be called from a context in which kernel_neon_begin() was previously
> + * called, with no call to kernel_neon_end() in the meantime.
> + *
> + * The caller must not use the FPSIMD registers after this function is called,
> + * unless kernel_neon_begin() is called again in the meantime.
> + */
>  void kernel_neon_end(void)
>  {
> +       bool busy;
> +
>         if (!system_supports_fpsimd())
>                 return;
> -       if (in_interrupt()) {
> -               struct fpsimd_partial_state *s = this_cpu_ptr(
> -                       in_irq() ? &hardirq_fpsimdstate : &softirq_fpsimdstate);
> -               fpsimd_load_partial_state(s);
> -       } else {
> -               preempt_enable();
> -       }
> +
> +       busy = __this_cpu_xchg(kernel_neon_busy, false);
> +       WARN_ON(!busy); /* No matching kernel_neon_begin()? */
> +
> +       preempt_enable();
>  }
>  EXPORT_SYMBOL(kernel_neon_end);
>
> --
> 2.1.4
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-05-31 11:51 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-25 18:24 [RFC PATCH v3 0/4] Simplify kernel-mode NEON Dave Martin
2017-05-25 18:24 ` [RFC PATCH v3 1/4] arm64: neon: Add missing header guard in <asm/neon.h> Dave Martin
2017-05-31 11:41   ` Ard Biesheuvel
2017-05-25 18:24 ` [RFC PATCH v3 2/4] arm64: fpsimd: Consistently use __this_cpu_ ops where appropriate Dave Martin
2017-05-31 11:43   ` Ard Biesheuvel
2017-05-25 18:25 ` [RFC PATCH v3 3/4] arm64: neon: Remove support for nested or hardirq kernel-mode NEON Dave Martin
2017-05-31 11:51   ` Ard Biesheuvel
2017-05-25 18:25 ` [RFC PATCH v3 4/4] arm64: neon: Add backwards compatibility kernel_neon_begin_partial() Dave Martin
2017-05-30 18:02 ` [RFC PATCH v3 0/4] Simplify kernel-mode NEON Dave Martin
2017-05-31  8:41   ` Ard Biesheuvel
2017-05-31 10:08     ` Dave Martin
2017-05-31 11:07       ` Ard Biesheuvel
2017-05-31 11:33         ` Dave Martin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.