* [RFC PATCH v3 0/4] Simplify kernel-mode NEON @ 2017-05-25 18:24 Dave Martin 2017-05-25 18:24 ` [RFC PATCH v3 1/4] arm64: neon: Add missing header guard in <asm/neon.h> Dave Martin ` (4 more replies) 0 siblings, 5 replies; 13+ messages in thread From: Dave Martin @ 2017-05-25 18:24 UTC (permalink / raw) To: linux-arm-kernel This series aims to simplify kernel-mode NEON. Changes since RFC v2: * Re-enable softirqs between kernel_neon_begin() and kernel_mode_end() (This was the original intention, but I was overtaken by irrational paranoia when drafting the previous version of the patch.) * softirq disable/enable is removed from fpsimd_thread_switch() (which runs with hardirqs disabled in any case, thus local_bh_enable() is treated as an error by the core code). The comment block is updated to clarify this situation. This also means that no cost is added to the context switch critical path after all. * Move the this_cpu_write(fpsimd_last_state, NULL) back outside the conditional in kernel_neon_begin() (i.e., where it was originally before this series). RFC v2 accidentally moved this invalidation inside the conditional, which leads to state corruption if switching back to a user task previously running on the same CPU, after kernel_mode_neon() is used by a kernel thread. * Minor formatting and spurious #include tidyups. Testing: * For now, I've hacked linux/kernel/time/timer.c:run_timer_softirq() to call kernel_mode_neon_begin() and erase the FPSIMD registers, and jacked CONFIG_HZ to 1000. I also added a test syscall sys_blatt_neon that userspace can hammer to trigger the nested kernel_neon_begin() scenario. See [2]. This is not hugely realistic, but was sufficient to diagnose the corruption problem described above, when running on a model. Original blurb: The main motivation for these changes is that supporting kernel-mode NEON alongside SVE is tricky with the current framework: the current partial save/restore mechanisms would need additional porting to save/restore the extended SVE vector bits, and this renders the cost saving of partial save/restore rather doubtful -- even if not all vector registers are saved, the save/restore cost will still grow with increasing vector length. We could get the mechanics of this to work in principle, but it doesn't feel like the right thing to do. If we withdraw kernel-mode NEON support for hardirq context, accept some extra softirq latency and disable nesting, then we can simplify the code by always saving the task context directly into task_struct and deferring all restore to ret_to_user. Previous discussions with Ard Biesheuvel suggest that this is a feasible approach and reasonably aligned with other architectures. The main impact on client code is that callers must check that kernel-mode NEON is usable in the current context and fall back to a non-NEON when necessary. Ard has already done some work on this. [1] The interesting changes are all in patch 2: the first patch just adds a header inclusion guard that I noted was missing. [1] git://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git kernel-mode-neon Dave Martin (4): arm64: neon: Add missing header guard in <asm/neon.h> arm64: fpsimd: Consistently use __this_cpu_ ops where appropriate arm64: neon: Remove support for nested or hardirq kernel-mode NEON arm64: neon: Add backwards compatibility kernel_neon_begin_partial() arch/arm64/include/asm/fpsimd.h | 14 ---- arch/arm64/include/asm/fpsimdmacros.h | 56 ---------------- arch/arm64/include/asm/neon.h | 12 +++- arch/arm64/include/asm/simd.h | 56 ++++++++++++++++ arch/arm64/kernel/entry-fpsimd.S | 24 ------- arch/arm64/kernel/fpsimd.c | 119 +++++++++++++++++++++++----------- 6 files changed, 146 insertions(+), 135 deletions(-) create mode 100644 arch/arm64/include/asm/simd.h -- [2] Test hacks, not for merging >From 584f1187269dcee04cb43fc6cb443d8e56454e2b Mon Sep 17 00:00:00 2001 From: Dave Martin <Dave.Martin@arm.com> Date: Thu, 25 May 2017 19:08:42 +0100 Subject: [PATCH] test hacks --- arch/arm64/kernel/fpsimd.c | 63 +++++++++++++++++++++++++++++++++- arch/arm64/kernel/sys.c | 58 +++++++++++++++++++++++++++++++ blatt-neon.c | 31 +++++++++++++++++ include/uapi/asm-generic/unistd.h | 5 ++- kernel/time/timer.c | 72 +++++++++++++++++++++++++++++++++++++++ 5 files changed, 227 insertions(+), 2 deletions(-) create mode 100644 blatt-neon.c diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 14329d6..6156e56 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -22,6 +22,7 @@ #include <linux/cpu_pm.h> #include <linux/kernel.h> #include <linux/init.h> +#include <linux/irqflags.h> #include <linux/percpu.h> #include <linux/sched/signal.h> #include <linux/signal.h> @@ -37,6 +38,8 @@ #define FPEXC_IXF (1 << 4) #define FPEXC_IDF (1 << 7) +static void print_suspicious_vregs(char const *str, struct fpsimd_state const *st); + /* * In order to reduce the number of times the FPSIMD state is needlessly saved * and restored, we need to keep track of two things: @@ -138,13 +141,18 @@ void fpsimd_thread_switch(struct task_struct *next) { if (!system_supports_fpsimd()) return; + + BUG_ON(!irqs_disabled()); + /* * Save the current FPSIMD state to memory, but only if whatever is in * the registers is in fact the most recent userland FPSIMD state of * 'current'. */ - if (current->mm && !test_thread_flag(TIF_FOREIGN_FPSTATE)) + if (current->mm && !test_thread_flag(TIF_FOREIGN_FPSTATE)) { fpsimd_save_state(¤t->thread.fpsimd_state); + print_suspicious_vregs("sched-out", ¤t->thread.fpsimd_state); + } if (next->mm) { /* @@ -197,6 +205,55 @@ void fpsimd_preserve_current_state(void) local_bh_enable(); } +static bool suspicious_vreg(u8 const *reg) +{ + u8 b = *reg; + unsigned int i; + + if (b < 0xc2 || b > 0xc7) + return false; + + for (i = 1; i < 16; ++i) + if (b != reg[i]) + return false; + + return true; +} + +static int print_vreg_if_suspicious(u8 const *reg, unsigned int num) +{ + char s[16 * 2 + 1], *p = s; + int n; + unsigned int i; + + if (!suspicious_vreg(reg)) + return 0; + + for (i = 0; i < 16; ++i) { + n = snprintf(p, s + sizeof(s) - p, "%.2hhx", reg[i]); + if (n < 0) + break; + + p += n; + } + + *p = '\0'; + return 1; +} + +static void print_suspicious_vregs(char const *str, struct fpsimd_state const *st) +{ + int bad = 0; + unsigned int i; + + for (i = 0; i < ARRAY_SIZE(st->vregs); ++i) + bad |= print_vreg_if_suspicious((u8 const *)&st->vregs[i], i); + + if (bad) + pr_info("%s[%d]: %s: suspicious vreg(s) detected\n", + current->comm, task_pid_nr(current), str); +} + /* * Load the userland FPSIMD state of 'current' from memory, but only if the * FPSIMD state already held in the registers is /not/ the most recent FPSIMD @@ -215,6 +272,8 @@ void fpsimd_restore_current_state(void) fpsimd_load_state(st); __this_cpu_write(fpsimd_last_state, st); st->cpu = smp_processor_id(); + + print_suspicious_vregs("fpsimd_restore_current_state", st); } local_bh_enable(); @@ -290,6 +349,8 @@ void kernel_neon_begin(void) /* Invalidate any task state remaining in the fpsimd regs: */ __this_cpu_write(fpsimd_last_state, NULL); + print_suspicious_vregs("kernel_neon_begin", ¤t->thread.fpsimd_state); + preempt_disable(); local_bh_enable(); diff --git a/arch/arm64/kernel/sys.c b/arch/arm64/kernel/sys.c index 26fe8ea..ab73b51 100644 --- a/arch/arm64/kernel/sys.c +++ b/arch/arm64/kernel/sys.c @@ -26,6 +26,8 @@ #include <linux/slab.h> #include <linux/syscalls.h> #include <asm/cpufeature.h> +#include <asm/neon.h> +#include <asm/simd.h> asmlinkage long sys_mmap(unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, @@ -45,6 +47,62 @@ SYSCALL_DEFINE1(arm64_personality, unsigned int, personality) return sys_personality(personality); } +SYSCALL_DEFINE0(blatt_neon) +{ + int ret = 0; + unsigned int i; + + BUG_ON(!may_use_simd()); + + kernel_neon_begin(); + + for (i = 0; i < 1000; ++i) { + asm volatile ( + "movi v0.16b, #0\n\t" + "movi v1.16b, #0\n\t" + "movi v2.16b, #0\n\t" + "movi v3.16b, #0\n\t" + "movi v4.16b, #0\n\t" + "movi v5.16b, #0\n\t" + "movi v6.16b, #0\n\t" + "movi v7.16b, #0\n\t" + "movi v8.16b, #0\n\t" + "movi v9.16b, #0\n\t" + "movi v10.16b, #0\n\t" + "movi v11.16b, #0\n\t" + "movi v12.16b, #0\n\t" + "movi v13.16b, #0\n\t" + "movi v14.16b, #0\n\t" + "movi v15.16b, #0\n\t" + "movi v16.16b, #0\n\t" + "movi v17.16b, #0\n\t" + "movi v18.16b, #0\n\t" + "movi v19.16b, #0\n\t" + "movi v20.16b, #0\n\t" + "movi v21.16b, #0\n\t" + "movi v22.16b, #0\n\t" + "movi v23.16b, #0\n\t" + "movi v24.16b, #0\n\t" + "movi v25.16b, #0\n\t" + "movi v26.16b, #0\n\t" + "movi v27.16b, #0\n\t" + "movi v28.16b, #0\n\t" + "movi v29.16b, #0\n\t" + "movi v30.16b, #0\n\t" + "movi v31.16b, #0" + ); + + if (test_thread_flag(TIF_SIGPENDING)) { + ret = -EINTR; + break; + } + } + + kernel_neon_end(); + + return ret; +} + /* * Wrappers to pass the pt_regs argument. */ diff --git a/blatt-neon.c b/blatt-neon.c new file mode 100644 index 0000000..fafb144 --- /dev/null +++ b/blatt-neon.c @@ -0,0 +1,31 @@ +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <asm/unistd.h> + +static long blatt_neon(void) +{ + register long ret asm("x0"); + register unsigned int scno asm("x8") = __NR_blatt_neon; + + asm volatile ( + "svc #0" + : "=r" (ret) + : "r" (scno) + ); + + return ret; +} + +int main(void) +{ + long ret; + + while (1) { + ret = blatt_neon(); + if (ret) { + fprintf(stderr, "%s\n", strerror(ret)); + return EXIT_FAILURE; + } + } +} diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index 061185a..bf2d3e9 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -732,8 +732,11 @@ __SYSCALL(__NR_pkey_free, sys_pkey_free) #define __NR_statx 291 __SYSCALL(__NR_statx, sys_statx) +#define __NR_blatt_neon 292 +__SYSCALL(__NR_blatt_neon, sys_blatt_neon) + #undef __NR_syscalls -#define __NR_syscalls 292 +#define __NR_syscalls 293 /* * All syscalls below here should go away really, diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 152a706..dc79771 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -34,6 +34,7 @@ #include <linux/posix-timers.h> #include <linux/cpu.h> #include <linux/syscalls.h> +#include <linux/debugfs.h> #include <linux/delay.h> #include <linux/tick.h> #include <linux/kallsyms.h> @@ -50,6 +51,8 @@ #include <asm/div64.h> #include <asm/timex.h> #include <asm/io.h> +#include <asm/neon.h> +#include <asm/simd.h> #include "tick-internal.h" @@ -1604,6 +1607,21 @@ static inline void __run_timers(struct timer_base *base) spin_unlock_irq(&base->lock); } +static u64 neon_busy_count, neon_idle_count; + +static int __init timer_softirq_debugfs_init(void) +{ + struct dentry *dir = debugfs_create_dir("neon-softirq", NULL); + + if (!dir || + !debugfs_create_u64("busy", 0444, dir, &neon_busy_count) || + !debugfs_create_u64("idle", 0444, dir, &neon_idle_count)) + WARN_ON(1); + + return 0; +} +early_initcall(timer_softirq_debugfs_init); + /* * This function runs timers and the timer-tq in bottom half context. */ @@ -1611,6 +1629,60 @@ static __latent_entropy void run_timer_softirq(struct softirq_action *h) { struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]); + BUG_ON(preemptible()); + BUG_ON(!softirq_count()); + + if (!may_use_simd()) + neon_busy_count++; + else /* may_use_simd() */ { + static u8 n = 0xc2; + + neon_idle_count++; + + kernel_neon_begin(); + asm volatile ( + "dup v0.16b, %w0\n\t" + "dup v1.16b, %w0\n\t" + "dup v2.16b, %w0\n\t" + "dup v3.16b, %w0\n\t" + "dup v4.16b, %w0\n\t" + "dup v5.16b, %w0\n\t" + "dup v6.16b, %w0\n\t" + "dup v7.16b, %w0\n\t" + "dup v8.16b, %w0\n\t" + "dup v9.16b, %w0\n\t" + "dup v10.16b, %w0\n\t" + "dup v11.16b, %w0\n\t" + "dup v12.16b, %w0\n\t" + "dup v13.16b, %w0\n\t" + "dup v14.16b, %w0\n\t" + "dup v15.16b, %w0\n\t" + "dup v16.16b, %w0\n\t" + "dup v17.16b, %w0\n\t" + "dup v18.16b, %w0\n\t" + "dup v19.16b, %w0\n\t" + "dup v20.16b, %w0\n\t" + "dup v21.16b, %w0\n\t" + "dup v22.16b, %w0\n\t" + "dup v23.16b, %w0\n\t" + "dup v24.16b, %w0\n\t" + "dup v25.16b, %w0\n\t" + "dup v26.16b, %w0\n\t" + "dup v27.16b, %w0\n\t" + "dup v28.16b, %w0\n\t" + "dup v29.16b, %w0\n\t" + "dup v30.16b, %w0\n\t" + "dup v31.16b, %w0" + :: "r" (n) + ); + + ++n; + if (n == 0xc8) + n = 0xc2; + + kernel_neon_end(); + } /* may_use_simd() */ + __run_timers(base); if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && base->nohz_active) __run_timers(this_cpu_ptr(&timer_bases[BASE_DEF])); -- 2.1.4 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [RFC PATCH v3 1/4] arm64: neon: Add missing header guard in <asm/neon.h> 2017-05-25 18:24 [RFC PATCH v3 0/4] Simplify kernel-mode NEON Dave Martin @ 2017-05-25 18:24 ` Dave Martin 2017-05-31 11:41 ` Ard Biesheuvel 2017-05-25 18:24 ` [RFC PATCH v3 2/4] arm64: fpsimd: Consistently use __this_cpu_ ops where appropriate Dave Martin ` (3 subsequent siblings) 4 siblings, 1 reply; 13+ messages in thread From: Dave Martin @ 2017-05-25 18:24 UTC (permalink / raw) To: linux-arm-kernel asm/neon.h doesn't have a header inclusion guard, but it should have one for consistency with other headers. This patch adds a suitable guard. Signed-off-by: Dave Martin <Dave.Martin@arm.com> --- arch/arm64/include/asm/neon.h | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/arch/arm64/include/asm/neon.h b/arch/arm64/include/asm/neon.h index ad4cdc9..5368bd0 100644 --- a/arch/arm64/include/asm/neon.h +++ b/arch/arm64/include/asm/neon.h @@ -8,6 +8,9 @@ * published by the Free Software Foundation. */ +#ifndef __ASM_NEON_H +#define __ASM_NEON_H + #include <linux/types.h> #include <asm/fpsimd.h> @@ -17,3 +20,5 @@ void kernel_neon_begin_partial(u32 num_regs); void kernel_neon_end(void); + +#endif /* ! __ASM_NEON_H */ -- 2.1.4 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [RFC PATCH v3 1/4] arm64: neon: Add missing header guard in <asm/neon.h> 2017-05-25 18:24 ` [RFC PATCH v3 1/4] arm64: neon: Add missing header guard in <asm/neon.h> Dave Martin @ 2017-05-31 11:41 ` Ard Biesheuvel 0 siblings, 0 replies; 13+ messages in thread From: Ard Biesheuvel @ 2017-05-31 11:41 UTC (permalink / raw) To: linux-arm-kernel On 25 May 2017 at 18:24, Dave Martin <Dave.Martin@arm.com> wrote: > asm/neon.h doesn't have a header inclusion guard, but it should > have one for consistency with other headers. > > This patch adds a suitable guard. > > Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > --- > arch/arm64/include/asm/neon.h | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/arch/arm64/include/asm/neon.h b/arch/arm64/include/asm/neon.h > index ad4cdc9..5368bd0 100644 > --- a/arch/arm64/include/asm/neon.h > +++ b/arch/arm64/include/asm/neon.h > @@ -8,6 +8,9 @@ > * published by the Free Software Foundation. > */ > > +#ifndef __ASM_NEON_H > +#define __ASM_NEON_H > + > #include <linux/types.h> > #include <asm/fpsimd.h> > > @@ -17,3 +20,5 @@ > > void kernel_neon_begin_partial(u32 num_regs); > void kernel_neon_end(void); > + > +#endif /* ! __ASM_NEON_H */ > -- > 2.1.4 > ^ permalink raw reply [flat|nested] 13+ messages in thread
* [RFC PATCH v3 2/4] arm64: fpsimd: Consistently use __this_cpu_ ops where appropriate 2017-05-25 18:24 [RFC PATCH v3 0/4] Simplify kernel-mode NEON Dave Martin 2017-05-25 18:24 ` [RFC PATCH v3 1/4] arm64: neon: Add missing header guard in <asm/neon.h> Dave Martin @ 2017-05-25 18:24 ` Dave Martin 2017-05-31 11:43 ` Ard Biesheuvel 2017-05-25 18:25 ` [RFC PATCH v3 3/4] arm64: neon: Remove support for nested or hardirq kernel-mode NEON Dave Martin ` (2 subsequent siblings) 4 siblings, 1 reply; 13+ messages in thread From: Dave Martin @ 2017-05-25 18:24 UTC (permalink / raw) To: linux-arm-kernel __this_cpu_ ops are not used consistently with regard to this_cpu_ ops in a couple of places in fpsimd.c. Since preemption is explicitly disabled in fpsimd_restore_current_state() and fpsimd_update_current_state(), this patch converts this_cpu_ ops in those functions to __this_cpu_ ops. This doesn't save cost on arm64, but benefits from additional assertions in the core code. Signed-off-by: Dave Martin <Dave.Martin@arm.com> --- arch/arm64/kernel/fpsimd.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 06da8ea..d7e5f8a 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -194,7 +194,7 @@ void fpsimd_restore_current_state(void) struct fpsimd_state *st = ¤t->thread.fpsimd_state; fpsimd_load_state(st); - this_cpu_write(fpsimd_last_state, st); + __this_cpu_write(fpsimd_last_state, st); st->cpu = smp_processor_id(); } preempt_enable(); @@ -214,7 +214,7 @@ void fpsimd_update_current_state(struct fpsimd_state *state) if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) { struct fpsimd_state *st = ¤t->thread.fpsimd_state; - this_cpu_write(fpsimd_last_state, st); + __this_cpu_write(fpsimd_last_state, st); st->cpu = smp_processor_id(); } preempt_enable(); -- 2.1.4 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [RFC PATCH v3 2/4] arm64: fpsimd: Consistently use __this_cpu_ ops where appropriate 2017-05-25 18:24 ` [RFC PATCH v3 2/4] arm64: fpsimd: Consistently use __this_cpu_ ops where appropriate Dave Martin @ 2017-05-31 11:43 ` Ard Biesheuvel 0 siblings, 0 replies; 13+ messages in thread From: Ard Biesheuvel @ 2017-05-31 11:43 UTC (permalink / raw) To: linux-arm-kernel On 25 May 2017 at 18:24, Dave Martin <Dave.Martin@arm.com> wrote: > __this_cpu_ ops are not used consistently with regard to this_cpu_ > ops in a couple of places in fpsimd.c. > > Since preemption is explicitly disabled in > fpsimd_restore_current_state() and fpsimd_update_current_state(), > this patch converts this_cpu_ ops in those functions to __this_cpu_ > ops. This doesn't save cost on arm64, but benefits from additional > assertions in the core code. > > Signed-off-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > --- > arch/arm64/kernel/fpsimd.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c > index 06da8ea..d7e5f8a 100644 > --- a/arch/arm64/kernel/fpsimd.c > +++ b/arch/arm64/kernel/fpsimd.c > @@ -194,7 +194,7 @@ void fpsimd_restore_current_state(void) > struct fpsimd_state *st = ¤t->thread.fpsimd_state; > > fpsimd_load_state(st); > - this_cpu_write(fpsimd_last_state, st); > + __this_cpu_write(fpsimd_last_state, st); > st->cpu = smp_processor_id(); > } > preempt_enable(); > @@ -214,7 +214,7 @@ void fpsimd_update_current_state(struct fpsimd_state *state) > if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) { > struct fpsimd_state *st = ¤t->thread.fpsimd_state; > > - this_cpu_write(fpsimd_last_state, st); > + __this_cpu_write(fpsimd_last_state, st); > st->cpu = smp_processor_id(); > } > preempt_enable(); > -- > 2.1.4 > ^ permalink raw reply [flat|nested] 13+ messages in thread
* [RFC PATCH v3 3/4] arm64: neon: Remove support for nested or hardirq kernel-mode NEON 2017-05-25 18:24 [RFC PATCH v3 0/4] Simplify kernel-mode NEON Dave Martin 2017-05-25 18:24 ` [RFC PATCH v3 1/4] arm64: neon: Add missing header guard in <asm/neon.h> Dave Martin 2017-05-25 18:24 ` [RFC PATCH v3 2/4] arm64: fpsimd: Consistently use __this_cpu_ ops where appropriate Dave Martin @ 2017-05-25 18:25 ` Dave Martin 2017-05-31 11:51 ` Ard Biesheuvel 2017-05-25 18:25 ` [RFC PATCH v3 4/4] arm64: neon: Add backwards compatibility kernel_neon_begin_partial() Dave Martin 2017-05-30 18:02 ` [RFC PATCH v3 0/4] Simplify kernel-mode NEON Dave Martin 4 siblings, 1 reply; 13+ messages in thread From: Dave Martin @ 2017-05-25 18:25 UTC (permalink / raw) To: linux-arm-kernel Support for kernel-mode NEON to be nested and/or used in hardirq context adds significant complexity, and the benefits may be marginal. In practice, kernel-mode NEON is not used in hardirq context, and is rarely used in softirq context (by certain mac80211 drivers). This patch implements an arm64 may_use_simd() function to allow clients to check whether kernel-mode NEON is usable in the current context, and simplifies kernel_neon_{begin,end}() to handle only saving of the task FPSIMD state (if any). Without nesting, there is no other state to save. The partial fpsimd save/restore functions become redundant as a result of these changes, so they are removed too. The save/restore model is changed to operate directly on task_struct without additional percpu storage. This simplifies the code and saves a bit of memory, but means that softirqs must now be disabled when manipulating the task fpsimd state from task context: correspondingly, preempt_{en,dis}sable() calls are upgraded to local_bh_{en,dis}able() as appropriate. fpsimd_thread_switch() already runs with hardirqs disabled and so is already protected from softirqs. These changes should make it easier to support kernel-mode NEON in the presence of the Scalable Vector extension in the future. Signed-off-by: Dave Martin <Dave.Martin@arm.com> --- arch/arm64/include/asm/fpsimd.h | 14 ----- arch/arm64/include/asm/fpsimdmacros.h | 56 ----------------- arch/arm64/include/asm/neon.h | 4 +- arch/arm64/include/asm/simd.h | 56 +++++++++++++++++ arch/arm64/kernel/entry-fpsimd.S | 24 ------- arch/arm64/kernel/fpsimd.c | 115 +++++++++++++++++++++++----------- 6 files changed, 136 insertions(+), 133 deletions(-) create mode 100644 arch/arm64/include/asm/simd.h diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index 50f559f..ff2f6cd 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -41,16 +41,6 @@ struct fpsimd_state { unsigned int cpu; }; -/* - * Struct for stacking the bottom 'n' FP/SIMD registers. - */ -struct fpsimd_partial_state { - u32 fpsr; - u32 fpcr; - u32 num_regs; - __uint128_t vregs[32]; -}; - #if defined(__KERNEL__) && defined(CONFIG_COMPAT) /* Masks for extracting the FPSR and FPCR from the FPSCR */ @@ -77,10 +67,6 @@ extern void fpsimd_update_current_state(struct fpsimd_state *state); extern void fpsimd_flush_task_state(struct task_struct *target); -extern void fpsimd_save_partial_state(struct fpsimd_partial_state *state, - u32 num_regs); -extern void fpsimd_load_partial_state(struct fpsimd_partial_state *state); - #endif #endif diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h index a2daf12..0f5fdd3 100644 --- a/arch/arm64/include/asm/fpsimdmacros.h +++ b/arch/arm64/include/asm/fpsimdmacros.h @@ -75,59 +75,3 @@ ldr w\tmpnr, [\state, #16 * 2 + 4] fpsimd_restore_fpcr x\tmpnr, \state .endm - -.macro fpsimd_save_partial state, numnr, tmpnr1, tmpnr2 - mrs x\tmpnr1, fpsr - str w\numnr, [\state, #8] - mrs x\tmpnr2, fpcr - stp w\tmpnr1, w\tmpnr2, [\state] - adr x\tmpnr1, 0f - add \state, \state, x\numnr, lsl #4 - sub x\tmpnr1, x\tmpnr1, x\numnr, lsl #1 - br x\tmpnr1 - stp q30, q31, [\state, #-16 * 30 - 16] - stp q28, q29, [\state, #-16 * 28 - 16] - stp q26, q27, [\state, #-16 * 26 - 16] - stp q24, q25, [\state, #-16 * 24 - 16] - stp q22, q23, [\state, #-16 * 22 - 16] - stp q20, q21, [\state, #-16 * 20 - 16] - stp q18, q19, [\state, #-16 * 18 - 16] - stp q16, q17, [\state, #-16 * 16 - 16] - stp q14, q15, [\state, #-16 * 14 - 16] - stp q12, q13, [\state, #-16 * 12 - 16] - stp q10, q11, [\state, #-16 * 10 - 16] - stp q8, q9, [\state, #-16 * 8 - 16] - stp q6, q7, [\state, #-16 * 6 - 16] - stp q4, q5, [\state, #-16 * 4 - 16] - stp q2, q3, [\state, #-16 * 2 - 16] - stp q0, q1, [\state, #-16 * 0 - 16] -0: -.endm - -.macro fpsimd_restore_partial state, tmpnr1, tmpnr2 - ldp w\tmpnr1, w\tmpnr2, [\state] - msr fpsr, x\tmpnr1 - fpsimd_restore_fpcr x\tmpnr2, x\tmpnr1 - adr x\tmpnr1, 0f - ldr w\tmpnr2, [\state, #8] - add \state, \state, x\tmpnr2, lsl #4 - sub x\tmpnr1, x\tmpnr1, x\tmpnr2, lsl #1 - br x\tmpnr1 - ldp q30, q31, [\state, #-16 * 30 - 16] - ldp q28, q29, [\state, #-16 * 28 - 16] - ldp q26, q27, [\state, #-16 * 26 - 16] - ldp q24, q25, [\state, #-16 * 24 - 16] - ldp q22, q23, [\state, #-16 * 22 - 16] - ldp q20, q21, [\state, #-16 * 20 - 16] - ldp q18, q19, [\state, #-16 * 18 - 16] - ldp q16, q17, [\state, #-16 * 16 - 16] - ldp q14, q15, [\state, #-16 * 14 - 16] - ldp q12, q13, [\state, #-16 * 12 - 16] - ldp q10, q11, [\state, #-16 * 10 - 16] - ldp q8, q9, [\state, #-16 * 8 - 16] - ldp q6, q7, [\state, #-16 * 6 - 16] - ldp q4, q5, [\state, #-16 * 4 - 16] - ldp q2, q3, [\state, #-16 * 2 - 16] - ldp q0, q1, [\state, #-16 * 0 - 16] -0: -.endm diff --git a/arch/arm64/include/asm/neon.h b/arch/arm64/include/asm/neon.h index 5368bd0..fb9d137 100644 --- a/arch/arm64/include/asm/neon.h +++ b/arch/arm64/include/asm/neon.h @@ -16,9 +16,7 @@ #define cpu_has_neon() system_supports_fpsimd() -#define kernel_neon_begin() kernel_neon_begin_partial(32) - -void kernel_neon_begin_partial(u32 num_regs); +void kernel_neon_begin(void); void kernel_neon_end(void); #endif /* ! __ASM_NEON_H */ diff --git a/arch/arm64/include/asm/simd.h b/arch/arm64/include/asm/simd.h new file mode 100644 index 0000000..bbfc68f --- /dev/null +++ b/arch/arm64/include/asm/simd.h @@ -0,0 +1,56 @@ +/* + * Copyright (C) 2017 ARM Limited + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ +#ifndef __ASM_SIMD_H +#define __ASM_SIMD_H + +#include <linux/percpu.h> +#include <linux/preempt.h> +#include <linux/types.h> + +#ifdef CONFIG_KERNEL_MODE_NEON + +DECLARE_PER_CPU(bool, kernel_neon_busy); + +/* + * Returns true if and only if it is safe to call kernel_neon_begin(). + * Callers must not assume that the result remains true beyond the next + * preempt_enable() or return from softirq context. + */ +static inline bool may_use_simd(void) +{ + /* + * The raw_cpu_read() is racy if called with preemption enabled. + * This is not a bug: kernel_neon_busy is only set when + * preemption is disabled, so we cannot migrate to another CPU + * while it is set, nor can we migrate to a CPU where it is set. + * So, if we find it clear on some CPU then we're guaranteed to + * find it clear on any CPU we could migrate to. + * + * If we are in between kernel_neon_begin()...kernel_neon_end(), + * the flag will be set, but preemption is also disabled, so we + * can't migrate to another CPU and spuriously see it become + * false. + */ + return !in_irq() && !in_nmi() && !raw_cpu_read(kernel_neon_busy); +} + +#else /* ! CONFIG_KERNEL_MODE_NEON */ + +static inline bool may_use_simd(void) { return false; } + +#endif /* ! CONFIG_KERNEL_MODE_NEON */ + +#endif /* ! __ASM_SIMD_H */ diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S index c44a82f..6a27cd6 100644 --- a/arch/arm64/kernel/entry-fpsimd.S +++ b/arch/arm64/kernel/entry-fpsimd.S @@ -41,27 +41,3 @@ ENTRY(fpsimd_load_state) fpsimd_restore x0, 8 ret ENDPROC(fpsimd_load_state) - -#ifdef CONFIG_KERNEL_MODE_NEON - -/* - * Save the bottom n FP registers. - * - * x0 - pointer to struct fpsimd_partial_state - */ -ENTRY(fpsimd_save_partial_state) - fpsimd_save_partial x0, 1, 8, 9 - ret -ENDPROC(fpsimd_save_partial_state) - -/* - * Load the bottom n FP registers. - * - * x0 - pointer to struct fpsimd_partial_state - */ -ENTRY(fpsimd_load_partial_state) - fpsimd_restore_partial x0, 8, 9 - ret -ENDPROC(fpsimd_load_partial_state) - -#endif diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index d7e5f8a..14329d6 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -17,16 +17,18 @@ * along with this program. If not, see <http://www.gnu.org/licenses/>. */ +#include <linux/bottom_half.h> #include <linux/cpu.h> #include <linux/cpu_pm.h> #include <linux/kernel.h> #include <linux/init.h> +#include <linux/percpu.h> #include <linux/sched/signal.h> #include <linux/signal.h> -#include <linux/hardirq.h> #include <asm/fpsimd.h> #include <asm/cputype.h> +#include <asm/simd.h> #define FPEXC_IOF (1 << 0) #define FPEXC_DZF (1 << 1) @@ -62,6 +64,13 @@ * CPU currently contain the most recent userland FPSIMD state of the current * task. * + * In order to allow softirq handlers to use FPSIMD, kernel_neon_begin() may + * save the task's FPSIMD context back to task_struct from softirq context. + * To prevent this from racing with the manipulation of the task's FPSIMD state + * from task context and thereby corrupting the state, it is necessary to + * protect any manipulation of a task's fpsimd_state or TIF_FOREIGN_FPSTATE + * flag with local_bh_disable() unless softirqs are already masked. + * * For a certain task, the sequence may look something like this: * - the task gets scheduled in; if both the task's fpsimd_state.cpu field * contains the id of the current CPU, and the CPU's fpsimd_last_state per-cpu @@ -161,9 +170,14 @@ void fpsimd_flush_thread(void) { if (!system_supports_fpsimd()) return; + + local_bh_disable(); + memset(¤t->thread.fpsimd_state, 0, sizeof(struct fpsimd_state)); fpsimd_flush_task_state(current); set_thread_flag(TIF_FOREIGN_FPSTATE); + + local_bh_enable(); } /* @@ -174,10 +188,13 @@ void fpsimd_preserve_current_state(void) { if (!system_supports_fpsimd()) return; - preempt_disable(); + + local_bh_disable(); + if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) fpsimd_save_state(¤t->thread.fpsimd_state); - preempt_enable(); + + local_bh_enable(); } /* @@ -189,7 +206,9 @@ void fpsimd_restore_current_state(void) { if (!system_supports_fpsimd()) return; - preempt_disable(); + + local_bh_disable(); + if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) { struct fpsimd_state *st = ¤t->thread.fpsimd_state; @@ -197,7 +216,8 @@ void fpsimd_restore_current_state(void) __this_cpu_write(fpsimd_last_state, st); st->cpu = smp_processor_id(); } - preempt_enable(); + + local_bh_enable(); } /* @@ -209,7 +229,9 @@ void fpsimd_update_current_state(struct fpsimd_state *state) { if (!system_supports_fpsimd()) return; - preempt_disable(); + + local_bh_disable(); + fpsimd_load_state(state); if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) { struct fpsimd_state *st = ¤t->thread.fpsimd_state; @@ -217,7 +239,8 @@ void fpsimd_update_current_state(struct fpsimd_state *state) __this_cpu_write(fpsimd_last_state, st); st->cpu = smp_processor_id(); } - preempt_enable(); + + local_bh_enable(); } /* @@ -230,49 +253,69 @@ void fpsimd_flush_task_state(struct task_struct *t) #ifdef CONFIG_KERNEL_MODE_NEON -static DEFINE_PER_CPU(struct fpsimd_partial_state, hardirq_fpsimdstate); -static DEFINE_PER_CPU(struct fpsimd_partial_state, softirq_fpsimdstate); +DEFINE_PER_CPU(bool, kernel_neon_busy); /* * Kernel-side NEON support functions */ -void kernel_neon_begin_partial(u32 num_regs) + +/* + * kernel_neon_begin(): obtain the CPU FPSIMD registers for use by the calling + * context + * + * Must not be called unless may_use_simd() returns true. + * Task context in the FPSIMD registers is saved back to memory as necessary. + * + * A matching call to kernel_neon_end() must be made before returning from the + * calling context. + * + * The caller may freely use the FPSIMD registers until kernel_neon_end() is + * called. + */ +void kernel_neon_begin(void) { if (WARN_ON(!system_supports_fpsimd())) return; - if (in_interrupt()) { - struct fpsimd_partial_state *s = this_cpu_ptr( - in_irq() ? &hardirq_fpsimdstate : &softirq_fpsimdstate); - BUG_ON(num_regs > 32); - fpsimd_save_partial_state(s, roundup(num_regs, 2)); - } else { - /* - * Save the userland FPSIMD state if we have one and if we - * haven't done so already. Clear fpsimd_last_state to indicate - * that there is no longer userland FPSIMD state in the - * registers. - */ - preempt_disable(); - if (current->mm && - !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE)) - fpsimd_save_state(¤t->thread.fpsimd_state); - this_cpu_write(fpsimd_last_state, NULL); - } + BUG_ON(!may_use_simd()); + + local_bh_disable(); + + __this_cpu_write(kernel_neon_busy, true); + + /* Save unsaved task fpsimd state, if any: */ + if (current->mm && !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE)) + fpsimd_save_state(¤t->thread.fpsimd_state); + + /* Invalidate any task state remaining in the fpsimd regs: */ + __this_cpu_write(fpsimd_last_state, NULL); + + preempt_disable(); + + local_bh_enable(); } -EXPORT_SYMBOL(kernel_neon_begin_partial); +EXPORT_SYMBOL(kernel_neon_begin); +/* + * kernel_neon_end(): give the CPU FPSIMD registers back to the current task + * + * Must be called from a context in which kernel_neon_begin() was previously + * called, with no call to kernel_neon_end() in the meantime. + * + * The caller must not use the FPSIMD registers after this function is called, + * unless kernel_neon_begin() is called again in the meantime. + */ void kernel_neon_end(void) { + bool busy; + if (!system_supports_fpsimd()) return; - if (in_interrupt()) { - struct fpsimd_partial_state *s = this_cpu_ptr( - in_irq() ? &hardirq_fpsimdstate : &softirq_fpsimdstate); - fpsimd_load_partial_state(s); - } else { - preempt_enable(); - } + + busy = __this_cpu_xchg(kernel_neon_busy, false); + WARN_ON(!busy); /* No matching kernel_neon_begin()? */ + + preempt_enable(); } EXPORT_SYMBOL(kernel_neon_end); -- 2.1.4 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [RFC PATCH v3 3/4] arm64: neon: Remove support for nested or hardirq kernel-mode NEON 2017-05-25 18:25 ` [RFC PATCH v3 3/4] arm64: neon: Remove support for nested or hardirq kernel-mode NEON Dave Martin @ 2017-05-31 11:51 ` Ard Biesheuvel 0 siblings, 0 replies; 13+ messages in thread From: Ard Biesheuvel @ 2017-05-31 11:51 UTC (permalink / raw) To: linux-arm-kernel On 25 May 2017 at 18:25, Dave Martin <Dave.Martin@arm.com> wrote: > Support for kernel-mode NEON to be nested and/or used in hardirq > context adds significant complexity, and the benefits may be > marginal. In practice, kernel-mode NEON is not used in hardirq > context, and is rarely used in softirq context (by certain mac80211 > drivers). > To clarify: it is actually the core mac80211 code that invokes the AES crypto routines, but only if the crypto is not performed by the hardware. > This patch implements an arm64 may_use_simd() function to allow > clients to check whether kernel-mode NEON is usable in the current > context, and simplifies kernel_neon_{begin,end}() to handle only > saving of the task FPSIMD state (if any). Without nesting, there > is no other state to save. > > The partial fpsimd save/restore functions become redundant as a > result of these changes, so they are removed too. > > The save/restore model is changed to operate directly on > task_struct without additional percpu storage. This simplifies the > code and saves a bit of memory, but means that softirqs must now be > disabled when manipulating the task fpsimd state from task context: > correspondingly, preempt_{en,dis}sable() calls are upgraded to > local_bh_{en,dis}able() as appropriate. I don't think the latter implies the former: please look at the difference between SOFTIRQ_DISABLE_OFFSET and SOFTIRQ_LOCK_OFFSET > fpsimd_thread_switch() > already runs with hardirqs disabled and so is already protected > from softirqs. > > These changes should make it easier to support kernel-mode NEON in > the presence of the Scalable Vector extension in the future. > > Signed-off-by: Dave Martin <Dave.Martin@arm.com> > --- > arch/arm64/include/asm/fpsimd.h | 14 ----- > arch/arm64/include/asm/fpsimdmacros.h | 56 ----------------- > arch/arm64/include/asm/neon.h | 4 +- > arch/arm64/include/asm/simd.h | 56 +++++++++++++++++ > arch/arm64/kernel/entry-fpsimd.S | 24 ------- > arch/arm64/kernel/fpsimd.c | 115 +++++++++++++++++++++++----------- > 6 files changed, 136 insertions(+), 133 deletions(-) > create mode 100644 arch/arm64/include/asm/simd.h > > diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h > index 50f559f..ff2f6cd 100644 > --- a/arch/arm64/include/asm/fpsimd.h > +++ b/arch/arm64/include/asm/fpsimd.h > @@ -41,16 +41,6 @@ struct fpsimd_state { > unsigned int cpu; > }; > > -/* > - * Struct for stacking the bottom 'n' FP/SIMD registers. > - */ > -struct fpsimd_partial_state { > - u32 fpsr; > - u32 fpcr; > - u32 num_regs; > - __uint128_t vregs[32]; > -}; > - > > #if defined(__KERNEL__) && defined(CONFIG_COMPAT) > /* Masks for extracting the FPSR and FPCR from the FPSCR */ > @@ -77,10 +67,6 @@ extern void fpsimd_update_current_state(struct fpsimd_state *state); > > extern void fpsimd_flush_task_state(struct task_struct *target); > > -extern void fpsimd_save_partial_state(struct fpsimd_partial_state *state, > - u32 num_regs); > -extern void fpsimd_load_partial_state(struct fpsimd_partial_state *state); > - > #endif > > #endif > diff --git a/arch/arm64/include/asm/fpsimdmacros.h b/arch/arm64/include/asm/fpsimdmacros.h > index a2daf12..0f5fdd3 100644 > --- a/arch/arm64/include/asm/fpsimdmacros.h > +++ b/arch/arm64/include/asm/fpsimdmacros.h > @@ -75,59 +75,3 @@ > ldr w\tmpnr, [\state, #16 * 2 + 4] > fpsimd_restore_fpcr x\tmpnr, \state > .endm > - > -.macro fpsimd_save_partial state, numnr, tmpnr1, tmpnr2 > - mrs x\tmpnr1, fpsr > - str w\numnr, [\state, #8] > - mrs x\tmpnr2, fpcr > - stp w\tmpnr1, w\tmpnr2, [\state] > - adr x\tmpnr1, 0f > - add \state, \state, x\numnr, lsl #4 > - sub x\tmpnr1, x\tmpnr1, x\numnr, lsl #1 > - br x\tmpnr1 > - stp q30, q31, [\state, #-16 * 30 - 16] > - stp q28, q29, [\state, #-16 * 28 - 16] > - stp q26, q27, [\state, #-16 * 26 - 16] > - stp q24, q25, [\state, #-16 * 24 - 16] > - stp q22, q23, [\state, #-16 * 22 - 16] > - stp q20, q21, [\state, #-16 * 20 - 16] > - stp q18, q19, [\state, #-16 * 18 - 16] > - stp q16, q17, [\state, #-16 * 16 - 16] > - stp q14, q15, [\state, #-16 * 14 - 16] > - stp q12, q13, [\state, #-16 * 12 - 16] > - stp q10, q11, [\state, #-16 * 10 - 16] > - stp q8, q9, [\state, #-16 * 8 - 16] > - stp q6, q7, [\state, #-16 * 6 - 16] > - stp q4, q5, [\state, #-16 * 4 - 16] > - stp q2, q3, [\state, #-16 * 2 - 16] > - stp q0, q1, [\state, #-16 * 0 - 16] > -0: > -.endm > - > -.macro fpsimd_restore_partial state, tmpnr1, tmpnr2 > - ldp w\tmpnr1, w\tmpnr2, [\state] > - msr fpsr, x\tmpnr1 > - fpsimd_restore_fpcr x\tmpnr2, x\tmpnr1 > - adr x\tmpnr1, 0f > - ldr w\tmpnr2, [\state, #8] > - add \state, \state, x\tmpnr2, lsl #4 > - sub x\tmpnr1, x\tmpnr1, x\tmpnr2, lsl #1 > - br x\tmpnr1 > - ldp q30, q31, [\state, #-16 * 30 - 16] > - ldp q28, q29, [\state, #-16 * 28 - 16] > - ldp q26, q27, [\state, #-16 * 26 - 16] > - ldp q24, q25, [\state, #-16 * 24 - 16] > - ldp q22, q23, [\state, #-16 * 22 - 16] > - ldp q20, q21, [\state, #-16 * 20 - 16] > - ldp q18, q19, [\state, #-16 * 18 - 16] > - ldp q16, q17, [\state, #-16 * 16 - 16] > - ldp q14, q15, [\state, #-16 * 14 - 16] > - ldp q12, q13, [\state, #-16 * 12 - 16] > - ldp q10, q11, [\state, #-16 * 10 - 16] > - ldp q8, q9, [\state, #-16 * 8 - 16] > - ldp q6, q7, [\state, #-16 * 6 - 16] > - ldp q4, q5, [\state, #-16 * 4 - 16] > - ldp q2, q3, [\state, #-16 * 2 - 16] > - ldp q0, q1, [\state, #-16 * 0 - 16] > -0: > -.endm > diff --git a/arch/arm64/include/asm/neon.h b/arch/arm64/include/asm/neon.h > index 5368bd0..fb9d137 100644 > --- a/arch/arm64/include/asm/neon.h > +++ b/arch/arm64/include/asm/neon.h > @@ -16,9 +16,7 @@ > > #define cpu_has_neon() system_supports_fpsimd() > > -#define kernel_neon_begin() kernel_neon_begin_partial(32) > - > -void kernel_neon_begin_partial(u32 num_regs); > +void kernel_neon_begin(void); > void kernel_neon_end(void); > > #endif /* ! __ASM_NEON_H */ > diff --git a/arch/arm64/include/asm/simd.h b/arch/arm64/include/asm/simd.h > new file mode 100644 > index 0000000..bbfc68f > --- /dev/null > +++ b/arch/arm64/include/asm/simd.h > @@ -0,0 +1,56 @@ > +/* > + * Copyright (C) 2017 ARM Limited > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License version 2 as > + * published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program. If not, see <http://www.gnu.org/licenses/>. > + */ > +#ifndef __ASM_SIMD_H > +#define __ASM_SIMD_H > + > +#include <linux/percpu.h> > +#include <linux/preempt.h> > +#include <linux/types.h> > + > +#ifdef CONFIG_KERNEL_MODE_NEON > + > +DECLARE_PER_CPU(bool, kernel_neon_busy); > + > +/* > + * Returns true if and only if it is safe to call kernel_neon_begin(). > + * Callers must not assume that the result remains true beyond the next > + * preempt_enable() or return from softirq context. > + */ > +static inline bool may_use_simd(void) > +{ > + /* > + * The raw_cpu_read() is racy if called with preemption enabled. > + * This is not a bug: kernel_neon_busy is only set when > + * preemption is disabled, so we cannot migrate to another CPU > + * while it is set, nor can we migrate to a CPU where it is set. > + * So, if we find it clear on some CPU then we're guaranteed to > + * find it clear on any CPU we could migrate to. > + * > + * If we are in between kernel_neon_begin()...kernel_neon_end(), > + * the flag will be set, but preemption is also disabled, so we > + * can't migrate to another CPU and spuriously see it become > + * false. > + */ > + return !in_irq() && !in_nmi() && !raw_cpu_read(kernel_neon_busy); > +} > + > +#else /* ! CONFIG_KERNEL_MODE_NEON */ > + > +static inline bool may_use_simd(void) { return false; } > + > +#endif /* ! CONFIG_KERNEL_MODE_NEON */ > + > +#endif /* ! __ASM_SIMD_H */ > diff --git a/arch/arm64/kernel/entry-fpsimd.S b/arch/arm64/kernel/entry-fpsimd.S > index c44a82f..6a27cd6 100644 > --- a/arch/arm64/kernel/entry-fpsimd.S > +++ b/arch/arm64/kernel/entry-fpsimd.S > @@ -41,27 +41,3 @@ ENTRY(fpsimd_load_state) > fpsimd_restore x0, 8 > ret > ENDPROC(fpsimd_load_state) > - > -#ifdef CONFIG_KERNEL_MODE_NEON > - > -/* > - * Save the bottom n FP registers. > - * > - * x0 - pointer to struct fpsimd_partial_state > - */ > -ENTRY(fpsimd_save_partial_state) > - fpsimd_save_partial x0, 1, 8, 9 > - ret > -ENDPROC(fpsimd_save_partial_state) > - > -/* > - * Load the bottom n FP registers. > - * > - * x0 - pointer to struct fpsimd_partial_state > - */ > -ENTRY(fpsimd_load_partial_state) > - fpsimd_restore_partial x0, 8, 9 > - ret > -ENDPROC(fpsimd_load_partial_state) > - > -#endif > diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c > index d7e5f8a..14329d6 100644 > --- a/arch/arm64/kernel/fpsimd.c > +++ b/arch/arm64/kernel/fpsimd.c > @@ -17,16 +17,18 @@ > * along with this program. If not, see <http://www.gnu.org/licenses/>. > */ > > +#include <linux/bottom_half.h> > #include <linux/cpu.h> > #include <linux/cpu_pm.h> > #include <linux/kernel.h> > #include <linux/init.h> > +#include <linux/percpu.h> > #include <linux/sched/signal.h> > #include <linux/signal.h> > -#include <linux/hardirq.h> > > #include <asm/fpsimd.h> > #include <asm/cputype.h> > +#include <asm/simd.h> > > #define FPEXC_IOF (1 << 0) > #define FPEXC_DZF (1 << 1) > @@ -62,6 +64,13 @@ > * CPU currently contain the most recent userland FPSIMD state of the current > * task. > * > + * In order to allow softirq handlers to use FPSIMD, kernel_neon_begin() may > + * save the task's FPSIMD context back to task_struct from softirq context. > + * To prevent this from racing with the manipulation of the task's FPSIMD state > + * from task context and thereby corrupting the state, it is necessary to > + * protect any manipulation of a task's fpsimd_state or TIF_FOREIGN_FPSTATE > + * flag with local_bh_disable() unless softirqs are already masked. > + * > * For a certain task, the sequence may look something like this: > * - the task gets scheduled in; if both the task's fpsimd_state.cpu field > * contains the id of the current CPU, and the CPU's fpsimd_last_state per-cpu > @@ -161,9 +170,14 @@ void fpsimd_flush_thread(void) > { > if (!system_supports_fpsimd()) > return; > + > + local_bh_disable(); > + > memset(¤t->thread.fpsimd_state, 0, sizeof(struct fpsimd_state)); > fpsimd_flush_task_state(current); > set_thread_flag(TIF_FOREIGN_FPSTATE); > + > + local_bh_enable(); > } > > /* > @@ -174,10 +188,13 @@ void fpsimd_preserve_current_state(void) > { > if (!system_supports_fpsimd()) > return; > - preempt_disable(); > + > + local_bh_disable(); > + > if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) > fpsimd_save_state(¤t->thread.fpsimd_state); > - preempt_enable(); > + > + local_bh_enable(); > } > > /* > @@ -189,7 +206,9 @@ void fpsimd_restore_current_state(void) > { > if (!system_supports_fpsimd()) > return; > - preempt_disable(); > + > + local_bh_disable(); > + > if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) { > struct fpsimd_state *st = ¤t->thread.fpsimd_state; > > @@ -197,7 +216,8 @@ void fpsimd_restore_current_state(void) > __this_cpu_write(fpsimd_last_state, st); > st->cpu = smp_processor_id(); > } > - preempt_enable(); > + > + local_bh_enable(); > } > > /* > @@ -209,7 +229,9 @@ void fpsimd_update_current_state(struct fpsimd_state *state) > { > if (!system_supports_fpsimd()) > return; > - preempt_disable(); > + > + local_bh_disable(); > + > fpsimd_load_state(state); > if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) { > struct fpsimd_state *st = ¤t->thread.fpsimd_state; > @@ -217,7 +239,8 @@ void fpsimd_update_current_state(struct fpsimd_state *state) > __this_cpu_write(fpsimd_last_state, st); > st->cpu = smp_processor_id(); > } > - preempt_enable(); > + > + local_bh_enable(); > } > > /* > @@ -230,49 +253,69 @@ void fpsimd_flush_task_state(struct task_struct *t) > > #ifdef CONFIG_KERNEL_MODE_NEON > > -static DEFINE_PER_CPU(struct fpsimd_partial_state, hardirq_fpsimdstate); > -static DEFINE_PER_CPU(struct fpsimd_partial_state, softirq_fpsimdstate); > +DEFINE_PER_CPU(bool, kernel_neon_busy); > > /* > * Kernel-side NEON support functions > */ > -void kernel_neon_begin_partial(u32 num_regs) > + > +/* > + * kernel_neon_begin(): obtain the CPU FPSIMD registers for use by the calling > + * context > + * > + * Must not be called unless may_use_simd() returns true. > + * Task context in the FPSIMD registers is saved back to memory as necessary. > + * > + * A matching call to kernel_neon_end() must be made before returning from the > + * calling context. > + * > + * The caller may freely use the FPSIMD registers until kernel_neon_end() is > + * called. > + */ > +void kernel_neon_begin(void) > { > if (WARN_ON(!system_supports_fpsimd())) > return; > - if (in_interrupt()) { > - struct fpsimd_partial_state *s = this_cpu_ptr( > - in_irq() ? &hardirq_fpsimdstate : &softirq_fpsimdstate); > > - BUG_ON(num_regs > 32); > - fpsimd_save_partial_state(s, roundup(num_regs, 2)); > - } else { > - /* > - * Save the userland FPSIMD state if we have one and if we > - * haven't done so already. Clear fpsimd_last_state to indicate > - * that there is no longer userland FPSIMD state in the > - * registers. > - */ > - preempt_disable(); > - if (current->mm && > - !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE)) > - fpsimd_save_state(¤t->thread.fpsimd_state); > - this_cpu_write(fpsimd_last_state, NULL); > - } > + BUG_ON(!may_use_simd()); > + > + local_bh_disable(); > + > + __this_cpu_write(kernel_neon_busy, true); > + > + /* Save unsaved task fpsimd state, if any: */ > + if (current->mm && !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE)) > + fpsimd_save_state(¤t->thread.fpsimd_state); > + > + /* Invalidate any task state remaining in the fpsimd regs: */ > + __this_cpu_write(fpsimd_last_state, NULL); > + > + preempt_disable(); > + > + local_bh_enable(); > } > -EXPORT_SYMBOL(kernel_neon_begin_partial); > +EXPORT_SYMBOL(kernel_neon_begin); > > +/* > + * kernel_neon_end(): give the CPU FPSIMD registers back to the current task > + * > + * Must be called from a context in which kernel_neon_begin() was previously > + * called, with no call to kernel_neon_end() in the meantime. > + * > + * The caller must not use the FPSIMD registers after this function is called, > + * unless kernel_neon_begin() is called again in the meantime. > + */ > void kernel_neon_end(void) > { > + bool busy; > + > if (!system_supports_fpsimd()) > return; > - if (in_interrupt()) { > - struct fpsimd_partial_state *s = this_cpu_ptr( > - in_irq() ? &hardirq_fpsimdstate : &softirq_fpsimdstate); > - fpsimd_load_partial_state(s); > - } else { > - preempt_enable(); > - } > + > + busy = __this_cpu_xchg(kernel_neon_busy, false); > + WARN_ON(!busy); /* No matching kernel_neon_begin()? */ > + > + preempt_enable(); > } > EXPORT_SYMBOL(kernel_neon_end); > > -- > 2.1.4 > ^ permalink raw reply [flat|nested] 13+ messages in thread
* [RFC PATCH v3 4/4] arm64: neon: Add backwards compatibility kernel_neon_begin_partial() 2017-05-25 18:24 [RFC PATCH v3 0/4] Simplify kernel-mode NEON Dave Martin ` (2 preceding siblings ...) 2017-05-25 18:25 ` [RFC PATCH v3 3/4] arm64: neon: Remove support for nested or hardirq kernel-mode NEON Dave Martin @ 2017-05-25 18:25 ` Dave Martin 2017-05-30 18:02 ` [RFC PATCH v3 0/4] Simplify kernel-mode NEON Dave Martin 4 siblings, 0 replies; 13+ messages in thread From: Dave Martin @ 2017-05-25 18:25 UTC (permalink / raw) To: linux-arm-kernel kernel_neon_begin_partial() is no longer available, but the implementation of kernel_neon_begin() should be sufficient to satisfy kernel_neon_begin_partial() callers. Because of the changed semantics of may_use_simd() and kernel_neon_begin() such code really needs to be ported. In the meantime, this patch restores buildability of kernel-mode NEON client code. Signed-off-by: Dave Martin <Dave.Martin@arm.com> --- arch/arm64/include/asm/neon.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/arm64/include/asm/neon.h b/arch/arm64/include/asm/neon.h index fb9d137..885f7a6 100644 --- a/arch/arm64/include/asm/neon.h +++ b/arch/arm64/include/asm/neon.h @@ -19,4 +19,7 @@ void kernel_neon_begin(void); void kernel_neon_end(void); +/* FIXME: Backwards compatibility only, should go away: */ +#define kernel_neon_begin_partial(num_regs) kernel_neon_begin() + #endif /* ! __ASM_NEON_H */ -- 2.1.4 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [RFC PATCH v3 0/4] Simplify kernel-mode NEON 2017-05-25 18:24 [RFC PATCH v3 0/4] Simplify kernel-mode NEON Dave Martin ` (3 preceding siblings ...) 2017-05-25 18:25 ` [RFC PATCH v3 4/4] arm64: neon: Add backwards compatibility kernel_neon_begin_partial() Dave Martin @ 2017-05-30 18:02 ` Dave Martin 2017-05-31 8:41 ` Ard Biesheuvel 4 siblings, 1 reply; 13+ messages in thread From: Dave Martin @ 2017-05-30 18:02 UTC (permalink / raw) To: linux-arm-kernel On Thu, May 25, 2017 at 07:24:57PM +0100, Dave Martin wrote: > This series aims to simplify kernel-mode NEON. Hi Ard, do you have any further comments on this series? I'd like to have it finalised as far as possible (modulo minor tweaks and bugfixes) so that I can port the SVE patches on top of it. Also, how do you think we should handle merging of this change? There's a flag-day issue here, since the kernel_mode_neon() API is being changed in an incompatible way. Cheers ---Dave [...] ^ permalink raw reply [flat|nested] 13+ messages in thread
* [RFC PATCH v3 0/4] Simplify kernel-mode NEON 2017-05-30 18:02 ` [RFC PATCH v3 0/4] Simplify kernel-mode NEON Dave Martin @ 2017-05-31 8:41 ` Ard Biesheuvel 2017-05-31 10:08 ` Dave Martin 0 siblings, 1 reply; 13+ messages in thread From: Ard Biesheuvel @ 2017-05-31 8:41 UTC (permalink / raw) To: linux-arm-kernel On 30 May 2017 at 18:02, Dave Martin <Dave.Martin@arm.com> wrote: > On Thu, May 25, 2017 at 07:24:57PM +0100, Dave Martin wrote: >> This series aims to simplify kernel-mode NEON. > > Hi Ard, do you have any further comments on this series? > > I'd like to have it finalised as far as possible (modulo minor tweaks > and bugfixes) so that I can port the SVE patches on top of it. > > Also, how do you think we should handle merging of this change? There's > a flag-day issue here, since the kernel_mode_neon() API is being changed > in an incompatible way. > I think the patches look fine now. The best way to merge these imo is to start with the changes in the clients, i.e., add an arm64 specific asm/simd.h that defines may_use_simd() as { return true; }, update all the crypto code with the fallbacks, and put this stuff on top of that. That way, there is a small window where the 'hint' is interpreted differently in the sha256 code, but apart from that, we should be bisection proof without a flag day AFAICT. BTW I got my ZD1211 working on my MacchiatioBin board. The performance is terrible, but that should not matter: if I can saturate a CPU doing NEON from userland and/or kernel process context, the softirq interruptions by the mac80211 code should exercise the updated code paths. I haven't tried that yet: let me get the code changes out today, so you can put your stuff on top. Then we can give it a good spin. -- Ard. ^ permalink raw reply [flat|nested] 13+ messages in thread
* [RFC PATCH v3 0/4] Simplify kernel-mode NEON 2017-05-31 8:41 ` Ard Biesheuvel @ 2017-05-31 10:08 ` Dave Martin 2017-05-31 11:07 ` Ard Biesheuvel 0 siblings, 1 reply; 13+ messages in thread From: Dave Martin @ 2017-05-31 10:08 UTC (permalink / raw) To: linux-arm-kernel On Wed, May 31, 2017 at 08:41:01AM +0000, Ard Biesheuvel wrote: > On 30 May 2017 at 18:02, Dave Martin <Dave.Martin@arm.com> wrote: > > On Thu, May 25, 2017 at 07:24:57PM +0100, Dave Martin wrote: > >> This series aims to simplify kernel-mode NEON. > > > > Hi Ard, do you have any further comments on this series? > > > > I'd like to have it finalised as far as possible (modulo minor tweaks > > and bugfixes) so that I can port the SVE patches on top of it. > > > > Also, how do you think we should handle merging of this change? There's > > a flag-day issue here, since the kernel_mode_neon() API is being changed > > in an incompatible way. > > > > I think the patches look fine now. The best way to merge these imo is > to start with the changes in the clients, i.e., add an arm64 specific > asm/simd.h that defines may_use_simd() as { return true; }, update all > the crypto code with the fallbacks, and put this stuff on top of that. Yes, that sounds feasible. Something like [1] below? Either way, it probably makes sense for that stub function to be added by your series. > That way, there is a small window where the 'hint' is interpreted > differently in the sha256 code, but apart from that, we should be > bisection proof without a flag day AFAICT. > > BTW I got my ZD1211 working on my MacchiatioBin board. The performance > is terrible, but that should not matter: if I can saturate a CPU doing Do you mean that my series causes a performance regression here, or is the performance terrible anyway? > NEON from userland and/or kernel process context, the softirq > interruptions by the mac80211 code should exercise the updated code > paths. I haven't tried that yet: let me get the code changes out > today, so you can put your stuff on top. Then we can give it a good > spin. That would be great, thanks. Cheers ---Dave [1]: >From 325ba5718ee0f136bfb3b4a43f7f42b1d8f2ab12 Mon Sep 17 00:00:00 2001 From: Dave Martin <Dave.Martin@arm.com> Date: Wed, 31 May 2017 10:31:26 +0100 Subject: [PATCH] arm64: neon: Add stub may_use_simd() in preparation for refactoring In preparation for refactoring that will make the conditions for kernel-mode NEON use non-trivial, this patch adds a stub may_use_simd() function. Since arm64 currently supports kernel-mode NEON from any context (excluding NMI) this function will return true for now. This should allow drivers to be ported to use kernel_neon_begin()/_end() based on this check, without impacting rebaseability. Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Dave Martin <Dave.Martin@arm.com> --- arch/arm64/include/asm/simd.h | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) create mode 100644 arch/arm64/include/asm/simd.h diff --git a/arch/arm64/include/asm/simd.h b/arch/arm64/include/asm/simd.h new file mode 100644 index 0000000..bd3acd8 --- /dev/null +++ b/arch/arm64/include/asm/simd.h @@ -0,0 +1,22 @@ +/* + * Copyright (C) 2017 ARM Limited + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ +#ifndef __ASM_SIMD_H +#define __ASM_SIMD_H + +/* arm64 kernel_mode_neon() currently supports all contexts up to hardirq */ +static inline bool may_use_simd(void) { return true; } + +#endif /* ! __ASM_SIMD_H */ -- 2.1.4 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [RFC PATCH v3 0/4] Simplify kernel-mode NEON 2017-05-31 10:08 ` Dave Martin @ 2017-05-31 11:07 ` Ard Biesheuvel 2017-05-31 11:33 ` Dave Martin 0 siblings, 1 reply; 13+ messages in thread From: Ard Biesheuvel @ 2017-05-31 11:07 UTC (permalink / raw) To: linux-arm-kernel On 31 May 2017 at 10:08, Dave Martin <Dave.Martin@arm.com> wrote: > On Wed, May 31, 2017 at 08:41:01AM +0000, Ard Biesheuvel wrote: >> On 30 May 2017 at 18:02, Dave Martin <Dave.Martin@arm.com> wrote: >> > On Thu, May 25, 2017 at 07:24:57PM +0100, Dave Martin wrote: >> >> This series aims to simplify kernel-mode NEON. >> > >> > Hi Ard, do you have any further comments on this series? >> > >> > I'd like to have it finalised as far as possible (modulo minor tweaks >> > and bugfixes) so that I can port the SVE patches on top of it. >> > >> > Also, how do you think we should handle merging of this change? There's >> > a flag-day issue here, since the kernel_mode_neon() API is being changed >> > in an incompatible way. >> > >> >> I think the patches look fine now. The best way to merge these imo is >> to start with the changes in the clients, i.e., add an arm64 specific >> asm/simd.h that defines may_use_simd() as { return true; }, update all >> the crypto code with the fallbacks, and put this stuff on top of that. > > Yes, that sounds feasible. > > Something like [1] below? Either way, it probably makes sense for that > stub function to be added by your series. > Pretty much, yeah. But don't forget to remove simd.h from arch/arm64/include/asm/Kbuild >> That way, there is a small window where the 'hint' is interpreted >> differently in the sha256 code, but apart from that, we should be >> bisection proof without a flag day AFAICT. >> >> BTW I got my ZD1211 working on my MacchiatioBin board. The performance >> is terrible, but that should not matter: if I can saturate a CPU doing > > Do you mean that my series causes a performance regression here, or is > the performance terrible anyway? > No, the performance is terrible, which shouldn't matter per se, but it would be nice if the load induced by the mac80211 were visible in 'top' as wait, sys or whatever-it-is-called time. Currently, the 3 Mbit/s throughput combined with the 2.2 cycles per byte performance of the AES-CCM code makes the code unnoticeable. >> NEON from userland and/or kernel process context, the softirq >> interruptions by the mac80211 code should exercise the updated code >> paths. I haven't tried that yet: let me get the code changes out >> today, so you can put your stuff on top. Then we can give it a good >> spin. > > That would be great, thanks. > I have updated my branch here: https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=kernel-mode-neon I removed all kernel_neon_begin_partial() invocations as well. -- Ard. ^ permalink raw reply [flat|nested] 13+ messages in thread
* [RFC PATCH v3 0/4] Simplify kernel-mode NEON 2017-05-31 11:07 ` Ard Biesheuvel @ 2017-05-31 11:33 ` Dave Martin 0 siblings, 0 replies; 13+ messages in thread From: Dave Martin @ 2017-05-31 11:33 UTC (permalink / raw) To: linux-arm-kernel On Wed, May 31, 2017 at 11:07:56AM +0000, Ard Biesheuvel wrote: > On 31 May 2017 at 10:08, Dave Martin <Dave.Martin@arm.com> wrote: > > On Wed, May 31, 2017 at 08:41:01AM +0000, Ard Biesheuvel wrote: [...] > > Something like [1] below? Either way, it probably makes sense for that > > stub function to be added by your series. > > > > Pretty much, yeah. But don't forget to remove simd.h from > arch/arm64/include/asm/Kbuild Oh wow, I thought that was done by magic. > >> BTW I got my ZD1211 working on my MacchiatioBin board. The performance > >> is terrible, but that should not matter: if I can saturate a CPU doing > > > > Do you mean that my series causes a performance regression here, or is > > the performance terrible anyway? > > > > No, the performance is terrible, which shouldn't matter per se, but it > would be nice if the load induced by the mac80211 were visible in > 'top' as wait, sys or whatever-it-is-called time. Currently, the 3 > Mbit/s throughput combined with the 2.2 cycles per byte performance of > the AES-CCM code makes the code unnoticeable. > > >> NEON from userland and/or kernel process context, the softirq > >> interruptions by the mac80211 code should exercise the updated code > >> paths. I haven't tried that yet: let me get the code changes out > >> today, so you can put your stuff on top. Then we can give it a good > >> spin. > > > > That would be great, thanks. > > > > I have updated my branch here: > https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=kernel-mode-neon Looks good. > I removed all kernel_neon_begin_partial() invocations as well. OK, I will drop that from my series. Cheers ---Dave ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2017-05-31 11:51 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-05-25 18:24 [RFC PATCH v3 0/4] Simplify kernel-mode NEON Dave Martin 2017-05-25 18:24 ` [RFC PATCH v3 1/4] arm64: neon: Add missing header guard in <asm/neon.h> Dave Martin 2017-05-31 11:41 ` Ard Biesheuvel 2017-05-25 18:24 ` [RFC PATCH v3 2/4] arm64: fpsimd: Consistently use __this_cpu_ ops where appropriate Dave Martin 2017-05-31 11:43 ` Ard Biesheuvel 2017-05-25 18:25 ` [RFC PATCH v3 3/4] arm64: neon: Remove support for nested or hardirq kernel-mode NEON Dave Martin 2017-05-31 11:51 ` Ard Biesheuvel 2017-05-25 18:25 ` [RFC PATCH v3 4/4] arm64: neon: Add backwards compatibility kernel_neon_begin_partial() Dave Martin 2017-05-30 18:02 ` [RFC PATCH v3 0/4] Simplify kernel-mode NEON Dave Martin 2017-05-31 8:41 ` Ard Biesheuvel 2017-05-31 10:08 ` Dave Martin 2017-05-31 11:07 ` Ard Biesheuvel 2017-05-31 11:33 ` Dave Martin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.