All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] kernel mode NEON support
@ 2013-06-25 20:24 Ard Biesheuvel
  2013-06-25 20:24 ` [PATCH v2 1/5] ARM: move VFP init to an earlier boot stage Ard Biesheuvel
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2013-06-25 20:24 UTC (permalink / raw)
  To: linux-arm-kernel

This is version 2 of the kernel mode NEON patch set.

Changes since v1:
- changed the order of the patches, so kernel_neon_begin() does not
  appear before the required fixes are in place
- don't use might_sleep() to enforce that kernel_neon_begin() should
  not be called from interrupt context, as it also prevents it from
  being called with preemption disabled, which is perfectly acceptable
- prefer inc_preempt_count() over preempt_disable() so sleeping after
  calling kernel_neon_begin() gets flagged even with kernel preemption 
  disabled in Kconfig
- made the RAID6 patch suitable for both arm and arm64

Ard Biesheuvel (5):
  ARM: move VFP init to an earlier boot stage
  ARM: be strict about FP exceptions in kernel mode
  ARM: add support for kernel mode NEON
  ARM: crypto: add NEON accelerated XOR implementation
  lib/raid6: add ARM-NEON accelerated syndrome calculation

 arch/arm/Kconfig            |  7 ++++
 arch/arm/include/asm/neon.h | 36 ++++++++++++++++++++
 arch/arm/include/asm/xor.h  | 73 +++++++++++++++++++++++++++++++++++++++++
 arch/arm/lib/Makefile       |  6 ++++
 arch/arm/lib/xor-neon.c     | 42 ++++++++++++++++++++++++
 arch/arm/vfp/vfphw.S        |  5 +++
 arch/arm/vfp/vfpmodule.c    | 67 ++++++++++++++++++++++++++++++++++++-
 include/linux/raid/pq.h     |  5 +++
 lib/raid6/.gitignore        |  1 +
 lib/raid6/Makefile          | 40 +++++++++++++++++++++++
 lib/raid6/algos.c           |  6 ++++
 lib/raid6/neon.c            | 58 ++++++++++++++++++++++++++++++++
 lib/raid6/neon.uc           | 80 +++++++++++++++++++++++++++++++++++++++++++++
 lib/raid6/test/Makefile     | 26 ++++++++++++++-
 14 files changed, 450 insertions(+), 2 deletions(-)
 create mode 100644 arch/arm/include/asm/neon.h
 create mode 100644 arch/arm/lib/xor-neon.c
 create mode 100644 lib/raid6/neon.c
 create mode 100644 lib/raid6/neon.uc

-- 
1.8.1.2

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 1/5] ARM: move VFP init to an earlier boot stage
  2013-06-25 20:24 [PATCH v2 0/5] kernel mode NEON support Ard Biesheuvel
@ 2013-06-25 20:24 ` Ard Biesheuvel
  2013-06-25 20:24 ` [PATCH v2 2/5] ARM: be strict about FP exceptions in kernel mode Ard Biesheuvel
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2013-06-25 20:24 UTC (permalink / raw)
  To: linux-arm-kernel

In order to use the NEON unit in the kernel, we should
initialize it a bit earlier in the boot process so NEON users
that like to do a quick benchmark at load time (like the
xor_blocks or RAID-6 code) find the NEON/VFP unit already
enabled.

Replaced late_initcall() with core_initcall().

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
---
 arch/arm/vfp/vfpmodule.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
index 5dfbb0b..791993a 100644
--- a/arch/arm/vfp/vfpmodule.c
+++ b/arch/arm/vfp/vfpmodule.c
@@ -731,4 +731,4 @@ static int __init vfp_init(void)
 	return 0;
 }
 
-late_initcall(vfp_init);
+core_initcall(vfp_init);
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 2/5] ARM: be strict about FP exceptions in kernel mode
  2013-06-25 20:24 [PATCH v2 0/5] kernel mode NEON support Ard Biesheuvel
  2013-06-25 20:24 ` [PATCH v2 1/5] ARM: move VFP init to an earlier boot stage Ard Biesheuvel
@ 2013-06-25 20:24 ` Ard Biesheuvel
  2013-06-25 20:24 ` [PATCH v2 3/5] ARM: add support for kernel mode NEON Ard Biesheuvel
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2013-06-25 20:24 UTC (permalink / raw)
  To: linux-arm-kernel

The support code in vfp_support_entry does not care whether the
exception that caused it to be invoked occurred in kernel mode or
in user mode. However, neither condition that could trigger this
exception (lazy restore and VFP bounce to support code) is
currently allowable in kernel mode.

In the former case, we can just handle it as an undefined instruction.
In the latter case, we should flag it as a bug, as it implies that
the FP unit has been enabled and an attempt has been made to
execute FP instructions that are dependent on the support code, and
this is not supported in kernel mode.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
---
 arch/arm/vfp/vfphw.S     |  5 +++++
 arch/arm/vfp/vfpmodule.c | 11 +++++++++++
 2 files changed, 16 insertions(+)

diff --git a/arch/arm/vfp/vfphw.S b/arch/arm/vfp/vfphw.S
index 8d10dc8..3e5d311 100644
--- a/arch/arm/vfp/vfphw.S
+++ b/arch/arm/vfp/vfphw.S
@@ -78,6 +78,11 @@
 ENTRY(vfp_support_entry)
 	DBGSTR3	"instr %08x pc %08x state %p", r0, r2, r10
 
+	ldr	r3, [sp, #S_PSR]	@ Neither lazy restore nor FP exceptions
+	and	r3, r3, #MODE_MASK	@ are supported in kernel mode
+	teq	r3, #USR_MODE
+	bne	vfp_kmode_exception	@ Returns through lr
+
 	VFPFMRX	r1, FPEXC		@ Is the VFP enabled?
 	DBGSTR1	"fpexc %08x", r1
 	tst	r1, #FPEXC_EN
diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
index 791993a..fd1466c 100644
--- a/arch/arm/vfp/vfpmodule.c
+++ b/arch/arm/vfp/vfpmodule.c
@@ -26,6 +26,7 @@
 #include <asm/system_info.h>
 #include <asm/thread_notify.h>
 #include <asm/vfp.h>
+#include <asm/bug.h>
 
 #include "vfpinstr.h"
 #include "vfp.h"
@@ -648,6 +649,16 @@ static int vfp_hotplug(struct notifier_block *b, unsigned long action,
 	return NOTIFY_OK;
 }
 
+void vfp_kmode_exception(void)
+{
+	/*
+	 * Taking an FP exception in kernel mode is always a bug, because
+	 * none of the FP instructions currently supported in kernel mode
+	 * (i.e., NEON) should ever be bounced back to the support code.
+	 */
+	BUG_ON(fmrx(FPEXC) & FPEXC_EN);
+}
+
 /*
  * VFP support code initialisation.
  */
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 3/5] ARM: add support for kernel mode NEON
  2013-06-25 20:24 [PATCH v2 0/5] kernel mode NEON support Ard Biesheuvel
  2013-06-25 20:24 ` [PATCH v2 1/5] ARM: move VFP init to an earlier boot stage Ard Biesheuvel
  2013-06-25 20:24 ` [PATCH v2 2/5] ARM: be strict about FP exceptions in kernel mode Ard Biesheuvel
@ 2013-06-25 20:24 ` Ard Biesheuvel
  2013-06-26 10:55   ` Ard Biesheuvel
  2013-06-25 20:24 ` [PATCH v2 4/5] ARM: crypto: add NEON accelerated XOR implementation Ard Biesheuvel
  2013-06-25 20:24 ` [PATCH v2 5/5] lib/raid6: add ARM-NEON accelerated syndrome calculation Ard Biesheuvel
  4 siblings, 1 reply; 21+ messages in thread
From: Ard Biesheuvel @ 2013-06-25 20:24 UTC (permalink / raw)
  To: linux-arm-kernel

In order to safely support the use of NEON instructions in
kernel mode, some precautions need to be taken:
- the userland context that may be present in the registers (even
  if the NEON/VFP is currently disabled) must be stored under the
  correct task (which may not be 'current' in the UP case),
- to avoid having to keep track of additional vfpstates for the
  kernel side, disallow the use of NEON in interrupt context
  and run with preemption disabled,
- after use, re-enable preemption and re-enable the lazy restore
  machinery by disabling the NEON/VFP unit.

This patch adds the functions kernel_neon_begin() and
kernel_neon_end() which take care of the above. It also adds
the Kconfig symbol KERNEL_MODE_NEON to enable it.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
---
 arch/arm/Kconfig            |  7 ++++++
 arch/arm/include/asm/neon.h | 36 ++++++++++++++++++++++++++++++
 arch/arm/vfp/vfpmodule.c    | 54 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 97 insertions(+)
 create mode 100644 arch/arm/include/asm/neon.h

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 2651b1d..1187e64 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -2183,6 +2183,13 @@ config NEON
 	  Say Y to include support code for NEON, the ARMv7 Advanced SIMD
 	  Extension.
 
+config KERNEL_MODE_NEON
+	bool "Support for NEON in kernel mode"
+	default n
+	depends on NEON
+	help
+	  Say Y to include support for NEON in kernel mode.
+
 endmenu
 
 menu "Userspace binary formats"
diff --git a/arch/arm/include/asm/neon.h b/arch/arm/include/asm/neon.h
new file mode 100644
index 0000000..8f730fe
--- /dev/null
+++ b/arch/arm/include/asm/neon.h
@@ -0,0 +1,36 @@
+/*
+ * linux/arch/arm/include/asm/neon.h
+ *
+ * Copyright (C) 2013 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <asm/hwcap.h>
+
+#define cpu_has_neon()		(!!(elf_hwcap & HWCAP_NEON))
+
+#ifdef __ARM_NEON__
+
+/*
+ * If you are affected by the BUILD_BUG below, it probably means that you are
+ * using NEON code /and/ calling the kernel_neon_begin() function from the same
+ * compilation unit. To prevent issues that may arise from GCC reordering or
+ * generating(1) NEON instructions outside of these begin/end functions, the
+ * only supported way of using NEON code in the kernel is by isolating it in a
+ * separate compilation unit, and calling it from another unit from inside a
+ * kernel_neon_begin/kernel_neon_end pair.
+ *
+ * (1) Current GCC (4.7) might generate NEON instructions at O3 level if
+ *     -mpfu=neon is set.
+ */
+
+#define kernel_neon_begin() \
+	BUILD_BUG_ON_MSG(1, "kernel_neon_begin() called from NEON code")
+
+#else
+void kernel_neon_begin(void);
+#endif
+void kernel_neon_end(void);
diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
index fd1466c..b64ca77 100644
--- a/arch/arm/vfp/vfpmodule.c
+++ b/arch/arm/vfp/vfpmodule.c
@@ -20,6 +20,7 @@
 #include <linux/init.h>
 #include <linux/uaccess.h>
 #include <linux/user.h>
+#include <linux/export.h>
 
 #include <asm/cp15.h>
 #include <asm/cputype.h>
@@ -659,6 +660,59 @@ void vfp_kmode_exception(void)
 	BUG_ON(fmrx(FPEXC) & FPEXC_EN);
 }
 
+#ifdef CONFIG_KERNEL_MODE_NEON
+
+/*
+ * Kernel-side NEON support functions
+ */
+void kernel_neon_begin(void)
+{
+	struct thread_info *thread = current_thread_info();
+	unsigned int cpu;
+	u32 fpexc;
+
+	/*
+	 * Kernel mode NEON is only allowed outside of interrupt context
+	 * with preemption disabled. This will make sure that the kernel
+	 * mode NEON register contents never need to be preserved.
+	 *
+	 * Use inc_preempt_count() instead of preempt_disable() so sleeping
+	 * complains noisily even on builds that have kernel preemption
+	 * disabled.
+	 */
+	BUG_ON(in_interrupt());
+	inc_preempt_count();
+	barrier();
+	cpu = smp_processor_id();
+
+	fpexc = fmrx(FPEXC) | FPEXC_EN;
+	fmxr(FPEXC, fpexc);
+
+	/*
+	 * Save the userland NEON/VFP state. Under UP,
+	 * the owner could be a task other than 'current'
+	 */
+	if (vfp_state_in_hw(cpu, thread))
+		vfp_save_state(&thread->vfpstate, fpexc);
+#ifndef CONFIG_SMP
+	else if (vfp_current_hw_state[cpu] != NULL)
+		vfp_save_state(vfp_current_hw_state[cpu], fpexc);
+#endif
+	vfp_current_hw_state[cpu] = NULL;
+}
+EXPORT_SYMBOL(kernel_neon_begin);
+
+void kernel_neon_end(void)
+{
+	/* Disable the NEON/VFP unit. */
+	fmxr(FPEXC, fmrx(FPEXC) & ~FPEXC_EN);
+	barrier();
+	dec_preempt_count();
+}
+EXPORT_SYMBOL(kernel_neon_end);
+
+#endif /* CONFIG_KERNEL_MODE_NEON */
+
 /*
  * VFP support code initialisation.
  */
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 4/5] ARM: crypto: add NEON accelerated XOR implementation
  2013-06-25 20:24 [PATCH v2 0/5] kernel mode NEON support Ard Biesheuvel
                   ` (2 preceding siblings ...)
  2013-06-25 20:24 ` [PATCH v2 3/5] ARM: add support for kernel mode NEON Ard Biesheuvel
@ 2013-06-25 20:24 ` Ard Biesheuvel
  2013-06-25 20:24 ` [PATCH v2 5/5] lib/raid6: add ARM-NEON accelerated syndrome calculation Ard Biesheuvel
  4 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2013-06-25 20:24 UTC (permalink / raw)
  To: linux-arm-kernel

Add a source file xor-neon.c (which is really just the reference
C implementation passed through the GCC vectorizer) and hook it
up to the XOR framework.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
---
 arch/arm/include/asm/xor.h | 73 ++++++++++++++++++++++++++++++++++++++++++++++
 arch/arm/lib/Makefile      |  6 ++++
 arch/arm/lib/xor-neon.c    | 42 ++++++++++++++++++++++++++
 3 files changed, 121 insertions(+)
 create mode 100644 arch/arm/lib/xor-neon.c

diff --git a/arch/arm/include/asm/xor.h b/arch/arm/include/asm/xor.h
index 7604673..4ffb26d 100644
--- a/arch/arm/include/asm/xor.h
+++ b/arch/arm/include/asm/xor.h
@@ -7,7 +7,10 @@
  * it under the terms of the GNU General Public License version 2 as
  * published by the Free Software Foundation.
  */
+#include <linux/hardirq.h>
 #include <asm-generic/xor.h>
+#include <asm/hwcap.h>
+#include <asm/neon.h>
 
 #define __XOR(a1, a2) a1 ^= a2
 
@@ -138,4 +141,74 @@ static struct xor_block_template xor_block_arm4regs = {
 		xor_speed(&xor_block_arm4regs);	\
 		xor_speed(&xor_block_8regs);	\
 		xor_speed(&xor_block_32regs);	\
+		NEON_TEMPLATES;			\
 	} while (0)
+
+#ifdef CONFIG_KERNEL_MODE_NEON
+
+extern struct xor_block_template const xor_block_neon_inner;
+
+static void
+xor_neon_2(unsigned long bytes, unsigned long *p1, unsigned long *p2)
+{
+	if (in_interrupt()) {
+		xor_arm4regs_2(bytes, p1, p2);
+	} else {
+		kernel_neon_begin();
+		xor_block_neon_inner.do_2(bytes, p1, p2);
+		kernel_neon_end();
+	}
+}
+
+static void
+xor_neon_3(unsigned long bytes, unsigned long *p1, unsigned long *p2,
+		unsigned long *p3)
+{
+	if (in_interrupt()) {
+		xor_arm4regs_3(bytes, p1, p2, p3);
+	} else {
+		kernel_neon_begin();
+		xor_block_neon_inner.do_3(bytes, p1, p2, p3);
+		kernel_neon_end();
+	}
+}
+
+static void
+xor_neon_4(unsigned long bytes, unsigned long *p1, unsigned long *p2,
+		unsigned long *p3, unsigned long *p4)
+{
+	if (in_interrupt()) {
+		xor_arm4regs_4(bytes, p1, p2, p3, p4);
+	} else {
+		kernel_neon_begin();
+		xor_block_neon_inner.do_4(bytes, p1, p2, p3, p4);
+		kernel_neon_end();
+	}
+}
+
+static void
+xor_neon_5(unsigned long bytes, unsigned long *p1, unsigned long *p2,
+		unsigned long *p3, unsigned long *p4, unsigned long *p5)
+{
+	if (in_interrupt()) {
+		xor_arm4regs_5(bytes, p1, p2, p3, p4, p5);
+	} else {
+		kernel_neon_begin();
+		xor_block_neon_inner.do_5(bytes, p1, p2, p3, p4, p5);
+		kernel_neon_end();
+	}
+}
+
+static struct xor_block_template xor_block_neon = {
+	.name	= "neon",
+	.do_2	= xor_neon_2,
+	.do_3	= xor_neon_3,
+	.do_4	= xor_neon_4,
+	.do_5	= xor_neon_5
+};
+
+#define NEON_TEMPLATES	\
+	do { if (cpu_has_neon()) xor_speed(&xor_block_neon); } while (0)
+#else
+#define NEON_TEMPLATES
+#endif
diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
index af72969..aaf3a87 100644
--- a/arch/arm/lib/Makefile
+++ b/arch/arm/lib/Makefile
@@ -45,3 +45,9 @@ lib-$(CONFIG_ARCH_SHARK)	+= io-shark.o
 
 $(obj)/csumpartialcopy.o:	$(obj)/csumpartialcopygeneric.S
 $(obj)/csumpartialcopyuser.o:	$(obj)/csumpartialcopygeneric.S
+
+ifeq ($(CONFIG_KERNEL_MODE_NEON),y)
+  NEON_FLAGS			:= -mfloat-abi=softfp -mfpu=neon
+  CFLAGS_xor-neon.o		+= $(NEON_FLAGS)
+  lib-$(CONFIG_XOR_BLOCKS)	+= xor-neon.o
+endif
diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
new file mode 100644
index 0000000..f485e5a
--- /dev/null
+++ b/arch/arm/lib/xor-neon.c
@@ -0,0 +1,42 @@
+/*
+ * linux/arch/arm/lib/xor-neon.c
+ *
+ * Copyright (C) 2013 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/raid/xor.h>
+
+#ifndef __ARM_NEON__
+#error You should compile this file with '-mfloat-abi=softfp -mfpu=neon'
+#endif
+
+/*
+ * Pull in the reference implementations while instructing GCC (through
+ * -ftree-vectorize) to attempt to exploit implicit parallelism and emit
+ * NEON instructions.
+ */
+#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)
+#pragma GCC optimize "tree-vectorize"
+#else
+/*
+ * While older versions of GCC do not generate incorrect code, they fail to
+ * recognize the parallel nature of these functions, and emit plain ARM code,
+ * which is known to be slower than the optimized ARM code in asm-arm/xor.h.
+ */
+#warning This code requires at least version 4.6 of GCC
+#endif
+
+#pragma GCC diagnostic ignored "-Wunused-variable"
+#include <asm-generic/xor.h>
+
+struct xor_block_template const xor_block_neon_inner = {
+	.name	= "__inner_neon__",
+	.do_2	= xor_8regs_2,
+	.do_3	= xor_8regs_3,
+	.do_4	= xor_8regs_4,
+	.do_5	= xor_8regs_5,
+};
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 5/5] lib/raid6: add ARM-NEON accelerated syndrome calculation
  2013-06-25 20:24 [PATCH v2 0/5] kernel mode NEON support Ard Biesheuvel
                   ` (3 preceding siblings ...)
  2013-06-25 20:24 ` [PATCH v2 4/5] ARM: crypto: add NEON accelerated XOR implementation Ard Biesheuvel
@ 2013-06-25 20:24 ` Ard Biesheuvel
  4 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2013-06-25 20:24 UTC (permalink / raw)
  To: linux-arm-kernel

Rebased/reworked a patch contributed by Rob Herring that uses
NEON intrinsics to perform the RAID-6 syndrome calculations.
It uses the existing unroll.awk code to generate several
unrolled versions of which the best performing one is selected
at boot time.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
Cc: hpa at linux.intel.com
---
 include/linux/raid/pq.h |  5 ++++
 lib/raid6/.gitignore    |  1 +
 lib/raid6/Makefile      | 40 +++++++++++++++++++++++++
 lib/raid6/algos.c       |  6 ++++
 lib/raid6/neon.c        | 58 +++++++++++++++++++++++++++++++++++
 lib/raid6/neon.uc       | 80 +++++++++++++++++++++++++++++++++++++++++++++++++
 lib/raid6/test/Makefile | 26 +++++++++++++++-
 7 files changed, 215 insertions(+), 1 deletion(-)
 create mode 100644 lib/raid6/neon.c
 create mode 100644 lib/raid6/neon.uc

diff --git a/include/linux/raid/pq.h b/include/linux/raid/pq.h
index 8dfaa2c..0f42469 100644
--- a/include/linux/raid/pq.h
+++ b/include/linux/raid/pq.h
@@ -114,6 +114,11 @@ extern const struct raid6_recov_calls raid6_recov_intx1;
 extern const struct raid6_recov_calls raid6_recov_ssse3;
 extern const struct raid6_recov_calls raid6_recov_avx2;
 
+extern const struct raid6_calls raid6_neonx1;
+extern const struct raid6_calls raid6_neonx2;
+extern const struct raid6_calls raid6_neonx4;
+extern const struct raid6_calls raid6_neonx8;
+
 /* Algorithm list */
 extern const struct raid6_calls * const raid6_algos[];
 extern const struct raid6_recov_calls *const raid6_recov_algos[];
diff --git a/lib/raid6/.gitignore b/lib/raid6/.gitignore
index 162beca..0a7e494 100644
--- a/lib/raid6/.gitignore
+++ b/lib/raid6/.gitignore
@@ -2,3 +2,4 @@ mktables
 altivec*.c
 int*.c
 tables.c
+neon?.c
diff --git a/lib/raid6/Makefile b/lib/raid6/Makefile
index 9f7c184..b462578 100644
--- a/lib/raid6/Makefile
+++ b/lib/raid6/Makefile
@@ -5,6 +5,7 @@ raid6_pq-y	+= algos.o recov.o tables.o int1.o int2.o int4.o \
 
 raid6_pq-$(CONFIG_X86) += recov_ssse3.o recov_avx2.o mmx.o sse1.o sse2.o avx2.o
 raid6_pq-$(CONFIG_ALTIVEC) += altivec1.o altivec2.o altivec4.o altivec8.o
+raid6_pq-$(CONFIG_KERNEL_MODE_NEON) += neon.o neon1.o neon2.o neon4.o neon8.o
 
 hostprogs-y	+= mktables
 
@@ -16,6 +17,21 @@ ifeq ($(CONFIG_ALTIVEC),y)
 altivec_flags := -maltivec -mabi=altivec
 endif
 
+# The GCC option -ffreestanding is required in order to compile code containing
+# ARM/NEON intrinsics in a non C99-compliant environment (such as the kernel)
+ifeq ($(CONFIG_KERNEL_MODE_NEON),y)
+NEON_FLAGS := -ffreestanding
+ifeq ($(ARCH),arm)
+NEON_FLAGS += -mfloat-abi=softfp -mfpu=neon
+endif
+ifeq ($(ARCH),arm64)
+CFLAGS_REMOVE_neon1.o += -mgeneral-regs-only
+CFLAGS_REMOVE_neon2.o += -mgeneral-regs-only
+CFLAGS_REMOVE_neon4.o += -mgeneral-regs-only
+CFLAGS_REMOVE_neon8.o += -mgeneral-regs-only
+endif
+endif
+
 targets += int1.c
 $(obj)/int1.c:   UNROLL := 1
 $(obj)/int1.c:   $(src)/int.uc $(src)/unroll.awk FORCE
@@ -70,6 +86,30 @@ $(obj)/altivec8.c:   UNROLL := 8
 $(obj)/altivec8.c:   $(src)/altivec.uc $(src)/unroll.awk FORCE
 	$(call if_changed,unroll)
 
+CFLAGS_neon1.o += $(NEON_FLAGS)
+targets += neon1.c
+$(obj)/neon1.c:   UNROLL := 1
+$(obj)/neon1.c:   $(src)/neon.uc $(src)/unroll.awk FORCE
+	$(call if_changed,unroll)
+
+CFLAGS_neon2.o += $(NEON_FLAGS)
+targets += neon2.c
+$(obj)/neon2.c:   UNROLL := 2
+$(obj)/neon2.c:   $(src)/neon.uc $(src)/unroll.awk FORCE
+	$(call if_changed,unroll)
+
+CFLAGS_neon4.o += $(NEON_FLAGS)
+targets += neon4.c
+$(obj)/neon4.c:   UNROLL := 4
+$(obj)/neon4.c:   $(src)/neon.uc $(src)/unroll.awk FORCE
+	$(call if_changed,unroll)
+
+CFLAGS_neon8.o += $(NEON_FLAGS)
+targets += neon8.c
+$(obj)/neon8.c:   UNROLL := 8
+$(obj)/neon8.c:   $(src)/neon.uc $(src)/unroll.awk FORCE
+	$(call if_changed,unroll)
+
 quiet_cmd_mktable = TABLE   $@
       cmd_mktable = $(obj)/mktables > $@ || ( rm -f $@ && exit 1 )
 
diff --git a/lib/raid6/algos.c b/lib/raid6/algos.c
index 6d7316f..74e6f56 100644
--- a/lib/raid6/algos.c
+++ b/lib/raid6/algos.c
@@ -70,6 +70,12 @@ const struct raid6_calls * const raid6_algos[] = {
 	&raid6_intx2,
 	&raid6_intx4,
 	&raid6_intx8,
+#ifdef CONFIG_KERNEL_MODE_NEON
+	&raid6_neonx1,
+	&raid6_neonx2,
+	&raid6_neonx4,
+	&raid6_neonx8,
+#endif
 	NULL
 };
 
diff --git a/lib/raid6/neon.c b/lib/raid6/neon.c
new file mode 100644
index 0000000..36ad470
--- /dev/null
+++ b/lib/raid6/neon.c
@@ -0,0 +1,58 @@
+/*
+ * linux/lib/raid6/neon.c - RAID6 syndrome calculation using ARM NEON intrinsics
+ *
+ * Copyright (C) 2013 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/raid/pq.h>
+
+#ifdef __KERNEL__
+#include <asm/neon.h>
+#else
+#define kernel_neon_begin()
+#define kernel_neon_end()
+#define cpu_has_neon()		(1)
+#endif
+
+/*
+ * There are 2 reasons these wrappers are kept in a separate compilation unit
+ * from the actual implementations in neonN.c (generated from neon.uc by
+ * unroll.awk):
+ * - the actual implementations use NEON intrinsics, and the GCC support header
+ *   (arm_neon.h) is not fully compatible (type wise) with the kernel;
+ * - the neonN.c files are compiled with -mfpu=neon and optimization enabled,
+ *   and we have to make sure that we never use *any* NEON/VFP instructions
+ *   outside a kernel_neon_begin()/kernel_neon_end() pair.
+ */
+
+#define RAID6_NEON_WRAPPER(_n)						\
+	static void raid6_neon ## _n ## _gen_syndrome(int disks,	\
+					size_t bytes, void **ptrs)	\
+	{								\
+		void raid6_neon ## _n  ## _gen_syndrome_real(int,	\
+						unsigned long, void**);	\
+		kernel_neon_begin();					\
+		raid6_neon ## _n ## _gen_syndrome_real(disks,		\
+					(unsigned long)bytes, ptrs);	\
+		kernel_neon_end();					\
+	}								\
+	struct raid6_calls const raid6_neonx ## _n = {			\
+		raid6_neon ## _n ## _gen_syndrome,			\
+		raid6_have_neon,					\
+		"neonx" #_n,						\
+		0							\
+	}
+
+static int raid6_have_neon(void)
+{
+	return cpu_has_neon();
+}
+
+RAID6_NEON_WRAPPER(1);
+RAID6_NEON_WRAPPER(2);
+RAID6_NEON_WRAPPER(4);
+RAID6_NEON_WRAPPER(8);
diff --git a/lib/raid6/neon.uc b/lib/raid6/neon.uc
new file mode 100644
index 0000000..1b9ed79
--- /dev/null
+++ b/lib/raid6/neon.uc
@@ -0,0 +1,80 @@
+/* -----------------------------------------------------------------------
+ *
+ *   neon.uc - RAID-6 syndrome calculation using ARM NEON instructions
+ *
+ *   Copyright (C) 2012 Rob Herring
+ *
+ *   Based on altivec.uc:
+ *     Copyright 2002-2004 H. Peter Anvin - All Rights Reserved
+ *
+ *   This program is free software; you can redistribute it and/or modify
+ *   it under the terms of the GNU General Public License as published by
+ *   the Free Software Foundation, Inc., 53 Temple Place Ste 330,
+ *   Boston MA 02111-1307, USA; either version 2 of the License, or
+ *   (at your option) any later version; incorporated herein by reference.
+ *
+ * ----------------------------------------------------------------------- */
+
+/*
+ * neon$#.c
+ *
+ * $#-way unrolled NEON intrinsics math RAID-6 instruction set
+ *
+ * This file is postprocessed using unroll.awk
+ */
+
+#include <arm_neon.h>
+
+typedef uint8x16_t unative_t;
+
+#define NBYTES(x) ((unative_t){x,x,x,x, x,x,x,x, x,x,x,x, x,x,x,x})
+#define NSIZE	sizeof(unative_t)
+
+/*
+ * The SHLBYTE() operation shifts each byte left by 1, *not*
+ * rolling over into the next byte
+ */
+static inline unative_t SHLBYTE(unative_t v)
+{
+	return vshlq_n_u8(v, 1);
+}
+
+/*
+ * The MASK() operation returns 0xFF in any byte for which the high
+ * bit is 1, 0x00 for any byte for which the high bit is 0.
+ */
+static inline unative_t MASK(unative_t v)
+{
+	const uint8x16_t temp = NBYTES(0);
+	return (unative_t)vcltq_s8((int8x16_t)v, (int8x16_t)temp);
+}
+
+void raid6_neon$#_gen_syndrome_real(int disks, unsigned long bytes, void **ptrs)
+{
+	uint8_t **dptr = (uint8_t **)ptrs;
+	uint8_t *p, *q;
+	int d, z, z0;
+
+	register unative_t wd$$, wq$$, wp$$, w1$$, w2$$;
+	const unative_t x1d = NBYTES(0x1d);
+
+	z0 = disks - 3;		/* Highest data disk */
+	p = dptr[z0+1];		/* XOR parity */
+	q = dptr[z0+2];		/* RS syndrome */
+
+	for ( d = 0 ; d < bytes ; d += NSIZE*$# ) {
+		wq$$ = wp$$ = vld1q_u8(&dptr[z0][d+$$*NSIZE]);
+		for ( z = z0-1 ; z >= 0 ; z-- ) {
+			wd$$ = vld1q_u8(&dptr[z][d+$$*NSIZE]);
+			wp$$ = veorq_u8(wp$$, wd$$);
+			w2$$ = MASK(wq$$);
+			w1$$ = SHLBYTE(wq$$);
+
+			w2$$ = vandq_u8(w2$$, x1d);
+			w1$$ = veorq_u8(w1$$, w2$$);
+			wq$$ = veorq_u8(w1$$, wd$$);
+		}
+		vst1q_u8(&p[d+NSIZE*$$], wp$$);
+		vst1q_u8(&q[d+NSIZE*$$], wq$$);
+	}
+}
diff --git a/lib/raid6/test/Makefile b/lib/raid6/test/Makefile
index 087332d..28afa1a 100644
--- a/lib/raid6/test/Makefile
+++ b/lib/raid6/test/Makefile
@@ -22,11 +22,23 @@ ifeq ($(ARCH),x86_64)
         IS_X86 = yes
 endif
 
+ifeq ($(ARCH),arm)
+        CFLAGS += -I../../../arch/arm/include -mfpu=neon
+        HAS_NEON = yes
+endif
+ifeq ($(ARCH),arm64)
+        CFLAGS += -I../../../arch/arm64/include
+        HAS_NEON = yes
+endif
+
 ifeq ($(IS_X86),yes)
         OBJS   += mmx.o sse1.o sse2.o avx2.o recov_ssse3.o recov_avx2.o
         CFLAGS += $(shell echo "vpbroadcastb %xmm0, %ymm1" |	\
                     gcc -c -x assembler - >&/dev/null &&	\
                     rm ./-.o && echo -DCONFIG_AS_AVX2=1)
+else ifeq ($(HAS_NEON),yes)
+        OBJS   += neon.o neon1.o neon2.o neon4.o neon8.o
+        CFLAGS += -DCONFIG_KERNEL_MODE_NEON=1
 else
         HAS_ALTIVEC := $(shell echo -e '\#include <altivec.h>\nvector int a;' |\
                          gcc -c -x c - >&/dev/null && \
@@ -55,6 +67,18 @@ raid6.a: $(OBJS)
 raid6test: test.c raid6.a
 	$(CC) $(CFLAGS) -o raid6test $^
 
+neon1.c: neon.uc ../unroll.awk
+	$(AWK) ../unroll.awk -vN=1 < neon.uc > $@
+
+neon2.c: neon.uc ../unroll.awk
+	$(AWK) ../unroll.awk -vN=2 < neon.uc > $@
+
+neon4.c: neon.uc ../unroll.awk
+	$(AWK) ../unroll.awk -vN=4 < neon.uc > $@
+
+neon8.c: neon.uc ../unroll.awk
+	$(AWK) ../unroll.awk -vN=8 < neon.uc > $@
+
 altivec1.c: altivec.uc ../unroll.awk
 	$(AWK) ../unroll.awk -vN=1 < altivec.uc > $@
 
@@ -89,7 +113,7 @@ tables.c: mktables
 	./mktables > tables.c
 
 clean:
-	rm -f *.o *.a mktables mktables.c *.uc int*.c altivec*.c tables.c raid6test
+	rm -f *.o *.a mktables mktables.c *.uc int*.c altivec*.c neon*.c tables.c raid6test
 
 spotless: clean
 	rm -f *~
-- 
1.8.1.2

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 3/5] ARM: add support for kernel mode NEON
  2013-06-25 20:24 ` [PATCH v2 3/5] ARM: add support for kernel mode NEON Ard Biesheuvel
@ 2013-06-26 10:55   ` Ard Biesheuvel
  2013-06-26 11:14     ` Will Deacon
  2013-06-28 13:46     ` Jean-Christophe PLAGNIOL-VILLARD
  0 siblings, 2 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2013-06-26 10:55 UTC (permalink / raw)
  To: linux-arm-kernel

Replying to self:

On 25 June 2013 22:24, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> +void kernel_neon_end(void)
> +{
> +       /* Disable the NEON/VFP unit. */
> +       fmxr(FPEXC, fmrx(FPEXC) & ~FPEXC_EN);
> +       barrier();
> +       dec_preempt_count();
> +}
> +EXPORT_SYMBOL(kernel_neon_end);

Meh. This is not going to please the RT crowd, as preempt_schedule()
will not be called on PREEMPT builds in this case.

Propose to replace it with

    preempt_enable();
#ifndef CONFIG_PREEMPT_COUNT
    /* in this case, the preempt_enable() right above is just a barrier() */
    dec_preempt_count();
#endif

(and the converse in kernel_neon_begin())

In that case, preempt_disable will either be just a barrier(), or it
will re-enable preemption, and potentially reschedule if the preempt
count has dropped to zero.

-- 
Ard.


> +
> +#endif /* CONFIG_KERNEL_MODE_NEON */
> +
>  /*
>   * VFP support code initialisation.
>   */
> --
> 1.8.1.2
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 3/5] ARM: add support for kernel mode NEON
  2013-06-26 10:55   ` Ard Biesheuvel
@ 2013-06-26 11:14     ` Will Deacon
  2013-06-26 11:28       ` Ard Biesheuvel
  2013-06-28 13:46     ` Jean-Christophe PLAGNIOL-VILLARD
  1 sibling, 1 reply; 21+ messages in thread
From: Will Deacon @ 2013-06-26 11:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 26, 2013 at 11:55:33AM +0100, Ard Biesheuvel wrote:
> Replying to self:
> 
> On 25 June 2013 22:24, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> > +void kernel_neon_end(void)
> > +{
> > +       /* Disable the NEON/VFP unit. */
> > +       fmxr(FPEXC, fmrx(FPEXC) & ~FPEXC_EN);
> > +       barrier();
> > +       dec_preempt_count();
> > +}
> > +EXPORT_SYMBOL(kernel_neon_end);
> 
> Meh. This is not going to please the RT crowd, as preempt_schedule()
> will not be called on PREEMPT builds in this case.
> 
> Propose to replace it with
> 
>     preempt_enable();
> #ifndef CONFIG_PREEMPT_COUNT
>     /* in this case, the preempt_enable() right above is just a barrier() */
>     dec_preempt_count();
> #endif
> 
> (and the converse in kernel_neon_begin())

Yuck, that's ugly as sin! How does x86 deal with this? Looking at
kernel_fpu_{begin,end}, they just disable preemption so I guess that they
assume the caller is non-blocking? There's an aside about the use of
preempt-notifiers for KVM, so it does sound like the onus is on the caller
not to shoot themselves in the face.

Will

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 3/5] ARM: add support for kernel mode NEON
  2013-06-26 11:14     ` Will Deacon
@ 2013-06-26 11:28       ` Ard Biesheuvel
  2013-06-26 12:40         ` Will Deacon
  0 siblings, 1 reply; 21+ messages in thread
From: Ard Biesheuvel @ 2013-06-26 11:28 UTC (permalink / raw)
  To: linux-arm-kernel

On 26 June 2013 13:14, Will Deacon <will.deacon@arm.com> wrote:
> On Wed, Jun 26, 2013 at 11:55:33AM +0100, Ard Biesheuvel wrote:
>> Replying to self:
>>
>> On 25 June 2013 22:24, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> > +void kernel_neon_end(void)
>> > +{
>> > +       /* Disable the NEON/VFP unit. */
>> > +       fmxr(FPEXC, fmrx(FPEXC) & ~FPEXC_EN);
>> > +       barrier();
>> > +       dec_preempt_count();
>> > +}
>> > +EXPORT_SYMBOL(kernel_neon_end);
>>
>> Meh. This is not going to please the RT crowd, as preempt_schedule()
>> will not be called on PREEMPT builds in this case.
>>
>> Propose to replace it with
>>
>>     preempt_enable();
>> #ifndef CONFIG_PREEMPT_COUNT
>>     /* in this case, the preempt_enable() right above is just a barrier() */
>>     dec_preempt_count();
>> #endif
>>
>> (and the converse in kernel_neon_begin())
>
> Yuck, that's ugly as sin! How does x86 deal with this? Looking at
> kernel_fpu_{begin,end}, they just disable preemption so I guess that they
> assume the caller is non-blocking? There's an aside about the use of
> preempt-notifiers for KVM, so it does sound like the onus is on the caller
> not to shoot themselves in the face.
>

Even if x86 doesn't care about this, do you really think we should
take the risk of silently clobbering the NEON registers if the caller
does something that may end up sleeping? Anyway, I don't remember
exactly who suggested using inc_preempt_count() directly, but doing so
brings about the responsibility of calling preempt_schedule() when
leaving the critical section, and just using both (without the #ifdef)
is also not an option.

So can you suggest a better way of making sure schedule_debug() shoots
us down if calling schedule() between kernel_neon_begin and
kernel_neon_end, even on non-preempt builds?

-- 
Ard.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 3/5] ARM: add support for kernel mode NEON
  2013-06-26 11:28       ` Ard Biesheuvel
@ 2013-06-26 12:40         ` Will Deacon
  2013-06-26 12:52           ` Ard Biesheuvel
  0 siblings, 1 reply; 21+ messages in thread
From: Will Deacon @ 2013-06-26 12:40 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ard,

On Wed, Jun 26, 2013 at 12:28:49PM +0100, Ard Biesheuvel wrote:
> On 26 June 2013 13:14, Will Deacon <will.deacon@arm.com> wrote:
> > On Wed, Jun 26, 2013 at 11:55:33AM +0100, Ard Biesheuvel wrote:
> >> Propose to replace it with
> >>
> >>     preempt_enable();
> >> #ifndef CONFIG_PREEMPT_COUNT
> >>     /* in this case, the preempt_enable() right above is just a barrier() */
> >>     dec_preempt_count();
> >> #endif
> >>
> >> (and the converse in kernel_neon_begin())
> >
> > Yuck, that's ugly as sin! How does x86 deal with this? Looking at
> > kernel_fpu_{begin,end}, they just disable preemption so I guess that they
> > assume the caller is non-blocking? There's an aside about the use of
> > preempt-notifiers for KVM, so it does sound like the onus is on the caller
> > not to shoot themselves in the face.
> >
> 
> Even if x86 doesn't care about this, do you really think we should
> take the risk of silently clobbering the NEON registers if the caller
> does something that may end up sleeping? Anyway, I don't remember
> exactly who suggested using inc_preempt_count() directly, but doing so
> brings about the responsibility of calling preempt_schedule() when
> leaving the critical section, and just using both (without the #ifdef)
> is also not an option.
> 
> So can you suggest a better way of making sure schedule_debug() shoots
> us down if calling schedule() between kernel_neon_begin and
> kernel_neon_end, even on non-preempt builds?

With what we currently have in the kernel, no, I can't think of a better
way. However, I also don't think that smuggling in a back-end hack is a good
idea either. How about we follow x86's lead on this and rely on the caller
not to sleep for the timebeing? Then, separately to this patch series, you
could look at augmenting the scheduler so that schedule_debug can complain
if it encounters a task that is not expected to sleep? That seems like the
right place to fix this problem, and will benefit other architectures too.

Will

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 3/5] ARM: add support for kernel mode NEON
  2013-06-26 12:40         ` Will Deacon
@ 2013-06-26 12:52           ` Ard Biesheuvel
  2013-06-26 13:13             ` Ard Biesheuvel
  0 siblings, 1 reply; 21+ messages in thread
From: Ard Biesheuvel @ 2013-06-26 12:52 UTC (permalink / raw)
  To: linux-arm-kernel

On 26 June 2013 14:40, Will Deacon <will.deacon@arm.com> wrote:
> On Wed, Jun 26, 2013 at 12:28:49PM +0100, Ard Biesheuvel wrote:
>> So can you suggest a better way of making sure schedule_debug() shoots
>> us down if calling schedule() between kernel_neon_begin and
>> kernel_neon_end, even on non-preempt builds?
>
> With what we currently have in the kernel, no, I can't think of a better
> way. However, I also don't think that smuggling in a back-end hack is a good
> idea either. How about we follow x86's lead on this and rely on the caller
> not to sleep for the timebeing? Then, separately to this patch series, you
> could look at augmenting the scheduler so that schedule_debug can complain
> if it encounters a task that is not expected to sleep? That seems like the
> right place to fix this problem, and will benefit other architectures too.
>

Good point. As preempt_enable/disable already have this side effect on
PREEMPT builds, perhaps it wouldn't be such a bad idea to modify them
in the non-PREEMPT case to at least complain if schedule() is invoked
during such a section.

@Russell: you mentioned spinlocks at some point to prevent sleeping.
Are you ok with Will's suggestion instead, i.e., to rely on
preempt_disable() to do the right thing, and fix it later because
currently, it doesn't on non-PREEMPT?

-- 
Ard.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 3/5] ARM: add support for kernel mode NEON
  2013-06-26 12:52           ` Ard Biesheuvel
@ 2013-06-26 13:13             ` Ard Biesheuvel
  2013-06-27 13:11               ` Ard Biesheuvel
  0 siblings, 1 reply; 21+ messages in thread
From: Ard Biesheuvel @ 2013-06-26 13:13 UTC (permalink / raw)
  To: linux-arm-kernel

On 26 June 2013 14:52, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 26 June 2013 14:40, Will Deacon <will.deacon@arm.com> wrote:
>> On Wed, Jun 26, 2013 at 12:28:49PM +0100, Ard Biesheuvel wrote:
>>> So can you suggest a better way of making sure schedule_debug() shoots
>>> us down if calling schedule() between kernel_neon_begin and
>>> kernel_neon_end, even on non-preempt builds?
>>
>> With what we currently have in the kernel, no, I can't think of a better
>> way. However, I also don't think that smuggling in a back-end hack is a good
>> idea either. How about we follow x86's lead on this and rely on the caller
>> not to sleep for the timebeing? Then, separately to this patch series, you
>> could look at augmenting the scheduler so that schedule_debug can complain
>> if it encounters a task that is not expected to sleep? That seems like the
>> right place to fix this problem, and will benefit other architectures too.
>>
>
> Good point. As preempt_enable/disable already have this side effect on
> PREEMPT builds, perhaps it wouldn't be such a bad idea to modify them
> in the non-PREEMPT case to at least complain if schedule() is invoked
> during such a section.
>

It appears we have this already, but in the non-PREEMPT case, it needs
CONFIG_DEBUG_ATOMIC_SLEEP to be defined.
Let's just rely on preempt_disable() to do the right thing...

-- 
Ard.


> @Russell: you mentioned spinlocks at some point to prevent sleeping.
> Are you ok with Will's suggestion instead, i.e., to rely on
> preempt_disable() to do the right thing, and fix it later because
> currently, it doesn't on non-PREEMPT?
>
> --
> Ard.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 3/5] ARM: add support for kernel mode NEON
  2013-06-26 13:13             ` Ard Biesheuvel
@ 2013-06-27 13:11               ` Ard Biesheuvel
  2013-06-27 15:09                 ` Will Deacon
  2013-06-27 15:13                 ` Catalin Marinas
  0 siblings, 2 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2013-06-27 13:11 UTC (permalink / raw)
  To: linux-arm-kernel

On 26 June 2013 15:13, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 26 June 2013 14:52, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> On 26 June 2013 14:40, Will Deacon <will.deacon@arm.com> wrote:
>>> With what we currently have in the kernel, no, I can't think of a better
>>> way. However, I also don't think that smuggling in a back-end hack is a good
>>> idea either. How about we follow x86's lead on this and rely on the caller
>>> not to sleep for the timebeing? Then, separately to this patch series, you
>>> could look at augmenting the scheduler so that schedule_debug can complain
>>> if it encounters a task that is not expected to sleep? That seems like the
>>> right place to fix this problem, and will benefit other architectures too.
>>>
>>
>> Good point. As preempt_enable/disable already have this side effect on
>> PREEMPT builds, perhaps it wouldn't be such a bad idea to modify them
>> in the non-PREEMPT case to at least complain if schedule() is invoked
>> during such a section.
>>
>
> It appears we have this already, but in the non-PREEMPT case, it needs
> CONFIG_DEBUG_ATOMIC_SLEEP to be defined.
> Let's just rely on preempt_disable() to do the right thing...
>

OK, just one more question before I respin the next (hopefully final) version:
if a caller does sleep after calling kernel_neon_begin() (and thus
receives no warning if he runs a non-PREEMPT build with
CONFIG_DEBUG_ATOMIC_SLEEP disabled), he will most likely find the
NEON/VFP unit disabled after waking up (as we disable it on a context
switch), so any subsequent NEON instructions will trigger the undef
handler.

Should I perhaps expand the vfp_kmode_exception() function which gets
invoked in this case to be more helpful in identifying this condition?
Currently it just BUG()s on conditions that indicate dependence on
support code, and reports an undefined instruction otherwise.

-- 
Ard.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 3/5] ARM: add support for kernel mode NEON
  2013-06-27 13:11               ` Ard Biesheuvel
@ 2013-06-27 15:09                 ` Will Deacon
  2013-06-27 15:13                 ` Catalin Marinas
  1 sibling, 0 replies; 21+ messages in thread
From: Will Deacon @ 2013-06-27 15:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jun 27, 2013 at 02:11:27PM +0100, Ard Biesheuvel wrote:
> On 26 June 2013 15:13, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> > On 26 June 2013 14:52, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> >> On 26 June 2013 14:40, Will Deacon <will.deacon@arm.com> wrote:
> >>> With what we currently have in the kernel, no, I can't think of a better
> >>> way. However, I also don't think that smuggling in a back-end hack is a good
> >>> idea either. How about we follow x86's lead on this and rely on the caller
> >>> not to sleep for the timebeing? Then, separately to this patch series, you
> >>> could look at augmenting the scheduler so that schedule_debug can complain
> >>> if it encounters a task that is not expected to sleep? That seems like the
> >>> right place to fix this problem, and will benefit other architectures too.
> >>>
> >>
> >> Good point. As preempt_enable/disable already have this side effect on
> >> PREEMPT builds, perhaps it wouldn't be such a bad idea to modify them
> >> in the non-PREEMPT case to at least complain if schedule() is invoked
> >> during such a section.
> >>
> >
> > It appears we have this already, but in the non-PREEMPT case, it needs
> > CONFIG_DEBUG_ATOMIC_SLEEP to be defined.
> > Let's just rely on preempt_disable() to do the right thing...
> >
> 
> OK, just one more question before I respin the next (hopefully final) version:
> if a caller does sleep after calling kernel_neon_begin() (and thus
> receives no warning if he runs a non-PREEMPT build with
> CONFIG_DEBUG_ATOMIC_SLEEP disabled), he will most likely find the
> NEON/VFP unit disabled after waking up (as we disable it on a context
> switch), so any subsequent NEON instructions will trigger the undef
> handler.
> 
> Should I perhaps expand the vfp_kmode_exception() function which gets
> invoked in this case to be more helpful in identifying this condition?
> Currently it just BUG()s on conditions that indicate dependence on
> support code, and reports an undefined instruction otherwise.

I don't think you need to worry too much about this. We can enable the debug
option if we want proper debugging and the BUG is a good indicator to go and
investigate a potential problem.

Will

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 3/5] ARM: add support for kernel mode NEON
  2013-06-27 13:11               ` Ard Biesheuvel
  2013-06-27 15:09                 ` Will Deacon
@ 2013-06-27 15:13                 ` Catalin Marinas
  2013-06-27 15:43                   ` Ard Biesheuvel
  2013-06-28 10:25                   ` Ard Biesheuvel
  1 sibling, 2 replies; 21+ messages in thread
From: Catalin Marinas @ 2013-06-27 15:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jun 27, 2013 at 02:11:27PM +0100, Ard Biesheuvel wrote:
> On 26 June 2013 15:13, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> > On 26 June 2013 14:52, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> >> On 26 June 2013 14:40, Will Deacon <will.deacon@arm.com> wrote:
> >>> With what we currently have in the kernel, no, I can't think of a better
> >>> way. However, I also don't think that smuggling in a back-end hack is a good
> >>> idea either. How about we follow x86's lead on this and rely on the caller
> >>> not to sleep for the timebeing? Then, separately to this patch series, you
> >>> could look at augmenting the scheduler so that schedule_debug can complain
> >>> if it encounters a task that is not expected to sleep? That seems like the
> >>> right place to fix this problem, and will benefit other architectures too.
> >>>
> >>
> >> Good point. As preempt_enable/disable already have this side effect on
> >> PREEMPT builds, perhaps it wouldn't be such a bad idea to modify them
> >> in the non-PREEMPT case to at least complain if schedule() is invoked
> >> during such a section.
> >>
> >
> > It appears we have this already, but in the non-PREEMPT case, it needs
> > CONFIG_DEBUG_ATOMIC_SLEEP to be defined.
> > Let's just rely on preempt_disable() to do the right thing...
> 
> OK, just one more question before I respin the next (hopefully final) version:
> if a caller does sleep after calling kernel_neon_begin() (and thus
> receives no warning if he runs a non-PREEMPT build with
> CONFIG_DEBUG_ATOMIC_SLEEP disabled), he will most likely find the
> NEON/VFP unit disabled after waking up (as we disable it on a context
> switch), so any subsequent NEON instructions will trigger the undef
> handler.

Can you check on the VFP context switch path whether kernel_neon_begin()
has been called and we are moving away from the task? You could even
store the LR in kernel_neon_begin() to give better error information.

-- 
Catalin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 3/5] ARM: add support for kernel mode NEON
  2013-06-27 15:13                 ` Catalin Marinas
@ 2013-06-27 15:43                   ` Ard Biesheuvel
  2013-06-28 10:25                   ` Ard Biesheuvel
  1 sibling, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2013-06-27 15:43 UTC (permalink / raw)
  To: linux-arm-kernel

On 27 June 2013 17:13, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Thu, Jun 27, 2013 at 02:11:27PM +0100, Ard Biesheuvel wrote:
>> OK, just one more question before I respin the next (hopefully final) version:
>> if a caller does sleep after calling kernel_neon_begin() (and thus
>> receives no warning if he runs a non-PREEMPT build with
>> CONFIG_DEBUG_ATOMIC_SLEEP disabled), he will most likely find the
>> NEON/VFP unit disabled after waking up (as we disable it on a context
>> switch), so any subsequent NEON instructions will trigger the undef
>> handler.
>
> Can you check on the VFP context switch path whether kernel_neon_begin()
> has been called and we are moving away from the task? You could even
> store the LR in kernel_neon_begin() to give better error information.
>

I guess it should be quite doable to add the LR of the most recent
kernel_neon_begin() call to the vfpstate (and clear it in
kernel_neon_end), so we can check it when vfp_notifier() is called.
We'll still hit the vfp undef handler on swapping the task back in, so
it may be a bit redundant.

What I could do is use this LR field in vfpstate when hitting the
undef handler to distinguish between (a) kernel_neon_begin()
erroneously not having called at all and (b) kernel_neon_begin()
having been called before sleeping (and print a stacktrace in the
latter case)

-- 
Ard.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 3/5] ARM: add support for kernel mode NEON
  2013-06-27 15:13                 ` Catalin Marinas
  2013-06-27 15:43                   ` Ard Biesheuvel
@ 2013-06-28 10:25                   ` Ard Biesheuvel
  1 sibling, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2013-06-28 10:25 UTC (permalink / raw)
  To: linux-arm-kernel

On 27 June 2013 17:13, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Thu, Jun 27, 2013 at 02:11:27PM +0100, Ard Biesheuvel wrote:
>> OK, just one more question before I respin the next (hopefully final) version:
>> if a caller does sleep after calling kernel_neon_begin() (and thus
>> receives no warning if he runs a non-PREEMPT build with
>> CONFIG_DEBUG_ATOMIC_SLEEP disabled), he will most likely find the
>> NEON/VFP unit disabled after waking up (as we disable it on a context
>> switch), so any subsequent NEON instructions will trigger the undef
>> handler.
>
> Can you check on the VFP context switch path whether kernel_neon_begin()
> has been called and we are moving away from the task? You could even
> store the LR in kernel_neon_begin() to give better error information.
>

I wil take this suggestion for the arm64 case, and propose a new
version next week. For arm, I think we should be ok without this, as
Will also suggested, because you will always hit the BUG() in
vfp_kmode_exception() if you touch the NEON from the kernel after a
context switch (there are only two ways to get the NEON enabled, one
is through lazy restore, which now only works from userland, and the
other is through kernel_neon_begin())

-- 
Ard.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 3/5] ARM: add support for kernel mode NEON
  2013-06-26 10:55   ` Ard Biesheuvel
  2013-06-26 11:14     ` Will Deacon
@ 2013-06-28 13:46     ` Jean-Christophe PLAGNIOL-VILLARD
  2013-06-28 14:00       ` Ard Biesheuvel
  1 sibling, 1 reply; 21+ messages in thread
From: Jean-Christophe PLAGNIOL-VILLARD @ 2013-06-28 13:46 UTC (permalink / raw)
  To: linux-arm-kernel

On 12:55 Wed 26 Jun     , Ard Biesheuvel wrote:
> Replying to self:
> 
> On 25 June 2013 22:24, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> > +void kernel_neon_end(void)
> > +{
> > +       /* Disable the NEON/VFP unit. */
> > +       fmxr(FPEXC, fmrx(FPEXC) & ~FPEXC_EN);
> > +       barrier();
> > +       dec_preempt_count();
> > +}
> > +EXPORT_SYMBOL(kernel_neon_end);
> 
> Meh. This is not going to please the RT crowd, as preempt_schedule()
> will not be called on PREEMPT builds in this case.
> 
> Propose to replace it with
> 
>     preempt_enable();
> #ifndef CONFIG_PREEMPT_COUNT
if (IS_ENABLED(CONFIG_xxx)) at least
>     /* in this case, the preempt_enable() right above is just a barrier() */
>     dec_preempt_count();
> #endif

but why do you need to call inc_preempt and dec directly?

Best Regards,
J.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 3/5] ARM: add support for kernel mode NEON
  2013-06-28 13:46     ` Jean-Christophe PLAGNIOL-VILLARD
@ 2013-06-28 14:00       ` Ard Biesheuvel
  2013-06-28 15:46         ` Catalin Marinas
  0 siblings, 1 reply; 21+ messages in thread
From: Ard Biesheuvel @ 2013-06-28 14:00 UTC (permalink / raw)
  To: linux-arm-kernel

On 28 June 2013 15:46, Jean-Christophe PLAGNIOL-VILLARD
<plagnioj@jcrosoft.com> wrote:
> On 12:55 Wed 26 Jun     , Ard Biesheuvel wrote:
>> Meh. This is not going to please the RT crowd, as preempt_schedule()
>> will not be called on PREEMPT builds in this case.
>>
>> Propose to replace it with
>>
>>     preempt_enable();
>> #ifndef CONFIG_PREEMPT_COUNT
> if (IS_ENABLED(CONFIG_xxx)) at least
>>     /* in this case, the preempt_enable() right above is just a barrier() */
>>     dec_preempt_count();
>> #endif
>
> but why do you need to call inc_preempt and dec directly?
>

There is a concern that violations of the rule that a task should not
sleep between kernel_neon_begin and kernel_neon_end calls may not be
spotted on non-PREEMPT builds that don't have
CONFIG_DEBUG_ATOMIC_SLEEP set. However, in this case (as I pointed out
in my previous mail), you will at least oops the kernel with a message
that points to in-kernel use of the NEON/VFP, so perhaps we should not
be too paranoid about this. On the other hand, considering that this
stuff is intended to be used for RAID-6 checksumming etc, it's better
to err on the side of caution.

On arm64, it's a bit worse, as there is not lazy restore for the FP
context (and hence no oops if you sleep in the wrong place), so
context switches that clobber the NEON register contents may not be
detectable at all.

-- 
Ard.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 3/5] ARM: add support for kernel mode NEON
  2013-06-28 14:00       ` Ard Biesheuvel
@ 2013-06-28 15:46         ` Catalin Marinas
  2013-06-28 20:17           ` Ard Biesheuvel
  0 siblings, 1 reply; 21+ messages in thread
From: Catalin Marinas @ 2013-06-28 15:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jun 28, 2013 at 03:00:02PM +0100, Ard Biesheuvel wrote:
> On 28 June 2013 15:46, Jean-Christophe PLAGNIOL-VILLARD
> <plagnioj@jcrosoft.com> wrote:
> > On 12:55 Wed 26 Jun     , Ard Biesheuvel wrote:
> >> Meh. This is not going to please the RT crowd, as preempt_schedule()
> >> will not be called on PREEMPT builds in this case.
> >>
> >> Propose to replace it with
> >>
> >>     preempt_enable();
> >> #ifndef CONFIG_PREEMPT_COUNT
> > if (IS_ENABLED(CONFIG_xxx)) at least
> >>     /* in this case, the preempt_enable() right above is just a barrier() */
> >>     dec_preempt_count();
> >> #endif
> >
> > but why do you need to call inc_preempt and dec directly?
> 
> There is a concern that violations of the rule that a task should not
> sleep between kernel_neon_begin and kernel_neon_end calls may not be
> spotted on non-PREEMPT builds that don't have
> CONFIG_DEBUG_ATOMIC_SLEEP set.

Would an explicit call to schedule() trigger with
CONFIG_DEBUG_ATOMIC_SLEEP? It looks that this config option only
triggers for explicit might_sleep() calls but we don't have one for
explicit schedule() calls (cond_resched() call has might_sleep()).

-- 
Catalin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 3/5] ARM: add support for kernel mode NEON
  2013-06-28 15:46         ` Catalin Marinas
@ 2013-06-28 20:17           ` Ard Biesheuvel
  0 siblings, 0 replies; 21+ messages in thread
From: Ard Biesheuvel @ 2013-06-28 20:17 UTC (permalink / raw)
  To: linux-arm-kernel

On 28 June 2013 17:46, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Fri, Jun 28, 2013 at 03:00:02PM +0100, Ard Biesheuvel wrote:
>> On 28 June 2013 15:46, Jean-Christophe PLAGNIOL-VILLARD
>> <plagnioj@jcrosoft.com> wrote:
>> > On 12:55 Wed 26 Jun     , Ard Biesheuvel wrote:
>> >> Meh. This is not going to please the RT crowd, as preempt_schedule()
>> >> will not be called on PREEMPT builds in this case.
>> >>
>> >> Propose to replace it with
>> >>
>> >>     preempt_enable();
>> >> #ifndef CONFIG_PREEMPT_COUNT
>> > if (IS_ENABLED(CONFIG_xxx)) at least
>> >>     /* in this case, the preempt_enable() right above is just a barrier() */
>> >>     dec_preempt_count();
>> >> #endif
>> >
>> > but why do you need to call inc_preempt and dec directly?
>>
>> There is a concern that violations of the rule that a task should not
>> sleep between kernel_neon_begin and kernel_neon_end calls may not be
>> spotted on non-PREEMPT builds that don't have
>> CONFIG_DEBUG_ATOMIC_SLEEP set.
>
> Would an explicit call to schedule() trigger with
> CONFIG_DEBUG_ATOMIC_SLEEP? It looks that this config option only
> triggers for explicit might_sleep() calls but we don't have one for
> explicit schedule() calls (cond_resched() call has might_sleep()).
>

CONFIG_DEBUG_ATOMIC_SLEEP enables CONFIG_PREEMPT_COUNT, which is
enough for schedule_debug() to barf if schedule() is called in a
preempt_enable/disable section. (Hence my original approach to
increase/decrease the preempt count, but the problem with that is that
it doesn't force the schedule() to occur when the preempt count drops
to zero)

-- 
Ard.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2013-06-28 20:17 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-25 20:24 [PATCH v2 0/5] kernel mode NEON support Ard Biesheuvel
2013-06-25 20:24 ` [PATCH v2 1/5] ARM: move VFP init to an earlier boot stage Ard Biesheuvel
2013-06-25 20:24 ` [PATCH v2 2/5] ARM: be strict about FP exceptions in kernel mode Ard Biesheuvel
2013-06-25 20:24 ` [PATCH v2 3/5] ARM: add support for kernel mode NEON Ard Biesheuvel
2013-06-26 10:55   ` Ard Biesheuvel
2013-06-26 11:14     ` Will Deacon
2013-06-26 11:28       ` Ard Biesheuvel
2013-06-26 12:40         ` Will Deacon
2013-06-26 12:52           ` Ard Biesheuvel
2013-06-26 13:13             ` Ard Biesheuvel
2013-06-27 13:11               ` Ard Biesheuvel
2013-06-27 15:09                 ` Will Deacon
2013-06-27 15:13                 ` Catalin Marinas
2013-06-27 15:43                   ` Ard Biesheuvel
2013-06-28 10:25                   ` Ard Biesheuvel
2013-06-28 13:46     ` Jean-Christophe PLAGNIOL-VILLARD
2013-06-28 14:00       ` Ard Biesheuvel
2013-06-28 15:46         ` Catalin Marinas
2013-06-28 20:17           ` Ard Biesheuvel
2013-06-25 20:24 ` [PATCH v2 4/5] ARM: crypto: add NEON accelerated XOR implementation Ard Biesheuvel
2013-06-25 20:24 ` [PATCH v2 5/5] lib/raid6: add ARM-NEON accelerated syndrome calculation Ard Biesheuvel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.