linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/2] xor-neon: Remove GCC warn & pragmas
@ 2021-01-19 13:17 Adrian Ratiu
  2021-01-19 13:17 ` [PATCH v4 1/2] arm: lib: xor-neon: remove unnecessary GCC < 4.6 warning Adrian Ratiu
  2021-01-19 13:17 ` [PATCH v4 2/2] arm: lib: xor-neon: move pragma options to makefile Adrian Ratiu
  0 siblings, 2 replies; 17+ messages in thread
From: Adrian Ratiu @ 2021-01-19 13:17 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Nathan Chancellor, Nick Desaulniers, Arnd Bergmann, Russell King,
	Ard Biesheuvel, Arvind Sankar, clang-built-linux, kernel,
	Linux Kernel Mailing List

Dear all,

In v4 a Clang-specific vectorization warning was added at
Arnd suggestion.

This series does not address the Clang vectorize not working
bug itself which is a known pre-existing issued documented
at [1] [2] [3]. Clang vectorization needs to be investigated
in more deepth and fixed separately. The purpouse of this is
to only fix some low-hanging-fruit GCC related isues.

Tested on next-20210118 using GCC 10.2.0 and Clang 10.0.1.

[1] https://bugs.llvm.org/show_bug.cgi?id=40976
[2] https://github.com/ClangBuiltLinux/linux/issues/503
[3] https://github.com/ClangBuiltLinux/linux/issues/496

Kind regards,
Adrian

Adrian Ratiu (1):
  arm: lib: xor-neon: move pragma options to makefile

Nathan Chancellor (1):
  arm: lib: xor-neon: remove unnecessary GCC < 4.6 warning

 arch/arm/lib/Makefile   |  2 +-
 arch/arm/lib/xor-neon.c | 18 +++++-------------
 2 files changed, 6 insertions(+), 14 deletions(-)

-- 
2.30.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v4 1/2] arm: lib: xor-neon: remove unnecessary GCC < 4.6 warning
  2021-01-19 13:17 [PATCH v4 0/2] xor-neon: Remove GCC warn & pragmas Adrian Ratiu
@ 2021-01-19 13:17 ` Adrian Ratiu
  2021-01-19 21:18   ` Nick Desaulniers
                     ` (2 more replies)
  2021-01-19 13:17 ` [PATCH v4 2/2] arm: lib: xor-neon: move pragma options to makefile Adrian Ratiu
  1 sibling, 3 replies; 17+ messages in thread
From: Adrian Ratiu @ 2021-01-19 13:17 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Nathan Chancellor, Nick Desaulniers, Arnd Bergmann, Russell King,
	Ard Biesheuvel, Arvind Sankar, clang-built-linux, kernel,
	Linux Kernel Mailing List

From: Nathan Chancellor <natechancellor@gmail.com>

Drop warning because kernel now requires GCC >= v4.9 after
commit 6ec4476ac825 ("Raise gcc version requirement to 4.9")
and clarify that -ftree-vectorize now always needs enabling
for GCC by directly testing the presence of CONFIG_CC_IS_GCC.

Another reason to remove the warning is that Clang exposes
itself as GCC < 4.6 so it triggers the warning about GCC
which doesn't make much sense and misleads Clang users by
telling them to update GCC.

Because Clang is now supported by the kernel print a clear
Clang-specific warning.

Link: https://github.com/ClangBuiltLinux/linux/issues/496
Link: https://github.com/ClangBuiltLinux/linux/issues/503
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
---
 arch/arm/lib/xor-neon.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
index b99dd8e1c93f..f9f3601cc2d1 100644
--- a/arch/arm/lib/xor-neon.c
+++ b/arch/arm/lib/xor-neon.c
@@ -14,20 +14,22 @@ MODULE_LICENSE("GPL");
 #error You should compile this file with '-march=armv7-a -mfloat-abi=softfp -mfpu=neon'
 #endif
 
+/*
+ * TODO: Even though -ftree-vectorize is enabled by default in Clang, the
+ * compiler does not produce vectorized code due to its cost model.
+ * See: https://github.com/ClangBuiltLinux/linux/issues/503
+ */
+#ifdef CONFIG_CC_IS_CLANG
+#warning Clang does not vectorize code in this file.
+#endif
+
 /*
  * Pull in the reference implementations while instructing GCC (through
  * -ftree-vectorize) to attempt to exploit implicit parallelism and emit
  * NEON instructions.
  */
-#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)
+#ifdef CONFIG_CC_IS_GCC
 #pragma GCC optimize "tree-vectorize"
-#else
-/*
- * While older versions of GCC do not generate incorrect code, they fail to
- * recognize the parallel nature of these functions, and emit plain ARM code,
- * which is known to be slower than the optimized ARM code in asm-arm/xor.h.
- */
-#warning This code requires at least version 4.6 of GCC
 #endif
 
 #pragma GCC diagnostic ignored "-Wunused-variable"
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v4 2/2] arm: lib: xor-neon: move pragma options to makefile
  2021-01-19 13:17 [PATCH v4 0/2] xor-neon: Remove GCC warn & pragmas Adrian Ratiu
  2021-01-19 13:17 ` [PATCH v4 1/2] arm: lib: xor-neon: remove unnecessary GCC < 4.6 warning Adrian Ratiu
@ 2021-01-19 13:17 ` Adrian Ratiu
  1 sibling, 0 replies; 17+ messages in thread
From: Adrian Ratiu @ 2021-01-19 13:17 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Nathan Chancellor, Nick Desaulniers, Arnd Bergmann, Russell King,
	Ard Biesheuvel, Arvind Sankar, clang-built-linux, kernel,
	Linux Kernel Mailing List

Using a pragma like GCC optimize is a bad idea because it tags
all functions with an __attribute__((optimize)) which replaces
optimization options rather than appending so could result in
dropping important flags. Not recommended for production use.

Because these options should always be enabled for this file,
it's better to set them via command line. tree-vectorize is on
by default in Clang, but it doesn't hurt to make it explicit.

Suggested-by: Arvind Sankar <nivedita@alum.mit.edu>
Suggested-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
---
 arch/arm/lib/Makefile   |  2 +-
 arch/arm/lib/xor-neon.c | 10 ----------
 2 files changed, 1 insertion(+), 11 deletions(-)

diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
index 6d2ba454f25b..12d31d1a7630 100644
--- a/arch/arm/lib/Makefile
+++ b/arch/arm/lib/Makefile
@@ -45,6 +45,6 @@ $(obj)/csumpartialcopyuser.o:	$(obj)/csumpartialcopygeneric.S
 
 ifeq ($(CONFIG_KERNEL_MODE_NEON),y)
   NEON_FLAGS			:= -march=armv7-a -mfloat-abi=softfp -mfpu=neon
-  CFLAGS_xor-neon.o		+= $(NEON_FLAGS)
+  CFLAGS_xor-neon.o		+= $(NEON_FLAGS) -ftree-vectorize -Wno-unused-variable
   obj-$(CONFIG_XOR_BLOCKS)	+= xor-neon.o
 endif
diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
index f9f3601cc2d1..65125ce69044 100644
--- a/arch/arm/lib/xor-neon.c
+++ b/arch/arm/lib/xor-neon.c
@@ -23,16 +23,6 @@ MODULE_LICENSE("GPL");
 #warning Clang does not vectorize code in this file.
 #endif
 
-/*
- * Pull in the reference implementations while instructing GCC (through
- * -ftree-vectorize) to attempt to exploit implicit parallelism and emit
- * NEON instructions.
- */
-#ifdef CONFIG_CC_IS_GCC
-#pragma GCC optimize "tree-vectorize"
-#endif
-
-#pragma GCC diagnostic ignored "-Wunused-variable"
 #include <asm-generic/xor.h>
 
 struct xor_block_template const xor_block_neon_inner = {
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] arm: lib: xor-neon: remove unnecessary GCC < 4.6 warning
  2021-01-19 13:17 ` [PATCH v4 1/2] arm: lib: xor-neon: remove unnecessary GCC < 4.6 warning Adrian Ratiu
@ 2021-01-19 21:18   ` Nick Desaulniers
  2021-01-19 21:34     ` Arnd Bergmann
  2021-01-20 13:11     ` Adrian Ratiu
  2021-01-19 21:54   ` Nathan Chancellor
  2021-01-19 22:04   ` David Laight
  2 siblings, 2 replies; 17+ messages in thread
From: Nick Desaulniers @ 2021-01-19 21:18 UTC (permalink / raw)
  To: Adrian Ratiu, Arnd Bergmann
  Cc: Linux ARM, Nathan Chancellor, Russell King, Ard Biesheuvel,
	Arvind Sankar, clang-built-linux, Collabora Kernel ML,
	Linux Kernel Mailing List

On Tue, Jan 19, 2021 at 5:17 AM Adrian Ratiu <adrian.ratiu@collabora.com> wrote:
>
> From: Nathan Chancellor <natechancellor@gmail.com>
>
> Drop warning because kernel now requires GCC >= v4.9 after
> commit 6ec4476ac825 ("Raise gcc version requirement to 4.9")
> and clarify that -ftree-vectorize now always needs enabling
> for GCC by directly testing the presence of CONFIG_CC_IS_GCC.
>
> Another reason to remove the warning is that Clang exposes
> itself as GCC < 4.6 so it triggers the warning about GCC
> which doesn't make much sense and misleads Clang users by
> telling them to update GCC.
>
> Because Clang is now supported by the kernel print a clear
> Clang-specific warning.
>
> Link: https://github.com/ClangBuiltLinux/linux/issues/496
> Link: https://github.com/ClangBuiltLinux/linux/issues/503
> Reported-by: Nick Desaulniers <ndesaulniers@google.com>
> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>

This is not the version of the patch I had reviewed; please drop my
reviewed-by tag when you change a patch significantly, as otherwise it
looks like I approved this patch.

Nacked-by: Nick Desaulniers <ndesaulniers@google.com>

> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
> ---
>  arch/arm/lib/xor-neon.c | 18 ++++++++++--------
>  1 file changed, 10 insertions(+), 8 deletions(-)
>
> diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
> index b99dd8e1c93f..f9f3601cc2d1 100644
> --- a/arch/arm/lib/xor-neon.c
> +++ b/arch/arm/lib/xor-neon.c
> @@ -14,20 +14,22 @@ MODULE_LICENSE("GPL");
>  #error You should compile this file with '-march=armv7-a -mfloat-abi=softfp -mfpu=neon'
>  #endif
>
> +/*
> + * TODO: Even though -ftree-vectorize is enabled by default in Clang, the
> + * compiler does not produce vectorized code due to its cost model.
> + * See: https://github.com/ClangBuiltLinux/linux/issues/503
> + */
> +#ifdef CONFIG_CC_IS_CLANG
> +#warning Clang does not vectorize code in this file.
> +#endif

Arnd, remind me again why it's a bug that the compiler's cost model
says it's faster to not produce a vectorized version of these loops?
I stand by my previous comment: https://bugs.llvm.org/show_bug.cgi?id=40976#c8

> +
>  /*
>   * Pull in the reference implementations while instructing GCC (through
>   * -ftree-vectorize) to attempt to exploit implicit parallelism and emit
>   * NEON instructions.
>   */
> -#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)
> +#ifdef CONFIG_CC_IS_GCC
>  #pragma GCC optimize "tree-vectorize"
> -#else
> -/*
> - * While older versions of GCC do not generate incorrect code, they fail to
> - * recognize the parallel nature of these functions, and emit plain ARM code,
> - * which is known to be slower than the optimized ARM code in asm-arm/xor.h.
> - */
> -#warning This code requires at least version 4.6 of GCC
>  #endif
>
>  #pragma GCC diagnostic ignored "-Wunused-variable"
> --
> 2.30.0
>


-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] arm: lib: xor-neon: remove unnecessary GCC < 4.6 warning
  2021-01-19 21:18   ` Nick Desaulniers
@ 2021-01-19 21:34     ` Arnd Bergmann
  2021-01-19 22:04       ` Nick Desaulniers
  2021-01-20 23:09       ` Nick Desaulniers
  2021-01-20 13:11     ` Adrian Ratiu
  1 sibling, 2 replies; 17+ messages in thread
From: Arnd Bergmann @ 2021-01-19 21:34 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Adrian Ratiu, Arnd Bergmann, Linux ARM, Nathan Chancellor,
	Russell King, Ard Biesheuvel, Arvind Sankar, clang-built-linux,
	Collabora Kernel ML, Linux Kernel Mailing List

On Tue, Jan 19, 2021 at 10:18 PM 'Nick Desaulniers' via Clang Built
Linux <clang-built-linux@googlegroups.com> wrote:
>
> On Tue, Jan 19, 2021 at 5:17 AM Adrian Ratiu <adrian.ratiu@collabora.com> wrote:
> > diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
> > index b99dd8e1c93f..f9f3601cc2d1 100644
> > --- a/arch/arm/lib/xor-neon.c
> > +++ b/arch/arm/lib/xor-neon.c
> > @@ -14,20 +14,22 @@ MODULE_LICENSE("GPL");
> >  #error You should compile this file with '-march=armv7-a -mfloat-abi=softfp -mfpu=neon'
> >  #endif
> >
> > +/*
> > + * TODO: Even though -ftree-vectorize is enabled by default in Clang, the
> > + * compiler does not produce vectorized code due to its cost model.
> > + * See: https://github.com/ClangBuiltLinux/linux/issues/503
> > + */
> > +#ifdef CONFIG_CC_IS_CLANG
> > +#warning Clang does not vectorize code in this file.
> > +#endif
>
> Arnd, remind me again why it's a bug that the compiler's cost model
> says it's faster to not produce a vectorized version of these loops?
> I stand by my previous comment: https://bugs.llvm.org/show_bug.cgi?id=40976#c8

The point is that without vectorizing the code, there is no point in building
both the default xor code and a "neon" version that has to save/restore
the neon registers but doesn't actually use them.

          Arnd

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] arm: lib: xor-neon: remove unnecessary GCC < 4.6 warning
  2021-01-19 13:17 ` [PATCH v4 1/2] arm: lib: xor-neon: remove unnecessary GCC < 4.6 warning Adrian Ratiu
  2021-01-19 21:18   ` Nick Desaulniers
@ 2021-01-19 21:54   ` Nathan Chancellor
  2021-01-20 13:18     ` Adrian Ratiu
  2021-01-19 22:04   ` David Laight
  2 siblings, 1 reply; 17+ messages in thread
From: Nathan Chancellor @ 2021-01-19 21:54 UTC (permalink / raw)
  To: Adrian Ratiu
  Cc: linux-arm-kernel, Nick Desaulniers, Arnd Bergmann, Russell King,
	Ard Biesheuvel, Arvind Sankar, clang-built-linux, kernel,
	Linux Kernel Mailing List

On Tue, Jan 19, 2021 at 03:17:23PM +0200, Adrian Ratiu wrote:
> From: Nathan Chancellor <natechancellor@gmail.com>
> 
> Drop warning because kernel now requires GCC >= v4.9 after
> commit 6ec4476ac825 ("Raise gcc version requirement to 4.9")
> and clarify that -ftree-vectorize now always needs enabling
> for GCC by directly testing the presence of CONFIG_CC_IS_GCC.
> 
> Another reason to remove the warning is that Clang exposes
> itself as GCC < 4.6 so it triggers the warning about GCC
> which doesn't make much sense and misleads Clang users by
> telling them to update GCC.
> 
> Because Clang is now supported by the kernel print a clear
> Clang-specific warning.
> 
> Link: https://github.com/ClangBuiltLinux/linux/issues/496
> Link: https://github.com/ClangBuiltLinux/linux/issues/503
> Reported-by: Nick Desaulniers <ndesaulniers@google.com>
> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>

The commit message looks like it is written by me but I never added a
Clang specific warning. I appreciate wanting to give me credit but when
you change things about my original commit message, please make it
clear that you did the edits, something like:

Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
[adrian: Add clang specific warning]
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>

> ---
>  arch/arm/lib/xor-neon.c | 18 ++++++++++--------
>  1 file changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
> index b99dd8e1c93f..f9f3601cc2d1 100644
> --- a/arch/arm/lib/xor-neon.c
> +++ b/arch/arm/lib/xor-neon.c
> @@ -14,20 +14,22 @@ MODULE_LICENSE("GPL");
>  #error You should compile this file with '-march=armv7-a -mfloat-abi=softfp -mfpu=neon'
>  #endif
>  
> +/*
> + * TODO: Even though -ftree-vectorize is enabled by default in Clang, the
> + * compiler does not produce vectorized code due to its cost model.
> + * See: https://github.com/ClangBuiltLinux/linux/issues/503
> + */
> +#ifdef CONFIG_CC_IS_CLANG
> +#warning Clang does not vectorize code in this file.
> +#endif

I really do not like this. With the GCC specific warning, the user could
just upgrade their GCC. With this warning, it is basically telling them
don't use clang, in which case, it would just be better to disable this
code altogether. I would rather see:

1. Just don't build this file with clang altogether, which I believe was
   v1's 2/2 patch.

OR

2. Use the pragma:

#pragma clang loop vectorize(enable)

as Nick suggests in v1's 2/2 patch.

Alternatively, __restrict__ sounds like it might be beneficial for both
GCC and clang:

https://lore.kernel.org/lkml/20201112215033.GA438824@rani.riverdale.lan/

>  /*
>   * Pull in the reference implementations while instructing GCC (through
>   * -ftree-vectorize) to attempt to exploit implicit parallelism and emit
>   * NEON instructions.
>   */
> -#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)
> +#ifdef CONFIG_CC_IS_GCC
>  #pragma GCC optimize "tree-vectorize"
> -#else
> -/*
> - * While older versions of GCC do not generate incorrect code, they fail to
> - * recognize the parallel nature of these functions, and emit plain ARM code,
> - * which is known to be slower than the optimized ARM code in asm-arm/xor.h.
> - */
> -#warning This code requires at least version 4.6 of GCC
>  #endif
>  
>  #pragma GCC diagnostic ignored "-Wunused-variable"
> -- 
> 2.30.0
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] arm: lib: xor-neon: remove unnecessary GCC < 4.6 warning
  2021-01-19 21:34     ` Arnd Bergmann
@ 2021-01-19 22:04       ` Nick Desaulniers
  2021-01-19 23:10         ` Nick Desaulniers
  2021-01-20 23:09       ` Nick Desaulniers
  1 sibling, 1 reply; 17+ messages in thread
From: Nick Desaulniers @ 2021-01-19 22:04 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Adrian Ratiu, Arnd Bergmann, Linux ARM, Nathan Chancellor,
	Russell King, Ard Biesheuvel, Arvind Sankar, clang-built-linux,
	Collabora Kernel ML, Linux Kernel Mailing List

On Tue, Jan 19, 2021 at 1:35 PM Arnd Bergmann <arnd@kernel.org> wrote:
>
> On Tue, Jan 19, 2021 at 10:18 PM 'Nick Desaulniers' via Clang Built
> Linux <clang-built-linux@googlegroups.com> wrote:
> >
> > On Tue, Jan 19, 2021 at 5:17 AM Adrian Ratiu <adrian.ratiu@collabora.com> wrote:
> > > diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
> > > index b99dd8e1c93f..f9f3601cc2d1 100644
> > > --- a/arch/arm/lib/xor-neon.c
> > > +++ b/arch/arm/lib/xor-neon.c
> > > @@ -14,20 +14,22 @@ MODULE_LICENSE("GPL");
> > >  #error You should compile this file with '-march=armv7-a -mfloat-abi=softfp -mfpu=neon'
> > >  #endif
> > >
> > > +/*
> > > + * TODO: Even though -ftree-vectorize is enabled by default in Clang, the
> > > + * compiler does not produce vectorized code due to its cost model.
> > > + * See: https://github.com/ClangBuiltLinux/linux/issues/503
> > > + */
> > > +#ifdef CONFIG_CC_IS_CLANG
> > > +#warning Clang does not vectorize code in this file.
> > > +#endif
> >
> > Arnd, remind me again why it's a bug that the compiler's cost model
> > says it's faster to not produce a vectorized version of these loops?
> > I stand by my previous comment: https://bugs.llvm.org/show_bug.cgi?id=40976#c8
>
> The point is that without vectorizing the code, there is no point in building
> both the default xor code and a "neon" version that has to save/restore
> the neon registers but doesn't actually use them.
>
>           Arnd

Thoughts?  Also, Nathan brings up my previous point about restrict.
This would benefit both GCC and Clang as they would not produce 2
"versions" of the loop; one vectorized if the std::distance between x
& y >= 8, one scalar otherwise.  But the callers would have to ensure
no overlap otherwise UB.

diff --git a/include/asm-generic/xor.h b/include/asm-generic/xor.h
index b62a2a56a4d4..7db16adc7d89 100644
--- a/include/asm-generic/xor.h
+++ b/include/asm-generic/xor.h
@@ -7,12 +7,21 @@

 #include <linux/prefetch.h>

+/* Overrule LLVM's cost model in order to always produce a vectorized loop
+ * version.
+ */
+#if defined(__clang__) && defined(CONFIG_ARM)
+#define FORCE_VECTORIZE _Pragma("clang loop vectorize(enable)")
+#else
+#define CLANG_FORCE_VECTORIZE
+#endif
+
 static void
 xor_8regs_2(unsigned long bytes, unsigned long *p1, unsigned long *p2)
 {
        long lines = bytes / (sizeof (long)) / 8;

-       do {
+       FORCE_VECTORIZE do {
                p1[0] ^= p2[0];
                p1[1] ^= p2[1];
                p1[2] ^= p2[2];
@@ -32,7 +41,7 @@ xor_8regs_3(unsigned long bytes, unsigned long *p1,
unsigned long *p2,
 {
        long lines = bytes / (sizeof (long)) / 8;

-       do {
+       FORCE_VECTORIZE do {
                p1[0] ^= p2[0] ^ p3[0];
                p1[1] ^= p2[1] ^ p3[1];
                p1[2] ^= p2[2] ^ p3[2];
@@ -53,7 +62,7 @@ xor_8regs_4(unsigned long bytes, unsigned long *p1,
unsigned long *p2,
 {
        long lines = bytes / (sizeof (long)) / 8;

-       do {
+       FORCE_VECTORIZE do {
                p1[0] ^= p2[0] ^ p3[0] ^ p4[0];
                p1[1] ^= p2[1] ^ p3[1] ^ p4[1];
                p1[2] ^= p2[2] ^ p3[2] ^ p4[2];
@@ -75,7 +84,7 @@ xor_8regs_5(unsigned long bytes, unsigned long *p1,
unsigned long *p2,
 {
        long lines = bytes / (sizeof (long)) / 8;

-       do {
+       FORCE_VECTORIZE do {
                p1[0] ^= p2[0] ^ p3[0] ^ p4[0] ^ p5[0];
                p1[1] ^= p2[1] ^ p3[1] ^ p4[1] ^ p5[1];
                p1[2] ^= p2[2] ^ p3[2] ^ p4[2] ^ p5[2];
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* RE: [PATCH v4 1/2] arm: lib: xor-neon: remove unnecessary GCC < 4.6 warning
  2021-01-19 13:17 ` [PATCH v4 1/2] arm: lib: xor-neon: remove unnecessary GCC < 4.6 warning Adrian Ratiu
  2021-01-19 21:18   ` Nick Desaulniers
  2021-01-19 21:54   ` Nathan Chancellor
@ 2021-01-19 22:04   ` David Laight
  2021-01-19 22:05     ` Nick Desaulniers
  2 siblings, 1 reply; 17+ messages in thread
From: David Laight @ 2021-01-19 22:04 UTC (permalink / raw)
  To: 'Adrian Ratiu', linux-arm-kernel
  Cc: Nathan Chancellor, Nick Desaulniers, Arnd Bergmann, Russell King,
	Ard Biesheuvel, Arvind Sankar, clang-built-linux, kernel,
	Linux Kernel Mailing List

From: Adrian Ratiu
> Sent: 19 January 2021 13:17
> To: linux-arm-kernel@lists.infradead.org
> 
> Drop warning because kernel now requires GCC >= v4.9 after
> commit 6ec4476ac825 ("Raise gcc version requirement to 4.9")
> and clarify that -ftree-vectorize now always needs enabling
> for GCC by directly testing the presence of CONFIG_CC_IS_GCC.
...
> -#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)
> +#ifdef CONFIG_CC_IS_GCC
>  #pragma GCC optimize "tree-vectorize"
> -#else

I thought that #pragma optimise was very likely to remove
a random subset of the command line parameters leading to
incorrect/unexpected code.

As such the extra option needs to passed in as a per source
file compiler option.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] arm: lib: xor-neon: remove unnecessary GCC < 4.6 warning
  2021-01-19 22:04   ` David Laight
@ 2021-01-19 22:05     ` Nick Desaulniers
  0 siblings, 0 replies; 17+ messages in thread
From: Nick Desaulniers @ 2021-01-19 22:05 UTC (permalink / raw)
  To: David Laight
  Cc: Adrian Ratiu, linux-arm-kernel, Nathan Chancellor, Arnd Bergmann,
	Russell King, Ard Biesheuvel, Arvind Sankar, clang-built-linux,
	kernel, Linux Kernel Mailing List

On Tue, Jan 19, 2021 at 2:04 PM David Laight <David.Laight@aculab.com> wrote:
>
> From: Adrian Ratiu
> > Sent: 19 January 2021 13:17
> > To: linux-arm-kernel@lists.infradead.org
> >
> > Drop warning because kernel now requires GCC >= v4.9 after
> > commit 6ec4476ac825 ("Raise gcc version requirement to 4.9")
> > and clarify that -ftree-vectorize now always needs enabling
> > for GCC by directly testing the presence of CONFIG_CC_IS_GCC.
> ...
> > -#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)
> > +#ifdef CONFIG_CC_IS_GCC
> >  #pragma GCC optimize "tree-vectorize"
> > -#else
>
> I thought that #pragma optimise was very likely to remove
> a random subset of the command line parameters leading to
> incorrect/unexpected code.
>
> As such the extra option needs to passed in as a per source
> file compiler option.

Yes, patch 2 in the series removes it from .c source, and passes
-ftree-vectorize during Kbuild.

>
>         David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>


-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] arm: lib: xor-neon: remove unnecessary GCC < 4.6 warning
  2021-01-19 22:04       ` Nick Desaulniers
@ 2021-01-19 23:10         ` Nick Desaulniers
  0 siblings, 0 replies; 17+ messages in thread
From: Nick Desaulniers @ 2021-01-19 23:10 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Adrian Ratiu, Arnd Bergmann, Linux ARM, Nathan Chancellor,
	Russell King, Ard Biesheuvel, Arvind Sankar, clang-built-linux,
	Collabora Kernel ML, Linux Kernel Mailing List

On Tue, Jan 19, 2021 at 2:04 PM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> On Tue, Jan 19, 2021 at 1:35 PM Arnd Bergmann <arnd@kernel.org> wrote:
> >
> > On Tue, Jan 19, 2021 at 10:18 PM 'Nick Desaulniers' via Clang Built
> > Linux <clang-built-linux@googlegroups.com> wrote:
> > >
> > > On Tue, Jan 19, 2021 at 5:17 AM Adrian Ratiu <adrian.ratiu@collabora.com> wrote:
> > > > diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
> > > > index b99dd8e1c93f..f9f3601cc2d1 100644
> > > > --- a/arch/arm/lib/xor-neon.c
> > > > +++ b/arch/arm/lib/xor-neon.c
> > > > @@ -14,20 +14,22 @@ MODULE_LICENSE("GPL");
> > > >  #error You should compile this file with '-march=armv7-a -mfloat-abi=softfp -mfpu=neon'
> > > >  #endif
> > > >
> > > > +/*
> > > > + * TODO: Even though -ftree-vectorize is enabled by default in Clang, the
> > > > + * compiler does not produce vectorized code due to its cost model.
> > > > + * See: https://github.com/ClangBuiltLinux/linux/issues/503
> > > > + */
> > > > +#ifdef CONFIG_CC_IS_CLANG
> > > > +#warning Clang does not vectorize code in this file.
> > > > +#endif
> > >
> > > Arnd, remind me again why it's a bug that the compiler's cost model
> > > says it's faster to not produce a vectorized version of these loops?
> > > I stand by my previous comment: https://bugs.llvm.org/show_bug.cgi?id=40976#c8
> >
> > The point is that without vectorizing the code, there is no point in building
> > both the default xor code and a "neon" version that has to save/restore
> > the neon registers but doesn't actually use them.
> >
> >           Arnd
>
> Thoughts?  Also, Nathan brings up my previous point about restrict.
> This would benefit both GCC and Clang as they would not produce 2
> "versions" of the loop; one vectorized if the std::distance between x
> & y >= 8, one scalar otherwise.  But the callers would have to ensure
> no overlap otherwise UB.
>
> diff --git a/include/asm-generic/xor.h b/include/asm-generic/xor.h
> index b62a2a56a4d4..7db16adc7d89 100644
> --- a/include/asm-generic/xor.h
> +++ b/include/asm-generic/xor.h
> @@ -7,12 +7,21 @@
>
>  #include <linux/prefetch.h>
>
> +/* Overrule LLVM's cost model in order to always produce a vectorized loop
> + * version.
> + */
> +#if defined(__clang__) && defined(CONFIG_ARM)
> +#define FORCE_VECTORIZE _Pragma("clang loop vectorize(enable)")
> +#else
> +#define CLANG_FORCE_VECTORIZE

^ err, I had renamed it, but missed this. Should have been
`FORCE_VECTORIZE` but you catch the drift.

> +#endif
> +
>  static void
>  xor_8regs_2(unsigned long bytes, unsigned long *p1, unsigned long *p2)
>  {
>         long lines = bytes / (sizeof (long)) / 8;
>
> -       do {
> +       FORCE_VECTORIZE do {
>                 p1[0] ^= p2[0];
>                 p1[1] ^= p2[1];
>                 p1[2] ^= p2[2];
> @@ -32,7 +41,7 @@ xor_8regs_3(unsigned long bytes, unsigned long *p1,
> unsigned long *p2,
>  {
>         long lines = bytes / (sizeof (long)) / 8;
>
> -       do {
> +       FORCE_VECTORIZE do {
>                 p1[0] ^= p2[0] ^ p3[0];
>                 p1[1] ^= p2[1] ^ p3[1];
>                 p1[2] ^= p2[2] ^ p3[2];
> @@ -53,7 +62,7 @@ xor_8regs_4(unsigned long bytes, unsigned long *p1,
> unsigned long *p2,
>  {
>         long lines = bytes / (sizeof (long)) / 8;
>
> -       do {
> +       FORCE_VECTORIZE do {
>                 p1[0] ^= p2[0] ^ p3[0] ^ p4[0];
>                 p1[1] ^= p2[1] ^ p3[1] ^ p4[1];
>                 p1[2] ^= p2[2] ^ p3[2] ^ p4[2];
> @@ -75,7 +84,7 @@ xor_8regs_5(unsigned long bytes, unsigned long *p1,
> unsigned long *p2,
>  {
>         long lines = bytes / (sizeof (long)) / 8;
>
> -       do {
> +       FORCE_VECTORIZE do {
>                 p1[0] ^= p2[0] ^ p3[0] ^ p4[0] ^ p5[0];
>                 p1[1] ^= p2[1] ^ p3[1] ^ p4[1] ^ p5[1];
>                 p1[2] ^= p2[2] ^ p3[2] ^ p4[2] ^ p5[2];
> --
> Thanks,
> ~Nick Desaulniers



-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] arm: lib: xor-neon: remove unnecessary GCC < 4.6 warning
  2021-01-19 21:18   ` Nick Desaulniers
  2021-01-19 21:34     ` Arnd Bergmann
@ 2021-01-20 13:11     ` Adrian Ratiu
  1 sibling, 0 replies; 17+ messages in thread
From: Adrian Ratiu @ 2021-01-20 13:11 UTC (permalink / raw)
  To: Nick Desaulniers, Adrian Ratiu, Arnd Bergmann
  Cc: Linux ARM, Nathan Chancellor, Russell King, Ard Biesheuvel,
	Arvind Sankar, clang-built-linux, Collabora Kernel ML,
	Linux Kernel Mailing List

On Tue, 19 Jan 2021, Nick Desaulniers <ndesaulniers@google.com> 
wrote:
> On Tue, Jan 19, 2021 at 5:17 AM Adrian Ratiu 
> <adrian.ratiu@collabora.com> wrote: 
>> 
>> From: Nathan Chancellor <natechancellor@gmail.com> 
>> 
>> Drop warning because kernel now requires GCC >= v4.9 after 
>> commit 6ec4476ac825 ("Raise gcc version requirement to 4.9") 
>> and clarify that -ftree-vectorize now always needs enabling for 
>> GCC by directly testing the presence of CONFIG_CC_IS_GCC. 
>> 
>> Another reason to remove the warning is that Clang exposes 
>> itself as GCC < 4.6 so it triggers the warning about GCC which 
>> doesn't make much sense and misleads Clang users by telling 
>> them to update GCC. 
>> 
>> Because Clang is now supported by the kernel print a clear 
>> Clang-specific warning. 
>> 
>> Link: https://github.com/ClangBuiltLinux/linux/issues/496 Link: 
>> https://github.com/ClangBuiltLinux/linux/issues/503 
>> Reported-by: Nick Desaulniers <ndesaulniers@google.com> 
>> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> 
> 
> This is not the version of the patch I had reviewed; please drop 
> my reviewed-by tag when you change a patch significantly, as 
> otherwise it looks like I approved this patch. 
> 
> Nacked-by: Nick Desaulniers <ndesaulniers@google.com> 
> 

Sorry for not removing the reviewed-by tags from the previous 
versions in this v4. I guess the only way forward with this is to 
actually make clang vectorization work. Also thanks for the patch 
suggestion in the other e-mail!

>> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
>> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
>> ---
>>  arch/arm/lib/xor-neon.c | 18 ++++++++++--------
>>  1 file changed, 10 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
>> index b99dd8e1c93f..f9f3601cc2d1 100644
>> --- a/arch/arm/lib/xor-neon.c
>> +++ b/arch/arm/lib/xor-neon.c
>> @@ -14,20 +14,22 @@ MODULE_LICENSE("GPL");
>>  #error You should compile this file with '-march=armv7-a -mfloat-abi=softfp -mfpu=neon'
>>  #endif
>>
>> +/*
>> + * TODO: Even though -ftree-vectorize is enabled by default in Clang, the
>> + * compiler does not produce vectorized code due to its cost model.
>> + * See: https://github.com/ClangBuiltLinux/linux/issues/503
>> + */
>> +#ifdef CONFIG_CC_IS_CLANG
>> +#warning Clang does not vectorize code in this file.
>> +#endif
>
> Arnd, remind me again why it's a bug that the compiler's cost model
> says it's faster to not produce a vectorized version of these loops?
> I stand by my previous comment: https://bugs.llvm.org/show_bug.cgi?id=40976#c8
>
>> +
>>  /*
>>   * Pull in the reference implementations while instructing GCC (through
>>   * -ftree-vectorize) to attempt to exploit implicit parallelism and emit
>>   * NEON instructions.
>>   */
>> -#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)
>> +#ifdef CONFIG_CC_IS_GCC
>>  #pragma GCC optimize "tree-vectorize"
>> -#else
>> -/*
>> - * While older versions of GCC do not generate incorrect code, they fail to
>> - * recognize the parallel nature of these functions, and emit plain ARM code,
>> - * which is known to be slower than the optimized ARM code in asm-arm/xor.h.
>> - */
>> -#warning This code requires at least version 4.6 of GCC
>>  #endif
>>
>>  #pragma GCC diagnostic ignored "-Wunused-variable"
>> --
>> 2.30.0
>>
>
>
> -- 
> Thanks,
> ~Nick Desaulniers

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] arm: lib: xor-neon: remove unnecessary GCC < 4.6 warning
  2021-01-19 21:54   ` Nathan Chancellor
@ 2021-01-20 13:18     ` Adrian Ratiu
  0 siblings, 0 replies; 17+ messages in thread
From: Adrian Ratiu @ 2021-01-20 13:18 UTC (permalink / raw)
  To: Nathan Chancellor, Adrian Ratiu
  Cc: linux-arm-kernel, Nick Desaulniers, Arnd Bergmann, Russell King,
	Ard Biesheuvel, Arvind Sankar, clang-built-linux, kernel,
	Linux Kernel Mailing List

On Tue, 19 Jan 2021, Nathan Chancellor <natechancellor@gmail.com> 
wrote:
> On Tue, Jan 19, 2021 at 03:17:23PM +0200, Adrian Ratiu wrote: 
>> From: Nathan Chancellor <natechancellor@gmail.com>  Drop 
>> warning because kernel now requires GCC >= v4.9 after commit 
>> 6ec4476ac825 ("Raise gcc version requirement to 4.9") and 
>> clarify that -ftree-vectorize now always needs enabling for GCC 
>> by directly testing the presence of CONFIG_CC_IS_GCC.   Another 
>> reason to remove the warning is that Clang exposes itself as 
>> GCC < 4.6 so it triggers the warning about GCC which doesn't 
>> make much sense and misleads Clang users by telling them to 
>> update GCC.   Because Clang is now supported by the kernel 
>> print a clear Clang-specific warning.   Link: 
>> https://github.com/ClangBuiltLinux/linux/issues/496 Link: 
>> https://github.com/ClangBuiltLinux/linux/issues/503 
>> Reported-by: Nick Desaulniers <ndesaulniers@google.com> 
>> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> 
>> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> 
>> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> 
> 
> The commit message looks like it is written by me but I never 
> added a Clang specific warning. I appreciate wanting to give me 
> credit but when you change things about my original commit 
> message, please make it clear that you did the edits, something 
> like: 
> 
> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> 
> [adrian: Add clang specific warning] Signed-off-by: Adrian Ratiu 
> <adrian.ratiu@collabora.com> 
> 

Thanks for the suggestion. Makes sense. I contemplated adding 
another patch by me on top but thought it was too much 
churn. Sorry if my edits were unclear.

>> --- 
>>  arch/arm/lib/xor-neon.c | 18 ++++++++++-------- 1 file 
>>  changed, 10 insertions(+), 8 deletions(-) 
>>  diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c 
>> index b99dd8e1c93f..f9f3601cc2d1 100644 --- 
>> a/arch/arm/lib/xor-neon.c +++ b/arch/arm/lib/xor-neon.c @@ 
>> -14,20 +14,22 @@ MODULE_LICENSE("GPL"); 
>>  #error You should compile this file with '-march=armv7-a 
>>  -mfloat-abi=softfp -mfpu=neon' #endif  
>> +/* + * TODO: Even though -ftree-vectorize is enabled by 
>> default in Clang, the + * compiler does not produce vectorized 
>> code due to its cost model.  + * See: 
>> https://github.com/ClangBuiltLinux/linux/issues/503 + */ 
>> +#ifdef CONFIG_CC_IS_CLANG +#warning Clang does not vectorize 
>> code in this file.  +#endif 
> 
> I really do not like this. With the GCC specific warning, the 
> user could just upgrade their GCC. With this warning, it is 
> basically telling them don't use clang, in which case, it would 
> just be better to disable this code altogether. I would rather 
> see: 
> 
> 1. Just don't build this file with clang altogether, which I 
> believe was 
>    v1's 2/2 patch. 
> 
> OR 
> 
> 2. Use the pragma: 
> 
> #pragma clang loop vectorize(enable) 
> 
> as Nick suggests in v1's 2/2 patch. 
> 
> Alternatively, __restrict__ sounds like it might be beneficial 
> for both GCC and clang: 
> 
> https://lore.kernel.org/lkml/20201112215033.GA438824@rani.riverdale.lan/ 
> 

Option 1 from v1 got clearly NACKed by Nick a while back so the 
only option gonig forward is to also fix clang vectorization 
together with these changes so the warning becomes unnecessary.

>>  /*
>>   * Pull in the reference implementations while instructing GCC (through
>>   * -ftree-vectorize) to attempt to exploit implicit parallelism and emit
>>   * NEON instructions.
>>   */
>> -#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)
>> +#ifdef CONFIG_CC_IS_GCC
>>  #pragma GCC optimize "tree-vectorize"
>> -#else
>> -/*
>> - * While older versions of GCC do not generate incorrect code, they fail to
>> - * recognize the parallel nature of these functions, and emit plain ARM code,
>> - * which is known to be slower than the optimized ARM code in asm-arm/xor.h.
>> - */
>> -#warning This code requires at least version 4.6 of GCC
>>  #endif
>>  
>>  #pragma GCC diagnostic ignored "-Wunused-variable"
>> -- 
>> 2.30.0
>> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] arm: lib: xor-neon: remove unnecessary GCC < 4.6 warning
  2021-01-19 21:34     ` Arnd Bergmann
  2021-01-19 22:04       ` Nick Desaulniers
@ 2021-01-20 23:09       ` Nick Desaulniers
  2021-01-20 23:15         ` Nick Desaulniers
  2021-01-21  4:13         ` Arvind Sankar
  1 sibling, 2 replies; 17+ messages in thread
From: Nick Desaulniers @ 2021-01-20 23:09 UTC (permalink / raw)
  To: Arnd Bergmann, Ard Biesheuvel
  Cc: Adrian Ratiu, Arnd Bergmann, Linux ARM, Nathan Chancellor,
	Russell King, Arvind Sankar, clang-built-linux,
	Collabora Kernel ML, Linux Kernel Mailing List

On Tue, Jan 19, 2021 at 1:35 PM Arnd Bergmann <arnd@kernel.org> wrote:
>
> On Tue, Jan 19, 2021 at 10:18 PM 'Nick Desaulniers' via Clang Built
> Linux <clang-built-linux@googlegroups.com> wrote:
> >
> > On Tue, Jan 19, 2021 at 5:17 AM Adrian Ratiu <adrian.ratiu@collabora.com> wrote:
> > > diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
> > > index b99dd8e1c93f..f9f3601cc2d1 100644
> > > --- a/arch/arm/lib/xor-neon.c
> > > +++ b/arch/arm/lib/xor-neon.c
> > > @@ -14,20 +14,22 @@ MODULE_LICENSE("GPL");
> > >  #error You should compile this file with '-march=armv7-a -mfloat-abi=softfp -mfpu=neon'
> > >  #endif
> > >
> > > +/*
> > > + * TODO: Even though -ftree-vectorize is enabled by default in Clang, the
> > > + * compiler does not produce vectorized code due to its cost model.
> > > + * See: https://github.com/ClangBuiltLinux/linux/issues/503
> > > + */
> > > +#ifdef CONFIG_CC_IS_CLANG
> > > +#warning Clang does not vectorize code in this file.
> > > +#endif
> >
> > Arnd, remind me again why it's a bug that the compiler's cost model
> > says it's faster to not produce a vectorized version of these loops?
> > I stand by my previous comment: https://bugs.llvm.org/show_bug.cgi?id=40976#c8
>
> The point is that without vectorizing the code, there is no point in building
> both the default xor code and a "neon" version that has to save/restore
> the neon registers but doesn't actually use them.

Doesn't that already happen today with GCC when the pointer arguments
are overlapping?  The loop is "versioned" and may not actually use the
NEON registers at all at runtime for such arguments.
https://godbolt.org/z/q48q8v  See also:
https://bugs.llvm.org/show_bug.cgi?id=40976#c11.  Or am I missing
something?

So I'm thinking if we extend out this pattern to the rest of the
functions, we can actually avoid calls to
kernel_neon_begin()/kernel_neon_end() for cases in which pointers
would be too close to use the vectorized loop version; meaning for GCC
this would be an optimization (don't save neon registers when you're
not going to use them).  I would probably consider moving
include/asm-generic/xor.h somewhere under arch/arm/
perhaps...err...something for the other users of <asm-generic/xor.h>.

diff --git a/arch/arm/include/asm/xor.h b/arch/arm/include/asm/xor.h
index aefddec79286..abd748d317e8 100644
--- a/arch/arm/include/asm/xor.h
+++ b/arch/arm/include/asm/xor.h
@@ -148,12 +148,12 @@ extern struct xor_block_template const
xor_block_neon_inner;
 static void
 xor_neon_2(unsigned long bytes, unsigned long *p1, unsigned long *p2)
 {
-       if (in_interrupt()) {
-               xor_arm4regs_2(bytes, p1, p2);
-       } else {
+       if (!in_interrupt() && abs((uintptr_t)p1, (uintptr_t)p2) >= 8) {
                kernel_neon_begin();
                xor_block_neon_inner.do_2(bytes, p1, p2);
                kernel_neon_end();
+       } else {
+               xor_arm4regs_2(bytes, p1, p2);
        }
 }
diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
index b99dd8e1c93f..0e8e474c0523 100644
--- a/arch/arm/lib/xor-neon.c
+++ b/arch/arm/lib/xor-neon.c
@@ -14,22 +14,6 @@ MODULE_LICENSE("GPL");
 #error You should compile this file with '-march=armv7-a
-mfloat-abi=softfp -mfpu=neon'
 #endif

-/*
- * Pull in the reference implementations while instructing GCC (through
- * -ftree-vectorize) to attempt to exploit implicit parallelism and emit
- * NEON instructions.
- */
-#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)
-#pragma GCC optimize "tree-vectorize"
-#else
-/*
- * While older versions of GCC do not generate incorrect code, they fail to
- * recognize the parallel nature of these functions, and emit plain ARM code,
- * which is known to be slower than the optimized ARM code in asm-arm/xor.h.
- */
-#warning This code requires at least version 4.6 of GCC
-#endif
-
 #pragma GCC diagnostic ignored "-Wunused-variable"
 #include <asm-generic/xor.h>
diff --git a/include/asm-generic/xor.h b/include/asm-generic/xor.h
index b62a2a56a4d4..69df62095c33 100644
--- a/include/asm-generic/xor.h
+++ b/include/asm-generic/xor.h
@@ -8,7 +8,7 @@
 #include <linux/prefetch.h>

 static void
-xor_8regs_2(unsigned long bytes, unsigned long *p1, unsigned long *p2)
+xor_8regs_2(unsigned long bytes, unsigned long * __restrict p1,
unsigned long * __restrict p2)
 {
        long lines = bytes / (sizeof (long)) / 8;


-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] arm: lib: xor-neon: remove unnecessary GCC < 4.6 warning
  2021-01-20 23:09       ` Nick Desaulniers
@ 2021-01-20 23:15         ` Nick Desaulniers
  2021-01-21  4:13         ` Arvind Sankar
  1 sibling, 0 replies; 17+ messages in thread
From: Nick Desaulniers @ 2021-01-20 23:15 UTC (permalink / raw)
  To: Arnd Bergmann, Ard Biesheuvel
  Cc: Adrian Ratiu, Arnd Bergmann, Linux ARM, Nathan Chancellor,
	Russell King, Arvind Sankar, clang-built-linux,
	Collabora Kernel ML, Linux Kernel Mailing List

On Wed, Jan 20, 2021 at 3:09 PM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> On Tue, Jan 19, 2021 at 1:35 PM Arnd Bergmann <arnd@kernel.org> wrote:
> >
> > On Tue, Jan 19, 2021 at 10:18 PM 'Nick Desaulniers' via Clang Built
> > Linux <clang-built-linux@googlegroups.com> wrote:
> > >
> > > On Tue, Jan 19, 2021 at 5:17 AM Adrian Ratiu <adrian.ratiu@collabora.com> wrote:
> > > > diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
> > > > index b99dd8e1c93f..f9f3601cc2d1 100644
> > > > --- a/arch/arm/lib/xor-neon.c
> > > > +++ b/arch/arm/lib/xor-neon.c
> > > > @@ -14,20 +14,22 @@ MODULE_LICENSE("GPL");
> > > >  #error You should compile this file with '-march=armv7-a -mfloat-abi=softfp -mfpu=neon'
> > > >  #endif
> > > >
> > > > +/*
> > > > + * TODO: Even though -ftree-vectorize is enabled by default in Clang, the
> > > > + * compiler does not produce vectorized code due to its cost model.
> > > > + * See: https://github.com/ClangBuiltLinux/linux/issues/503
> > > > + */
> > > > +#ifdef CONFIG_CC_IS_CLANG
> > > > +#warning Clang does not vectorize code in this file.
> > > > +#endif
> > >
> > > Arnd, remind me again why it's a bug that the compiler's cost model
> > > says it's faster to not produce a vectorized version of these loops?
> > > I stand by my previous comment: https://bugs.llvm.org/show_bug.cgi?id=40976#c8
> >
> > The point is that without vectorizing the code, there is no point in building
> > both the default xor code and a "neon" version that has to save/restore
> > the neon registers but doesn't actually use them.
>
> Doesn't that already happen today with GCC when the pointer arguments
> are overlapping?  The loop is "versioned" and may not actually use the
> NEON registers at all at runtime for such arguments.
> https://godbolt.org/z/q48q8v  See also:
> https://bugs.llvm.org/show_bug.cgi?id=40976#c11.  Or am I missing
> something?
>
> So I'm thinking if we extend out this pattern to the rest of the
> functions, we can actually avoid calls to
> kernel_neon_begin()/kernel_neon_end() for cases in which pointers
> would be too close to use the vectorized loop version; meaning for GCC
> this would be an optimization (don't save neon registers when you're
> not going to use them).  I would probably consider moving
> include/asm-generic/xor.h somewhere under arch/arm/
> perhaps...err...something for the other users of <asm-generic/xor.h>.
>
> diff --git a/arch/arm/include/asm/xor.h b/arch/arm/include/asm/xor.h
> index aefddec79286..abd748d317e8 100644
> --- a/arch/arm/include/asm/xor.h
> +++ b/arch/arm/include/asm/xor.h
> @@ -148,12 +148,12 @@ extern struct xor_block_template const
> xor_block_neon_inner;
>  static void
>  xor_neon_2(unsigned long bytes, unsigned long *p1, unsigned long *p2)
>  {
> -       if (in_interrupt()) {
> -               xor_arm4regs_2(bytes, p1, p2);
> -       } else {
> +       if (!in_interrupt() && abs((uintptr_t)p1, (uintptr_t)p2) >= 8) {
>                 kernel_neon_begin();
>                 xor_block_neon_inner.do_2(bytes, p1, p2);
>                 kernel_neon_end();
> +       } else {
> +               xor_arm4regs_2(bytes, p1, p2);
>         }
>  }
> diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
> index b99dd8e1c93f..0e8e474c0523 100644
> --- a/arch/arm/lib/xor-neon.c
> +++ b/arch/arm/lib/xor-neon.c
> @@ -14,22 +14,6 @@ MODULE_LICENSE("GPL");
>  #error You should compile this file with '-march=armv7-a
> -mfloat-abi=softfp -mfpu=neon'
>  #endif
>
> -/*
> - * Pull in the reference implementations while instructing GCC (through
> - * -ftree-vectorize) to attempt to exploit implicit parallelism and emit
> - * NEON instructions.
> - */
> -#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)
> -#pragma GCC optimize "tree-vectorize"

Err...we need to keep this but use the -f flag with __restrict for GCC
to vectorize:
https://godbolt.org/z/9acnEv

-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] arm: lib: xor-neon: remove unnecessary GCC < 4.6 warning
  2021-01-20 23:09       ` Nick Desaulniers
  2021-01-20 23:15         ` Nick Desaulniers
@ 2021-01-21  4:13         ` Arvind Sankar
  2021-01-21  4:24           ` Nick Desaulniers
  2021-01-21  8:46           ` Ard Biesheuvel
  1 sibling, 2 replies; 17+ messages in thread
From: Arvind Sankar @ 2021-01-21  4:13 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Arnd Bergmann, Ard Biesheuvel, Adrian Ratiu, Arnd Bergmann,
	Linux ARM, Nathan Chancellor, Russell King, clang-built-linux,
	Collabora Kernel ML, Linux Kernel Mailing List

On Wed, Jan 20, 2021 at 03:09:53PM -0800, Nick Desaulniers wrote:
> On Tue, Jan 19, 2021 at 1:35 PM Arnd Bergmann <arnd@kernel.org> wrote:
> >
> > On Tue, Jan 19, 2021 at 10:18 PM 'Nick Desaulniers' via Clang Built
> > Linux <clang-built-linux@googlegroups.com> wrote:
> > >
> > > On Tue, Jan 19, 2021 at 5:17 AM Adrian Ratiu <adrian.ratiu@collabora.com> wrote:
> > > > diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
> > > > index b99dd8e1c93f..f9f3601cc2d1 100644
> > > > --- a/arch/arm/lib/xor-neon.c
> > > > +++ b/arch/arm/lib/xor-neon.c
> > > > @@ -14,20 +14,22 @@ MODULE_LICENSE("GPL");
> > > >  #error You should compile this file with '-march=armv7-a -mfloat-abi=softfp -mfpu=neon'
> > > >  #endif
> > > >
> > > > +/*
> > > > + * TODO: Even though -ftree-vectorize is enabled by default in Clang, the
> > > > + * compiler does not produce vectorized code due to its cost model.
> > > > + * See: https://github.com/ClangBuiltLinux/linux/issues/503
> > > > + */
> > > > +#ifdef CONFIG_CC_IS_CLANG
> > > > +#warning Clang does not vectorize code in this file.
> > > > +#endif
> > >
> > > Arnd, remind me again why it's a bug that the compiler's cost model
> > > says it's faster to not produce a vectorized version of these loops?
> > > I stand by my previous comment: https://bugs.llvm.org/show_bug.cgi?id=40976#c8
> >
> > The point is that without vectorizing the code, there is no point in building
> > both the default xor code and a "neon" version that has to save/restore
> > the neon registers but doesn't actually use them.
> 
> Doesn't that already happen today with GCC when the pointer arguments
> are overlapping?  The loop is "versioned" and may not actually use the
> NEON registers at all at runtime for such arguments.
> https://godbolt.org/z/q48q8v  See also:
> https://bugs.llvm.org/show_bug.cgi?id=40976#c11.  Or am I missing
> something?

The gcc version is at least useful when the arguments _don't_ overlap,
which is presumably most/all the time.

There's no point to building this file if it's not going to produce a
vectorized version at all. The warning seems unnecessary, and it's not
really a compiler bug either -- until we agree on a way to get clang to
produce a vectorized version, the best thing would be for the neon
version to not be built at all with clang. Is that too messy to achieve?

> 
> So I'm thinking if we extend out this pattern to the rest of the
> functions, we can actually avoid calls to
> kernel_neon_begin()/kernel_neon_end() for cases in which pointers
> would be too close to use the vectorized loop version; meaning for GCC
> this would be an optimization (don't save neon registers when you're
> not going to use them).  I would probably consider moving
> include/asm-generic/xor.h somewhere under arch/arm/
> perhaps...err...something for the other users of <asm-generic/xor.h>.

We can't directly do the patch below since there are other users of the
asm-generic/xor.h implementations than just the neon file.  If it's too
much work to check and add __restrict everywhere, I think we'd either
need to copy the code into the xor-neon file, or maybe do some ifdeffery
so __restrict is only used for the neon version.

> 
> diff --git a/arch/arm/include/asm/xor.h b/arch/arm/include/asm/xor.h
> index aefddec79286..abd748d317e8 100644
> --- a/arch/arm/include/asm/xor.h
> +++ b/arch/arm/include/asm/xor.h
> @@ -148,12 +148,12 @@ extern struct xor_block_template const
> xor_block_neon_inner;
>  static void
>  xor_neon_2(unsigned long bytes, unsigned long *p1, unsigned long *p2)
>  {
> -       if (in_interrupt()) {
> -               xor_arm4regs_2(bytes, p1, p2);
> -       } else {
> +       if (!in_interrupt() && abs((uintptr_t)p1, (uintptr_t)p2) >= 8) {
>                 kernel_neon_begin();
>                 xor_block_neon_inner.do_2(bytes, p1, p2);
>                 kernel_neon_end();
> +       } else {
> +               xor_arm4regs_2(bytes, p1, p2);
>         }
>  }
> diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
> index b99dd8e1c93f..0e8e474c0523 100644
> --- a/arch/arm/lib/xor-neon.c
> +++ b/arch/arm/lib/xor-neon.c
> @@ -14,22 +14,6 @@ MODULE_LICENSE("GPL");
>  #error You should compile this file with '-march=armv7-a
> -mfloat-abi=softfp -mfpu=neon'
>  #endif
> 
> -/*
> - * Pull in the reference implementations while instructing GCC (through
> - * -ftree-vectorize) to attempt to exploit implicit parallelism and emit
> - * NEON instructions.
> - */
> -#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)
> -#pragma GCC optimize "tree-vectorize"
> -#else
> -/*
> - * While older versions of GCC do not generate incorrect code, they fail to
> - * recognize the parallel nature of these functions, and emit plain ARM code,
> - * which is known to be slower than the optimized ARM code in asm-arm/xor.h.
> - */
> -#warning This code requires at least version 4.6 of GCC
> -#endif
> -
>  #pragma GCC diagnostic ignored "-Wunused-variable"
>  #include <asm-generic/xor.h>
> diff --git a/include/asm-generic/xor.h b/include/asm-generic/xor.h
> index b62a2a56a4d4..69df62095c33 100644
> --- a/include/asm-generic/xor.h
> +++ b/include/asm-generic/xor.h
> @@ -8,7 +8,7 @@
>  #include <linux/prefetch.h>
> 
>  static void
> -xor_8regs_2(unsigned long bytes, unsigned long *p1, unsigned long *p2)
> +xor_8regs_2(unsigned long bytes, unsigned long * __restrict p1,
> unsigned long * __restrict p2)
>  {
>         long lines = bytes / (sizeof (long)) / 8;
> 
> 
> -- 
> Thanks,
> ~Nick Desaulniers

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] arm: lib: xor-neon: remove unnecessary GCC < 4.6 warning
  2021-01-21  4:13         ` Arvind Sankar
@ 2021-01-21  4:24           ` Nick Desaulniers
  2021-01-21  8:46           ` Ard Biesheuvel
  1 sibling, 0 replies; 17+ messages in thread
From: Nick Desaulniers @ 2021-01-21  4:24 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Arnd Bergmann, Ard Biesheuvel, Adrian Ratiu, Arnd Bergmann,
	Linux ARM, Nathan Chancellor, Russell King, clang-built-linux,
	Collabora Kernel ML, Linux Kernel Mailing List

On Wed, Jan 20, 2021 at 8:13 PM Arvind Sankar <nivedita@alum.mit.edu> wrote:
>
> On Wed, Jan 20, 2021 at 03:09:53PM -0800, Nick Desaulniers wrote:
> > So I'm thinking if we extend out this pattern to the rest of the
> > functions, we can actually avoid calls to
> > kernel_neon_begin()/kernel_neon_end() for cases in which pointers
> > would be too close to use the vectorized loop version; meaning for GCC
> > this would be an optimization (don't save neon registers when you're
> > not going to use them).  I would probably consider moving
> > include/asm-generic/xor.h somewhere under arch/arm/
> > perhaps...err...something for the other users of <asm-generic/xor.h>.
>
> We can't directly do the patch below since there are other users of the
> asm-generic/xor.h implementations than just the neon file.  If it's too
> much work to check and add __restrict everywhere, I think we'd either
> need to copy the code into the xor-neon file,

Yes; I'd rather copy that code into xor-neon, or someone under
arch/arm/ at least.

> or maybe do some ifdeffery
> so __restrict is only used for the neon version.

-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v4 1/2] arm: lib: xor-neon: remove unnecessary GCC < 4.6 warning
  2021-01-21  4:13         ` Arvind Sankar
  2021-01-21  4:24           ` Nick Desaulniers
@ 2021-01-21  8:46           ` Ard Biesheuvel
  1 sibling, 0 replies; 17+ messages in thread
From: Ard Biesheuvel @ 2021-01-21  8:46 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Nick Desaulniers, Arnd Bergmann, Adrian Ratiu, Arnd Bergmann,
	Linux ARM, Nathan Chancellor, Russell King, clang-built-linux,
	Collabora Kernel ML, Linux Kernel Mailing List

On Thu, 21 Jan 2021 at 05:13, Arvind Sankar <nivedita@alum.mit.edu> wrote:
>
> On Wed, Jan 20, 2021 at 03:09:53PM -0800, Nick Desaulniers wrote:
> > On Tue, Jan 19, 2021 at 1:35 PM Arnd Bergmann <arnd@kernel.org> wrote:
> > >
> > > On Tue, Jan 19, 2021 at 10:18 PM 'Nick Desaulniers' via Clang Built
> > > Linux <clang-built-linux@googlegroups.com> wrote:
> > > >
> > > > On Tue, Jan 19, 2021 at 5:17 AM Adrian Ratiu <adrian.ratiu@collabora.com> wrote:
> > > > > diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
> > > > > index b99dd8e1c93f..f9f3601cc2d1 100644
> > > > > --- a/arch/arm/lib/xor-neon.c
> > > > > +++ b/arch/arm/lib/xor-neon.c
> > > > > @@ -14,20 +14,22 @@ MODULE_LICENSE("GPL");
> > > > >  #error You should compile this file with '-march=armv7-a -mfloat-abi=softfp -mfpu=neon'
> > > > >  #endif
> > > > >
> > > > > +/*
> > > > > + * TODO: Even though -ftree-vectorize is enabled by default in Clang, the
> > > > > + * compiler does not produce vectorized code due to its cost model.
> > > > > + * See: https://github.com/ClangBuiltLinux/linux/issues/503
> > > > > + */
> > > > > +#ifdef CONFIG_CC_IS_CLANG
> > > > > +#warning Clang does not vectorize code in this file.
> > > > > +#endif
> > > >
> > > > Arnd, remind me again why it's a bug that the compiler's cost model
> > > > says it's faster to not produce a vectorized version of these loops?
> > > > I stand by my previous comment: https://bugs.llvm.org/show_bug.cgi?id=40976#c8
> > >
> > > The point is that without vectorizing the code, there is no point in building
> > > both the default xor code and a "neon" version that has to save/restore
> > > the neon registers but doesn't actually use them.
> >
> > Doesn't that already happen today with GCC when the pointer arguments
> > are overlapping?  The loop is "versioned" and may not actually use the
> > NEON registers at all at runtime for such arguments.
> > https://godbolt.org/z/q48q8v  See also:
> > https://bugs.llvm.org/show_bug.cgi?id=40976#c11.  Or am I missing
> > something?
>
> The gcc version is at least useful when the arguments _don't_ overlap,
> which is presumably most/all the time.
>

Indeed

> There's no point to building this file if it's not going to produce a
> vectorized version at all. The warning seems unnecessary, and it's not
> really a compiler bug either -- until we agree on a way to get clang to
> produce a vectorized version, the best thing would be for the neon
> version to not be built at all with clang. Is that too messy to achieve?
>

+1

> >
> > So I'm thinking if we extend out this pattern to the rest of the
> > functions, we can actually avoid calls to
> > kernel_neon_begin()/kernel_neon_end() for cases in which pointers
> > would be too close to use the vectorized loop version; meaning for GCC
> > this would be an optimization (don't save neon registers when you're
> > not going to use them).  I would probably consider moving
> > include/asm-generic/xor.h somewhere under arch/arm/
> > perhaps...err...something for the other users of <asm-generic/xor.h>.
>
> We can't directly do the patch below since there are other users of the
> asm-generic/xor.h implementations than just the neon file.  If it's too
> much work to check and add __restrict everywhere, I think we'd either
> need to copy the code into the xor-neon file, or maybe do some ifdeffery
> so __restrict is only used for the neon version.
>

Don't mess with the code, please. It's really not worth it.

The current implementation works fine with overlapping inputs, but
works better when they don't. I don't see why we would change that.

If Clang cannot be forced to vectorize this code, then just disable it
for Clang - it's not the end of the world, who runs xor_blocks() on a
hot path on 32-bit ARM anyway?



> >
> > diff --git a/arch/arm/include/asm/xor.h b/arch/arm/include/asm/xor.h
> > index aefddec79286..abd748d317e8 100644
> > --- a/arch/arm/include/asm/xor.h
> > +++ b/arch/arm/include/asm/xor.h
> > @@ -148,12 +148,12 @@ extern struct xor_block_template const
> > xor_block_neon_inner;
> >  static void
> >  xor_neon_2(unsigned long bytes, unsigned long *p1, unsigned long *p2)
> >  {
> > -       if (in_interrupt()) {
> > -               xor_arm4regs_2(bytes, p1, p2);
> > -       } else {
> > +       if (!in_interrupt() && abs((uintptr_t)p1, (uintptr_t)p2) >= 8) {
> >                 kernel_neon_begin();
> >                 xor_block_neon_inner.do_2(bytes, p1, p2);
> >                 kernel_neon_end();
> > +       } else {
> > +               xor_arm4regs_2(bytes, p1, p2);
> >         }
> >  }
> > diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
> > index b99dd8e1c93f..0e8e474c0523 100644
> > --- a/arch/arm/lib/xor-neon.c
> > +++ b/arch/arm/lib/xor-neon.c
> > @@ -14,22 +14,6 @@ MODULE_LICENSE("GPL");
> >  #error You should compile this file with '-march=armv7-a
> > -mfloat-abi=softfp -mfpu=neon'
> >  #endif
> >
> > -/*
> > - * Pull in the reference implementations while instructing GCC (through
> > - * -ftree-vectorize) to attempt to exploit implicit parallelism and emit
> > - * NEON instructions.
> > - */
> > -#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)
> > -#pragma GCC optimize "tree-vectorize"
> > -#else
> > -/*
> > - * While older versions of GCC do not generate incorrect code, they fail to
> > - * recognize the parallel nature of these functions, and emit plain ARM code,
> > - * which is known to be slower than the optimized ARM code in asm-arm/xor.h.
> > - */
> > -#warning This code requires at least version 4.6 of GCC
> > -#endif
> > -
> >  #pragma GCC diagnostic ignored "-Wunused-variable"
> >  #include <asm-generic/xor.h>
> > diff --git a/include/asm-generic/xor.h b/include/asm-generic/xor.h
> > index b62a2a56a4d4..69df62095c33 100644
> > --- a/include/asm-generic/xor.h
> > +++ b/include/asm-generic/xor.h
> > @@ -8,7 +8,7 @@
> >  #include <linux/prefetch.h>
> >
> >  static void
> > -xor_8regs_2(unsigned long bytes, unsigned long *p1, unsigned long *p2)
> > +xor_8regs_2(unsigned long bytes, unsigned long * __restrict p1,
> > unsigned long * __restrict p2)
> >  {
> >         long lines = bytes / (sizeof (long)) / 8;
> >
> >
> > --
> > Thanks,
> > ~Nick Desaulniers

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-01-21  8:50 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-19 13:17 [PATCH v4 0/2] xor-neon: Remove GCC warn & pragmas Adrian Ratiu
2021-01-19 13:17 ` [PATCH v4 1/2] arm: lib: xor-neon: remove unnecessary GCC < 4.6 warning Adrian Ratiu
2021-01-19 21:18   ` Nick Desaulniers
2021-01-19 21:34     ` Arnd Bergmann
2021-01-19 22:04       ` Nick Desaulniers
2021-01-19 23:10         ` Nick Desaulniers
2021-01-20 23:09       ` Nick Desaulniers
2021-01-20 23:15         ` Nick Desaulniers
2021-01-21  4:13         ` Arvind Sankar
2021-01-21  4:24           ` Nick Desaulniers
2021-01-21  8:46           ` Ard Biesheuvel
2021-01-20 13:11     ` Adrian Ratiu
2021-01-19 21:54   ` Nathan Chancellor
2021-01-20 13:18     ` Adrian Ratiu
2021-01-19 22:04   ` David Laight
2021-01-19 22:05     ` Nick Desaulniers
2021-01-19 13:17 ` [PATCH v4 2/2] arm: lib: xor-neon: move pragma options to makefile Adrian Ratiu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).