All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ard Biesheuvel <ardb@kernel.org>
To: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Arnd Bergmann <arnd@arndb.de>,
	Adrian Ratiu <adrian.ratiu@collabora.com>,
	Nick Desaulniers <ndesaulniers@google.com>,
	Russell King <linux@armlinux.org.uk>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	clang-built-linux <clang-built-linux@googlegroups.com>,
	Nathan Chancellor <natechancellor@gmail.com>,
	kernel@collabora.com,
	Linux ARM <linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH 2/2] arm: lib: xor-neon: disable clang vectorization
Date: Sun, 8 Nov 2020 21:14:50 +0100	[thread overview]
Message-ID: <CAMj1kXEhkW=twpprrqjRahBvgiyL6kkSOcJPAGkeOUZ_DC5euQ@mail.gmail.com> (raw)
In-Reply-To: <20201108180942.GA226037@rani.riverdale.lan>

On Sun, 8 Nov 2020 at 19:10, Arvind Sankar <nivedita@alum.mit.edu> wrote:
>
> On Sun, Nov 08, 2020 at 12:40:14PM -0500, Arvind Sankar wrote:
> > On Fri, Nov 06, 2020 at 07:14:36AM +0200, Adrian Ratiu wrote:
> > > Due to a Clang bug [1] neon autoloop vectorization does not happen or
> > > happens badly with no gains and considering previous GCC experiences
> > > which generated unoptimized code which was worse than the default asm
> > > implementation, it is safer to default clang builds to the known good
> > > generic implementation.
> > >
> > > The kernel currently supports a minimum Clang version of v10.0.1, see
> > > commit 1f7a44f63e6c ("compiler-clang: add build check for clang 10.0.1").
> > >
> > > When the bug gets eventually fixed, this commit could be reverted or,
> > > if the minimum clang version bump takes a long time, a warning could
> > > be added for users to upgrade their compilers like was done for GCC.
> > >
> > > [1] https://bugs.llvm.org/show_bug.cgi?id=40976
> > >
> > > Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
> > > ---
> > >  arch/arm/include/asm/xor.h | 3 ++-
> > >  arch/arm/lib/Makefile      | 3 +++
> > >  arch/arm/lib/xor-neon.c    | 4 ++++
> > >  3 files changed, 9 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/arm/include/asm/xor.h b/arch/arm/include/asm/xor.h
> > > index aefddec79286..49937dafaa71 100644
> > > --- a/arch/arm/include/asm/xor.h
> > > +++ b/arch/arm/include/asm/xor.h
> > > @@ -141,7 +141,8 @@ static struct xor_block_template xor_block_arm4regs = {
> > >             NEON_TEMPLATES;                 \
> > >     } while (0)
> > >
> > > -#ifdef CONFIG_KERNEL_MODE_NEON
> > > +/* disabled on clang/arm due to https://bugs.llvm.org/show_bug.cgi?id=40976 */
> > > +#if defined(CONFIG_KERNEL_MODE_NEON) && !defined(CONFIG_CC_IS_CLANG)
> > >
> > >  extern struct xor_block_template const xor_block_neon_inner;
> > >
> > > diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
> > > index 6d2ba454f25b..53f9e7dd9714 100644
> > > --- a/arch/arm/lib/Makefile
> > > +++ b/arch/arm/lib/Makefile
> > > @@ -43,8 +43,11 @@ endif
> > >  $(obj)/csumpartialcopy.o:  $(obj)/csumpartialcopygeneric.S
> > >  $(obj)/csumpartialcopyuser.o:      $(obj)/csumpartialcopygeneric.S
> > >
> > > +# disabled on clang/arm due to https://bugs.llvm.org/show_bug.cgi?id=40976
> > > +ifndef CONFIG_CC_IS_CLANG
> > >  ifeq ($(CONFIG_KERNEL_MODE_NEON),y)
> > >    NEON_FLAGS                       := -march=armv7-a -mfloat-abi=softfp -mfpu=neon
> > >    CFLAGS_xor-neon.o                += $(NEON_FLAGS)
> > >    obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o
> > >  endif
> > > +endif
> > > diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
> > > index e1e76186ec23..84c91c48dfa2 100644
> > > --- a/arch/arm/lib/xor-neon.c
> > > +++ b/arch/arm/lib/xor-neon.c
> > > @@ -18,6 +18,10 @@ MODULE_LICENSE("GPL");
> > >   * Pull in the reference implementations while instructing GCC (through
> > >   * -ftree-vectorize) to attempt to exploit implicit parallelism and emit
> > >   * NEON instructions.
> > > +
> > > + * On Clang the loop vectorizer is enabled by default, but due to a bug
> > > + * (https://bugs.llvm.org/show_bug.cgi?id=40976) vectorization is broke
> > > + * so xor-neon is disabled in favor of the default reg implementations.
> > >   */
> > >  #ifdef CONFIG_CC_IS_GCC
> > >  #pragma GCC optimize "tree-vectorize"
> > > --
> > > 2.29.0
> > >
> >
> > It's actually a bad idea to use #pragma GCC optimize. This is basically
> > the same as tagging all the functions with __attribute__((optimize)),
> > which GCC does not recommend for production use, as it _replaces_
> > optimization options rather than appending to them, and has been
> > observed to result in dropping important compiler flags.
> >
> > There've been a few discussions recently around other such cases:
> > https://lore.kernel.org/lkml/20201028171506.15682-1-ardb@kernel.org/
> > https://lore.kernel.org/lkml/20201028081123.GT2628@hirez.programming.kicks-ass.net/
> >
> > For this file, given that it is supposed to use -ftree-vectorize for the
> > whole file anyway, is there any reason it's not just added to CFLAGS via
> > the Makefile? This seems to be the only use of pragma optimize in the
> > kernel.
>
> Eg, this shows that the pragma results in dropping -fno-strict-aliasing.
> https://godbolt.org/z/1nfrKT
>
> The first function does not use vectorization because s and s->a might
> alias.
>

Thanks, Arvind. I wasn't aware of this issue at the time, but I agree
that we should replace the #pragma with a command line option in this
case.

And given that we already set CFLAGS_xor-neon.o in the Makefile,
adding it there would have been more straight-forward to begin with.

WARNING: multiple messages have this Message-ID (diff)
From: Ard Biesheuvel <ardb@kernel.org>
To: Arvind Sankar <nivedita@alum.mit.edu>
Cc: Arnd Bergmann <arnd@arndb.de>,
	Adrian Ratiu <adrian.ratiu@collabora.com>,
	Nick Desaulniers <ndesaulniers@google.com>,
	Russell King <linux@armlinux.org.uk>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	clang-built-linux <clang-built-linux@googlegroups.com>,
	Nathan Chancellor <natechancellor@gmail.com>,
	kernel@collabora.com,
	Linux ARM <linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH 2/2] arm: lib: xor-neon: disable clang vectorization
Date: Sun, 8 Nov 2020 21:14:50 +0100	[thread overview]
Message-ID: <CAMj1kXEhkW=twpprrqjRahBvgiyL6kkSOcJPAGkeOUZ_DC5euQ@mail.gmail.com> (raw)
In-Reply-To: <20201108180942.GA226037@rani.riverdale.lan>

On Sun, 8 Nov 2020 at 19:10, Arvind Sankar <nivedita@alum.mit.edu> wrote:
>
> On Sun, Nov 08, 2020 at 12:40:14PM -0500, Arvind Sankar wrote:
> > On Fri, Nov 06, 2020 at 07:14:36AM +0200, Adrian Ratiu wrote:
> > > Due to a Clang bug [1] neon autoloop vectorization does not happen or
> > > happens badly with no gains and considering previous GCC experiences
> > > which generated unoptimized code which was worse than the default asm
> > > implementation, it is safer to default clang builds to the known good
> > > generic implementation.
> > >
> > > The kernel currently supports a minimum Clang version of v10.0.1, see
> > > commit 1f7a44f63e6c ("compiler-clang: add build check for clang 10.0.1").
> > >
> > > When the bug gets eventually fixed, this commit could be reverted or,
> > > if the minimum clang version bump takes a long time, a warning could
> > > be added for users to upgrade their compilers like was done for GCC.
> > >
> > > [1] https://bugs.llvm.org/show_bug.cgi?id=40976
> > >
> > > Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
> > > ---
> > >  arch/arm/include/asm/xor.h | 3 ++-
> > >  arch/arm/lib/Makefile      | 3 +++
> > >  arch/arm/lib/xor-neon.c    | 4 ++++
> > >  3 files changed, 9 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/arm/include/asm/xor.h b/arch/arm/include/asm/xor.h
> > > index aefddec79286..49937dafaa71 100644
> > > --- a/arch/arm/include/asm/xor.h
> > > +++ b/arch/arm/include/asm/xor.h
> > > @@ -141,7 +141,8 @@ static struct xor_block_template xor_block_arm4regs = {
> > >             NEON_TEMPLATES;                 \
> > >     } while (0)
> > >
> > > -#ifdef CONFIG_KERNEL_MODE_NEON
> > > +/* disabled on clang/arm due to https://bugs.llvm.org/show_bug.cgi?id=40976 */
> > > +#if defined(CONFIG_KERNEL_MODE_NEON) && !defined(CONFIG_CC_IS_CLANG)
> > >
> > >  extern struct xor_block_template const xor_block_neon_inner;
> > >
> > > diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
> > > index 6d2ba454f25b..53f9e7dd9714 100644
> > > --- a/arch/arm/lib/Makefile
> > > +++ b/arch/arm/lib/Makefile
> > > @@ -43,8 +43,11 @@ endif
> > >  $(obj)/csumpartialcopy.o:  $(obj)/csumpartialcopygeneric.S
> > >  $(obj)/csumpartialcopyuser.o:      $(obj)/csumpartialcopygeneric.S
> > >
> > > +# disabled on clang/arm due to https://bugs.llvm.org/show_bug.cgi?id=40976
> > > +ifndef CONFIG_CC_IS_CLANG
> > >  ifeq ($(CONFIG_KERNEL_MODE_NEON),y)
> > >    NEON_FLAGS                       := -march=armv7-a -mfloat-abi=softfp -mfpu=neon
> > >    CFLAGS_xor-neon.o                += $(NEON_FLAGS)
> > >    obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o
> > >  endif
> > > +endif
> > > diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c
> > > index e1e76186ec23..84c91c48dfa2 100644
> > > --- a/arch/arm/lib/xor-neon.c
> > > +++ b/arch/arm/lib/xor-neon.c
> > > @@ -18,6 +18,10 @@ MODULE_LICENSE("GPL");
> > >   * Pull in the reference implementations while instructing GCC (through
> > >   * -ftree-vectorize) to attempt to exploit implicit parallelism and emit
> > >   * NEON instructions.
> > > +
> > > + * On Clang the loop vectorizer is enabled by default, but due to a bug
> > > + * (https://bugs.llvm.org/show_bug.cgi?id=40976) vectorization is broke
> > > + * so xor-neon is disabled in favor of the default reg implementations.
> > >   */
> > >  #ifdef CONFIG_CC_IS_GCC
> > >  #pragma GCC optimize "tree-vectorize"
> > > --
> > > 2.29.0
> > >
> >
> > It's actually a bad idea to use #pragma GCC optimize. This is basically
> > the same as tagging all the functions with __attribute__((optimize)),
> > which GCC does not recommend for production use, as it _replaces_
> > optimization options rather than appending to them, and has been
> > observed to result in dropping important compiler flags.
> >
> > There've been a few discussions recently around other such cases:
> > https://lore.kernel.org/lkml/20201028171506.15682-1-ardb@kernel.org/
> > https://lore.kernel.org/lkml/20201028081123.GT2628@hirez.programming.kicks-ass.net/
> >
> > For this file, given that it is supposed to use -ftree-vectorize for the
> > whole file anyway, is there any reason it's not just added to CFLAGS via
> > the Makefile? This seems to be the only use of pragma optimize in the
> > kernel.
>
> Eg, this shows that the pragma results in dropping -fno-strict-aliasing.
> https://godbolt.org/z/1nfrKT
>
> The first function does not use vectorization because s and s->a might
> alias.
>

Thanks, Arvind. I wasn't aware of this issue at the time, but I agree
that we should replace the #pragma with a command line option in this
case.

And given that we already set CFLAGS_xor-neon.o in the Makefile,
adding it there would have been more straight-forward to begin with.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-11-08 20:15 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-06  5:14 [PATCH 0/2] arm: lib: xor-neon: Remove warn & disble neon vect Adrian Ratiu
2020-11-06  5:14 ` Adrian Ratiu
2020-11-06  5:14 ` [PATCH 1/2] arm: lib: xor-neon: remove unnecessary GCC < 4.6 warning Adrian Ratiu
2020-11-06  5:14   ` Adrian Ratiu
2020-11-06 14:46   ` Arnd Bergmann
2020-11-06 14:46     ` Arnd Bergmann
2020-11-06 18:03     ` Nathan Chancellor
2020-11-06 18:03       ` Nathan Chancellor
2020-11-06 21:46       ` Arnd Bergmann
2020-11-06 21:46         ` Arnd Bergmann
2020-11-06  5:14 ` [PATCH 2/2] arm: lib: xor-neon: disable clang vectorization Adrian Ratiu
2020-11-06  5:14   ` Adrian Ratiu
2020-11-06 10:14   ` Nathan Chancellor
2020-11-06 10:14     ` Nathan Chancellor
2020-11-06 11:50     ` Adrian Ratiu
2020-11-06 11:50       ` Adrian Ratiu
2020-11-06 18:01       ` Nathan Chancellor
2020-11-06 18:01         ` Nathan Chancellor
2020-11-06 19:52       ` Nick Desaulniers
2020-11-06 19:52         ` Nick Desaulniers
2020-11-07 18:07         ` Adrian Ratiu
2020-11-07 18:07           ` Adrian Ratiu
2020-11-09 19:53         ` Adrian Ratiu
2020-11-09 19:53           ` Adrian Ratiu
2020-11-10 21:41           ` Nick Desaulniers
2020-11-10 21:41             ` Nick Desaulniers
2020-11-10 22:15             ` Arvind Sankar
2020-11-10 22:15               ` Arvind Sankar
2020-11-10 22:36               ` Nick Desaulniers
2020-11-10 22:36                 ` Nick Desaulniers
2020-11-10 22:39                 ` Nick Desaulniers
2020-11-10 22:39                   ` Nick Desaulniers
2020-11-10 22:39                   ` Nick Desaulniers
2020-11-10 22:39                     ` Nick Desaulniers
2020-11-10 22:54                     ` Arvind Sankar
2020-11-10 22:54                       ` Arvind Sankar
2020-11-10 23:56             ` Adrian Ratiu
2020-11-10 23:56               ` Adrian Ratiu
2020-11-11  0:18               ` Nick Desaulniers
2020-11-11  0:18                 ` Nick Desaulniers
2020-11-11 14:15                 ` Adrian Ratiu
2020-11-11 14:15                   ` Adrian Ratiu
2020-11-12 21:50                   ` Arvind Sankar
2020-11-12 21:50                     ` Arvind Sankar
2020-11-12 21:55                     ` Nick Desaulniers
2020-11-12 21:55                       ` Nick Desaulniers
2020-11-07 10:22   ` Russell King - ARM Linux admin
2020-11-07 10:22     ` Russell King - ARM Linux admin
2020-11-07 18:12     ` Adrian Ratiu
2020-11-07 18:12       ` Adrian Ratiu
2020-11-08 17:40   ` Arvind Sankar
2020-11-08 17:40     ` Arvind Sankar
2020-11-08 18:09     ` Arvind Sankar
2020-11-08 18:09       ` Arvind Sankar
2020-11-08 20:14       ` Ard Biesheuvel [this message]
2020-11-08 20:14         ` Ard Biesheuvel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAMj1kXEhkW=twpprrqjRahBvgiyL6kkSOcJPAGkeOUZ_DC5euQ@mail.gmail.com' \
    --to=ardb@kernel.org \
    --cc=adrian.ratiu@collabora.com \
    --cc=arnd@arndb.de \
    --cc=clang-built-linux@googlegroups.com \
    --cc=kernel@collabora.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=natechancellor@gmail.com \
    --cc=ndesaulniers@google.com \
    --cc=nivedita@alum.mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.