From: Nick Desaulniers <ndesaulniers@google.com> To: Adrian Ratiu <adrian.ratiu@collabora.com> Cc: Nathan Chancellor <natechancellor@gmail.com>, Arnd Bergmann <arnd@arndb.de>, Linux ARM <linux-arm-kernel@lists.infradead.org>, clang-built-linux <clang-built-linux@googlegroups.com>, Russell King <linux@armlinux.org.uk>, LKML <linux-kernel@vger.kernel.org>, Collabora Kernel ML <kernel@collabora.com>, Ard Biesheuvel <ardb@kernel.org> Subject: Re: [PATCH 2/2] arm: lib: xor-neon: disable clang vectorization Date: Fri, 6 Nov 2020 11:52:46 -0800 [thread overview] Message-ID: <CAKwvOdkodob0M0r_AK_4nG3atLGMyNENMd6qVAHSPa92Zh7UZA@mail.gmail.com> (raw) In-Reply-To: <87wnyyvh56.fsf@collabora.com> On Fri, Nov 6, 2020 at 3:50 AM Adrian Ratiu <adrian.ratiu@collabora.com> wrote: > > Hi Nathan, > > On Fri, 06 Nov 2020, Nathan Chancellor <natechancellor@gmail.com> > wrote: > > + Ard, who wrote this code. > > > > On Fri, Nov 06, 2020 at 07:14:36AM +0200, Adrian Ratiu wrote: > >> Due to a Clang bug [1] neon autoloop vectorization does not > >> happen or happens badly with no gains and considering previous > >> GCC experiences which generated unoptimized code which was > >> worse than the default asm implementation, it is safer to > >> default clang builds to the known good generic implementation. > >> The kernel currently supports a minimum Clang version of > >> v10.0.1, see commit 1f7a44f63e6c ("compiler-clang: add build > >> check for clang 10.0.1"). When the bug gets eventually fixed, > >> this commit could be reverted or, if the minimum clang version > >> bump takes a long time, a warning could be added for users to > >> upgrade their compilers like was done for GCC. [1] > >> https://bugs.llvm.org/show_bug.cgi?id=40976 Signed-off-by: > >> Adrian Ratiu <adrian.ratiu@collabora.com> > > > > Thank you for the patch! We are also tracking this here: > > > > https://github.com/ClangBuiltLinux/linux/issues/496 > > > > It was on my TODO to revist getting the warning eliminated, > > which likely would have involved a patch like this as well. > > > > I am curious if it is worth revisting or dusting off Arnd's > > patch in the LLVM bug tracker first. I have not tried it > > personally. If that is not a worthwhile option, I am fine with > > this for now. It would be nice to try and get a fix pinned down > > on the LLVM side at some point but alas, finite amount of > > resources and people :( > > I tested Arnd's kernel patch from the LLVM bugtracker [1], but > with the Clang v10.0.1 I still get warnings like the following > even though the __restrict workaround seems to affect the > generated instructions: > > ./include/asm-generic/xor.h:15:2: remark: the cost-model indicates > that interleaving is not beneficial [-Rpass-missed=loop-vectorize] > ./include/asm-generic/xor.h:11:1: remark: List vectorization was > possible but not beneficial with cost 0 >= 0 > [-Rpass-missed=slp-vectorizer] xor_8regs_2(unsigned long bytes, > unsigned long *__restrict p1, unsigned long *__restrict p2) If it's just a matter of overruling the cost model #pragma clang loop vectorize(enable) will do the trick. Indeed, ``` diff --git a/include/asm-generic/xor.h b/include/asm-generic/xor.h index b62a2a56a4d4..8796955498b7 100644 --- a/include/asm-generic/xor.h +++ b/include/asm-generic/xor.h @@ -12,6 +12,7 @@ xor_8regs_2(unsigned long bytes, unsigned long *p1, unsigned long *p2) { long lines = bytes / (sizeof (long)) / 8; +#pragma clang loop vectorize(enable) do { p1[0] ^= p2[0]; p1[1] ^= p2[1]; @@ -32,6 +33,7 @@ xor_8regs_3(unsigned long bytes, unsigned long *p1, unsigned long *p2, { long lines = bytes / (sizeof (long)) / 8; +#pragma clang loop vectorize(enable) do { p1[0] ^= p2[0] ^ p3[0]; p1[1] ^= p2[1] ^ p3[1]; @@ -53,6 +55,7 @@ xor_8regs_4(unsigned long bytes, unsigned long *p1, unsigned long *p2, { long lines = bytes / (sizeof (long)) / 8; +#pragma clang loop vectorize(enable) do { p1[0] ^= p2[0] ^ p3[0] ^ p4[0]; p1[1] ^= p2[1] ^ p3[1] ^ p4[1]; @@ -75,6 +78,7 @@ xor_8regs_5(unsigned long bytes, unsigned long *p1, unsigned long *p2, { long lines = bytes / (sizeof (long)) / 8; +#pragma clang loop vectorize(enable) do { p1[0] ^= p2[0] ^ p3[0] ^ p4[0] ^ p5[0]; p1[1] ^= p2[1] ^ p3[1] ^ p4[1] ^ p5[1]; ``` seems to generate the vectorized code. Why don't we find a way to make those pragma's more toolchain portable, rather than open coding them like I have above rather than this series? -- Thanks, ~Nick Desaulniers
WARNING: multiple messages have this Message-ID (diff)
From: Nick Desaulniers <ndesaulniers@google.com> To: Adrian Ratiu <adrian.ratiu@collabora.com> Cc: Arnd Bergmann <arnd@arndb.de>, LKML <linux-kernel@vger.kernel.org>, Russell King <linux@armlinux.org.uk>, clang-built-linux <clang-built-linux@googlegroups.com>, Nathan Chancellor <natechancellor@gmail.com>, Collabora Kernel ML <kernel@collabora.com>, Ard Biesheuvel <ardb@kernel.org>, Linux ARM <linux-arm-kernel@lists.infradead.org> Subject: Re: [PATCH 2/2] arm: lib: xor-neon: disable clang vectorization Date: Fri, 6 Nov 2020 11:52:46 -0800 [thread overview] Message-ID: <CAKwvOdkodob0M0r_AK_4nG3atLGMyNENMd6qVAHSPa92Zh7UZA@mail.gmail.com> (raw) In-Reply-To: <87wnyyvh56.fsf@collabora.com> On Fri, Nov 6, 2020 at 3:50 AM Adrian Ratiu <adrian.ratiu@collabora.com> wrote: > > Hi Nathan, > > On Fri, 06 Nov 2020, Nathan Chancellor <natechancellor@gmail.com> > wrote: > > + Ard, who wrote this code. > > > > On Fri, Nov 06, 2020 at 07:14:36AM +0200, Adrian Ratiu wrote: > >> Due to a Clang bug [1] neon autoloop vectorization does not > >> happen or happens badly with no gains and considering previous > >> GCC experiences which generated unoptimized code which was > >> worse than the default asm implementation, it is safer to > >> default clang builds to the known good generic implementation. > >> The kernel currently supports a minimum Clang version of > >> v10.0.1, see commit 1f7a44f63e6c ("compiler-clang: add build > >> check for clang 10.0.1"). When the bug gets eventually fixed, > >> this commit could be reverted or, if the minimum clang version > >> bump takes a long time, a warning could be added for users to > >> upgrade their compilers like was done for GCC. [1] > >> https://bugs.llvm.org/show_bug.cgi?id=40976 Signed-off-by: > >> Adrian Ratiu <adrian.ratiu@collabora.com> > > > > Thank you for the patch! We are also tracking this here: > > > > https://github.com/ClangBuiltLinux/linux/issues/496 > > > > It was on my TODO to revist getting the warning eliminated, > > which likely would have involved a patch like this as well. > > > > I am curious if it is worth revisting or dusting off Arnd's > > patch in the LLVM bug tracker first. I have not tried it > > personally. If that is not a worthwhile option, I am fine with > > this for now. It would be nice to try and get a fix pinned down > > on the LLVM side at some point but alas, finite amount of > > resources and people :( > > I tested Arnd's kernel patch from the LLVM bugtracker [1], but > with the Clang v10.0.1 I still get warnings like the following > even though the __restrict workaround seems to affect the > generated instructions: > > ./include/asm-generic/xor.h:15:2: remark: the cost-model indicates > that interleaving is not beneficial [-Rpass-missed=loop-vectorize] > ./include/asm-generic/xor.h:11:1: remark: List vectorization was > possible but not beneficial with cost 0 >= 0 > [-Rpass-missed=slp-vectorizer] xor_8regs_2(unsigned long bytes, > unsigned long *__restrict p1, unsigned long *__restrict p2) If it's just a matter of overruling the cost model #pragma clang loop vectorize(enable) will do the trick. Indeed, ``` diff --git a/include/asm-generic/xor.h b/include/asm-generic/xor.h index b62a2a56a4d4..8796955498b7 100644 --- a/include/asm-generic/xor.h +++ b/include/asm-generic/xor.h @@ -12,6 +12,7 @@ xor_8regs_2(unsigned long bytes, unsigned long *p1, unsigned long *p2) { long lines = bytes / (sizeof (long)) / 8; +#pragma clang loop vectorize(enable) do { p1[0] ^= p2[0]; p1[1] ^= p2[1]; @@ -32,6 +33,7 @@ xor_8regs_3(unsigned long bytes, unsigned long *p1, unsigned long *p2, { long lines = bytes / (sizeof (long)) / 8; +#pragma clang loop vectorize(enable) do { p1[0] ^= p2[0] ^ p3[0]; p1[1] ^= p2[1] ^ p3[1]; @@ -53,6 +55,7 @@ xor_8regs_4(unsigned long bytes, unsigned long *p1, unsigned long *p2, { long lines = bytes / (sizeof (long)) / 8; +#pragma clang loop vectorize(enable) do { p1[0] ^= p2[0] ^ p3[0] ^ p4[0]; p1[1] ^= p2[1] ^ p3[1] ^ p4[1]; @@ -75,6 +78,7 @@ xor_8regs_5(unsigned long bytes, unsigned long *p1, unsigned long *p2, { long lines = bytes / (sizeof (long)) / 8; +#pragma clang loop vectorize(enable) do { p1[0] ^= p2[0] ^ p3[0] ^ p4[0] ^ p5[0]; p1[1] ^= p2[1] ^ p3[1] ^ p4[1] ^ p5[1]; ``` seems to generate the vectorized code. Why don't we find a way to make those pragma's more toolchain portable, rather than open coding them like I have above rather than this series? -- Thanks, ~Nick Desaulniers _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2020-11-06 19:53 UTC|newest] Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-11-06 5:14 [PATCH 0/2] arm: lib: xor-neon: Remove warn & disble neon vect Adrian Ratiu 2020-11-06 5:14 ` Adrian Ratiu 2020-11-06 5:14 ` [PATCH 1/2] arm: lib: xor-neon: remove unnecessary GCC < 4.6 warning Adrian Ratiu 2020-11-06 5:14 ` Adrian Ratiu 2020-11-06 14:46 ` Arnd Bergmann 2020-11-06 14:46 ` Arnd Bergmann 2020-11-06 18:03 ` Nathan Chancellor 2020-11-06 18:03 ` Nathan Chancellor 2020-11-06 21:46 ` Arnd Bergmann 2020-11-06 21:46 ` Arnd Bergmann 2020-11-06 5:14 ` [PATCH 2/2] arm: lib: xor-neon: disable clang vectorization Adrian Ratiu 2020-11-06 5:14 ` Adrian Ratiu 2020-11-06 10:14 ` Nathan Chancellor 2020-11-06 10:14 ` Nathan Chancellor 2020-11-06 11:50 ` Adrian Ratiu 2020-11-06 11:50 ` Adrian Ratiu 2020-11-06 18:01 ` Nathan Chancellor 2020-11-06 18:01 ` Nathan Chancellor 2020-11-06 19:52 ` Nick Desaulniers [this message] 2020-11-06 19:52 ` Nick Desaulniers 2020-11-07 18:07 ` Adrian Ratiu 2020-11-07 18:07 ` Adrian Ratiu 2020-11-09 19:53 ` Adrian Ratiu 2020-11-09 19:53 ` Adrian Ratiu 2020-11-10 21:41 ` Nick Desaulniers 2020-11-10 21:41 ` Nick Desaulniers 2020-11-10 22:15 ` Arvind Sankar 2020-11-10 22:15 ` Arvind Sankar 2020-11-10 22:36 ` Nick Desaulniers 2020-11-10 22:36 ` Nick Desaulniers 2020-11-10 22:39 ` Nick Desaulniers 2020-11-10 22:39 ` Nick Desaulniers 2020-11-10 22:39 ` Nick Desaulniers 2020-11-10 22:39 ` Nick Desaulniers 2020-11-10 22:54 ` Arvind Sankar 2020-11-10 22:54 ` Arvind Sankar 2020-11-10 23:56 ` Adrian Ratiu 2020-11-10 23:56 ` Adrian Ratiu 2020-11-11 0:18 ` Nick Desaulniers 2020-11-11 0:18 ` Nick Desaulniers 2020-11-11 14:15 ` Adrian Ratiu 2020-11-11 14:15 ` Adrian Ratiu 2020-11-12 21:50 ` Arvind Sankar 2020-11-12 21:50 ` Arvind Sankar 2020-11-12 21:55 ` Nick Desaulniers 2020-11-12 21:55 ` Nick Desaulniers 2020-11-07 10:22 ` Russell King - ARM Linux admin 2020-11-07 10:22 ` Russell King - ARM Linux admin 2020-11-07 18:12 ` Adrian Ratiu 2020-11-07 18:12 ` Adrian Ratiu 2020-11-08 17:40 ` Arvind Sankar 2020-11-08 17:40 ` Arvind Sankar 2020-11-08 18:09 ` Arvind Sankar 2020-11-08 18:09 ` Arvind Sankar 2020-11-08 20:14 ` Ard Biesheuvel 2020-11-08 20:14 ` Ard Biesheuvel
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=CAKwvOdkodob0M0r_AK_4nG3atLGMyNENMd6qVAHSPa92Zh7UZA@mail.gmail.com \ --to=ndesaulniers@google.com \ --cc=adrian.ratiu@collabora.com \ --cc=ardb@kernel.org \ --cc=arnd@arndb.de \ --cc=clang-built-linux@googlegroups.com \ --cc=kernel@collabora.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux@armlinux.org.uk \ --cc=natechancellor@gmail.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.