From: Nick Desaulniers <ndesaulniers@google.com> To: Adrian Ratiu <adrian.ratiu@collabora.com> Cc: Nathan Chancellor <natechancellor@gmail.com>, Arnd Bergmann <arnd@arndb.de>, Linux ARM <linux-arm-kernel@lists.infradead.org>, clang-built-linux <clang-built-linux@googlegroups.com>, Russell King <linux@armlinux.org.uk>, LKML <linux-kernel@vger.kernel.org>, Collabora Kernel ML <kernel@collabora.com>, Ard Biesheuvel <ardb@kernel.org> Subject: Re: [PATCH 2/2] arm: lib: xor-neon: disable clang vectorization Date: Tue, 10 Nov 2020 13:41:17 -0800 [thread overview] Message-ID: <CAKwvOdkm3u83TQDBB-fC0TwKZCFXGh5sAfahKXxA+mnzgDid_w@mail.gmail.com> (raw) In-Reply-To: <871rh2i9xg.fsf@iwork.i-did-not-set--mail-host-address--so-tickle-me> On Mon, Nov 9, 2020 at 11:51 AM Adrian Ratiu <adrian.ratiu@collabora.com> wrote: > > On Fri, 06 Nov 2020, Nick Desaulniers <ndesaulniers@google.com> > wrote: > > +#pragma clang loop vectorize(enable) > > do { > > p1[0] ^= p2[0] ^ p3[0] ^ p4[0] ^ p5[0]; p1[1] ^= > > p2[1] ^ p3[1] ^ p4[1] ^ p5[1]; > > ``` seems to generate the vectorized code. > > > > Why don't we find a way to make those pragma's more toolchain > > portable, rather than open coding them like I have above rather > > than this series? > > Hi again Nick, > > How did you verify the above pragmas generate correct vectorized > code? Have you tested this specific use case? I read the disassembly before and after my suggested use of pragmas; look for vld/vstr. You can also add -Rpass-missed=loop-vectorize to CFLAGS_xor-neon.o in arch/arm/lib/Makefile and rebuild arch/arm/lib/xor-neon.o with CONFIG_BTRFS enabled. > > I'm asking because overrulling the cost model might not be enough, > the only thing I can confirm is that the generated code is > changed, but not that it is correct in any way. The object disasm > also looks weird, but I don't have enough knowledge to start > debugging what's happening within LLVM/Clang itself. It doesn't "look weird" to me. The loop is versioned based on a comparison whether the parameters alias or not. There's a non-vectorized version if the parameters are equal or close enough to overlap. There's another version of the loop that's vectorized. If you want just the vectorized version, then you have to mark the parameters as __restrict qualified, then check that all callers are ok with that. > > I also get some new warnings with your code [1], besides the > previously 'vectorization was possible but not beneficial' which > is still present. It is quite funny because these two warnings > seem to contradict themselves. :) From which compiler? ``` $ clang -Wpass-failed=transform-warning -c -x c /dev/null warning: unknown warning option '-Wpass-failed=transform-warning'; did you mean '-Wprofile-instr-missing'? [-Wunknown-warning-option] ``` The pragma is clang specific, hence my recommendation to wrap it in an #ifdef __clang__. > > At this point I do not trust the compiler and am inclined to do Nonsense. > like was done for GCC when it was broken: disable the optimization > and warn users to upgrade after the compiler is fixed and > confirmed to work. > > If you agree I can send a v2 with this and also drop the GCC > pragma as Arvind and Ard suggested. If you resend "this" as in 2/2, I will NACK it. There's nothing wrong with the cost model; it's saying there's little point in generating the vectorized version because you're still going to need a non-vectorized loop version anyways. Claiming there is a compiler bug here is dubious just because the cost models between two compilers differ slightly. Resend the patch removing the warning, remove the GCC pragma, but if you want to change anything here for Clang, use `#pragma clang loop vectorize(enable)` wrapped in an `#ifdef __clang__`. > > Kind regards, > Adrian > > [1] > ./include/asm-generic/xor.h:11:1: warning: loop not vectorized: > the optimizer was unable to perform the requested transformation; > the transformation might be disabled or specified as part of an > unsupported transformation ordering > [-Wpass-failed=transform-warning] xor_8regs_2(unsigned long bytes, > unsigned long *p1, unsigned long *p2) -- Thanks, ~Nick Desaulniers
WARNING: multiple messages have this Message-ID (diff)
From: Nick Desaulniers <ndesaulniers@google.com> To: Adrian Ratiu <adrian.ratiu@collabora.com> Cc: Arnd Bergmann <arnd@arndb.de>, LKML <linux-kernel@vger.kernel.org>, Russell King <linux@armlinux.org.uk>, clang-built-linux <clang-built-linux@googlegroups.com>, Nathan Chancellor <natechancellor@gmail.com>, Collabora Kernel ML <kernel@collabora.com>, Ard Biesheuvel <ardb@kernel.org>, Linux ARM <linux-arm-kernel@lists.infradead.org> Subject: Re: [PATCH 2/2] arm: lib: xor-neon: disable clang vectorization Date: Tue, 10 Nov 2020 13:41:17 -0800 [thread overview] Message-ID: <CAKwvOdkm3u83TQDBB-fC0TwKZCFXGh5sAfahKXxA+mnzgDid_w@mail.gmail.com> (raw) In-Reply-To: <871rh2i9xg.fsf@iwork.i-did-not-set--mail-host-address--so-tickle-me> On Mon, Nov 9, 2020 at 11:51 AM Adrian Ratiu <adrian.ratiu@collabora.com> wrote: > > On Fri, 06 Nov 2020, Nick Desaulniers <ndesaulniers@google.com> > wrote: > > +#pragma clang loop vectorize(enable) > > do { > > p1[0] ^= p2[0] ^ p3[0] ^ p4[0] ^ p5[0]; p1[1] ^= > > p2[1] ^ p3[1] ^ p4[1] ^ p5[1]; > > ``` seems to generate the vectorized code. > > > > Why don't we find a way to make those pragma's more toolchain > > portable, rather than open coding them like I have above rather > > than this series? > > Hi again Nick, > > How did you verify the above pragmas generate correct vectorized > code? Have you tested this specific use case? I read the disassembly before and after my suggested use of pragmas; look for vld/vstr. You can also add -Rpass-missed=loop-vectorize to CFLAGS_xor-neon.o in arch/arm/lib/Makefile and rebuild arch/arm/lib/xor-neon.o with CONFIG_BTRFS enabled. > > I'm asking because overrulling the cost model might not be enough, > the only thing I can confirm is that the generated code is > changed, but not that it is correct in any way. The object disasm > also looks weird, but I don't have enough knowledge to start > debugging what's happening within LLVM/Clang itself. It doesn't "look weird" to me. The loop is versioned based on a comparison whether the parameters alias or not. There's a non-vectorized version if the parameters are equal or close enough to overlap. There's another version of the loop that's vectorized. If you want just the vectorized version, then you have to mark the parameters as __restrict qualified, then check that all callers are ok with that. > > I also get some new warnings with your code [1], besides the > previously 'vectorization was possible but not beneficial' which > is still present. It is quite funny because these two warnings > seem to contradict themselves. :) From which compiler? ``` $ clang -Wpass-failed=transform-warning -c -x c /dev/null warning: unknown warning option '-Wpass-failed=transform-warning'; did you mean '-Wprofile-instr-missing'? [-Wunknown-warning-option] ``` The pragma is clang specific, hence my recommendation to wrap it in an #ifdef __clang__. > > At this point I do not trust the compiler and am inclined to do Nonsense. > like was done for GCC when it was broken: disable the optimization > and warn users to upgrade after the compiler is fixed and > confirmed to work. > > If you agree I can send a v2 with this and also drop the GCC > pragma as Arvind and Ard suggested. If you resend "this" as in 2/2, I will NACK it. There's nothing wrong with the cost model; it's saying there's little point in generating the vectorized version because you're still going to need a non-vectorized loop version anyways. Claiming there is a compiler bug here is dubious just because the cost models between two compilers differ slightly. Resend the patch removing the warning, remove the GCC pragma, but if you want to change anything here for Clang, use `#pragma clang loop vectorize(enable)` wrapped in an `#ifdef __clang__`. > > Kind regards, > Adrian > > [1] > ./include/asm-generic/xor.h:11:1: warning: loop not vectorized: > the optimizer was unable to perform the requested transformation; > the transformation might be disabled or specified as part of an > unsupported transformation ordering > [-Wpass-failed=transform-warning] xor_8regs_2(unsigned long bytes, > unsigned long *p1, unsigned long *p2) -- Thanks, ~Nick Desaulniers _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2020-11-10 21:42 UTC|newest] Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-11-06 5:14 [PATCH 0/2] arm: lib: xor-neon: Remove warn & disble neon vect Adrian Ratiu 2020-11-06 5:14 ` Adrian Ratiu 2020-11-06 5:14 ` [PATCH 1/2] arm: lib: xor-neon: remove unnecessary GCC < 4.6 warning Adrian Ratiu 2020-11-06 5:14 ` Adrian Ratiu 2020-11-06 14:46 ` Arnd Bergmann 2020-11-06 14:46 ` Arnd Bergmann 2020-11-06 18:03 ` Nathan Chancellor 2020-11-06 18:03 ` Nathan Chancellor 2020-11-06 21:46 ` Arnd Bergmann 2020-11-06 21:46 ` Arnd Bergmann 2020-11-06 5:14 ` [PATCH 2/2] arm: lib: xor-neon: disable clang vectorization Adrian Ratiu 2020-11-06 5:14 ` Adrian Ratiu 2020-11-06 10:14 ` Nathan Chancellor 2020-11-06 10:14 ` Nathan Chancellor 2020-11-06 11:50 ` Adrian Ratiu 2020-11-06 11:50 ` Adrian Ratiu 2020-11-06 18:01 ` Nathan Chancellor 2020-11-06 18:01 ` Nathan Chancellor 2020-11-06 19:52 ` Nick Desaulniers 2020-11-06 19:52 ` Nick Desaulniers 2020-11-07 18:07 ` Adrian Ratiu 2020-11-07 18:07 ` Adrian Ratiu 2020-11-09 19:53 ` Adrian Ratiu 2020-11-09 19:53 ` Adrian Ratiu 2020-11-10 21:41 ` Nick Desaulniers [this message] 2020-11-10 21:41 ` Nick Desaulniers 2020-11-10 22:15 ` Arvind Sankar 2020-11-10 22:15 ` Arvind Sankar 2020-11-10 22:36 ` Nick Desaulniers 2020-11-10 22:36 ` Nick Desaulniers 2020-11-10 22:39 ` Nick Desaulniers 2020-11-10 22:39 ` Nick Desaulniers 2020-11-10 22:39 ` Nick Desaulniers 2020-11-10 22:39 ` Nick Desaulniers 2020-11-10 22:54 ` Arvind Sankar 2020-11-10 22:54 ` Arvind Sankar 2020-11-10 23:56 ` Adrian Ratiu 2020-11-10 23:56 ` Adrian Ratiu 2020-11-11 0:18 ` Nick Desaulniers 2020-11-11 0:18 ` Nick Desaulniers 2020-11-11 14:15 ` Adrian Ratiu 2020-11-11 14:15 ` Adrian Ratiu 2020-11-12 21:50 ` Arvind Sankar 2020-11-12 21:50 ` Arvind Sankar 2020-11-12 21:55 ` Nick Desaulniers 2020-11-12 21:55 ` Nick Desaulniers 2020-11-07 10:22 ` Russell King - ARM Linux admin 2020-11-07 10:22 ` Russell King - ARM Linux admin 2020-11-07 18:12 ` Adrian Ratiu 2020-11-07 18:12 ` Adrian Ratiu 2020-11-08 17:40 ` Arvind Sankar 2020-11-08 17:40 ` Arvind Sankar 2020-11-08 18:09 ` Arvind Sankar 2020-11-08 18:09 ` Arvind Sankar 2020-11-08 20:14 ` Ard Biesheuvel 2020-11-08 20:14 ` Ard Biesheuvel
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=CAKwvOdkm3u83TQDBB-fC0TwKZCFXGh5sAfahKXxA+mnzgDid_w@mail.gmail.com \ --to=ndesaulniers@google.com \ --cc=adrian.ratiu@collabora.com \ --cc=ardb@kernel.org \ --cc=arnd@arndb.de \ --cc=clang-built-linux@googlegroups.com \ --cc=kernel@collabora.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux@armlinux.org.uk \ --cc=natechancellor@gmail.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.