* [PATCH] powerpc: workaround clang codegen bug in dcbz @ 2019-07-29 20:25 ` Nick Desaulniers 0 siblings, 0 replies; 52+ messages in thread From: Nick Desaulniers @ 2019-07-29 20:25 UTC (permalink / raw) To: mpe Cc: christophe.leroy, segher, arnd, Nick Desaulniers, Nathan Chancellor, kbuild test robot, Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev, linux-kernel, clang-built-linux Commit 6c5875843b87 ("powerpc: slightly improve cache helpers") exposed what looks like a codegen bug in Clang's handling of `%y` output template with `Z` constraint. This is resulting in panics during boot for 32b powerpc builds w/ Clang, as reported by our CI. Add back the original code that worked behind a preprocessor check for __clang__ until we can fix LLVM. Further, it seems that clang allnoconfig builds are unhappy with `Z`, as reported by 0day bot. This is likely because Clang warns about inline asm constraints when the constraint requires inlining to be semantically valid. Link: https://bugs.llvm.org/show_bug.cgi?id=42762 Link: https://github.com/ClangBuiltLinux/linux/issues/593 Link: https://lore.kernel.org/lkml/20190721075846.GA97701@archlinux-threadripper/ Debugged-by: Nathan Chancellor <natechancellor@gmail.com> Reported-by: Nathan Chancellor <natechancellor@gmail.com> Reported-by: kbuild test robot <lkp@intel.com> Suggested-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> --- Alternatively, we could just revert 6c5875843b87. It seems that GCC generates the same code for these functions for out of line versions. But I'm not sure how the inlined code generated would be affected. arch/powerpc/include/asm/cache.h | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h index b3388d95f451..72983da94dce 100644 --- a/arch/powerpc/include/asm/cache.h +++ b/arch/powerpc/include/asm/cache.h @@ -105,6 +105,30 @@ extern void _set_L3CR(unsigned long); #define _set_L3CR(val) do { } while(0) #endif +/* + * Workaround for https://bugs.llvm.org/show_bug.cgi?id=42762. + */ +#ifdef __clang__ +static inline void dcbz(void *addr) +{ + __asm__ __volatile__ ("dcbz 0, %0" : : "r"(addr) : "memory"); +} + +static inline void dcbi(void *addr) +{ + __asm__ __volatile__ ("dcbi 0, %0" : : "r"(addr) : "memory"); +} + +static inline void dcbf(void *addr) +{ + __asm__ __volatile__ ("dcbf 0, %0" : : "r"(addr) : "memory"); +} + +static inline void dcbst(void *addr) +{ + __asm__ __volatile__ ("dcbst 0, %0" : : "r"(addr) : "memory"); +} +#else static inline void dcbz(void *addr) { __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory"); @@ -124,6 +148,7 @@ static inline void dcbst(void *addr) { __asm__ __volatile__ ("dcbst %y0" : : "Z"(*(u8 *)addr) : "memory"); } +#endif /* __clang__ */ #endif /* !__ASSEMBLY__ */ #endif /* __KERNEL__ */ #endif /* _ASM_POWERPC_CACHE_H */ -- 2.22.0.709.g102302147b-goog ^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH] powerpc: workaround clang codegen bug in dcbz @ 2019-07-29 20:25 ` Nick Desaulniers 0 siblings, 0 replies; 52+ messages in thread From: Nick Desaulniers @ 2019-07-29 20:25 UTC (permalink / raw) To: mpe Cc: arnd, Nick Desaulniers, linux-kernel, clang-built-linux, Paul Mackerras, Nathan Chancellor, linuxppc-dev, kbuild test robot Commit 6c5875843b87 ("powerpc: slightly improve cache helpers") exposed what looks like a codegen bug in Clang's handling of `%y` output template with `Z` constraint. This is resulting in panics during boot for 32b powerpc builds w/ Clang, as reported by our CI. Add back the original code that worked behind a preprocessor check for __clang__ until we can fix LLVM. Further, it seems that clang allnoconfig builds are unhappy with `Z`, as reported by 0day bot. This is likely because Clang warns about inline asm constraints when the constraint requires inlining to be semantically valid. Link: https://bugs.llvm.org/show_bug.cgi?id=42762 Link: https://github.com/ClangBuiltLinux/linux/issues/593 Link: https://lore.kernel.org/lkml/20190721075846.GA97701@archlinux-threadripper/ Debugged-by: Nathan Chancellor <natechancellor@gmail.com> Reported-by: Nathan Chancellor <natechancellor@gmail.com> Reported-by: kbuild test robot <lkp@intel.com> Suggested-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> --- Alternatively, we could just revert 6c5875843b87. It seems that GCC generates the same code for these functions for out of line versions. But I'm not sure how the inlined code generated would be affected. arch/powerpc/include/asm/cache.h | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h index b3388d95f451..72983da94dce 100644 --- a/arch/powerpc/include/asm/cache.h +++ b/arch/powerpc/include/asm/cache.h @@ -105,6 +105,30 @@ extern void _set_L3CR(unsigned long); #define _set_L3CR(val) do { } while(0) #endif +/* + * Workaround for https://bugs.llvm.org/show_bug.cgi?id=42762. + */ +#ifdef __clang__ +static inline void dcbz(void *addr) +{ + __asm__ __volatile__ ("dcbz 0, %0" : : "r"(addr) : "memory"); +} + +static inline void dcbi(void *addr) +{ + __asm__ __volatile__ ("dcbi 0, %0" : : "r"(addr) : "memory"); +} + +static inline void dcbf(void *addr) +{ + __asm__ __volatile__ ("dcbf 0, %0" : : "r"(addr) : "memory"); +} + +static inline void dcbst(void *addr) +{ + __asm__ __volatile__ ("dcbst 0, %0" : : "r"(addr) : "memory"); +} +#else static inline void dcbz(void *addr) { __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory"); @@ -124,6 +148,7 @@ static inline void dcbst(void *addr) { __asm__ __volatile__ ("dcbst %y0" : : "Z"(*(u8 *)addr) : "memory"); } +#endif /* __clang__ */ #endif /* !__ASSEMBLY__ */ #endif /* __KERNEL__ */ #endif /* _ASM_POWERPC_CACHE_H */ -- 2.22.0.709.g102302147b-goog ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz 2019-07-29 20:25 ` Nick Desaulniers @ 2019-07-29 20:32 ` Nathan Chancellor -1 siblings, 0 replies; 52+ messages in thread From: Nathan Chancellor @ 2019-07-29 20:32 UTC (permalink / raw) To: Nick Desaulniers Cc: mpe, christophe.leroy, segher, arnd, kbuild test robot, Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev, linux-kernel, clang-built-linux On Mon, Jul 29, 2019 at 01:25:41PM -0700, Nick Desaulniers wrote: > Commit 6c5875843b87 ("powerpc: slightly improve cache helpers") exposed > what looks like a codegen bug in Clang's handling of `%y` output > template with `Z` constraint. This is resulting in panics during boot > for 32b powerpc builds w/ Clang, as reported by our CI. > > Add back the original code that worked behind a preprocessor check for > __clang__ until we can fix LLVM. > > Further, it seems that clang allnoconfig builds are unhappy with `Z`, as > reported by 0day bot. This is likely because Clang warns about inline > asm constraints when the constraint requires inlining to be semantically > valid. > > Link: https://bugs.llvm.org/show_bug.cgi?id=42762 > Link: https://github.com/ClangBuiltLinux/linux/issues/593 > Link: https://lore.kernel.org/lkml/20190721075846.GA97701@archlinux-threadripper/ > Debugged-by: Nathan Chancellor <natechancellor@gmail.com> > Reported-by: Nathan Chancellor <natechancellor@gmail.com> > Reported-by: kbuild test robot <lkp@intel.com> > Suggested-by: Nathan Chancellor <natechancellor@gmail.com> > Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> > --- > Alternatively, we could just revert 6c5875843b87. It seems that GCC > generates the same code for these functions for out of line versions. > But I'm not sure how the inlined code generated would be affected. For the record: https://godbolt.org/z/z57VU7 This seems consistent with what Michael found so I don't think a revert is entirely unreasonable. Either way: Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz @ 2019-07-29 20:32 ` Nathan Chancellor 0 siblings, 0 replies; 52+ messages in thread From: Nathan Chancellor @ 2019-07-29 20:32 UTC (permalink / raw) To: Nick Desaulniers Cc: arnd, linux-kernel, clang-built-linux, Paul Mackerras, linuxppc-dev, kbuild test robot On Mon, Jul 29, 2019 at 01:25:41PM -0700, Nick Desaulniers wrote: > Commit 6c5875843b87 ("powerpc: slightly improve cache helpers") exposed > what looks like a codegen bug in Clang's handling of `%y` output > template with `Z` constraint. This is resulting in panics during boot > for 32b powerpc builds w/ Clang, as reported by our CI. > > Add back the original code that worked behind a preprocessor check for > __clang__ until we can fix LLVM. > > Further, it seems that clang allnoconfig builds are unhappy with `Z`, as > reported by 0day bot. This is likely because Clang warns about inline > asm constraints when the constraint requires inlining to be semantically > valid. > > Link: https://bugs.llvm.org/show_bug.cgi?id=42762 > Link: https://github.com/ClangBuiltLinux/linux/issues/593 > Link: https://lore.kernel.org/lkml/20190721075846.GA97701@archlinux-threadripper/ > Debugged-by: Nathan Chancellor <natechancellor@gmail.com> > Reported-by: Nathan Chancellor <natechancellor@gmail.com> > Reported-by: kbuild test robot <lkp@intel.com> > Suggested-by: Nathan Chancellor <natechancellor@gmail.com> > Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> > --- > Alternatively, we could just revert 6c5875843b87. It seems that GCC > generates the same code for these functions for out of line versions. > But I'm not sure how the inlined code generated would be affected. For the record: https://godbolt.org/z/z57VU7 This seems consistent with what Michael found so I don't think a revert is entirely unreasonable. Either way: Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz 2019-07-29 20:32 ` Nathan Chancellor @ 2019-07-29 20:45 ` Nick Desaulniers -1 siblings, 0 replies; 52+ messages in thread From: Nick Desaulniers @ 2019-07-29 20:45 UTC (permalink / raw) To: Nathan Chancellor Cc: Michael Ellerman, Christophe Leroy, Segher Boessenkool, Arnd Bergmann, kbuild test robot, Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev, LKML, clang-built-linux On Mon, Jul 29, 2019 at 1:32 PM Nathan Chancellor <natechancellor@gmail.com> wrote: > > On Mon, Jul 29, 2019 at 01:25:41PM -0700, Nick Desaulniers wrote: > > But I'm not sure how the inlined code generated would be affected. > > For the record: > > https://godbolt.org/z/z57VU7 > > This seems consistent with what Michael found so I don't think a revert > is entirely unreasonable. Thanks for debugging/reporting/testing and the Godbolt link which clearly shows that the codegen for out of line versions is no different. The case I can't comment on is what happens when those `static inline` functions get inlined (maybe the original patch improves those cases?). -- Thanks, ~Nick Desaulniers ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz @ 2019-07-29 20:45 ` Nick Desaulniers 0 siblings, 0 replies; 52+ messages in thread From: Nick Desaulniers @ 2019-07-29 20:45 UTC (permalink / raw) To: Nathan Chancellor Cc: Arnd Bergmann, LKML, clang-built-linux, Paul Mackerras, linuxppc-dev, kbuild test robot On Mon, Jul 29, 2019 at 1:32 PM Nathan Chancellor <natechancellor@gmail.com> wrote: > > On Mon, Jul 29, 2019 at 01:25:41PM -0700, Nick Desaulniers wrote: > > But I'm not sure how the inlined code generated would be affected. > > For the record: > > https://godbolt.org/z/z57VU7 > > This seems consistent with what Michael found so I don't think a revert > is entirely unreasonable. Thanks for debugging/reporting/testing and the Godbolt link which clearly shows that the codegen for out of line versions is no different. The case I can't comment on is what happens when those `static inline` functions get inlined (maybe the original patch improves those cases?). -- Thanks, ~Nick Desaulniers ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz 2019-07-29 20:45 ` Nick Desaulniers @ 2019-07-29 20:47 ` Nathan Chancellor -1 siblings, 0 replies; 52+ messages in thread From: Nathan Chancellor @ 2019-07-29 20:47 UTC (permalink / raw) To: Nick Desaulniers Cc: Michael Ellerman, Christophe Leroy, Segher Boessenkool, Arnd Bergmann, kbuild test robot, Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev, LKML, clang-built-linux On Mon, Jul 29, 2019 at 01:45:35PM -0700, Nick Desaulniers wrote: > On Mon, Jul 29, 2019 at 1:32 PM Nathan Chancellor > <natechancellor@gmail.com> wrote: > > > > On Mon, Jul 29, 2019 at 01:25:41PM -0700, Nick Desaulniers wrote: > > > But I'm not sure how the inlined code generated would be affected. > > > > For the record: > > > > https://godbolt.org/z/z57VU7 > > > > This seems consistent with what Michael found so I don't think a revert > > is entirely unreasonable. > > Thanks for debugging/reporting/testing and the Godbolt link which > clearly shows that the codegen for out of line versions is no > different. The case I can't comment on is what happens when those > `static inline` functions get inlined (maybe the original patch > improves those cases?). > -- > Thanks, > ~Nick Desaulniers I'll try to build with various versions of GCC and compare the disassembly of the one problematic location that I found and see what it looks like. Cheers, Nathan ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz @ 2019-07-29 20:47 ` Nathan Chancellor 0 siblings, 0 replies; 52+ messages in thread From: Nathan Chancellor @ 2019-07-29 20:47 UTC (permalink / raw) To: Nick Desaulniers Cc: Arnd Bergmann, LKML, clang-built-linux, Paul Mackerras, linuxppc-dev, kbuild test robot On Mon, Jul 29, 2019 at 01:45:35PM -0700, Nick Desaulniers wrote: > On Mon, Jul 29, 2019 at 1:32 PM Nathan Chancellor > <natechancellor@gmail.com> wrote: > > > > On Mon, Jul 29, 2019 at 01:25:41PM -0700, Nick Desaulniers wrote: > > > But I'm not sure how the inlined code generated would be affected. > > > > For the record: > > > > https://godbolt.org/z/z57VU7 > > > > This seems consistent with what Michael found so I don't think a revert > > is entirely unreasonable. > > Thanks for debugging/reporting/testing and the Godbolt link which > clearly shows that the codegen for out of line versions is no > different. The case I can't comment on is what happens when those > `static inline` functions get inlined (maybe the original patch > improves those cases?). > -- > Thanks, > ~Nick Desaulniers I'll try to build with various versions of GCC and compare the disassembly of the one problematic location that I found and see what it looks like. Cheers, Nathan ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz 2019-07-29 20:47 ` Nathan Chancellor @ 2019-07-29 20:49 ` Nick Desaulniers -1 siblings, 0 replies; 52+ messages in thread From: Nick Desaulniers @ 2019-07-29 20:49 UTC (permalink / raw) To: Nathan Chancellor Cc: Michael Ellerman, Christophe Leroy, Segher Boessenkool, Arnd Bergmann, kbuild test robot, Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev, LKML, clang-built-linux On Mon, Jul 29, 2019 at 1:47 PM Nathan Chancellor <natechancellor@gmail.com> wrote: > > On Mon, Jul 29, 2019 at 01:45:35PM -0700, Nick Desaulniers wrote: > > On Mon, Jul 29, 2019 at 1:32 PM Nathan Chancellor > > <natechancellor@gmail.com> wrote: > > > > > > On Mon, Jul 29, 2019 at 01:25:41PM -0700, Nick Desaulniers wrote: > > > > But I'm not sure how the inlined code generated would be affected. > > > > > > For the record: > > > > > > https://godbolt.org/z/z57VU7 > > > > > > This seems consistent with what Michael found so I don't think a revert > > > is entirely unreasonable. > > > > Thanks for debugging/reporting/testing and the Godbolt link which > > clearly shows that the codegen for out of line versions is no > > different. The case I can't comment on is what happens when those > > `static inline` functions get inlined (maybe the original patch > > improves those cases?). > > -- > > Thanks, > > ~Nick Desaulniers > > I'll try to build with various versions of GCC and compare the > disassembly of the one problematic location that I found and see > what it looks like. Also, guess I should have included the tag: Fixes: 6c5875843b87 ("powerpc: slightly improve cache helpers") -- Thanks, ~Nick Desaulniers ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz @ 2019-07-29 20:49 ` Nick Desaulniers 0 siblings, 0 replies; 52+ messages in thread From: Nick Desaulniers @ 2019-07-29 20:49 UTC (permalink / raw) To: Nathan Chancellor Cc: Arnd Bergmann, LKML, clang-built-linux, Paul Mackerras, linuxppc-dev, kbuild test robot On Mon, Jul 29, 2019 at 1:47 PM Nathan Chancellor <natechancellor@gmail.com> wrote: > > On Mon, Jul 29, 2019 at 01:45:35PM -0700, Nick Desaulniers wrote: > > On Mon, Jul 29, 2019 at 1:32 PM Nathan Chancellor > > <natechancellor@gmail.com> wrote: > > > > > > On Mon, Jul 29, 2019 at 01:25:41PM -0700, Nick Desaulniers wrote: > > > > But I'm not sure how the inlined code generated would be affected. > > > > > > For the record: > > > > > > https://godbolt.org/z/z57VU7 > > > > > > This seems consistent with what Michael found so I don't think a revert > > > is entirely unreasonable. > > > > Thanks for debugging/reporting/testing and the Godbolt link which > > clearly shows that the codegen for out of line versions is no > > different. The case I can't comment on is what happens when those > > `static inline` functions get inlined (maybe the original patch > > improves those cases?). > > -- > > Thanks, > > ~Nick Desaulniers > > I'll try to build with various versions of GCC and compare the > disassembly of the one problematic location that I found and see > what it looks like. Also, guess I should have included the tag: Fixes: 6c5875843b87 ("powerpc: slightly improve cache helpers") -- Thanks, ~Nick Desaulniers ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz 2019-07-29 20:32 ` Nathan Chancellor @ 2019-07-29 21:52 ` Segher Boessenkool -1 siblings, 0 replies; 52+ messages in thread From: Segher Boessenkool @ 2019-07-29 21:52 UTC (permalink / raw) To: Nathan Chancellor Cc: Nick Desaulniers, mpe, christophe.leroy, arnd, kbuild test robot, Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev, linux-kernel, clang-built-linux On Mon, Jul 29, 2019 at 01:32:46PM -0700, Nathan Chancellor wrote: > For the record: > > https://godbolt.org/z/z57VU7 > > This seems consistent with what Michael found so I don't think a revert > is entirely unreasonable. Try this: https://godbolt.org/z/6_ZfVi This matters in non-trivial loops, for example. But all current cases where such non-trivial loops are done with cache block instructions are actually written in real assembler already, using two registers. Because performance matters. Not that I recommend writing code as critical as memset in C with inline asm :-) Segher ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz @ 2019-07-29 21:52 ` Segher Boessenkool 0 siblings, 0 replies; 52+ messages in thread From: Segher Boessenkool @ 2019-07-29 21:52 UTC (permalink / raw) To: Nathan Chancellor Cc: kbuild test robot, arnd, Nick Desaulniers, linux-kernel, clang-built-linux, Paul Mackerras, linuxppc-dev On Mon, Jul 29, 2019 at 01:32:46PM -0700, Nathan Chancellor wrote: > For the record: > > https://godbolt.org/z/z57VU7 > > This seems consistent with what Michael found so I don't think a revert > is entirely unreasonable. Try this: https://godbolt.org/z/6_ZfVi This matters in non-trivial loops, for example. But all current cases where such non-trivial loops are done with cache block instructions are actually written in real assembler already, using two registers. Because performance matters. Not that I recommend writing code as critical as memset in C with inline asm :-) Segher ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz 2019-07-29 21:52 ` Segher Boessenkool @ 2019-07-30 7:34 ` Arnd Bergmann -1 siblings, 0 replies; 52+ messages in thread From: Arnd Bergmann @ 2019-07-30 7:34 UTC (permalink / raw) To: Segher Boessenkool Cc: Nathan Chancellor, Nick Desaulniers, Michael Ellerman, christophe leroy, kbuild test robot, Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev, Linux Kernel Mailing List, clang-built-linux On Mon, Jul 29, 2019 at 11:52 PM Segher Boessenkool <segher@kernel.crashing.org> wrote: > > On Mon, Jul 29, 2019 at 01:32:46PM -0700, Nathan Chancellor wrote: > > For the record: > > > > https://godbolt.org/z/z57VU7 > > > > This seems consistent with what Michael found so I don't think a revert > > is entirely unreasonable. > > Try this: > > https://godbolt.org/z/6_ZfVi > > This matters in non-trivial loops, for example. But all current cases > where such non-trivial loops are done with cache block instructions are > actually written in real assembler already, using two registers. > Because performance matters. Not that I recommend writing code as > critical as memset in C with inline asm :-) Upon a second look, I think the issue is that the "Z" is an input argument when it should be an output. clang decides that it can make a copy of the input and pass that into the inline asm. This is not the most efficient way, but it seems entirely correct according to the constraints. Changing it to an output "=Z" constraint seems to make it work: https://godbolt.org/z/FwEqHf Clang still doesn't use the optimum form, but it passes the correct pointer. Arnd ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz @ 2019-07-30 7:34 ` Arnd Bergmann 0 siblings, 0 replies; 52+ messages in thread From: Arnd Bergmann @ 2019-07-30 7:34 UTC (permalink / raw) To: Segher Boessenkool Cc: kbuild test robot, Nick Desaulniers, Linux Kernel Mailing List, clang-built-linux, Paul Mackerras, Nathan Chancellor, linuxppc-dev On Mon, Jul 29, 2019 at 11:52 PM Segher Boessenkool <segher@kernel.crashing.org> wrote: > > On Mon, Jul 29, 2019 at 01:32:46PM -0700, Nathan Chancellor wrote: > > For the record: > > > > https://godbolt.org/z/z57VU7 > > > > This seems consistent with what Michael found so I don't think a revert > > is entirely unreasonable. > > Try this: > > https://godbolt.org/z/6_ZfVi > > This matters in non-trivial loops, for example. But all current cases > where such non-trivial loops are done with cache block instructions are > actually written in real assembler already, using two registers. > Because performance matters. Not that I recommend writing code as > critical as memset in C with inline asm :-) Upon a second look, I think the issue is that the "Z" is an input argument when it should be an output. clang decides that it can make a copy of the input and pass that into the inline asm. This is not the most efficient way, but it seems entirely correct according to the constraints. Changing it to an output "=Z" constraint seems to make it work: https://godbolt.org/z/FwEqHf Clang still doesn't use the optimum form, but it passes the correct pointer. Arnd ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz 2019-07-30 7:34 ` Arnd Bergmann @ 2019-07-30 11:17 ` Michael Ellerman -1 siblings, 0 replies; 52+ messages in thread From: Michael Ellerman @ 2019-07-30 11:17 UTC (permalink / raw) To: Arnd Bergmann, Segher Boessenkool Cc: Nathan Chancellor, Nick Desaulniers, christophe leroy, kbuild test robot, Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev, Linux Kernel Mailing List, clang-built-linux Arnd Bergmann <arnd@arndb.de> writes: > On Mon, Jul 29, 2019 at 11:52 PM Segher Boessenkool > <segher@kernel.crashing.org> wrote: >> On Mon, Jul 29, 2019 at 01:32:46PM -0700, Nathan Chancellor wrote: >> > For the record: >> > >> > https://godbolt.org/z/z57VU7 >> > >> > This seems consistent with what Michael found so I don't think a revert >> > is entirely unreasonable. >> >> Try this: >> >> https://godbolt.org/z/6_ZfVi >> >> This matters in non-trivial loops, for example. But all current cases >> where such non-trivial loops are done with cache block instructions are >> actually written in real assembler already, using two registers. >> Because performance matters. Not that I recommend writing code as >> critical as memset in C with inline asm :-) > > Upon a second look, I think the issue is that the "Z" is an input argument > when it should be an output. clang decides that it can make a copy of the > input and pass that into the inline asm. This is not the most efficient > way, but it seems entirely correct according to the constraints. > > Changing it to an output "=Z" constraint seems to make it work: > > https://godbolt.org/z/FwEqHf > > Clang still doesn't use the optimum form, but it passes the correct pointer. Thanks Arnd. This seems like a better solution. I'll drop the revert I have staged. Segher does this look OK to you? Nathan/Nick, are one of you able to test this with your clang CI? cheers ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz @ 2019-07-30 11:17 ` Michael Ellerman 0 siblings, 0 replies; 52+ messages in thread From: Michael Ellerman @ 2019-07-30 11:17 UTC (permalink / raw) To: Arnd Bergmann, Segher Boessenkool Cc: kbuild test robot, Nick Desaulniers, Linux Kernel Mailing List, clang-built-linux, Paul Mackerras, Nathan Chancellor, linuxppc-dev Arnd Bergmann <arnd@arndb.de> writes: > On Mon, Jul 29, 2019 at 11:52 PM Segher Boessenkool > <segher@kernel.crashing.org> wrote: >> On Mon, Jul 29, 2019 at 01:32:46PM -0700, Nathan Chancellor wrote: >> > For the record: >> > >> > https://godbolt.org/z/z57VU7 >> > >> > This seems consistent with what Michael found so I don't think a revert >> > is entirely unreasonable. >> >> Try this: >> >> https://godbolt.org/z/6_ZfVi >> >> This matters in non-trivial loops, for example. But all current cases >> where such non-trivial loops are done with cache block instructions are >> actually written in real assembler already, using two registers. >> Because performance matters. Not that I recommend writing code as >> critical as memset in C with inline asm :-) > > Upon a second look, I think the issue is that the "Z" is an input argument > when it should be an output. clang decides that it can make a copy of the > input and pass that into the inline asm. This is not the most efficient > way, but it seems entirely correct according to the constraints. > > Changing it to an output "=Z" constraint seems to make it work: > > https://godbolt.org/z/FwEqHf > > Clang still doesn't use the optimum form, but it passes the correct pointer. Thanks Arnd. This seems like a better solution. I'll drop the revert I have staged. Segher does this look OK to you? Nathan/Nick, are one of you able to test this with your clang CI? cheers ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH] powerpc: fix inline asm constraints for dcbz 2019-07-30 11:17 ` Michael Ellerman @ 2019-08-09 18:21 ` Nick Desaulniers -1 siblings, 0 replies; 52+ messages in thread From: Nick Desaulniers @ 2019-08-09 18:21 UTC (permalink / raw) To: mpe Cc: christophe.leroy, segher, arnd, Nick Desaulniers, Nathan Chancellor, kbuild test robot, Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev, linux-kernel, clang-built-linux The input parameter is modified, so it should be an output parameter with "=" to make it so that a copy of the input is not made by Clang. Link: https://bugs.llvm.org/show_bug.cgi?id=42762 Link: https://gcc.gnu.org/onlinedocs/gcc/Modifiers.html#Modifiers Link: https://github.com/ClangBuiltLinux/linux/issues/593 Link: https://godbolt.org/z/QwhZXi Link: https://lore.kernel.org/lkml/20190721075846.GA97701@archlinux-threadripper/ Fixes: 6c5875843b87 ("powerpc: slightly improve cache helpers") Debugged-by: Nathan Chancellor <natechancellor@gmail.com> Reported-by: Nathan Chancellor <natechancellor@gmail.com> Reported-by: kbuild test robot <lkp@intel.com> Suggested-by: Arnd Bergmann <arnd@arndb.de> Suggested-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> --- Green CI run: https://travis-ci.com/ClangBuiltLinux/continuous-integration/builds/122521976 https://github.com/ClangBuiltLinux/continuous-integration/pull/197/files#diff-40bd16e3188587e4d648c30e0c2d6d37 arch/powerpc/include/asm/cache.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h index b3388d95f451..5a0df6a1b9dc 100644 --- a/arch/powerpc/include/asm/cache.h +++ b/arch/powerpc/include/asm/cache.h @@ -107,22 +107,22 @@ extern void _set_L3CR(unsigned long); static inline void dcbz(void *addr) { - __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory"); + __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory"); } static inline void dcbi(void *addr) { - __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory"); + __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory"); } static inline void dcbf(void *addr) { - __asm__ __volatile__ ("dcbf %y0" : : "Z"(*(u8 *)addr) : "memory"); + __asm__ __volatile__ ("dcbf %y0" : "=Z"(*(u8 *)addr) :: "memory"); } static inline void dcbst(void *addr) { - __asm__ __volatile__ ("dcbst %y0" : : "Z"(*(u8 *)addr) : "memory"); + __asm__ __volatile__ ("dcbst %y0" : "=Z"(*(u8 *)addr) :: "memory"); } #endif /* !__ASSEMBLY__ */ #endif /* __KERNEL__ */ -- 2.23.0.rc1.153.gdeed80330f-goog ^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH] powerpc: fix inline asm constraints for dcbz @ 2019-08-09 18:21 ` Nick Desaulniers 0 siblings, 0 replies; 52+ messages in thread From: Nick Desaulniers @ 2019-08-09 18:21 UTC (permalink / raw) To: mpe Cc: arnd, Nick Desaulniers, linux-kernel, clang-built-linux, Paul Mackerras, Nathan Chancellor, linuxppc-dev, kbuild test robot The input parameter is modified, so it should be an output parameter with "=" to make it so that a copy of the input is not made by Clang. Link: https://bugs.llvm.org/show_bug.cgi?id=42762 Link: https://gcc.gnu.org/onlinedocs/gcc/Modifiers.html#Modifiers Link: https://github.com/ClangBuiltLinux/linux/issues/593 Link: https://godbolt.org/z/QwhZXi Link: https://lore.kernel.org/lkml/20190721075846.GA97701@archlinux-threadripper/ Fixes: 6c5875843b87 ("powerpc: slightly improve cache helpers") Debugged-by: Nathan Chancellor <natechancellor@gmail.com> Reported-by: Nathan Chancellor <natechancellor@gmail.com> Reported-by: kbuild test robot <lkp@intel.com> Suggested-by: Arnd Bergmann <arnd@arndb.de> Suggested-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> --- Green CI run: https://travis-ci.com/ClangBuiltLinux/continuous-integration/builds/122521976 https://github.com/ClangBuiltLinux/continuous-integration/pull/197/files#diff-40bd16e3188587e4d648c30e0c2d6d37 arch/powerpc/include/asm/cache.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h index b3388d95f451..5a0df6a1b9dc 100644 --- a/arch/powerpc/include/asm/cache.h +++ b/arch/powerpc/include/asm/cache.h @@ -107,22 +107,22 @@ extern void _set_L3CR(unsigned long); static inline void dcbz(void *addr) { - __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory"); + __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory"); } static inline void dcbi(void *addr) { - __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory"); + __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory"); } static inline void dcbf(void *addr) { - __asm__ __volatile__ ("dcbf %y0" : : "Z"(*(u8 *)addr) : "memory"); + __asm__ __volatile__ ("dcbf %y0" : "=Z"(*(u8 *)addr) :: "memory"); } static inline void dcbst(void *addr) { - __asm__ __volatile__ ("dcbst %y0" : : "Z"(*(u8 *)addr) : "memory"); + __asm__ __volatile__ ("dcbst %y0" : "=Z"(*(u8 *)addr) :: "memory"); } #endif /* !__ASSEMBLY__ */ #endif /* __KERNEL__ */ -- 2.23.0.rc1.153.gdeed80330f-goog ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: fix inline asm constraints for dcbz 2019-08-09 18:21 ` Nick Desaulniers @ 2019-08-09 18:28 ` Arnd Bergmann -1 siblings, 0 replies; 52+ messages in thread From: Arnd Bergmann @ 2019-08-09 18:28 UTC (permalink / raw) To: Nick Desaulniers Cc: Michael Ellerman, christophe leroy, Segher Boessenkool, Nathan Chancellor, kbuild test robot, Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev, Linux Kernel Mailing List, clang-built-linux On Fri, Aug 9, 2019 at 8:21 PM 'Nick Desaulniers' via Clang Built Linux <clang-built-linux@googlegroups.com> wrote: > static inline void dcbz(void *addr) > { > - __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory"); > + __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory"); > } > > static inline void dcbi(void *addr) > { > - __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory"); > + __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory"); > } I think the result of the discussion was that an output argument only kind-of makes sense for dcbz, but for the others it's really an input, and clang is wrong in the way it handles the "Z" constraint by making a copy, which it doesn't do for "m". I'm not sure whether it's correct to use "m" instead of "Z" here, which would be a better workaround if that works. More importantly though, clang really needs to be fixed to handle "Z" correctly. Arnd ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: fix inline asm constraints for dcbz @ 2019-08-09 18:28 ` Arnd Bergmann 0 siblings, 0 replies; 52+ messages in thread From: Arnd Bergmann @ 2019-08-09 18:28 UTC (permalink / raw) To: Nick Desaulniers Cc: kbuild test robot, Linux Kernel Mailing List, clang-built-linux, Paul Mackerras, Nathan Chancellor, linuxppc-dev On Fri, Aug 9, 2019 at 8:21 PM 'Nick Desaulniers' via Clang Built Linux <clang-built-linux@googlegroups.com> wrote: > static inline void dcbz(void *addr) > { > - __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory"); > + __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory"); > } > > static inline void dcbi(void *addr) > { > - __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory"); > + __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory"); > } I think the result of the discussion was that an output argument only kind-of makes sense for dcbz, but for the others it's really an input, and clang is wrong in the way it handles the "Z" constraint by making a copy, which it doesn't do for "m". I'm not sure whether it's correct to use "m" instead of "Z" here, which would be a better workaround if that works. More importantly though, clang really needs to be fixed to handle "Z" correctly. Arnd ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: fix inline asm constraints for dcbz 2019-08-09 18:28 ` Arnd Bergmann @ 2019-08-09 20:03 ` Christophe Leroy -1 siblings, 0 replies; 52+ messages in thread From: Christophe Leroy @ 2019-08-09 20:03 UTC (permalink / raw) To: Arnd Bergmann Cc: clang-built-linux, Linux Kernel Mailing List, linuxppc-dev, Paul Mackerras, Benjamin Herrenschmidt, kbuild test robot, Nathan Chancellor, Segher Boessenkool, Michael Ellerman, Nick Desaulniers Arnd Bergmann <arnd@arndb.de> a écrit : > On Fri, Aug 9, 2019 at 8:21 PM 'Nick Desaulniers' via Clang Built > Linux <clang-built-linux@googlegroups.com> wrote: > >> static inline void dcbz(void *addr) >> { >> - __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory"); >> + __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory"); >> } >> >> static inline void dcbi(void *addr) >> { >> - __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory"); >> + __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory"); >> } > > I think the result of the discussion was that an output argument only kind-of > makes sense for dcbz, but for the others it's really an input, and clang is > wrong in the way it handles the "Z" constraint by making a copy, which it > doesn't do for "m". > > I'm not sure whether it's correct to use "m" instead of "Z" here, which > would be a better workaround if that works. More importantly though, > clang really needs to be fixed to handle "Z" correctly. As the benefit is null, I think the best is probably to reverse my original commit until at least CLang is fixed, as initialy suggested by mpe Christophe ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: fix inline asm constraints for dcbz @ 2019-08-09 20:03 ` Christophe Leroy 0 siblings, 0 replies; 52+ messages in thread From: Christophe Leroy @ 2019-08-09 20:03 UTC (permalink / raw) To: Arnd Bergmann Cc: kbuild test robot, Nick Desaulniers, Linux Kernel Mailing List, clang-built-linux, Paul Mackerras, Nathan Chancellor, linuxppc-dev Arnd Bergmann <arnd@arndb.de> a écrit : > On Fri, Aug 9, 2019 at 8:21 PM 'Nick Desaulniers' via Clang Built > Linux <clang-built-linux@googlegroups.com> wrote: > >> static inline void dcbz(void *addr) >> { >> - __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory"); >> + __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory"); >> } >> >> static inline void dcbi(void *addr) >> { >> - __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory"); >> + __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory"); >> } > > I think the result of the discussion was that an output argument only kind-of > makes sense for dcbz, but for the others it's really an input, and clang is > wrong in the way it handles the "Z" constraint by making a copy, which it > doesn't do for "m". > > I'm not sure whether it's correct to use "m" instead of "Z" here, which > would be a better workaround if that works. More importantly though, > clang really needs to be fixed to handle "Z" correctly. As the benefit is null, I think the best is probably to reverse my original commit until at least CLang is fixed, as initialy suggested by mpe Christophe ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: fix inline asm constraints for dcbz 2019-08-09 20:03 ` Christophe Leroy @ 2019-08-09 20:12 ` Arnd Bergmann -1 siblings, 0 replies; 52+ messages in thread From: Arnd Bergmann @ 2019-08-09 20:12 UTC (permalink / raw) To: Christophe Leroy Cc: clang-built-linux, Linux Kernel Mailing List, linuxppc-dev, Paul Mackerras, Benjamin Herrenschmidt, kbuild test robot, Nathan Chancellor, Segher Boessenkool, Michael Ellerman, Nick Desaulniers On Fri, Aug 9, 2019 at 10:02 PM Christophe Leroy <christophe.leroy@c-s.fr> wrote: > > Arnd Bergmann <arnd@arndb.de> a écrit : > > On Fri, Aug 9, 2019 at 8:21 PM 'Nick Desaulniers' via Clang Built > > Linux <clang-built-linux@googlegroups.com> wrote: > > > >> static inline void dcbz(void *addr) > >> { > >> - __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory"); > >> + __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory"); > >> } > >> > >> static inline void dcbi(void *addr) > >> { > >> - __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory"); > >> + __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory"); > >> } > > > > I think the result of the discussion was that an output argument only kind-of > > makes sense for dcbz, but for the others it's really an input, and clang is > > wrong in the way it handles the "Z" constraint by making a copy, which it > > doesn't do for "m". > > > > I'm not sure whether it's correct to use "m" instead of "Z" here, which > > would be a better workaround if that works. More importantly though, > > clang really needs to be fixed to handle "Z" correctly. > > As the benefit is null, I think the best is probably to reverse my > original commit until at least CLang is fixed, as initialy suggested > by mpe Yes, makes sense. There is one other use of the "Z" constraint, so on top of the revert, I think it might be helpful if Nick could check if the patch below makes any difference with clang and, if it does, whether the current version is broken. Arnd diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h index 23e5d5d16c7e..28b467779328 100644 --- a/arch/powerpc/include/asm/io.h +++ b/arch/powerpc/include/asm/io.h @@ -106,7 +106,7 @@ static inline u##size name(const volatile u##size __iomem *addr) \ { \ u##size ret; \ __asm__ __volatile__("sync;"#insn" %0,%y1;twi 0,%0,0;isync" \ - : "=r" (ret) : "Z" (*addr) : "memory"); \ + : "=r" (ret) : "m" (*addr) : "memory"); \ return ret; \ } @@ -114,7 +114,7 @@ static inline u##size name(const volatile u##size __iomem *addr) \ static inline void name(volatile u##size __iomem *addr, u##size val) \ { \ __asm__ __volatile__("sync;"#insn" %1,%y0" \ - : "=Z" (*addr) : "r" (val) : "memory"); \ + : "=m" (*addr) : "r" (val) : "memory"); \ mmiowb_set_pending(); \ } ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: fix inline asm constraints for dcbz @ 2019-08-09 20:12 ` Arnd Bergmann 0 siblings, 0 replies; 52+ messages in thread From: Arnd Bergmann @ 2019-08-09 20:12 UTC (permalink / raw) To: Christophe Leroy Cc: kbuild test robot, Nick Desaulniers, Linux Kernel Mailing List, clang-built-linux, Paul Mackerras, Nathan Chancellor, linuxppc-dev On Fri, Aug 9, 2019 at 10:02 PM Christophe Leroy <christophe.leroy@c-s.fr> wrote: > > Arnd Bergmann <arnd@arndb.de> a écrit : > > On Fri, Aug 9, 2019 at 8:21 PM 'Nick Desaulniers' via Clang Built > > Linux <clang-built-linux@googlegroups.com> wrote: > > > >> static inline void dcbz(void *addr) > >> { > >> - __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory"); > >> + __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory"); > >> } > >> > >> static inline void dcbi(void *addr) > >> { > >> - __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory"); > >> + __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory"); > >> } > > > > I think the result of the discussion was that an output argument only kind-of > > makes sense for dcbz, but for the others it's really an input, and clang is > > wrong in the way it handles the "Z" constraint by making a copy, which it > > doesn't do for "m". > > > > I'm not sure whether it's correct to use "m" instead of "Z" here, which > > would be a better workaround if that works. More importantly though, > > clang really needs to be fixed to handle "Z" correctly. > > As the benefit is null, I think the best is probably to reverse my > original commit until at least CLang is fixed, as initialy suggested > by mpe Yes, makes sense. There is one other use of the "Z" constraint, so on top of the revert, I think it might be helpful if Nick could check if the patch below makes any difference with clang and, if it does, whether the current version is broken. Arnd diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h index 23e5d5d16c7e..28b467779328 100644 --- a/arch/powerpc/include/asm/io.h +++ b/arch/powerpc/include/asm/io.h @@ -106,7 +106,7 @@ static inline u##size name(const volatile u##size __iomem *addr) \ { \ u##size ret; \ __asm__ __volatile__("sync;"#insn" %0,%y1;twi 0,%0,0;isync" \ - : "=r" (ret) : "Z" (*addr) : "memory"); \ + : "=r" (ret) : "m" (*addr) : "memory"); \ return ret; \ } @@ -114,7 +114,7 @@ static inline u##size name(const volatile u##size __iomem *addr) \ static inline void name(volatile u##size __iomem *addr, u##size val) \ { \ __asm__ __volatile__("sync;"#insn" %1,%y0" \ - : "=Z" (*addr) : "r" (val) : "memory"); \ + : "=m" (*addr) : "r" (val) : "memory"); \ mmiowb_set_pending(); \ } ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: fix inline asm constraints for dcbz 2019-08-09 20:12 ` Arnd Bergmann @ 2019-08-09 22:03 ` Nick Desaulniers -1 siblings, 0 replies; 52+ messages in thread From: Nick Desaulniers @ 2019-08-09 22:03 UTC (permalink / raw) To: Arnd Bergmann Cc: Christophe Leroy, clang-built-linux, Linux Kernel Mailing List, linuxppc-dev, Paul Mackerras, Benjamin Herrenschmidt, kbuild test robot, Nathan Chancellor, Segher Boessenkool, Michael Ellerman On Fri, Aug 9, 2019 at 1:13 PM Arnd Bergmann <arnd@arndb.de> wrote: > > On Fri, Aug 9, 2019 at 10:02 PM Christophe Leroy > <christophe.leroy@c-s.fr> wrote: > > > > Arnd Bergmann <arnd@arndb.de> a écrit : > > > On Fri, Aug 9, 2019 at 8:21 PM 'Nick Desaulniers' via Clang Built > > > Linux <clang-built-linux@googlegroups.com> wrote: > > > > > >> static inline void dcbz(void *addr) > > >> { > > >> - __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory"); > > >> + __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory"); > > >> } > > >> > > >> static inline void dcbi(void *addr) > > >> { > > >> - __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory"); > > >> + __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory"); > > >> } > > > > > > I think the result of the discussion was that an output argument only kind-of > > > makes sense for dcbz, but for the others it's really an input, and clang is > > > wrong in the way it handles the "Z" constraint by making a copy, which it > > > doesn't do for "m". > > > > > > I'm not sure whether it's correct to use "m" instead of "Z" here, which > > > would be a better workaround if that works. More importantly though, > > > clang really needs to be fixed to handle "Z" correctly. > > > > As the benefit is null, I think the best is probably to reverse my > > original commit until at least CLang is fixed, as initialy suggested > > by mpe > > Yes, makes sense. > > There is one other use of the "Z" constraint, so on top of the revert, I > think it might be helpful if Nick could check if the patch below makes > any difference with clang and, if it does, whether the current version > is broken. > > Arnd > > diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h > index 23e5d5d16c7e..28b467779328 100644 > --- a/arch/powerpc/include/asm/io.h > +++ b/arch/powerpc/include/asm/io.h > @@ -106,7 +106,7 @@ static inline u##size name(const volatile u##size > __iomem *addr) \ > { \ > u##size ret; \ > __asm__ __volatile__("sync;"#insn" %0,%y1;twi 0,%0,0;isync" \ > - : "=r" (ret) : "Z" (*addr) : "memory"); \ > + : "=r" (ret) : "m" (*addr) : "memory"); \ > return ret; \ > } > > @@ -114,7 +114,7 @@ static inline u##size name(const volatile u##size > __iomem *addr) \ > static inline void name(volatile u##size __iomem *addr, u##size val) \ > { \ > __asm__ __volatile__("sync;"#insn" %1,%y0" \ > - : "=Z" (*addr) : "r" (val) : "memory"); \ > + : "=m" (*addr) : "r" (val) : "memory"); \ > mmiowb_set_pending(); \ > } Does not work: https://travis-ci.com/ClangBuiltLinux/continuous-integration/builds/122654899 https://github.com/ClangBuiltLinux/continuous-integration/pull/197/files#diff-40bd16e3188587e4d648c30e0c2d6d37 -- Thanks, ~Nick Desaulniers ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: fix inline asm constraints for dcbz @ 2019-08-09 22:03 ` Nick Desaulniers 0 siblings, 0 replies; 52+ messages in thread From: Nick Desaulniers @ 2019-08-09 22:03 UTC (permalink / raw) To: Arnd Bergmann Cc: kbuild test robot, Linux Kernel Mailing List, clang-built-linux, Paul Mackerras, Nathan Chancellor, linuxppc-dev On Fri, Aug 9, 2019 at 1:13 PM Arnd Bergmann <arnd@arndb.de> wrote: > > On Fri, Aug 9, 2019 at 10:02 PM Christophe Leroy > <christophe.leroy@c-s.fr> wrote: > > > > Arnd Bergmann <arnd@arndb.de> a écrit : > > > On Fri, Aug 9, 2019 at 8:21 PM 'Nick Desaulniers' via Clang Built > > > Linux <clang-built-linux@googlegroups.com> wrote: > > > > > >> static inline void dcbz(void *addr) > > >> { > > >> - __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory"); > > >> + __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory"); > > >> } > > >> > > >> static inline void dcbi(void *addr) > > >> { > > >> - __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory"); > > >> + __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory"); > > >> } > > > > > > I think the result of the discussion was that an output argument only kind-of > > > makes sense for dcbz, but for the others it's really an input, and clang is > > > wrong in the way it handles the "Z" constraint by making a copy, which it > > > doesn't do for "m". > > > > > > I'm not sure whether it's correct to use "m" instead of "Z" here, which > > > would be a better workaround if that works. More importantly though, > > > clang really needs to be fixed to handle "Z" correctly. > > > > As the benefit is null, I think the best is probably to reverse my > > original commit until at least CLang is fixed, as initialy suggested > > by mpe > > Yes, makes sense. > > There is one other use of the "Z" constraint, so on top of the revert, I > think it might be helpful if Nick could check if the patch below makes > any difference with clang and, if it does, whether the current version > is broken. > > Arnd > > diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h > index 23e5d5d16c7e..28b467779328 100644 > --- a/arch/powerpc/include/asm/io.h > +++ b/arch/powerpc/include/asm/io.h > @@ -106,7 +106,7 @@ static inline u##size name(const volatile u##size > __iomem *addr) \ > { \ > u##size ret; \ > __asm__ __volatile__("sync;"#insn" %0,%y1;twi 0,%0,0;isync" \ > - : "=r" (ret) : "Z" (*addr) : "memory"); \ > + : "=r" (ret) : "m" (*addr) : "memory"); \ > return ret; \ > } > > @@ -114,7 +114,7 @@ static inline u##size name(const volatile u##size > __iomem *addr) \ > static inline void name(volatile u##size __iomem *addr, u##size val) \ > { \ > __asm__ __volatile__("sync;"#insn" %1,%y0" \ > - : "=Z" (*addr) : "r" (val) : "memory"); \ > + : "=m" (*addr) : "r" (val) : "memory"); \ > mmiowb_set_pending(); \ > } Does not work: https://travis-ci.com/ClangBuiltLinux/continuous-integration/builds/122654899 https://github.com/ClangBuiltLinux/continuous-integration/pull/197/files#diff-40bd16e3188587e4d648c30e0c2d6d37 -- Thanks, ~Nick Desaulniers ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: fix inline asm constraints for dcbz 2019-08-09 20:12 ` Arnd Bergmann @ 2019-08-09 22:10 ` Segher Boessenkool -1 siblings, 0 replies; 52+ messages in thread From: Segher Boessenkool @ 2019-08-09 22:10 UTC (permalink / raw) To: Arnd Bergmann Cc: Christophe Leroy, clang-built-linux, Linux Kernel Mailing List, linuxppc-dev, Paul Mackerras, Benjamin Herrenschmidt, kbuild test robot, Nathan Chancellor, Michael Ellerman, Nick Desaulniers On Fri, Aug 09, 2019 at 10:12:56PM +0200, Arnd Bergmann wrote: > @@ -106,7 +106,7 @@ static inline u##size name(const volatile u##size > __iomem *addr) \ > { \ > u##size ret; \ > __asm__ __volatile__("sync;"#insn" %0,%y1;twi 0,%0,0;isync" \ > - : "=r" (ret) : "Z" (*addr) : "memory"); \ > + : "=r" (ret) : "m" (*addr) : "memory"); \ > return ret; \ > } That will no longer compile something like u8 *p; u16 x = in_le16(p + 12); (you'll get something like "invalid %y value, try using the 'Z' constraint"). So then you remove the %y, but that makes you get something like sync;lhbrx 3,12(3);twi 0,3,0;isync which is completely wrong. Segher ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: fix inline asm constraints for dcbz @ 2019-08-09 22:10 ` Segher Boessenkool 0 siblings, 0 replies; 52+ messages in thread From: Segher Boessenkool @ 2019-08-09 22:10 UTC (permalink / raw) To: Arnd Bergmann Cc: kbuild test robot, Nick Desaulniers, Linux Kernel Mailing List, clang-built-linux, Paul Mackerras, Nathan Chancellor, linuxppc-dev On Fri, Aug 09, 2019 at 10:12:56PM +0200, Arnd Bergmann wrote: > @@ -106,7 +106,7 @@ static inline u##size name(const volatile u##size > __iomem *addr) \ > { \ > u##size ret; \ > __asm__ __volatile__("sync;"#insn" %0,%y1;twi 0,%0,0;isync" \ > - : "=r" (ret) : "Z" (*addr) : "memory"); \ > + : "=r" (ret) : "m" (*addr) : "memory"); \ > return ret; \ > } That will no longer compile something like u8 *p; u16 x = in_le16(p + 12); (you'll get something like "invalid %y value, try using the 'Z' constraint"). So then you remove the %y, but that makes you get something like sync;lhbrx 3,12(3);twi 0,3,0;isync which is completely wrong. Segher ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: fix inline asm constraints for dcbz 2019-08-09 20:03 ` Christophe Leroy @ 2019-08-09 22:00 ` Segher Boessenkool -1 siblings, 0 replies; 52+ messages in thread From: Segher Boessenkool @ 2019-08-09 22:00 UTC (permalink / raw) To: Christophe Leroy Cc: Arnd Bergmann, clang-built-linux, Linux Kernel Mailing List, linuxppc-dev, Paul Mackerras, Benjamin Herrenschmidt, kbuild test robot, Nathan Chancellor, Michael Ellerman, Nick Desaulniers On Fri, Aug 09, 2019 at 10:03:01PM +0200, Christophe Leroy wrote: > Arnd Bergmann <arnd@arndb.de> a écrit : > > >On Fri, Aug 9, 2019 at 8:21 PM 'Nick Desaulniers' via Clang Built > >Linux <clang-built-linux@googlegroups.com> wrote: > > > >> static inline void dcbz(void *addr) > >> { > >>- __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory"); > >>+ __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory"); > >> } > >> > >> static inline void dcbi(void *addr) > >> { > >>- __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory"); > >>+ __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory"); > >> } > > > >I think the result of the discussion was that an output argument only > >kind-of > >makes sense for dcbz, but for the others it's really an input, and clang is > >wrong in the way it handles the "Z" constraint by making a copy, which it > >doesn't do for "m". > > > >I'm not sure whether it's correct to use "m" instead of "Z" here, which > >would be a better workaround if that works. More importantly though, > >clang really needs to be fixed to handle "Z" correctly. > > As the benefit is null, I think the best is probably to reverse my > original commit until at least CLang is fixed, as initialy suggested > by mpe And what about the other uses of "Z"? Also, if you use C routines (instead of assembler code) for the basic "clear a block" and the like routines, as there have been patches for recently, the benefit is not zero. Segher ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: fix inline asm constraints for dcbz @ 2019-08-09 22:00 ` Segher Boessenkool 0 siblings, 0 replies; 52+ messages in thread From: Segher Boessenkool @ 2019-08-09 22:00 UTC (permalink / raw) To: Christophe Leroy Cc: kbuild test robot, Arnd Bergmann, Nick Desaulniers, Linux Kernel Mailing List, clang-built-linux, Paul Mackerras, Nathan Chancellor, linuxppc-dev On Fri, Aug 09, 2019 at 10:03:01PM +0200, Christophe Leroy wrote: > Arnd Bergmann <arnd@arndb.de> a écrit : > > >On Fri, Aug 9, 2019 at 8:21 PM 'Nick Desaulniers' via Clang Built > >Linux <clang-built-linux@googlegroups.com> wrote: > > > >> static inline void dcbz(void *addr) > >> { > >>- __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory"); > >>+ __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory"); > >> } > >> > >> static inline void dcbi(void *addr) > >> { > >>- __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory"); > >>+ __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory"); > >> } > > > >I think the result of the discussion was that an output argument only > >kind-of > >makes sense for dcbz, but for the others it's really an input, and clang is > >wrong in the way it handles the "Z" constraint by making a copy, which it > >doesn't do for "m". > > > >I'm not sure whether it's correct to use "m" instead of "Z" here, which > >would be a better workaround if that works. More importantly though, > >clang really needs to be fixed to handle "Z" correctly. > > As the benefit is null, I think the best is probably to reverse my > original commit until at least CLang is fixed, as initialy suggested > by mpe And what about the other uses of "Z"? Also, if you use C routines (instead of assembler code) for the basic "clear a block" and the like routines, as there have been patches for recently, the benefit is not zero. Segher ^ permalink raw reply [flat|nested] 52+ messages in thread
* [PATCH v3] Revert "powerpc: slightly improve cache helpers" 2019-08-09 20:03 ` Christophe Leroy @ 2019-08-09 22:03 ` Nick Desaulniers -1 siblings, 0 replies; 52+ messages in thread From: Nick Desaulniers @ 2019-08-09 22:03 UTC (permalink / raw) To: mpe Cc: christophe.leroy, segher, arnd, Nick Desaulniers, Nathan Chancellor, kbuild test robot, Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev, linux-kernel, clang-built-linux This reverts commit 6c5875843b87c3adea2beade9d1b8b3d4523900a. Work around Clang bug preventing ppc32 from booting. Link: https://bugs.llvm.org/show_bug.cgi?id=42762 Link: https://github.com/ClangBuiltLinux/linux/issues/593 Debugged-by: Nathan Chancellor <natechancellor@gmail.com> Reported-by: Nathan Chancellor <natechancellor@gmail.com> Reported-by: kbuild test robot <lkp@intel.com> Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> --- Changes V2 -> V3: * Just revert, as per Christophe. Changes V1 -> V2: * Change to ouput paremeter. arch/powerpc/include/asm/cache.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h index b3388d95f451..45e3137ccd71 100644 --- a/arch/powerpc/include/asm/cache.h +++ b/arch/powerpc/include/asm/cache.h @@ -107,22 +107,22 @@ extern void _set_L3CR(unsigned long); static inline void dcbz(void *addr) { - __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory"); + __asm__ __volatile__ ("dcbz 0, %0" : : "r"(addr) : "memory"); } static inline void dcbi(void *addr) { - __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory"); + __asm__ __volatile__ ("dcbi 0, %0" : : "r"(addr) : "memory"); } static inline void dcbf(void *addr) { - __asm__ __volatile__ ("dcbf %y0" : : "Z"(*(u8 *)addr) : "memory"); + __asm__ __volatile__ ("dcbf 0, %0" : : "r"(addr) : "memory"); } static inline void dcbst(void *addr) { - __asm__ __volatile__ ("dcbst %y0" : : "Z"(*(u8 *)addr) : "memory"); + __asm__ __volatile__ ("dcbst 0, %0" : : "r"(addr) : "memory"); } #endif /* !__ASSEMBLY__ */ #endif /* __KERNEL__ */ -- 2.23.0.rc1.153.gdeed80330f-goog ^ permalink raw reply related [flat|nested] 52+ messages in thread
* [PATCH v3] Revert "powerpc: slightly improve cache helpers" @ 2019-08-09 22:03 ` Nick Desaulniers 0 siblings, 0 replies; 52+ messages in thread From: Nick Desaulniers @ 2019-08-09 22:03 UTC (permalink / raw) To: mpe Cc: arnd, Nick Desaulniers, linux-kernel, clang-built-linux, Paul Mackerras, Nathan Chancellor, linuxppc-dev, kbuild test robot This reverts commit 6c5875843b87c3adea2beade9d1b8b3d4523900a. Work around Clang bug preventing ppc32 from booting. Link: https://bugs.llvm.org/show_bug.cgi?id=42762 Link: https://github.com/ClangBuiltLinux/linux/issues/593 Debugged-by: Nathan Chancellor <natechancellor@gmail.com> Reported-by: Nathan Chancellor <natechancellor@gmail.com> Reported-by: kbuild test robot <lkp@intel.com> Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> --- Changes V2 -> V3: * Just revert, as per Christophe. Changes V1 -> V2: * Change to ouput paremeter. arch/powerpc/include/asm/cache.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h index b3388d95f451..45e3137ccd71 100644 --- a/arch/powerpc/include/asm/cache.h +++ b/arch/powerpc/include/asm/cache.h @@ -107,22 +107,22 @@ extern void _set_L3CR(unsigned long); static inline void dcbz(void *addr) { - __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory"); + __asm__ __volatile__ ("dcbz 0, %0" : : "r"(addr) : "memory"); } static inline void dcbi(void *addr) { - __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory"); + __asm__ __volatile__ ("dcbi 0, %0" : : "r"(addr) : "memory"); } static inline void dcbf(void *addr) { - __asm__ __volatile__ ("dcbf %y0" : : "Z"(*(u8 *)addr) : "memory"); + __asm__ __volatile__ ("dcbf 0, %0" : : "r"(addr) : "memory"); } static inline void dcbst(void *addr) { - __asm__ __volatile__ ("dcbst %y0" : : "Z"(*(u8 *)addr) : "memory"); + __asm__ __volatile__ ("dcbst 0, %0" : : "r"(addr) : "memory"); } #endif /* !__ASSEMBLY__ */ #endif /* __KERNEL__ */ -- 2.23.0.rc1.153.gdeed80330f-goog ^ permalink raw reply related [flat|nested] 52+ messages in thread
* Re: [PATCH v3] Revert "powerpc: slightly improve cache helpers" 2019-08-09 22:03 ` Nick Desaulniers @ 2019-08-10 9:09 ` Michael Ellerman -1 siblings, 0 replies; 52+ messages in thread From: Michael Ellerman @ 2019-08-10 9:09 UTC (permalink / raw) To: Nick Desaulniers Cc: christophe.leroy, segher, arnd, Nick Desaulniers, Nathan Chancellor, kbuild test robot, Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev, linux-kernel, clang-built-linux Nick Desaulniers <ndesaulniers@google.com> writes: > This reverts commit 6c5875843b87c3adea2beade9d1b8b3d4523900a. > > Work around Clang bug preventing ppc32 from booting. > > Link: https://bugs.llvm.org/show_bug.cgi?id=42762 > Link: https://github.com/ClangBuiltLinux/linux/issues/593 > Debugged-by: Nathan Chancellor <natechancellor@gmail.com> > Reported-by: Nathan Chancellor <natechancellor@gmail.com> > Reported-by: kbuild test robot <lkp@intel.com> > Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr> > Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> > --- > Changes V2 -> V3: > * Just revert, as per Christophe. > Changes V1 -> V2: > * Change to ouput paremeter. Thanks. I actually already had this revert in my tree since ~10 days ago, but hadn't pushed it yet because the discussion was ongoing. So I'll just use that version, and ask Linus to pull it. cheers ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH v3] Revert "powerpc: slightly improve cache helpers" @ 2019-08-10 9:09 ` Michael Ellerman 0 siblings, 0 replies; 52+ messages in thread From: Michael Ellerman @ 2019-08-10 9:09 UTC (permalink / raw) To: Nick Desaulniers Cc: arnd, Nick Desaulniers, linux-kernel, clang-built-linux, Paul Mackerras, Nathan Chancellor, linuxppc-dev, kbuild test robot Nick Desaulniers <ndesaulniers@google.com> writes: > This reverts commit 6c5875843b87c3adea2beade9d1b8b3d4523900a. > > Work around Clang bug preventing ppc32 from booting. > > Link: https://bugs.llvm.org/show_bug.cgi?id=42762 > Link: https://github.com/ClangBuiltLinux/linux/issues/593 > Debugged-by: Nathan Chancellor <natechancellor@gmail.com> > Reported-by: Nathan Chancellor <natechancellor@gmail.com> > Reported-by: kbuild test robot <lkp@intel.com> > Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr> > Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> > --- > Changes V2 -> V3: > * Just revert, as per Christophe. > Changes V1 -> V2: > * Change to ouput paremeter. Thanks. I actually already had this revert in my tree since ~10 days ago, but hadn't pushed it yet because the discussion was ongoing. So I'll just use that version, and ask Linus to pull it. cheers ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: fix inline asm constraints for dcbz 2019-08-09 18:28 ` Arnd Bergmann @ 2019-08-09 21:55 ` Segher Boessenkool -1 siblings, 0 replies; 52+ messages in thread From: Segher Boessenkool @ 2019-08-09 21:55 UTC (permalink / raw) To: Arnd Bergmann Cc: Nick Desaulniers, Michael Ellerman, christophe leroy, Nathan Chancellor, kbuild test robot, Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev, Linux Kernel Mailing List, clang-built-linux On Fri, Aug 09, 2019 at 08:28:19PM +0200, Arnd Bergmann wrote: > On Fri, Aug 9, 2019 at 8:21 PM 'Nick Desaulniers' via Clang Built > Linux <clang-built-linux@googlegroups.com> wrote: > > > static inline void dcbz(void *addr) > > { > > - __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory"); > > + __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory"); > > } > > > > static inline void dcbi(void *addr) > > { > > - __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory"); > > + __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory"); > > } > > I think the result of the discussion was that an output argument only kind-of > makes sense for dcbz, but for the others it's really an input, and clang is > wrong in the way it handles the "Z" constraint by making a copy, which it > doesn't do for "m". Yes. And clang has probably miscompiled this in all kernels since we have used "Z" for the first time, in 2008 (0f3d6bcd391b). It is not necessarily fatal or at least not easily visible for the I/O accessors: it "just" gets memory ordering wrong slightly (it looks like it does the sync;tw;isync thing around an extra stack access, after it has performed the actual I/O as any other memory load, without any synchronisation). > I'm not sure whether it's correct to use "m" instead of "Z" here, which > would be a better workaround if that works. More importantly though, > clang really needs to be fixed to handle "Z" correctly. "m" allows offset addressing, which these insns do not. That is the same reason you need the "y" output modifier. "m" is wrong here. We have other memory constraints, but do those work with LLVM? Segher ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: fix inline asm constraints for dcbz @ 2019-08-09 21:55 ` Segher Boessenkool 0 siblings, 0 replies; 52+ messages in thread From: Segher Boessenkool @ 2019-08-09 21:55 UTC (permalink / raw) To: Arnd Bergmann Cc: kbuild test robot, Nick Desaulniers, Linux Kernel Mailing List, clang-built-linux, Paul Mackerras, Nathan Chancellor, linuxppc-dev On Fri, Aug 09, 2019 at 08:28:19PM +0200, Arnd Bergmann wrote: > On Fri, Aug 9, 2019 at 8:21 PM 'Nick Desaulniers' via Clang Built > Linux <clang-built-linux@googlegroups.com> wrote: > > > static inline void dcbz(void *addr) > > { > > - __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory"); > > + __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory"); > > } > > > > static inline void dcbi(void *addr) > > { > > - __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory"); > > + __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory"); > > } > > I think the result of the discussion was that an output argument only kind-of > makes sense for dcbz, but for the others it's really an input, and clang is > wrong in the way it handles the "Z" constraint by making a copy, which it > doesn't do for "m". Yes. And clang has probably miscompiled this in all kernels since we have used "Z" for the first time, in 2008 (0f3d6bcd391b). It is not necessarily fatal or at least not easily visible for the I/O accessors: it "just" gets memory ordering wrong slightly (it looks like it does the sync;tw;isync thing around an extra stack access, after it has performed the actual I/O as any other memory load, without any synchronisation). > I'm not sure whether it's correct to use "m" instead of "Z" here, which > would be a better workaround if that works. More importantly though, > clang really needs to be fixed to handle "Z" correctly. "m" allows offset addressing, which these insns do not. That is the same reason you need the "y" output modifier. "m" is wrong here. We have other memory constraints, but do those work with LLVM? Segher ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: fix inline asm constraints for dcbz 2019-08-09 18:21 ` Nick Desaulniers @ 2019-08-09 20:36 ` Nathan Chancellor -1 siblings, 0 replies; 52+ messages in thread From: Nathan Chancellor @ 2019-08-09 20:36 UTC (permalink / raw) To: Nick Desaulniers Cc: mpe, christophe.leroy, segher, arnd, kbuild test robot, Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev, linux-kernel, clang-built-linux [-- Attachment #1: Type: text/plain, Size: 1151 bytes --] On Fri, Aug 09, 2019 at 11:21:05AM -0700, Nick Desaulniers wrote: > The input parameter is modified, so it should be an output parameter > with "=" to make it so that a copy of the input is not made by Clang. > > Link: https://bugs.llvm.org/show_bug.cgi?id=42762 > Link: https://gcc.gnu.org/onlinedocs/gcc/Modifiers.html#Modifiers > Link: https://github.com/ClangBuiltLinux/linux/issues/593 > Link: https://godbolt.org/z/QwhZXi > Link: https://lore.kernel.org/lkml/20190721075846.GA97701@archlinux-threadripper/ > Fixes: 6c5875843b87 ("powerpc: slightly improve cache helpers") > Debugged-by: Nathan Chancellor <natechancellor@gmail.com> > Reported-by: Nathan Chancellor <natechancellor@gmail.com> > Reported-by: kbuild test robot <lkp@intel.com> > Suggested-by: Arnd Bergmann <arnd@arndb.de> > Suggested-by: Nathan Chancellor <natechancellor@gmail.com> > Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> I applied this patch as well as a revert of the original patch and both clang and GCC appear to generate the same code; I think a straight revert would be better. Crude testing script and the generated files attached. Cheers, Nathan [-- Attachment #2: tmp.bRmcRT0jd0.sh --] [-- Type: application/x-sh, Size: 2707 bytes --] [-- Attachment #3: testing-output.tar.gz --] [-- Type: application/gzip, Size: 16412 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: fix inline asm constraints for dcbz @ 2019-08-09 20:36 ` Nathan Chancellor 0 siblings, 0 replies; 52+ messages in thread From: Nathan Chancellor @ 2019-08-09 20:36 UTC (permalink / raw) To: Nick Desaulniers Cc: arnd, linux-kernel, clang-built-linux, Paul Mackerras, linuxppc-dev, kbuild test robot [-- Attachment #1: Type: text/plain, Size: 1151 bytes --] On Fri, Aug 09, 2019 at 11:21:05AM -0700, Nick Desaulniers wrote: > The input parameter is modified, so it should be an output parameter > with "=" to make it so that a copy of the input is not made by Clang. > > Link: https://bugs.llvm.org/show_bug.cgi?id=42762 > Link: https://gcc.gnu.org/onlinedocs/gcc/Modifiers.html#Modifiers > Link: https://github.com/ClangBuiltLinux/linux/issues/593 > Link: https://godbolt.org/z/QwhZXi > Link: https://lore.kernel.org/lkml/20190721075846.GA97701@archlinux-threadripper/ > Fixes: 6c5875843b87 ("powerpc: slightly improve cache helpers") > Debugged-by: Nathan Chancellor <natechancellor@gmail.com> > Reported-by: Nathan Chancellor <natechancellor@gmail.com> > Reported-by: kbuild test robot <lkp@intel.com> > Suggested-by: Arnd Bergmann <arnd@arndb.de> > Suggested-by: Nathan Chancellor <natechancellor@gmail.com> > Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> I applied this patch as well as a revert of the original patch and both clang and GCC appear to generate the same code; I think a straight revert would be better. Crude testing script and the generated files attached. Cheers, Nathan [-- Attachment #2: tmp.bRmcRT0jd0.sh --] [-- Type: application/x-sh, Size: 2707 bytes --] [-- Attachment #3: testing-output.tar.gz --] [-- Type: application/gzip, Size: 16412 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz 2019-07-30 7:34 ` Arnd Bergmann @ 2019-07-30 13:48 ` Segher Boessenkool -1 siblings, 0 replies; 52+ messages in thread From: Segher Boessenkool @ 2019-07-30 13:48 UTC (permalink / raw) To: Arnd Bergmann Cc: Nathan Chancellor, Nick Desaulniers, Michael Ellerman, christophe leroy, kbuild test robot, Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev, Linux Kernel Mailing List, clang-built-linux On Tue, Jul 30, 2019 at 09:34:28AM +0200, Arnd Bergmann wrote: > Upon a second look, I think the issue is that the "Z" is an input argument > when it should be an output. clang decides that it can make a copy of the > input and pass that into the inline asm. This is not the most efficient > way, but it seems entirely correct according to the constraints. Most dcb* (and all icb*) do not change the memory pointed to. The memory is an input here, logically as well, and that is obvious. > Changing it to an output "=Z" constraint seems to make it work: > > https://godbolt.org/z/FwEqHf > > Clang still doesn't use the optimum form, but it passes the correct pointer. As I said many times already, LLVM does not seem to treat all asm operands as lvalues. That is a bug. And it is critical for memory operands for example, as should be obvious if you look at at for a few seconds (you pass *that* memory, not a copy of it). The thing you pass has an identity. It's an lvalue. This is true for *all* inline asm operands, not just output operands and memory operands, but it is most obvious there. Or, LLVM might have a bug elsewhere. Either way, the asm is fine, and it has worked fine in GCC since forever. Changing this constraint to be an output constraint would just be obfuscation (we could change *all* operands to *everything* to be inout ("+") constraints, and it won't affect correctness, just the reader's sanity). Segher ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz @ 2019-07-30 13:48 ` Segher Boessenkool 0 siblings, 0 replies; 52+ messages in thread From: Segher Boessenkool @ 2019-07-30 13:48 UTC (permalink / raw) To: Arnd Bergmann Cc: kbuild test robot, Nick Desaulniers, Linux Kernel Mailing List, clang-built-linux, Paul Mackerras, Nathan Chancellor, linuxppc-dev On Tue, Jul 30, 2019 at 09:34:28AM +0200, Arnd Bergmann wrote: > Upon a second look, I think the issue is that the "Z" is an input argument > when it should be an output. clang decides that it can make a copy of the > input and pass that into the inline asm. This is not the most efficient > way, but it seems entirely correct according to the constraints. Most dcb* (and all icb*) do not change the memory pointed to. The memory is an input here, logically as well, and that is obvious. > Changing it to an output "=Z" constraint seems to make it work: > > https://godbolt.org/z/FwEqHf > > Clang still doesn't use the optimum form, but it passes the correct pointer. As I said many times already, LLVM does not seem to treat all asm operands as lvalues. That is a bug. And it is critical for memory operands for example, as should be obvious if you look at at for a few seconds (you pass *that* memory, not a copy of it). The thing you pass has an identity. It's an lvalue. This is true for *all* inline asm operands, not just output operands and memory operands, but it is most obvious there. Or, LLVM might have a bug elsewhere. Either way, the asm is fine, and it has worked fine in GCC since forever. Changing this constraint to be an output constraint would just be obfuscation (we could change *all* operands to *everything* to be inout ("+") constraints, and it won't affect correctness, just the reader's sanity). Segher ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz 2019-07-30 13:48 ` Segher Boessenkool @ 2019-07-30 14:30 ` Arnd Bergmann -1 siblings, 0 replies; 52+ messages in thread From: Arnd Bergmann @ 2019-07-30 14:30 UTC (permalink / raw) To: Segher Boessenkool Cc: Nathan Chancellor, Nick Desaulniers, Michael Ellerman, christophe leroy, kbuild test robot, Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev, Linux Kernel Mailing List, clang-built-linux On Tue, Jul 30, 2019 at 3:49 PM Segher Boessenkool <segher@kernel.crashing.org> wrote: > > On Tue, Jul 30, 2019 at 09:34:28AM +0200, Arnd Bergmann wrote: > > Upon a second look, I think the issue is that the "Z" is an input argument > > when it should be an output. clang decides that it can make a copy of the > > input and pass that into the inline asm. This is not the most efficient > > way, but it seems entirely correct according to the constraints. > > Most dcb* (and all icb*) do not change the memory pointed to. The > memory is an input here, logically as well, and that is obvious. Ah, right. I had only thought of dcbz here, but you are right that using an output makes little sense for the others. readl() is another example where powerpc currently uses "Z" for an input, which illustrates this even better. > > Changing it to an output "=Z" constraint seems to make it work: > > > > https://godbolt.org/z/FwEqHf > > > > Clang still doesn't use the optimum form, but it passes the correct pointer. > > As I said many times already, LLVM does not seem to treat all asm > operands as lvalues. That is a bug. And it is critical for memory > operands for example, as should be obvious if you look at at for a few > seconds (you pass *that* memory, not a copy of it). The thing you pass > has an identity. It's an lvalue. This is true for *all* inline asm > operands, not just output operands and memory operands, but it is most > obvious there. From experimentation, I would guess that llvm handles "m" correctly, but not "Z". See https://godbolt.org/z/uqfDx_ for another example. > Or, LLVM might have a bug elsewhere. > > Either way, the asm is fine, and it has worked fine in GCC since > forever. Changing this constraint to be an output constraint would > just be obfuscation (we could change *all* operands to *everything* to > be inout ("+") constraints, and it won't affect correctness, just the > reader's sanity). I would still argue that for dcbz specifically, an output makes more sense than an input, but as you say that does not solve the others. Arnd ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz @ 2019-07-30 14:30 ` Arnd Bergmann 0 siblings, 0 replies; 52+ messages in thread From: Arnd Bergmann @ 2019-07-30 14:30 UTC (permalink / raw) To: Segher Boessenkool Cc: kbuild test robot, Nick Desaulniers, Linux Kernel Mailing List, clang-built-linux, Paul Mackerras, Nathan Chancellor, linuxppc-dev On Tue, Jul 30, 2019 at 3:49 PM Segher Boessenkool <segher@kernel.crashing.org> wrote: > > On Tue, Jul 30, 2019 at 09:34:28AM +0200, Arnd Bergmann wrote: > > Upon a second look, I think the issue is that the "Z" is an input argument > > when it should be an output. clang decides that it can make a copy of the > > input and pass that into the inline asm. This is not the most efficient > > way, but it seems entirely correct according to the constraints. > > Most dcb* (and all icb*) do not change the memory pointed to. The > memory is an input here, logically as well, and that is obvious. Ah, right. I had only thought of dcbz here, but you are right that using an output makes little sense for the others. readl() is another example where powerpc currently uses "Z" for an input, which illustrates this even better. > > Changing it to an output "=Z" constraint seems to make it work: > > > > https://godbolt.org/z/FwEqHf > > > > Clang still doesn't use the optimum form, but it passes the correct pointer. > > As I said many times already, LLVM does not seem to treat all asm > operands as lvalues. That is a bug. And it is critical for memory > operands for example, as should be obvious if you look at at for a few > seconds (you pass *that* memory, not a copy of it). The thing you pass > has an identity. It's an lvalue. This is true for *all* inline asm > operands, not just output operands and memory operands, but it is most > obvious there. From experimentation, I would guess that llvm handles "m" correctly, but not "Z". See https://godbolt.org/z/uqfDx_ for another example. > Or, LLVM might have a bug elsewhere. > > Either way, the asm is fine, and it has worked fine in GCC since > forever. Changing this constraint to be an output constraint would > just be obfuscation (we could change *all* operands to *everything* to > be inout ("+") constraints, and it won't affect correctness, just the > reader's sanity). I would still argue that for dcbz specifically, an output makes more sense than an input, but as you say that does not solve the others. Arnd ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz 2019-07-30 14:30 ` Arnd Bergmann @ 2019-07-30 16:16 ` Segher Boessenkool -1 siblings, 0 replies; 52+ messages in thread From: Segher Boessenkool @ 2019-07-30 16:16 UTC (permalink / raw) To: Arnd Bergmann Cc: Nathan Chancellor, Nick Desaulniers, Michael Ellerman, christophe leroy, kbuild test robot, Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev, Linux Kernel Mailing List, clang-built-linux On Tue, Jul 30, 2019 at 04:30:29PM +0200, Arnd Bergmann wrote: > On Tue, Jul 30, 2019 at 3:49 PM Segher Boessenkool > <segher@kernel.crashing.org> wrote: > > > > On Tue, Jul 30, 2019 at 09:34:28AM +0200, Arnd Bergmann wrote: > > > Upon a second look, I think the issue is that the "Z" is an input argument > > > when it should be an output. clang decides that it can make a copy of the > > > input and pass that into the inline asm. This is not the most efficient > > > way, but it seems entirely correct according to the constraints. > > > > Most dcb* (and all icb*) do not change the memory pointed to. The > > memory is an input here, logically as well, and that is obvious. > > Ah, right. I had only thought of dcbz here, but you are right that using > an output makes little sense for the others. > > readl() is another example where powerpc currently uses "Z" for an > input, which illustrates this even better. in_le32 and friends? Yeah, huh. If LLVM copies that to the stack as well, its (not byte reversing) read will be atomic just fine, so things will still work correctly. The things defined with DEF_MMIO_IN_D (instead of DEF_MMIO_IN_X) do not look like they will work correctly if an update form address is chosen, but that won't happen because the constraint is "m" instead of "m<>", making the %Un pretty useless (it will always be the empty string). > > As I said many times already, LLVM does not seem to treat all asm > > operands as lvalues. That is a bug. And it is critical for memory > > operands for example, as should be obvious if you look at at for a few > > seconds (you pass *that* memory, not a copy of it). The thing you pass > > has an identity. It's an lvalue. This is true for *all* inline asm > > operands, not just output operands and memory operands, but it is most > > obvious there. > > >From experimentation, I would guess that llvm handles "m" correctly, but > not "Z". See https://godbolt.org/z/uqfDx_ for another example. Yeah, it does not treat "Z" as a memory constraint apparently, and it special cases output operands and memory operands to be lvalues, but does not do that for everything else as it should. > > Or, LLVM might have a bug elsewhere. > > > > Either way, the asm is fine, and it has worked fine in GCC since > > forever. Changing this constraint to be an output constraint would > > just be obfuscation (we could change *all* operands to *everything* to > > be inout ("+") constraints, and it won't affect correctness, just the > > reader's sanity). > > I would still argue that for dcbz specifically, an output makes more > sense than an input, but as you say that does not solve the others. An output would be somewhat misleading. dcbz zeroes the whole aligned cache block sized region of memory its operand points into. The kernel dcbz functions do not easily know the cache block size I think, and besides, you want a "memory" clobber anyway, also for the other dcb*, so it won't help anything. Also, the compiler can almost never use the extra info ("affects the aligned 32B or 128B block this points into") usefully anyway; it will usually see it as "can alias pretty much anything". Just use a "memory" clobber :-/ Segher ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz @ 2019-07-30 16:16 ` Segher Boessenkool 0 siblings, 0 replies; 52+ messages in thread From: Segher Boessenkool @ 2019-07-30 16:16 UTC (permalink / raw) To: Arnd Bergmann Cc: kbuild test robot, Nick Desaulniers, Linux Kernel Mailing List, clang-built-linux, Paul Mackerras, Nathan Chancellor, linuxppc-dev On Tue, Jul 30, 2019 at 04:30:29PM +0200, Arnd Bergmann wrote: > On Tue, Jul 30, 2019 at 3:49 PM Segher Boessenkool > <segher@kernel.crashing.org> wrote: > > > > On Tue, Jul 30, 2019 at 09:34:28AM +0200, Arnd Bergmann wrote: > > > Upon a second look, I think the issue is that the "Z" is an input argument > > > when it should be an output. clang decides that it can make a copy of the > > > input and pass that into the inline asm. This is not the most efficient > > > way, but it seems entirely correct according to the constraints. > > > > Most dcb* (and all icb*) do not change the memory pointed to. The > > memory is an input here, logically as well, and that is obvious. > > Ah, right. I had only thought of dcbz here, but you are right that using > an output makes little sense for the others. > > readl() is another example where powerpc currently uses "Z" for an > input, which illustrates this even better. in_le32 and friends? Yeah, huh. If LLVM copies that to the stack as well, its (not byte reversing) read will be atomic just fine, so things will still work correctly. The things defined with DEF_MMIO_IN_D (instead of DEF_MMIO_IN_X) do not look like they will work correctly if an update form address is chosen, but that won't happen because the constraint is "m" instead of "m<>", making the %Un pretty useless (it will always be the empty string). > > As I said many times already, LLVM does not seem to treat all asm > > operands as lvalues. That is a bug. And it is critical for memory > > operands for example, as should be obvious if you look at at for a few > > seconds (you pass *that* memory, not a copy of it). The thing you pass > > has an identity. It's an lvalue. This is true for *all* inline asm > > operands, not just output operands and memory operands, but it is most > > obvious there. > > >From experimentation, I would guess that llvm handles "m" correctly, but > not "Z". See https://godbolt.org/z/uqfDx_ for another example. Yeah, it does not treat "Z" as a memory constraint apparently, and it special cases output operands and memory operands to be lvalues, but does not do that for everything else as it should. > > Or, LLVM might have a bug elsewhere. > > > > Either way, the asm is fine, and it has worked fine in GCC since > > forever. Changing this constraint to be an output constraint would > > just be obfuscation (we could change *all* operands to *everything* to > > be inout ("+") constraints, and it won't affect correctness, just the > > reader's sanity). > > I would still argue that for dcbz specifically, an output makes more > sense than an input, but as you say that does not solve the others. An output would be somewhat misleading. dcbz zeroes the whole aligned cache block sized region of memory its operand points into. The kernel dcbz functions do not easily know the cache block size I think, and besides, you want a "memory" clobber anyway, also for the other dcb*, so it won't help anything. Also, the compiler can almost never use the extra info ("affects the aligned 32B or 128B block this points into") usefully anyway; it will usually see it as "can alias pretty much anything". Just use a "memory" clobber :-/ Segher ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz 2019-07-30 16:16 ` Segher Boessenkool (?) @ 2019-07-30 17:07 ` Segher Boessenkool 2019-07-30 18:24 ` Arnd Bergmann -1 siblings, 1 reply; 52+ messages in thread From: Segher Boessenkool @ 2019-07-30 17:07 UTC (permalink / raw) To: Arnd Bergmann Cc: kbuild test robot, Nick Desaulniers, Linux Kernel Mailing List, clang-built-linux, Paul Mackerras, Nathan Chancellor, linuxppc-dev On Tue, Jul 30, 2019 at 11:16:37AM -0500, Segher Boessenkool wrote: > in_le32 and friends? Yeah, huh. If LLVM copies that to the stack as > well, its (not byte reversing) read will be atomic just fine, so things > will still work correctly. > > The things defined with DEF_MMIO_IN_D (instead of DEF_MMIO_IN_X) do not > look like they will work correctly if an update form address is chosen, > but that won't happen because the constraint is "m" instead of "m<>", > making the %Un pretty useless (it will always be the empty string). Btw, this is true since GCC 4.8; before 4.8, plain "m" *could* have an automodify (autoinc, autodec, etc.) side effect. What is the minimum GCC version required, these days? https://gcc.gnu.org/PR44492 https://gcc.gnu.org/r161328 Segher ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz 2019-07-30 17:07 ` Segher Boessenkool @ 2019-07-30 18:24 ` Arnd Bergmann 0 siblings, 0 replies; 52+ messages in thread From: Arnd Bergmann @ 2019-07-30 18:24 UTC (permalink / raw) To: Segher Boessenkool Cc: kbuild test robot, Nick Desaulniers, Linux Kernel Mailing List, clang-built-linux, Paul Mackerras, Nathan Chancellor, linuxppc-dev On Tue, Jul 30, 2019 at 7:07 PM Segher Boessenkool <segher@kernel.crashing.org> wrote: > > On Tue, Jul 30, 2019 at 11:16:37AM -0500, Segher Boessenkool wrote: > > in_le32 and friends? Yeah, huh. If LLVM copies that to the stack as > > well, its (not byte reversing) read will be atomic just fine, so things > > will still work correctly. > > > > The things defined with DEF_MMIO_IN_D (instead of DEF_MMIO_IN_X) do not > > look like they will work correctly if an update form address is chosen, > > but that won't happen because the constraint is "m" instead of "m<>", > > making the %Un pretty useless (it will always be the empty string). > > Btw, this is true since GCC 4.8; before 4.8, plain "m" *could* have an > automodify (autoinc, autodec, etc.) side effect. What is the minimum > GCC version required, these days? gcc-4.6, but an architecture can require a higher version. Arnd ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz 2019-07-30 16:16 ` Segher Boessenkool @ 2019-07-30 18:24 ` Arnd Bergmann -1 siblings, 0 replies; 52+ messages in thread From: Arnd Bergmann @ 2019-07-30 18:24 UTC (permalink / raw) To: Segher Boessenkool Cc: Nathan Chancellor, Nick Desaulniers, Michael Ellerman, christophe leroy, kbuild test robot, Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev, Linux Kernel Mailing List, clang-built-linux On Tue, Jul 30, 2019 at 6:16 PM Segher Boessenkool <segher@kernel.crashing.org> wrote: > > On Tue, Jul 30, 2019 at 04:30:29PM +0200, Arnd Bergmann wrote: > > On Tue, Jul 30, 2019 at 3:49 PM Segher Boessenkool > > <segher@kernel.crashing.org> wrote: > > > > > > On Tue, Jul 30, 2019 at 09:34:28AM +0200, Arnd Bergmann wrote: > > > > Upon a second look, I think the issue is that the "Z" is an input argument > > > > when it should be an output. clang decides that it can make a copy of the > > > > input and pass that into the inline asm. This is not the most efficient > > > > way, but it seems entirely correct according to the constraints. > > > > > > Most dcb* (and all icb*) do not change the memory pointed to. The > > > memory is an input here, logically as well, and that is obvious. > > > > Ah, right. I had only thought of dcbz here, but you are right that using > > an output makes little sense for the others. > > > > readl() is another example where powerpc currently uses "Z" for an > > input, which illustrates this even better. > > in_le32 and friends? Yeah, huh. If LLVM copies that to the stack as > well, its (not byte reversing) read will be atomic just fine, so things > will still work correctly. byteorder is fine, the problem I was thinking of is when moving the load/store instructions around the barriers that synchronize with DMA, or turning them into different-size accesses. Changing two consecutive 16-bit mmio reads into an unaligned 32-bit read will rarely have the intended effect ;-) Arnd ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz @ 2019-07-30 18:24 ` Arnd Bergmann 0 siblings, 0 replies; 52+ messages in thread From: Arnd Bergmann @ 2019-07-30 18:24 UTC (permalink / raw) To: Segher Boessenkool Cc: kbuild test robot, Nick Desaulniers, Linux Kernel Mailing List, clang-built-linux, Paul Mackerras, Nathan Chancellor, linuxppc-dev On Tue, Jul 30, 2019 at 6:16 PM Segher Boessenkool <segher@kernel.crashing.org> wrote: > > On Tue, Jul 30, 2019 at 04:30:29PM +0200, Arnd Bergmann wrote: > > On Tue, Jul 30, 2019 at 3:49 PM Segher Boessenkool > > <segher@kernel.crashing.org> wrote: > > > > > > On Tue, Jul 30, 2019 at 09:34:28AM +0200, Arnd Bergmann wrote: > > > > Upon a second look, I think the issue is that the "Z" is an input argument > > > > when it should be an output. clang decides that it can make a copy of the > > > > input and pass that into the inline asm. This is not the most efficient > > > > way, but it seems entirely correct according to the constraints. > > > > > > Most dcb* (and all icb*) do not change the memory pointed to. The > > > memory is an input here, logically as well, and that is obvious. > > > > Ah, right. I had only thought of dcbz here, but you are right that using > > an output makes little sense for the others. > > > > readl() is another example where powerpc currently uses "Z" for an > > input, which illustrates this even better. > > in_le32 and friends? Yeah, huh. If LLVM copies that to the stack as > well, its (not byte reversing) read will be atomic just fine, so things > will still work correctly. byteorder is fine, the problem I was thinking of is when moving the load/store instructions around the barriers that synchronize with DMA, or turning them into different-size accesses. Changing two consecutive 16-bit mmio reads into an unaligned 32-bit read will rarely have the intended effect ;-) Arnd ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz 2019-07-30 18:24 ` Arnd Bergmann @ 2019-07-30 19:35 ` Segher Boessenkool -1 siblings, 0 replies; 52+ messages in thread From: Segher Boessenkool @ 2019-07-30 19:35 UTC (permalink / raw) To: Arnd Bergmann Cc: Nathan Chancellor, Nick Desaulniers, Michael Ellerman, christophe leroy, kbuild test robot, Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev, Linux Kernel Mailing List, clang-built-linux On Tue, Jul 30, 2019 at 08:24:14PM +0200, Arnd Bergmann wrote: > On Tue, Jul 30, 2019 at 6:16 PM Segher Boessenkool > <segher@kernel.crashing.org> wrote: > > in_le32 and friends? Yeah, huh. If LLVM copies that to the stack as > > well, its (not byte reversing) read will be atomic just fine, so things > > will still work correctly. > > byteorder is fine, the problem I was thinking of is when moving the load/store > instructions around the barriers that synchronize with DMA, or turning > them into different-size accesses. Changing two consecutive 16-bit mmio reads > into an unaligned 32-bit read will rarely have the intended effect ;-) Most such barriers will also work on the copy accesses, I think. But yes it depends on exactly how it is written. The {in,out}_{be,le}<N> ones use sync;store for out and sync;load;trap;isync for in, so they should be safe ;-) (Well, almost -- writes to I/O will not necessarily actually happen before other stores, not from these macros alone at least). Should be pretty easy to check what LLVM makes of this? Segher ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz @ 2019-07-30 19:35 ` Segher Boessenkool 0 siblings, 0 replies; 52+ messages in thread From: Segher Boessenkool @ 2019-07-30 19:35 UTC (permalink / raw) To: Arnd Bergmann Cc: kbuild test robot, Nick Desaulniers, Linux Kernel Mailing List, clang-built-linux, Paul Mackerras, Nathan Chancellor, linuxppc-dev On Tue, Jul 30, 2019 at 08:24:14PM +0200, Arnd Bergmann wrote: > On Tue, Jul 30, 2019 at 6:16 PM Segher Boessenkool > <segher@kernel.crashing.org> wrote: > > in_le32 and friends? Yeah, huh. If LLVM copies that to the stack as > > well, its (not byte reversing) read will be atomic just fine, so things > > will still work correctly. > > byteorder is fine, the problem I was thinking of is when moving the load/store > instructions around the barriers that synchronize with DMA, or turning > them into different-size accesses. Changing two consecutive 16-bit mmio reads > into an unaligned 32-bit read will rarely have the intended effect ;-) Most such barriers will also work on the copy accesses, I think. But yes it depends on exactly how it is written. The {in,out}_{be,le}<N> ones use sync;store for out and sync;load;trap;isync for in, so they should be safe ;-) (Well, almost -- writes to I/O will not necessarily actually happen before other stores, not from these macros alone at least). Should be pretty easy to check what LLVM makes of this? Segher ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz 2019-07-29 20:32 ` Nathan Chancellor @ 2019-07-30 5:31 ` Christophe Leroy -1 siblings, 0 replies; 52+ messages in thread From: Christophe Leroy @ 2019-07-30 5:31 UTC (permalink / raw) To: Nathan Chancellor, Nick Desaulniers Cc: mpe, segher, arnd, kbuild test robot, Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev, linux-kernel, clang-built-linux Le 29/07/2019 à 22:32, Nathan Chancellor a écrit : > On Mon, Jul 29, 2019 at 01:25:41PM -0700, Nick Desaulniers wrote: >> Commit 6c5875843b87 ("powerpc: slightly improve cache helpers") exposed >> what looks like a codegen bug in Clang's handling of `%y` output >> template with `Z` constraint. This is resulting in panics during boot >> for 32b powerpc builds w/ Clang, as reported by our CI. >> >> Add back the original code that worked behind a preprocessor check for >> __clang__ until we can fix LLVM. >> >> Further, it seems that clang allnoconfig builds are unhappy with `Z`, as >> reported by 0day bot. This is likely because Clang warns about inline >> asm constraints when the constraint requires inlining to be semantically >> valid. >> >> Link: https://bugs.llvm.org/show_bug.cgi?id=42762 >> Link: https://github.com/ClangBuiltLinux/linux/issues/593 >> Link: https://lore.kernel.org/lkml/20190721075846.GA97701@archlinux-threadripper/ >> Debugged-by: Nathan Chancellor <natechancellor@gmail.com> >> Reported-by: Nathan Chancellor <natechancellor@gmail.com> >> Reported-by: kbuild test robot <lkp@intel.com> >> Suggested-by: Nathan Chancellor <natechancellor@gmail.com> >> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> >> --- >> Alternatively, we could just revert 6c5875843b87. It seems that GCC >> generates the same code for these functions for out of line versions. >> But I'm not sure how the inlined code generated would be affected. > > For the record: > > https://godbolt.org/z/z57VU7 > > This seems consistent with what Michael found so I don't think a revert > is entirely unreasonable. Your example functions are too simple to show anything. The functions takes only one parameter so of course GCC won't use two registers allthough given the opportunity. Christophe > > Either way: > > Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> > ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz @ 2019-07-30 5:31 ` Christophe Leroy 0 siblings, 0 replies; 52+ messages in thread From: Christophe Leroy @ 2019-07-30 5:31 UTC (permalink / raw) To: Nathan Chancellor, Nick Desaulniers Cc: arnd, linux-kernel, clang-built-linux, Paul Mackerras, linuxppc-dev, kbuild test robot Le 29/07/2019 à 22:32, Nathan Chancellor a écrit : > On Mon, Jul 29, 2019 at 01:25:41PM -0700, Nick Desaulniers wrote: >> Commit 6c5875843b87 ("powerpc: slightly improve cache helpers") exposed >> what looks like a codegen bug in Clang's handling of `%y` output >> template with `Z` constraint. This is resulting in panics during boot >> for 32b powerpc builds w/ Clang, as reported by our CI. >> >> Add back the original code that worked behind a preprocessor check for >> __clang__ until we can fix LLVM. >> >> Further, it seems that clang allnoconfig builds are unhappy with `Z`, as >> reported by 0day bot. This is likely because Clang warns about inline >> asm constraints when the constraint requires inlining to be semantically >> valid. >> >> Link: https://bugs.llvm.org/show_bug.cgi?id=42762 >> Link: https://github.com/ClangBuiltLinux/linux/issues/593 >> Link: https://lore.kernel.org/lkml/20190721075846.GA97701@archlinux-threadripper/ >> Debugged-by: Nathan Chancellor <natechancellor@gmail.com> >> Reported-by: Nathan Chancellor <natechancellor@gmail.com> >> Reported-by: kbuild test robot <lkp@intel.com> >> Suggested-by: Nathan Chancellor <natechancellor@gmail.com> >> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> >> --- >> Alternatively, we could just revert 6c5875843b87. It seems that GCC >> generates the same code for these functions for out of line versions. >> But I'm not sure how the inlined code generated would be affected. > > For the record: > > https://godbolt.org/z/z57VU7 > > This seems consistent with what Michael found so I don't think a revert > is entirely unreasonable. Your example functions are too simple to show anything. The functions takes only one parameter so of course GCC won't use two registers allthough given the opportunity. Christophe > > Either way: > > Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> > ^ permalink raw reply [flat|nested] 52+ messages in thread
end of thread, other threads:[~2019-08-10 9:11 UTC | newest] Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-07-29 20:25 [PATCH] powerpc: workaround clang codegen bug in dcbz Nick Desaulniers 2019-07-29 20:25 ` Nick Desaulniers 2019-07-29 20:32 ` Nathan Chancellor 2019-07-29 20:32 ` Nathan Chancellor 2019-07-29 20:45 ` Nick Desaulniers 2019-07-29 20:45 ` Nick Desaulniers 2019-07-29 20:47 ` Nathan Chancellor 2019-07-29 20:47 ` Nathan Chancellor 2019-07-29 20:49 ` Nick Desaulniers 2019-07-29 20:49 ` Nick Desaulniers 2019-07-29 21:52 ` Segher Boessenkool 2019-07-29 21:52 ` Segher Boessenkool 2019-07-30 7:34 ` Arnd Bergmann 2019-07-30 7:34 ` Arnd Bergmann 2019-07-30 11:17 ` Michael Ellerman 2019-07-30 11:17 ` Michael Ellerman 2019-08-09 18:21 ` [PATCH] powerpc: fix inline asm constraints for dcbz Nick Desaulniers 2019-08-09 18:21 ` Nick Desaulniers 2019-08-09 18:28 ` Arnd Bergmann 2019-08-09 18:28 ` Arnd Bergmann 2019-08-09 20:03 ` Christophe Leroy 2019-08-09 20:03 ` Christophe Leroy 2019-08-09 20:12 ` Arnd Bergmann 2019-08-09 20:12 ` Arnd Bergmann 2019-08-09 22:03 ` Nick Desaulniers 2019-08-09 22:03 ` Nick Desaulniers 2019-08-09 22:10 ` Segher Boessenkool 2019-08-09 22:10 ` Segher Boessenkool 2019-08-09 22:00 ` Segher Boessenkool 2019-08-09 22:00 ` Segher Boessenkool 2019-08-09 22:03 ` [PATCH v3] Revert "powerpc: slightly improve cache helpers" Nick Desaulniers 2019-08-09 22:03 ` Nick Desaulniers 2019-08-10 9:09 ` Michael Ellerman 2019-08-10 9:09 ` Michael Ellerman 2019-08-09 21:55 ` [PATCH] powerpc: fix inline asm constraints for dcbz Segher Boessenkool 2019-08-09 21:55 ` Segher Boessenkool 2019-08-09 20:36 ` Nathan Chancellor 2019-08-09 20:36 ` Nathan Chancellor 2019-07-30 13:48 ` [PATCH] powerpc: workaround clang codegen bug in dcbz Segher Boessenkool 2019-07-30 13:48 ` Segher Boessenkool 2019-07-30 14:30 ` Arnd Bergmann 2019-07-30 14:30 ` Arnd Bergmann 2019-07-30 16:16 ` Segher Boessenkool 2019-07-30 16:16 ` Segher Boessenkool 2019-07-30 17:07 ` Segher Boessenkool 2019-07-30 18:24 ` Arnd Bergmann 2019-07-30 18:24 ` Arnd Bergmann 2019-07-30 18:24 ` Arnd Bergmann 2019-07-30 19:35 ` Segher Boessenkool 2019-07-30 19:35 ` Segher Boessenkool 2019-07-30 5:31 ` Christophe Leroy 2019-07-30 5:31 ` Christophe Leroy
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.