linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] powerpc: workaround clang codegen bug in dcbz
@ 2019-07-29 20:25 Nick Desaulniers
  2019-07-29 20:32 ` Nathan Chancellor
  0 siblings, 1 reply; 27+ messages in thread
From: Nick Desaulniers @ 2019-07-29 20:25 UTC (permalink / raw)
  To: mpe
  Cc: christophe.leroy, segher, arnd, Nick Desaulniers,
	Nathan Chancellor, kbuild test robot, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, linux-kernel, clang-built-linux

Commit 6c5875843b87 ("powerpc: slightly improve cache helpers") exposed
what looks like a codegen bug in Clang's handling of `%y` output
template with `Z` constraint. This is resulting in panics during boot
for 32b powerpc builds w/ Clang, as reported by our CI.

Add back the original code that worked behind a preprocessor check for
__clang__ until we can fix LLVM.

Further, it seems that clang allnoconfig builds are unhappy with `Z`, as
reported by 0day bot. This is likely because Clang warns about inline
asm constraints when the constraint requires inlining to be semantically
valid.

Link: https://bugs.llvm.org/show_bug.cgi?id=42762
Link: https://github.com/ClangBuiltLinux/linux/issues/593
Link: https://lore.kernel.org/lkml/20190721075846.GA97701@archlinux-threadripper/
Debugged-by: Nathan Chancellor <natechancellor@gmail.com>
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Reported-by: kbuild test robot <lkp@intel.com>
Suggested-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
---
Alternatively, we could just revert 6c5875843b87. It seems that GCC
generates the same code for these functions for out of line versions.
But I'm not sure how the inlined code generated would be affected.

 arch/powerpc/include/asm/cache.h | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h
index b3388d95f451..72983da94dce 100644
--- a/arch/powerpc/include/asm/cache.h
+++ b/arch/powerpc/include/asm/cache.h
@@ -105,6 +105,30 @@ extern void _set_L3CR(unsigned long);
 #define _set_L3CR(val)	do { } while(0)
 #endif
 
+/*
+ * Workaround for https://bugs.llvm.org/show_bug.cgi?id=42762.
+ */
+#ifdef __clang__
+static inline void dcbz(void *addr)
+{
+	__asm__ __volatile__ ("dcbz 0, %0" : : "r"(addr) : "memory");
+}
+
+static inline void dcbi(void *addr)
+{
+	__asm__ __volatile__ ("dcbi 0, %0" : : "r"(addr) : "memory");
+}
+
+static inline void dcbf(void *addr)
+{
+	__asm__ __volatile__ ("dcbf 0, %0" : : "r"(addr) : "memory");
+}
+
+static inline void dcbst(void *addr)
+{
+	__asm__ __volatile__ ("dcbst 0, %0" : : "r"(addr) : "memory");
+}
+#else
 static inline void dcbz(void *addr)
 {
 	__asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory");
@@ -124,6 +148,7 @@ static inline void dcbst(void *addr)
 {
 	__asm__ __volatile__ ("dcbst %y0" : : "Z"(*(u8 *)addr) : "memory");
 }
+#endif /* __clang__ */
 #endif /* !__ASSEMBLY__ */
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_CACHE_H */
-- 
2.22.0.709.g102302147b-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz
  2019-07-29 20:25 [PATCH] powerpc: workaround clang codegen bug in dcbz Nick Desaulniers
@ 2019-07-29 20:32 ` Nathan Chancellor
  2019-07-29 20:45   ` Nick Desaulniers
                     ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Nathan Chancellor @ 2019-07-29 20:32 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: mpe, christophe.leroy, segher, arnd, kbuild test robot,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	linux-kernel, clang-built-linux

On Mon, Jul 29, 2019 at 01:25:41PM -0700, Nick Desaulniers wrote:
> Commit 6c5875843b87 ("powerpc: slightly improve cache helpers") exposed
> what looks like a codegen bug in Clang's handling of `%y` output
> template with `Z` constraint. This is resulting in panics during boot
> for 32b powerpc builds w/ Clang, as reported by our CI.
> 
> Add back the original code that worked behind a preprocessor check for
> __clang__ until we can fix LLVM.
> 
> Further, it seems that clang allnoconfig builds are unhappy with `Z`, as
> reported by 0day bot. This is likely because Clang warns about inline
> asm constraints when the constraint requires inlining to be semantically
> valid.
> 
> Link: https://bugs.llvm.org/show_bug.cgi?id=42762
> Link: https://github.com/ClangBuiltLinux/linux/issues/593
> Link: https://lore.kernel.org/lkml/20190721075846.GA97701@archlinux-threadripper/
> Debugged-by: Nathan Chancellor <natechancellor@gmail.com>
> Reported-by: Nathan Chancellor <natechancellor@gmail.com>
> Reported-by: kbuild test robot <lkp@intel.com>
> Suggested-by: Nathan Chancellor <natechancellor@gmail.com>
> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
> ---
> Alternatively, we could just revert 6c5875843b87. It seems that GCC
> generates the same code for these functions for out of line versions.
> But I'm not sure how the inlined code generated would be affected.

For the record:

https://godbolt.org/z/z57VU7

This seems consistent with what Michael found so I don't think a revert
is entirely unreasonable.

Either way:

Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz
  2019-07-29 20:32 ` Nathan Chancellor
@ 2019-07-29 20:45   ` Nick Desaulniers
  2019-07-29 20:47     ` Nathan Chancellor
  2019-07-29 21:52   ` Segher Boessenkool
  2019-07-30  5:31   ` Christophe Leroy
  2 siblings, 1 reply; 27+ messages in thread
From: Nick Desaulniers @ 2019-07-29 20:45 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Michael Ellerman, Christophe Leroy, Segher Boessenkool,
	Arnd Bergmann, kbuild test robot, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, LKML, clang-built-linux

On Mon, Jul 29, 2019 at 1:32 PM Nathan Chancellor
<natechancellor@gmail.com> wrote:
>
> On Mon, Jul 29, 2019 at 01:25:41PM -0700, Nick Desaulniers wrote:
> > But I'm not sure how the inlined code generated would be affected.
>
> For the record:
>
> https://godbolt.org/z/z57VU7
>
> This seems consistent with what Michael found so I don't think a revert
> is entirely unreasonable.

Thanks for debugging/reporting/testing and the Godbolt link which
clearly shows that the codegen for out of line versions is no
different.  The case I can't comment on is what happens when those
`static inline` functions get inlined (maybe the original patch
improves those cases?).
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz
  2019-07-29 20:45   ` Nick Desaulniers
@ 2019-07-29 20:47     ` Nathan Chancellor
  2019-07-29 20:49       ` Nick Desaulniers
  0 siblings, 1 reply; 27+ messages in thread
From: Nathan Chancellor @ 2019-07-29 20:47 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Michael Ellerman, Christophe Leroy, Segher Boessenkool,
	Arnd Bergmann, kbuild test robot, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, LKML, clang-built-linux

On Mon, Jul 29, 2019 at 01:45:35PM -0700, Nick Desaulniers wrote:
> On Mon, Jul 29, 2019 at 1:32 PM Nathan Chancellor
> <natechancellor@gmail.com> wrote:
> >
> > On Mon, Jul 29, 2019 at 01:25:41PM -0700, Nick Desaulniers wrote:
> > > But I'm not sure how the inlined code generated would be affected.
> >
> > For the record:
> >
> > https://godbolt.org/z/z57VU7
> >
> > This seems consistent with what Michael found so I don't think a revert
> > is entirely unreasonable.
> 
> Thanks for debugging/reporting/testing and the Godbolt link which
> clearly shows that the codegen for out of line versions is no
> different.  The case I can't comment on is what happens when those
> `static inline` functions get inlined (maybe the original patch
> improves those cases?).
> -- 
> Thanks,
> ~Nick Desaulniers

I'll try to build with various versions of GCC and compare the
disassembly of the one problematic location that I found and see
what it looks like.

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz
  2019-07-29 20:47     ` Nathan Chancellor
@ 2019-07-29 20:49       ` Nick Desaulniers
  0 siblings, 0 replies; 27+ messages in thread
From: Nick Desaulniers @ 2019-07-29 20:49 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Michael Ellerman, Christophe Leroy, Segher Boessenkool,
	Arnd Bergmann, kbuild test robot, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, LKML, clang-built-linux

On Mon, Jul 29, 2019 at 1:47 PM Nathan Chancellor
<natechancellor@gmail.com> wrote:
>
> On Mon, Jul 29, 2019 at 01:45:35PM -0700, Nick Desaulniers wrote:
> > On Mon, Jul 29, 2019 at 1:32 PM Nathan Chancellor
> > <natechancellor@gmail.com> wrote:
> > >
> > > On Mon, Jul 29, 2019 at 01:25:41PM -0700, Nick Desaulniers wrote:
> > > > But I'm not sure how the inlined code generated would be affected.
> > >
> > > For the record:
> > >
> > > https://godbolt.org/z/z57VU7
> > >
> > > This seems consistent with what Michael found so I don't think a revert
> > > is entirely unreasonable.
> >
> > Thanks for debugging/reporting/testing and the Godbolt link which
> > clearly shows that the codegen for out of line versions is no
> > different.  The case I can't comment on is what happens when those
> > `static inline` functions get inlined (maybe the original patch
> > improves those cases?).
> > --
> > Thanks,
> > ~Nick Desaulniers
>
> I'll try to build with various versions of GCC and compare the
> disassembly of the one problematic location that I found and see
> what it looks like.

Also, guess I should have included the tag:
Fixes: 6c5875843b87 ("powerpc: slightly improve cache helpers")
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz
  2019-07-29 20:32 ` Nathan Chancellor
  2019-07-29 20:45   ` Nick Desaulniers
@ 2019-07-29 21:52   ` Segher Boessenkool
  2019-07-30  7:34     ` Arnd Bergmann
  2019-07-30  5:31   ` Christophe Leroy
  2 siblings, 1 reply; 27+ messages in thread
From: Segher Boessenkool @ 2019-07-29 21:52 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Nick Desaulniers, mpe, christophe.leroy, arnd, kbuild test robot,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	linux-kernel, clang-built-linux

On Mon, Jul 29, 2019 at 01:32:46PM -0700, Nathan Chancellor wrote:
> For the record:
> 
> https://godbolt.org/z/z57VU7
> 
> This seems consistent with what Michael found so I don't think a revert
> is entirely unreasonable.

Try this:

  https://godbolt.org/z/6_ZfVi

This matters in non-trivial loops, for example.  But all current cases
where such non-trivial loops are done with cache block instructions are
actually written in real assembler already, using two registers.
Because performance matters.  Not that I recommend writing code as
critical as memset in C with inline asm :-)


Segher

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz
  2019-07-29 20:32 ` Nathan Chancellor
  2019-07-29 20:45   ` Nick Desaulniers
  2019-07-29 21:52   ` Segher Boessenkool
@ 2019-07-30  5:31   ` Christophe Leroy
  2 siblings, 0 replies; 27+ messages in thread
From: Christophe Leroy @ 2019-07-30  5:31 UTC (permalink / raw)
  To: Nathan Chancellor, Nick Desaulniers
  Cc: mpe, segher, arnd, kbuild test robot, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, linux-kernel, clang-built-linux



Le 29/07/2019 à 22:32, Nathan Chancellor a écrit :
> On Mon, Jul 29, 2019 at 01:25:41PM -0700, Nick Desaulniers wrote:
>> Commit 6c5875843b87 ("powerpc: slightly improve cache helpers") exposed
>> what looks like a codegen bug in Clang's handling of `%y` output
>> template with `Z` constraint. This is resulting in panics during boot
>> for 32b powerpc builds w/ Clang, as reported by our CI.
>>
>> Add back the original code that worked behind a preprocessor check for
>> __clang__ until we can fix LLVM.
>>
>> Further, it seems that clang allnoconfig builds are unhappy with `Z`, as
>> reported by 0day bot. This is likely because Clang warns about inline
>> asm constraints when the constraint requires inlining to be semantically
>> valid.
>>
>> Link: https://bugs.llvm.org/show_bug.cgi?id=42762
>> Link: https://github.com/ClangBuiltLinux/linux/issues/593
>> Link: https://lore.kernel.org/lkml/20190721075846.GA97701@archlinux-threadripper/
>> Debugged-by: Nathan Chancellor <natechancellor@gmail.com>
>> Reported-by: Nathan Chancellor <natechancellor@gmail.com>
>> Reported-by: kbuild test robot <lkp@intel.com>
>> Suggested-by: Nathan Chancellor <natechancellor@gmail.com>
>> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
>> ---
>> Alternatively, we could just revert 6c5875843b87. It seems that GCC
>> generates the same code for these functions for out of line versions.
>> But I'm not sure how the inlined code generated would be affected.
> 
> For the record:
> 
> https://godbolt.org/z/z57VU7
> 
> This seems consistent with what Michael found so I don't think a revert
> is entirely unreasonable.

Your example functions are too simple to show anything. The functions 
takes only one parameter so of course GCC won't use two registers 
allthough given the opportunity.

Christophe

> 
> Either way:
> 
> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz
  2019-07-29 21:52   ` Segher Boessenkool
@ 2019-07-30  7:34     ` Arnd Bergmann
  2019-07-30 11:17       ` Michael Ellerman
  2019-07-30 13:48       ` [PATCH] powerpc: workaround clang codegen bug in dcbz Segher Boessenkool
  0 siblings, 2 replies; 27+ messages in thread
From: Arnd Bergmann @ 2019-07-30  7:34 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Nathan Chancellor, Nick Desaulniers, Michael Ellerman,
	christophe leroy, kbuild test robot, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, Linux Kernel Mailing List,
	clang-built-linux

On Mon, Jul 29, 2019 at 11:52 PM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> On Mon, Jul 29, 2019 at 01:32:46PM -0700, Nathan Chancellor wrote:
> > For the record:
> >
> > https://godbolt.org/z/z57VU7
> >
> > This seems consistent with what Michael found so I don't think a revert
> > is entirely unreasonable.
>
> Try this:
>
>   https://godbolt.org/z/6_ZfVi
>
> This matters in non-trivial loops, for example.  But all current cases
> where such non-trivial loops are done with cache block instructions are
> actually written in real assembler already, using two registers.
> Because performance matters.  Not that I recommend writing code as
> critical as memset in C with inline asm :-)

Upon a second look, I think the issue is that the "Z" is an input argument
when it should be an output. clang decides that it can make a copy of the
input and pass that into the inline asm. This is not the most efficient
way, but it seems entirely correct according to the constraints.

Changing it to an output "=Z" constraint seems to make it work:

https://godbolt.org/z/FwEqHf

Clang still doesn't use the optimum form, but it passes the correct pointer.

       Arnd

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz
  2019-07-30  7:34     ` Arnd Bergmann
@ 2019-07-30 11:17       ` Michael Ellerman
  2019-08-09 18:21         ` [PATCH] powerpc: fix inline asm constraints for dcbz Nick Desaulniers
  2019-07-30 13:48       ` [PATCH] powerpc: workaround clang codegen bug in dcbz Segher Boessenkool
  1 sibling, 1 reply; 27+ messages in thread
From: Michael Ellerman @ 2019-07-30 11:17 UTC (permalink / raw)
  To: Arnd Bergmann, Segher Boessenkool
  Cc: Nathan Chancellor, Nick Desaulniers, christophe leroy,
	kbuild test robot, Benjamin Herrenschmidt, Paul Mackerras,
	linuxppc-dev, Linux Kernel Mailing List, clang-built-linux

Arnd Bergmann <arnd@arndb.de> writes:
> On Mon, Jul 29, 2019 at 11:52 PM Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
>> On Mon, Jul 29, 2019 at 01:32:46PM -0700, Nathan Chancellor wrote:
>> > For the record:
>> >
>> > https://godbolt.org/z/z57VU7
>> >
>> > This seems consistent with what Michael found so I don't think a revert
>> > is entirely unreasonable.
>>
>> Try this:
>>
>>   https://godbolt.org/z/6_ZfVi
>>
>> This matters in non-trivial loops, for example.  But all current cases
>> where such non-trivial loops are done with cache block instructions are
>> actually written in real assembler already, using two registers.
>> Because performance matters.  Not that I recommend writing code as
>> critical as memset in C with inline asm :-)
>
> Upon a second look, I think the issue is that the "Z" is an input argument
> when it should be an output. clang decides that it can make a copy of the
> input and pass that into the inline asm. This is not the most efficient
> way, but it seems entirely correct according to the constraints.
>
> Changing it to an output "=Z" constraint seems to make it work:
>
> https://godbolt.org/z/FwEqHf
>
> Clang still doesn't use the optimum form, but it passes the correct pointer.

Thanks Arnd. This seems like a better solution.

I'll drop the revert I have staged.

Segher does this look OK to you?

Nathan/Nick, are one of you able to test this with your clang CI?

cheers

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz
  2019-07-30  7:34     ` Arnd Bergmann
  2019-07-30 11:17       ` Michael Ellerman
@ 2019-07-30 13:48       ` Segher Boessenkool
  2019-07-30 14:30         ` Arnd Bergmann
  1 sibling, 1 reply; 27+ messages in thread
From: Segher Boessenkool @ 2019-07-30 13:48 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Nathan Chancellor, Nick Desaulniers, Michael Ellerman,
	christophe leroy, kbuild test robot, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, Linux Kernel Mailing List,
	clang-built-linux

On Tue, Jul 30, 2019 at 09:34:28AM +0200, Arnd Bergmann wrote:
> Upon a second look, I think the issue is that the "Z" is an input argument
> when it should be an output. clang decides that it can make a copy of the
> input and pass that into the inline asm. This is not the most efficient
> way, but it seems entirely correct according to the constraints.

Most dcb* (and all icb*) do not change the memory pointed to.  The
memory is an input here, logically as well, and that is obvious.

> Changing it to an output "=Z" constraint seems to make it work:
> 
> https://godbolt.org/z/FwEqHf
> 
> Clang still doesn't use the optimum form, but it passes the correct pointer.

As I said many times already, LLVM does not seem to treat all asm
operands as lvalues.  That is a bug.  And it is critical for memory
operands for example, as should be obvious if you look at at for a few
seconds (you pass *that* memory, not a copy of it).  The thing you pass
has an identity.  It's an lvalue.  This is true for *all* inline asm
operands, not just output operands and memory operands, but it is most
obvious there.

Or, LLVM might have a bug elsewhere.

Either way, the asm is fine, and it has worked fine in GCC since
forever.  Changing this constraint to be an output constraint would
just be obfuscation (we could change *all* operands to *everything* to
be inout ("+") constraints, and it won't affect correctness, just the
reader's sanity).


Segher

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz
  2019-07-30 13:48       ` [PATCH] powerpc: workaround clang codegen bug in dcbz Segher Boessenkool
@ 2019-07-30 14:30         ` Arnd Bergmann
  2019-07-30 16:16           ` Segher Boessenkool
  0 siblings, 1 reply; 27+ messages in thread
From: Arnd Bergmann @ 2019-07-30 14:30 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Nathan Chancellor, Nick Desaulniers, Michael Ellerman,
	christophe leroy, kbuild test robot, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, Linux Kernel Mailing List,
	clang-built-linux

On Tue, Jul 30, 2019 at 3:49 PM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> On Tue, Jul 30, 2019 at 09:34:28AM +0200, Arnd Bergmann wrote:
> > Upon a second look, I think the issue is that the "Z" is an input argument
> > when it should be an output. clang decides that it can make a copy of the
> > input and pass that into the inline asm. This is not the most efficient
> > way, but it seems entirely correct according to the constraints.
>
> Most dcb* (and all icb*) do not change the memory pointed to.  The
> memory is an input here, logically as well, and that is obvious.

Ah, right. I had only thought of dcbz here, but you are right that using
an output makes little sense for the others.

readl() is another example where powerpc currently uses "Z" for an
input, which illustrates this even better.

> > Changing it to an output "=Z" constraint seems to make it work:
> >
> > https://godbolt.org/z/FwEqHf
> >
> > Clang still doesn't use the optimum form, but it passes the correct pointer.
>
> As I said many times already, LLVM does not seem to treat all asm
> operands as lvalues.  That is a bug.  And it is critical for memory
> operands for example, as should be obvious if you look at at for a few
> seconds (you pass *that* memory, not a copy of it).  The thing you pass
> has an identity.  It's an lvalue.  This is true for *all* inline asm
> operands, not just output operands and memory operands, but it is most
> obvious there.

From experimentation, I would guess that llvm handles "m" correctly, but
not "Z". See https://godbolt.org/z/uqfDx_ for another example.

> Or, LLVM might have a bug elsewhere.
>
> Either way, the asm is fine, and it has worked fine in GCC since
> forever.  Changing this constraint to be an output constraint would
> just be obfuscation (we could change *all* operands to *everything* to
> be inout ("+") constraints, and it won't affect correctness, just the
> reader's sanity).

I would still argue that for dcbz specifically, an output makes more
sense than an input, but as you say that does not solve the others.

        Arnd

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz
  2019-07-30 14:30         ` Arnd Bergmann
@ 2019-07-30 16:16           ` Segher Boessenkool
  2019-07-30 17:07             ` Segher Boessenkool
  2019-07-30 18:24             ` Arnd Bergmann
  0 siblings, 2 replies; 27+ messages in thread
From: Segher Boessenkool @ 2019-07-30 16:16 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Nathan Chancellor, Nick Desaulniers, Michael Ellerman,
	christophe leroy, kbuild test robot, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, Linux Kernel Mailing List,
	clang-built-linux

On Tue, Jul 30, 2019 at 04:30:29PM +0200, Arnd Bergmann wrote:
> On Tue, Jul 30, 2019 at 3:49 PM Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> >
> > On Tue, Jul 30, 2019 at 09:34:28AM +0200, Arnd Bergmann wrote:
> > > Upon a second look, I think the issue is that the "Z" is an input argument
> > > when it should be an output. clang decides that it can make a copy of the
> > > input and pass that into the inline asm. This is not the most efficient
> > > way, but it seems entirely correct according to the constraints.
> >
> > Most dcb* (and all icb*) do not change the memory pointed to.  The
> > memory is an input here, logically as well, and that is obvious.
> 
> Ah, right. I had only thought of dcbz here, but you are right that using
> an output makes little sense for the others.
> 
> readl() is another example where powerpc currently uses "Z" for an
> input, which illustrates this even better.

in_le32 and friends?  Yeah, huh.  If LLVM copies that to the stack as
well, its (not byte reversing) read will be atomic just fine, so things
will still work correctly.

The things defined with DEF_MMIO_IN_D (instead of DEF_MMIO_IN_X) do not
look like they will work correctly if an update form address is chosen,
but that won't happen because the constraint is "m" instead of "m<>",
making the %Un pretty useless (it will always be the empty string).

> > As I said many times already, LLVM does not seem to treat all asm
> > operands as lvalues.  That is a bug.  And it is critical for memory
> > operands for example, as should be obvious if you look at at for a few
> > seconds (you pass *that* memory, not a copy of it).  The thing you pass
> > has an identity.  It's an lvalue.  This is true for *all* inline asm
> > operands, not just output operands and memory operands, but it is most
> > obvious there.
> 
> >From experimentation, I would guess that llvm handles "m" correctly, but
> not "Z". See https://godbolt.org/z/uqfDx_ for another example.

Yeah, it does not treat "Z" as a memory constraint apparently, and it
special cases output operands and memory operands to be lvalues, but
does not do that for everything else as it should.

> > Or, LLVM might have a bug elsewhere.
> >
> > Either way, the asm is fine, and it has worked fine in GCC since
> > forever.  Changing this constraint to be an output constraint would
> > just be obfuscation (we could change *all* operands to *everything* to
> > be inout ("+") constraints, and it won't affect correctness, just the
> > reader's sanity).
> 
> I would still argue that for dcbz specifically, an output makes more
> sense than an input, but as you say that does not solve the others.

An output would be somewhat misleading.  dcbz zeroes the whole aligned
cache block sized region of memory its operand points into.  The kernel
dcbz functions do not easily know the cache block size I think, and
besides, you want a "memory" clobber anyway, also for the other dcb*,
so it won't help anything.  Also, the compiler can almost never use the
extra info ("affects the aligned 32B or 128B block this points into")
usefully anyway; it will usually see it as "can alias pretty much
anything".  Just use a "memory" clobber :-/


Segher

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz
  2019-07-30 16:16           ` Segher Boessenkool
@ 2019-07-30 17:07             ` Segher Boessenkool
  2019-07-30 18:24               ` Arnd Bergmann
  2019-07-30 18:24             ` Arnd Bergmann
  1 sibling, 1 reply; 27+ messages in thread
From: Segher Boessenkool @ 2019-07-30 17:07 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: kbuild test robot, Nick Desaulniers, Linux Kernel Mailing List,
	clang-built-linux, Paul Mackerras, Nathan Chancellor,
	linuxppc-dev

On Tue, Jul 30, 2019 at 11:16:37AM -0500, Segher Boessenkool wrote:
> in_le32 and friends?  Yeah, huh.  If LLVM copies that to the stack as
> well, its (not byte reversing) read will be atomic just fine, so things
> will still work correctly.
> 
> The things defined with DEF_MMIO_IN_D (instead of DEF_MMIO_IN_X) do not
> look like they will work correctly if an update form address is chosen,
> but that won't happen because the constraint is "m" instead of "m<>",
> making the %Un pretty useless (it will always be the empty string).

Btw, this is true since GCC 4.8; before 4.8, plain "m" *could* have an
automodify (autoinc, autodec, etc.) side effect.  What is the minimum
GCC version required, these days?

https://gcc.gnu.org/PR44492
https://gcc.gnu.org/r161328


Segher

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz
  2019-07-30 16:16           ` Segher Boessenkool
  2019-07-30 17:07             ` Segher Boessenkool
@ 2019-07-30 18:24             ` Arnd Bergmann
  2019-07-30 19:35               ` Segher Boessenkool
  1 sibling, 1 reply; 27+ messages in thread
From: Arnd Bergmann @ 2019-07-30 18:24 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Nathan Chancellor, Nick Desaulniers, Michael Ellerman,
	christophe leroy, kbuild test robot, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, Linux Kernel Mailing List,
	clang-built-linux

On Tue, Jul 30, 2019 at 6:16 PM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> On Tue, Jul 30, 2019 at 04:30:29PM +0200, Arnd Bergmann wrote:
> > On Tue, Jul 30, 2019 at 3:49 PM Segher Boessenkool
> > <segher@kernel.crashing.org> wrote:
> > >
> > > On Tue, Jul 30, 2019 at 09:34:28AM +0200, Arnd Bergmann wrote:
> > > > Upon a second look, I think the issue is that the "Z" is an input argument
> > > > when it should be an output. clang decides that it can make a copy of the
> > > > input and pass that into the inline asm. This is not the most efficient
> > > > way, but it seems entirely correct according to the constraints.
> > >
> > > Most dcb* (and all icb*) do not change the memory pointed to.  The
> > > memory is an input here, logically as well, and that is obvious.
> >
> > Ah, right. I had only thought of dcbz here, but you are right that using
> > an output makes little sense for the others.
> >
> > readl() is another example where powerpc currently uses "Z" for an
> > input, which illustrates this even better.
>
> in_le32 and friends?  Yeah, huh.  If LLVM copies that to the stack as
> well, its (not byte reversing) read will be atomic just fine, so things
> will still work correctly.

byteorder is fine, the problem I was thinking of is when moving the load/store
instructions around the barriers that synchronize with DMA, or turning
them into different-size accesses. Changing two consecutive 16-bit mmio reads
into an unaligned 32-bit read will rarely have the intended effect ;-)

       Arnd

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz
  2019-07-30 17:07             ` Segher Boessenkool
@ 2019-07-30 18:24               ` Arnd Bergmann
  0 siblings, 0 replies; 27+ messages in thread
From: Arnd Bergmann @ 2019-07-30 18:24 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: kbuild test robot, Nick Desaulniers, Linux Kernel Mailing List,
	clang-built-linux, Paul Mackerras, Nathan Chancellor,
	linuxppc-dev

On Tue, Jul 30, 2019 at 7:07 PM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> On Tue, Jul 30, 2019 at 11:16:37AM -0500, Segher Boessenkool wrote:
> > in_le32 and friends?  Yeah, huh.  If LLVM copies that to the stack as
> > well, its (not byte reversing) read will be atomic just fine, so things
> > will still work correctly.
> >
> > The things defined with DEF_MMIO_IN_D (instead of DEF_MMIO_IN_X) do not
> > look like they will work correctly if an update form address is chosen,
> > but that won't happen because the constraint is "m" instead of "m<>",
> > making the %Un pretty useless (it will always be the empty string).
>
> Btw, this is true since GCC 4.8; before 4.8, plain "m" *could* have an
> automodify (autoinc, autodec, etc.) side effect.  What is the minimum
> GCC version required, these days?

gcc-4.6, but an architecture can require a higher version.

         Arnd

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] powerpc: workaround clang codegen bug in dcbz
  2019-07-30 18:24             ` Arnd Bergmann
@ 2019-07-30 19:35               ` Segher Boessenkool
  0 siblings, 0 replies; 27+ messages in thread
From: Segher Boessenkool @ 2019-07-30 19:35 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Nathan Chancellor, Nick Desaulniers, Michael Ellerman,
	christophe leroy, kbuild test robot, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, Linux Kernel Mailing List,
	clang-built-linux

On Tue, Jul 30, 2019 at 08:24:14PM +0200, Arnd Bergmann wrote:
> On Tue, Jul 30, 2019 at 6:16 PM Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> > in_le32 and friends?  Yeah, huh.  If LLVM copies that to the stack as
> > well, its (not byte reversing) read will be atomic just fine, so things
> > will still work correctly.
> 
> byteorder is fine, the problem I was thinking of is when moving the load/store
> instructions around the barriers that synchronize with DMA, or turning
> them into different-size accesses. Changing two consecutive 16-bit mmio reads
> into an unaligned 32-bit read will rarely have the intended effect ;-)

Most such barriers will also work on the copy accesses, I think.  But
yes it depends on exactly how it is written.  The {in,out}_{be,le}<N>
ones use sync;store for out and sync;load;trap;isync for in, so they
should be safe ;-)

(Well, almost -- writes to I/O will not necessarily actually happen
before other stores, not from these macros alone at least).

Should be pretty easy to check what LLVM makes of this?


Segher

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH] powerpc: fix inline asm constraints for dcbz
  2019-07-30 11:17       ` Michael Ellerman
@ 2019-08-09 18:21         ` Nick Desaulniers
  2019-08-09 18:28           ` Arnd Bergmann
  2019-08-09 20:36           ` Nathan Chancellor
  0 siblings, 2 replies; 27+ messages in thread
From: Nick Desaulniers @ 2019-08-09 18:21 UTC (permalink / raw)
  To: mpe
  Cc: christophe.leroy, segher, arnd, Nick Desaulniers,
	Nathan Chancellor, kbuild test robot, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, linux-kernel, clang-built-linux

The input parameter is modified, so it should be an output parameter
with "=" to make it so that a copy of the input is not made by Clang.

Link: https://bugs.llvm.org/show_bug.cgi?id=42762
Link: https://gcc.gnu.org/onlinedocs/gcc/Modifiers.html#Modifiers
Link: https://github.com/ClangBuiltLinux/linux/issues/593
Link: https://godbolt.org/z/QwhZXi
Link: https://lore.kernel.org/lkml/20190721075846.GA97701@archlinux-threadripper/
Fixes: 6c5875843b87 ("powerpc: slightly improve cache helpers")
Debugged-by: Nathan Chancellor <natechancellor@gmail.com>
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Reported-by: kbuild test robot <lkp@intel.com>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
---
Green CI run:
https://travis-ci.com/ClangBuiltLinux/continuous-integration/builds/122521976
https://github.com/ClangBuiltLinux/continuous-integration/pull/197/files#diff-40bd16e3188587e4d648c30e0c2d6d37

 arch/powerpc/include/asm/cache.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h
index b3388d95f451..5a0df6a1b9dc 100644
--- a/arch/powerpc/include/asm/cache.h
+++ b/arch/powerpc/include/asm/cache.h
@@ -107,22 +107,22 @@ extern void _set_L3CR(unsigned long);
 
 static inline void dcbz(void *addr)
 {
-	__asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory");
+	__asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory");
 }
 
 static inline void dcbi(void *addr)
 {
-	__asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory");
+	__asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory");
 }
 
 static inline void dcbf(void *addr)
 {
-	__asm__ __volatile__ ("dcbf %y0" : : "Z"(*(u8 *)addr) : "memory");
+	__asm__ __volatile__ ("dcbf %y0" : "=Z"(*(u8 *)addr) :: "memory");
 }
 
 static inline void dcbst(void *addr)
 {
-	__asm__ __volatile__ ("dcbst %y0" : : "Z"(*(u8 *)addr) : "memory");
+	__asm__ __volatile__ ("dcbst %y0" : "=Z"(*(u8 *)addr) :: "memory");
 }
 #endif /* !__ASSEMBLY__ */
 #endif /* __KERNEL__ */
-- 
2.23.0.rc1.153.gdeed80330f-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] powerpc: fix inline asm constraints for dcbz
  2019-08-09 18:21         ` [PATCH] powerpc: fix inline asm constraints for dcbz Nick Desaulniers
@ 2019-08-09 18:28           ` Arnd Bergmann
  2019-08-09 20:03             ` Christophe Leroy
  2019-08-09 21:55             ` [PATCH] powerpc: fix inline asm constraints for dcbz Segher Boessenkool
  2019-08-09 20:36           ` Nathan Chancellor
  1 sibling, 2 replies; 27+ messages in thread
From: Arnd Bergmann @ 2019-08-09 18:28 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Michael Ellerman, christophe leroy, Segher Boessenkool,
	Nathan Chancellor, kbuild test robot, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, Linux Kernel Mailing List,
	clang-built-linux

On Fri, Aug 9, 2019 at 8:21 PM 'Nick Desaulniers' via Clang Built
Linux <clang-built-linux@googlegroups.com> wrote:

>  static inline void dcbz(void *addr)
>  {
> -       __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory");
> +       __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory");
>  }
>
>  static inline void dcbi(void *addr)
>  {
> -       __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory");
> +       __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory");
>  }

I think the result of the discussion was that an output argument only kind-of
makes sense for dcbz, but for the others it's really an input, and clang is
wrong in the way it handles the "Z" constraint by making a copy, which it
doesn't do for "m".

I'm not sure whether it's correct to use "m" instead of "Z" here, which
would be a better workaround if that works. More importantly though,
clang really needs to be fixed to handle "Z" correctly.

        Arnd

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] powerpc: fix inline asm constraints for dcbz
  2019-08-09 18:28           ` Arnd Bergmann
@ 2019-08-09 20:03             ` Christophe Leroy
  2019-08-09 20:12               ` Arnd Bergmann
                                 ` (2 more replies)
  2019-08-09 21:55             ` [PATCH] powerpc: fix inline asm constraints for dcbz Segher Boessenkool
  1 sibling, 3 replies; 27+ messages in thread
From: Christophe Leroy @ 2019-08-09 20:03 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: clang-built-linux, Linux Kernel Mailing List, linuxppc-dev,
	Paul Mackerras, Benjamin Herrenschmidt, kbuild test robot,
	Nathan Chancellor, Segher Boessenkool, Michael Ellerman,
	Nick Desaulniers

Arnd Bergmann <arnd@arndb.de> a écrit :

> On Fri, Aug 9, 2019 at 8:21 PM 'Nick Desaulniers' via Clang Built
> Linux <clang-built-linux@googlegroups.com> wrote:
>
>>  static inline void dcbz(void *addr)
>>  {
>> -       __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory");
>> +       __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory");
>>  }
>>
>>  static inline void dcbi(void *addr)
>>  {
>> -       __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory");
>> +       __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory");
>>  }
>
> I think the result of the discussion was that an output argument only kind-of
> makes sense for dcbz, but for the others it's really an input, and clang is
> wrong in the way it handles the "Z" constraint by making a copy, which it
> doesn't do for "m".
>
> I'm not sure whether it's correct to use "m" instead of "Z" here, which
> would be a better workaround if that works. More importantly though,
> clang really needs to be fixed to handle "Z" correctly.

As the benefit is null, I think the best is probably to reverse my  
original commit until at least CLang is fixed, as initialy suggested  
by mpe

Christophe




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] powerpc: fix inline asm constraints for dcbz
  2019-08-09 20:03             ` Christophe Leroy
@ 2019-08-09 20:12               ` Arnd Bergmann
  2019-08-09 22:03                 ` Nick Desaulniers
  2019-08-09 22:10                 ` Segher Boessenkool
  2019-08-09 22:00               ` Segher Boessenkool
  2019-08-09 22:03               ` [PATCH v3] Revert "powerpc: slightly improve cache helpers" Nick Desaulniers
  2 siblings, 2 replies; 27+ messages in thread
From: Arnd Bergmann @ 2019-08-09 20:12 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: clang-built-linux, Linux Kernel Mailing List, linuxppc-dev,
	Paul Mackerras, Benjamin Herrenschmidt, kbuild test robot,
	Nathan Chancellor, Segher Boessenkool, Michael Ellerman,
	Nick Desaulniers

On Fri, Aug 9, 2019 at 10:02 PM Christophe Leroy
<christophe.leroy@c-s.fr> wrote:
>
> Arnd Bergmann <arnd@arndb.de> a écrit :
> > On Fri, Aug 9, 2019 at 8:21 PM 'Nick Desaulniers' via Clang Built
> > Linux <clang-built-linux@googlegroups.com> wrote:
> >
> >>  static inline void dcbz(void *addr)
> >>  {
> >> -       __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory");
> >> +       __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory");
> >>  }
> >>
> >>  static inline void dcbi(void *addr)
> >>  {
> >> -       __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory");
> >> +       __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory");
> >>  }
> >
> > I think the result of the discussion was that an output argument only kind-of
> > makes sense for dcbz, but for the others it's really an input, and clang is
> > wrong in the way it handles the "Z" constraint by making a copy, which it
> > doesn't do for "m".
> >
> > I'm not sure whether it's correct to use "m" instead of "Z" here, which
> > would be a better workaround if that works. More importantly though,
> > clang really needs to be fixed to handle "Z" correctly.
>
> As the benefit is null, I think the best is probably to reverse my
> original commit until at least CLang is fixed, as initialy suggested
> by mpe

Yes, makes sense.

There is one other use of the "Z" constraint, so on top of the revert, I
think it might be helpful if Nick could check if the patch below makes
any difference with clang and, if it does, whether the current version
is broken.

       Arnd

diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index 23e5d5d16c7e..28b467779328 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -106,7 +106,7 @@ static inline u##size name(const volatile u##size
__iomem *addr)    \
 {                                                                      \
        u##size ret;                                                    \
        __asm__ __volatile__("sync;"#insn" %0,%y1;twi 0,%0,0;isync"     \
-               : "=r" (ret) : "Z" (*addr) : "memory");                 \
+               : "=r" (ret) : "m" (*addr) : "memory");                 \
        return ret;                                                     \
 }

@@ -114,7 +114,7 @@ static inline u##size name(const volatile u##size
__iomem *addr)    \
 static inline void name(volatile u##size __iomem *addr, u##size val)   \
 {                                                                      \
        __asm__ __volatile__("sync;"#insn" %1,%y0"                      \
-               : "=Z" (*addr) : "r" (val) : "memory");                 \
+               : "=m" (*addr) : "r" (val) : "memory");                 \
        mmiowb_set_pending();                                           \
 }

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] powerpc: fix inline asm constraints for dcbz
  2019-08-09 18:21         ` [PATCH] powerpc: fix inline asm constraints for dcbz Nick Desaulniers
  2019-08-09 18:28           ` Arnd Bergmann
@ 2019-08-09 20:36           ` Nathan Chancellor
  1 sibling, 0 replies; 27+ messages in thread
From: Nathan Chancellor @ 2019-08-09 20:36 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: mpe, christophe.leroy, segher, arnd, kbuild test robot,
	Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev,
	linux-kernel, clang-built-linux

[-- Attachment #1: Type: text/plain, Size: 1151 bytes --]

On Fri, Aug 09, 2019 at 11:21:05AM -0700, Nick Desaulniers wrote:
> The input parameter is modified, so it should be an output parameter
> with "=" to make it so that a copy of the input is not made by Clang.
> 
> Link: https://bugs.llvm.org/show_bug.cgi?id=42762
> Link: https://gcc.gnu.org/onlinedocs/gcc/Modifiers.html#Modifiers
> Link: https://github.com/ClangBuiltLinux/linux/issues/593
> Link: https://godbolt.org/z/QwhZXi
> Link: https://lore.kernel.org/lkml/20190721075846.GA97701@archlinux-threadripper/
> Fixes: 6c5875843b87 ("powerpc: slightly improve cache helpers")
> Debugged-by: Nathan Chancellor <natechancellor@gmail.com>
> Reported-by: Nathan Chancellor <natechancellor@gmail.com>
> Reported-by: kbuild test robot <lkp@intel.com>
> Suggested-by: Arnd Bergmann <arnd@arndb.de>
> Suggested-by: Nathan Chancellor <natechancellor@gmail.com>
> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

I applied this patch as well as a revert of the original patch and both
clang and GCC appear to generate the same code; I think a straight
revert would be better.

Crude testing script and the generated files attached.

Cheers,
Nathan

[-- Attachment #2: tmp.bRmcRT0jd0.sh --]
[-- Type: application/x-sh, Size: 2707 bytes --]

[-- Attachment #3: testing-output.tar.gz --]
[-- Type: application/gzip, Size: 16412 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] powerpc: fix inline asm constraints for dcbz
  2019-08-09 18:28           ` Arnd Bergmann
  2019-08-09 20:03             ` Christophe Leroy
@ 2019-08-09 21:55             ` Segher Boessenkool
  1 sibling, 0 replies; 27+ messages in thread
From: Segher Boessenkool @ 2019-08-09 21:55 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Nick Desaulniers, Michael Ellerman, christophe leroy,
	Nathan Chancellor, kbuild test robot, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, Linux Kernel Mailing List,
	clang-built-linux

On Fri, Aug 09, 2019 at 08:28:19PM +0200, Arnd Bergmann wrote:
> On Fri, Aug 9, 2019 at 8:21 PM 'Nick Desaulniers' via Clang Built
> Linux <clang-built-linux@googlegroups.com> wrote:
> 
> >  static inline void dcbz(void *addr)
> >  {
> > -       __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory");
> > +       __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory");
> >  }
> >
> >  static inline void dcbi(void *addr)
> >  {
> > -       __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory");
> > +       __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory");
> >  }
> 
> I think the result of the discussion was that an output argument only kind-of
> makes sense for dcbz, but for the others it's really an input, and clang is
> wrong in the way it handles the "Z" constraint by making a copy, which it
> doesn't do for "m".

Yes.  And clang has probably miscompiled this in all kernels since we
have used "Z" for the first time, in 2008 (0f3d6bcd391b).

It is not necessarily fatal or at least not easily visible for the I/O
accessors: it "just" gets memory ordering wrong slightly (it looks like
it does the sync;tw;isync thing around an extra stack access, after it
has performed the actual I/O as any other memory load, without any
synchronisation).

> I'm not sure whether it's correct to use "m" instead of "Z" here, which
> would be a better workaround if that works. More importantly though,
> clang really needs to be fixed to handle "Z" correctly.

"m" allows offset addressing, which these insns do not.  That is the
same reason you need the "y" output modifier.  "m" is wrong here.

We have other memory constraints, but do those work with LLVM?


Segher

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] powerpc: fix inline asm constraints for dcbz
  2019-08-09 20:03             ` Christophe Leroy
  2019-08-09 20:12               ` Arnd Bergmann
@ 2019-08-09 22:00               ` Segher Boessenkool
  2019-08-09 22:03               ` [PATCH v3] Revert "powerpc: slightly improve cache helpers" Nick Desaulniers
  2 siblings, 0 replies; 27+ messages in thread
From: Segher Boessenkool @ 2019-08-09 22:00 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Arnd Bergmann, clang-built-linux, Linux Kernel Mailing List,
	linuxppc-dev, Paul Mackerras, Benjamin Herrenschmidt,
	kbuild test robot, Nathan Chancellor, Michael Ellerman,
	Nick Desaulniers

On Fri, Aug 09, 2019 at 10:03:01PM +0200, Christophe Leroy wrote:
> Arnd Bergmann <arnd@arndb.de> a écrit :
> 
> >On Fri, Aug 9, 2019 at 8:21 PM 'Nick Desaulniers' via Clang Built
> >Linux <clang-built-linux@googlegroups.com> wrote:
> >
> >> static inline void dcbz(void *addr)
> >> {
> >>-       __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory");
> >>+       __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory");
> >> }
> >>
> >> static inline void dcbi(void *addr)
> >> {
> >>-       __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory");
> >>+       __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory");
> >> }
> >
> >I think the result of the discussion was that an output argument only 
> >kind-of
> >makes sense for dcbz, but for the others it's really an input, and clang is
> >wrong in the way it handles the "Z" constraint by making a copy, which it
> >doesn't do for "m".
> >
> >I'm not sure whether it's correct to use "m" instead of "Z" here, which
> >would be a better workaround if that works. More importantly though,
> >clang really needs to be fixed to handle "Z" correctly.
> 
> As the benefit is null, I think the best is probably to reverse my  
> original commit until at least CLang is fixed, as initialy suggested  
> by mpe

And what about the other uses of "Z"?


Also, if you use C routines (instead of assembler code) for the basic
"clear a block" and the like routines, as there have been patches for
recently, the benefit is not zero.


Segher

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] powerpc: fix inline asm constraints for dcbz
  2019-08-09 20:12               ` Arnd Bergmann
@ 2019-08-09 22:03                 ` Nick Desaulniers
  2019-08-09 22:10                 ` Segher Boessenkool
  1 sibling, 0 replies; 27+ messages in thread
From: Nick Desaulniers @ 2019-08-09 22:03 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Christophe Leroy, clang-built-linux, Linux Kernel Mailing List,
	linuxppc-dev, Paul Mackerras, Benjamin Herrenschmidt,
	kbuild test robot, Nathan Chancellor, Segher Boessenkool,
	Michael Ellerman

On Fri, Aug 9, 2019 at 1:13 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Fri, Aug 9, 2019 at 10:02 PM Christophe Leroy
> <christophe.leroy@c-s.fr> wrote:
> >
> > Arnd Bergmann <arnd@arndb.de> a écrit :
> > > On Fri, Aug 9, 2019 at 8:21 PM 'Nick Desaulniers' via Clang Built
> > > Linux <clang-built-linux@googlegroups.com> wrote:
> > >
> > >>  static inline void dcbz(void *addr)
> > >>  {
> > >> -       __asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory");
> > >> +       __asm__ __volatile__ ("dcbz %y0" : "=Z"(*(u8 *)addr) :: "memory");
> > >>  }
> > >>
> > >>  static inline void dcbi(void *addr)
> > >>  {
> > >> -       __asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory");
> > >> +       __asm__ __volatile__ ("dcbi %y0" : "=Z"(*(u8 *)addr) :: "memory");
> > >>  }
> > >
> > > I think the result of the discussion was that an output argument only kind-of
> > > makes sense for dcbz, but for the others it's really an input, and clang is
> > > wrong in the way it handles the "Z" constraint by making a copy, which it
> > > doesn't do for "m".
> > >
> > > I'm not sure whether it's correct to use "m" instead of "Z" here, which
> > > would be a better workaround if that works. More importantly though,
> > > clang really needs to be fixed to handle "Z" correctly.
> >
> > As the benefit is null, I think the best is probably to reverse my
> > original commit until at least CLang is fixed, as initialy suggested
> > by mpe
>
> Yes, makes sense.
>
> There is one other use of the "Z" constraint, so on top of the revert, I
> think it might be helpful if Nick could check if the patch below makes
> any difference with clang and, if it does, whether the current version
> is broken.
>
>        Arnd
>
> diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
> index 23e5d5d16c7e..28b467779328 100644
> --- a/arch/powerpc/include/asm/io.h
> +++ b/arch/powerpc/include/asm/io.h
> @@ -106,7 +106,7 @@ static inline u##size name(const volatile u##size
> __iomem *addr)    \
>  {                                                                      \
>         u##size ret;                                                    \
>         __asm__ __volatile__("sync;"#insn" %0,%y1;twi 0,%0,0;isync"     \
> -               : "=r" (ret) : "Z" (*addr) : "memory");                 \
> +               : "=r" (ret) : "m" (*addr) : "memory");                 \
>         return ret;                                                     \
>  }
>
> @@ -114,7 +114,7 @@ static inline u##size name(const volatile u##size
> __iomem *addr)    \
>  static inline void name(volatile u##size __iomem *addr, u##size val)   \
>  {                                                                      \
>         __asm__ __volatile__("sync;"#insn" %1,%y0"                      \
> -               : "=Z" (*addr) : "r" (val) : "memory");                 \
> +               : "=m" (*addr) : "r" (val) : "memory");                 \
>         mmiowb_set_pending();                                           \
>  }

Does not work:
https://travis-ci.com/ClangBuiltLinux/continuous-integration/builds/122654899
https://github.com/ClangBuiltLinux/continuous-integration/pull/197/files#diff-40bd16e3188587e4d648c30e0c2d6d37

-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3] Revert "powerpc: slightly improve cache helpers"
  2019-08-09 20:03             ` Christophe Leroy
  2019-08-09 20:12               ` Arnd Bergmann
  2019-08-09 22:00               ` Segher Boessenkool
@ 2019-08-09 22:03               ` Nick Desaulniers
  2019-08-10  9:09                 ` Michael Ellerman
  2 siblings, 1 reply; 27+ messages in thread
From: Nick Desaulniers @ 2019-08-09 22:03 UTC (permalink / raw)
  To: mpe
  Cc: christophe.leroy, segher, arnd, Nick Desaulniers,
	Nathan Chancellor, kbuild test robot, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, linux-kernel, clang-built-linux

This reverts commit 6c5875843b87c3adea2beade9d1b8b3d4523900a.

Work around Clang bug preventing ppc32 from booting.

Link: https://bugs.llvm.org/show_bug.cgi?id=42762
Link: https://github.com/ClangBuiltLinux/linux/issues/593
Debugged-by: Nathan Chancellor <natechancellor@gmail.com>
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Reported-by: kbuild test robot <lkp@intel.com>
Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
---
Changes V2 -> V3:
* Just revert, as per Christophe.
Changes V1 -> V2:
* Change to ouput paremeter.


 arch/powerpc/include/asm/cache.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h
index b3388d95f451..45e3137ccd71 100644
--- a/arch/powerpc/include/asm/cache.h
+++ b/arch/powerpc/include/asm/cache.h
@@ -107,22 +107,22 @@ extern void _set_L3CR(unsigned long);
 
 static inline void dcbz(void *addr)
 {
-	__asm__ __volatile__ ("dcbz %y0" : : "Z"(*(u8 *)addr) : "memory");
+	__asm__ __volatile__ ("dcbz 0, %0" : : "r"(addr) : "memory");
 }
 
 static inline void dcbi(void *addr)
 {
-	__asm__ __volatile__ ("dcbi %y0" : : "Z"(*(u8 *)addr) : "memory");
+	__asm__ __volatile__ ("dcbi 0, %0" : : "r"(addr) : "memory");
 }
 
 static inline void dcbf(void *addr)
 {
-	__asm__ __volatile__ ("dcbf %y0" : : "Z"(*(u8 *)addr) : "memory");
+	__asm__ __volatile__ ("dcbf 0, %0" : : "r"(addr) : "memory");
 }
 
 static inline void dcbst(void *addr)
 {
-	__asm__ __volatile__ ("dcbst %y0" : : "Z"(*(u8 *)addr) : "memory");
+	__asm__ __volatile__ ("dcbst 0, %0" : : "r"(addr) : "memory");
 }
 #endif /* !__ASSEMBLY__ */
 #endif /* __KERNEL__ */
-- 
2.23.0.rc1.153.gdeed80330f-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] powerpc: fix inline asm constraints for dcbz
  2019-08-09 20:12               ` Arnd Bergmann
  2019-08-09 22:03                 ` Nick Desaulniers
@ 2019-08-09 22:10                 ` Segher Boessenkool
  1 sibling, 0 replies; 27+ messages in thread
From: Segher Boessenkool @ 2019-08-09 22:10 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Christophe Leroy, clang-built-linux, Linux Kernel Mailing List,
	linuxppc-dev, Paul Mackerras, Benjamin Herrenschmidt,
	kbuild test robot, Nathan Chancellor, Michael Ellerman,
	Nick Desaulniers

On Fri, Aug 09, 2019 at 10:12:56PM +0200, Arnd Bergmann wrote:
> @@ -106,7 +106,7 @@ static inline u##size name(const volatile u##size
> __iomem *addr)    \
>  {                                                                      \
>         u##size ret;                                                    \
>         __asm__ __volatile__("sync;"#insn" %0,%y1;twi 0,%0,0;isync"     \
> -               : "=r" (ret) : "Z" (*addr) : "memory");                 \
> +               : "=r" (ret) : "m" (*addr) : "memory");                 \
>         return ret;                                                     \
>  }

That will no longer compile something like
  u8 *p;
  u16 x = in_le16(p + 12);
(you'll get something like "invalid %y value, try using the 'Z' constraint").

So then you remove the %y, but that makes you get something like
  sync;lhbrx 3,12(3);twi 0,3,0;isync
which is completely wrong.


Segher

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3] Revert "powerpc: slightly improve cache helpers"
  2019-08-09 22:03               ` [PATCH v3] Revert "powerpc: slightly improve cache helpers" Nick Desaulniers
@ 2019-08-10  9:09                 ` Michael Ellerman
  0 siblings, 0 replies; 27+ messages in thread
From: Michael Ellerman @ 2019-08-10  9:09 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: christophe.leroy, segher, arnd, Nick Desaulniers,
	Nathan Chancellor, kbuild test robot, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev, linux-kernel, clang-built-linux

Nick Desaulniers <ndesaulniers@google.com> writes:
> This reverts commit 6c5875843b87c3adea2beade9d1b8b3d4523900a.
>
> Work around Clang bug preventing ppc32 from booting.
>
> Link: https://bugs.llvm.org/show_bug.cgi?id=42762
> Link: https://github.com/ClangBuiltLinux/linux/issues/593
> Debugged-by: Nathan Chancellor <natechancellor@gmail.com>
> Reported-by: Nathan Chancellor <natechancellor@gmail.com>
> Reported-by: kbuild test robot <lkp@intel.com>
> Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr>
> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
> ---
> Changes V2 -> V3:
> * Just revert, as per Christophe.
> Changes V1 -> V2:
> * Change to ouput paremeter.

Thanks. I actually already had this revert in my tree since ~10 days
ago, but hadn't pushed it yet because the discussion was ongoing.

So I'll just use that version, and ask Linus to pull it.

cheers

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2019-08-10  9:09 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-29 20:25 [PATCH] powerpc: workaround clang codegen bug in dcbz Nick Desaulniers
2019-07-29 20:32 ` Nathan Chancellor
2019-07-29 20:45   ` Nick Desaulniers
2019-07-29 20:47     ` Nathan Chancellor
2019-07-29 20:49       ` Nick Desaulniers
2019-07-29 21:52   ` Segher Boessenkool
2019-07-30  7:34     ` Arnd Bergmann
2019-07-30 11:17       ` Michael Ellerman
2019-08-09 18:21         ` [PATCH] powerpc: fix inline asm constraints for dcbz Nick Desaulniers
2019-08-09 18:28           ` Arnd Bergmann
2019-08-09 20:03             ` Christophe Leroy
2019-08-09 20:12               ` Arnd Bergmann
2019-08-09 22:03                 ` Nick Desaulniers
2019-08-09 22:10                 ` Segher Boessenkool
2019-08-09 22:00               ` Segher Boessenkool
2019-08-09 22:03               ` [PATCH v3] Revert "powerpc: slightly improve cache helpers" Nick Desaulniers
2019-08-10  9:09                 ` Michael Ellerman
2019-08-09 21:55             ` [PATCH] powerpc: fix inline asm constraints for dcbz Segher Boessenkool
2019-08-09 20:36           ` Nathan Chancellor
2019-07-30 13:48       ` [PATCH] powerpc: workaround clang codegen bug in dcbz Segher Boessenkool
2019-07-30 14:30         ` Arnd Bergmann
2019-07-30 16:16           ` Segher Boessenkool
2019-07-30 17:07             ` Segher Boessenkool
2019-07-30 18:24               ` Arnd Bergmann
2019-07-30 18:24             ` Arnd Bergmann
2019-07-30 19:35               ` Segher Boessenkool
2019-07-30  5:31   ` Christophe Leroy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).