All of lore.kernel.org
 help / color / mirror / Atom feed
* More annoying code generation by clang
@ 2024-04-04 22:53 Linus Torvalds
  2024-04-06 10:56 ` Ingo Molnar
  2024-04-08  8:49 ` Peter Zijlstra
  0 siblings, 2 replies; 13+ messages in thread
From: Linus Torvalds @ 2024-04-04 22:53 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, Peter Anvin
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 3725 bytes --]

So this doesn't really matter in any real life situation, but it
really grated on me.

Clang has this nasty habit of taking our nice asm constraints, and
turning them into worst-case garbage. It's been reported a couple of
times where we use "g" to tell the compiler that pretty much any
source to the asm works, and then clang takes that to mean "I will
take that to use 'memory'" even when that makes no sense what-so-ever.

See for example

    https://lore.kernel.org/all/CAHk-=wgobnShg4c2yyMbk2p=U-wmnOmX_0=b3ZY_479Jjey2xw@mail.gmail.com/

where I was ranting about clang just doing pointlessly stupid things.

However, I found a case where yes, clang does pointlessly stupid
things, but it's at least _partly_ our fault, and gcc can't generate
optimal code either.

We have this fairly critical code in __fget_files_rcu() to look up a
'struct file *' from an fd, and it does this:

                /* Mask is a 0 for invalid fd's, ~0 for valid ones */
                nospec_mask = array_index_mask_nospec(fd, fdt->max_fds);

and clang makes a *horrid* mess of it, generating this code:

        movl    %edi, %r14d
        movq    32(%rbx), %rdx
        movl    (%rdx), %eax
        movq    %rax, 8(%rsp)
        cmpq    8(%rsp), %r14
        sbbq    %rcx, %rcx

which is just crazy. Notice how it does that "move rax to stack, then
do the compare against the stack", instead of just using %rax.

In fact, that function shouldn't have a stack frame at all, and the
only reason it is generated is because of this whole oddity.

All clang's fault, right?

Yeah, mostly. But it turns out that what really messes with clangs
little head is that the x86 array_index_mask_nospec() function is
being a bit annoying.

This is what we do:

  static __always_inline unsigned long
array_index_mask_nospec(unsigned long index,
                unsigned long size)
  {
        unsigned long mask;

        asm volatile ("cmp %1,%2; sbb %0,%0;"
                        :"=r" (mask)
                        :"g"(size),"r" (index)
                        :"cc");
        return mask;
  }

and look at the use again:

        nospec_mask = array_index_mask_nospec(fd, fdt->max_fds);

here all the values are actually 'unsigned int'. So what happens is
that clang can't just use the fdt->max_fds value *directly* from
memory, because it needs to be expanded from 32-bit to 64-bit because
we've made our array_index_mask_nospec() function only work on 64-bit
'unsigned long' values.

So it turns out that by massaging this a bit, and making it just be a
macro - so that the asm can decide that "I can do this in 32-bit" - I
can get clang to generate much better code.

Clang still absolutely hates the "g" constraint, so to get clang to
really get this right I have to use "ir" instead of "g". Which is
wrong. Because gcc does this right, and could use the memory op
directly. But even gcc cannot do that with our *current* function,
because of that "the memory value is 32-bit, we require a 64-bit
value"

Anyway, I can get gcc to generate the right code:

        movq    32(%r13), %rdx
        cmp (%rdx),%ebx
        sbb %esi,%esi

which is basically the right code for the six crazy instructions clang
generates. And if I make the "g" be "ir", I can get clang to generate

        movq    32(%rdi), %rcx
        movl    (%rcx), %eax
        cmpl    %eax, %esi
        sbbl    %esi, %esi

which is the same thing, but with that (pointless) load to a register.

And now clang doesn't generate that stack frame at all.

Anyway, this was a long email to explain the odd attached patch.

Comments? Note that this patch is *entirely* untested, I have done
this purely by looking at the code generation in fs/file.c.

                Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 1122 bytes --]

 arch/x86/include/asm/barrier.h | 23 +++++++++--------------
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 66e57c010392..6159d2cbbfde 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -33,20 +33,15 @@
  * Returns:
  *     0 - (index < size)
  */
-static __always_inline unsigned long array_index_mask_nospec(unsigned long index,
-		unsigned long size)
-{
-	unsigned long mask;
-
-	asm volatile ("cmp %1,%2; sbb %0,%0;"
-			:"=r" (mask)
-			:"g"(size),"r" (index)
-			:"cc");
-	return mask;
-}
-
-/* Override the default implementation from linux/nospec.h. */
-#define array_index_mask_nospec array_index_mask_nospec
+#define array_index_mask_nospec(idx,sz) ({	\
+	typeof((idx)+(sz)) __idx = (idx);	\
+	typeof(__idx) __sz = (sz);		\
+	typeof(__idx) __mask;			\
+	asm volatile ("cmp %1,%2; sbb %0,%0"	\
+			:"=r" (__mask)		\
+			:"ir"(__sz),"r" (__idx)	\
+			:"cc");			\
+	__mask; })
 
 /* Prevent speculative execution past this barrier. */
 #define barrier_nospec() asm volatile("lfence":::"memory")

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: More annoying code generation by clang
  2024-04-04 22:53 More annoying code generation by clang Linus Torvalds
@ 2024-04-06 10:56 ` Ingo Molnar
  2024-04-06 12:30   ` Uros Bizjak
  2024-04-08  8:49 ` Peter Zijlstra
  1 sibling, 1 reply; 13+ messages in thread
From: Ingo Molnar @ 2024-04-06 10:56 UTC (permalink / raw)
  To: Linus Torvalds, Nick Desaulniers, Nathan Chancellor, Uros Bizjak
  Cc: Thomas Gleixner, Peter Anvin, the arch/x86 maintainers,
	Linux Kernel Mailing List


[ I've Cc:-ed a few more people who might be interested in this. ]
[ Copy of Linus's email below. ]

* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> So this doesn't really matter in any real life situation, but it
> really grated on me.
> 
> Clang has this nasty habit of taking our nice asm constraints, and
> turning them into worst-case garbage. It's been reported a couple of
> times where we use "g" to tell the compiler that pretty much any
> source to the asm works, and then clang takes that to mean "I will
> take that to use 'memory'" even when that makes no sense what-so-ever.
> 
> See for example
> 
>     https://lore.kernel.org/all/CAHk-=wgobnShg4c2yyMbk2p=U-wmnOmX_0=b3ZY_479Jjey2xw@mail.gmail.com/
> 
> where I was ranting about clang just doing pointlessly stupid things.
> 
> However, I found a case where yes, clang does pointlessly stupid
> things, but it's at least _partly_ our fault, and gcc can't generate
> optimal code either.
> 
> We have this fairly critical code in __fget_files_rcu() to look up a
> 'struct file *' from an fd, and it does this:
> 
>                 /* Mask is a 0 for invalid fd's, ~0 for valid ones */
>                 nospec_mask = array_index_mask_nospec(fd, fdt->max_fds);
> 
> and clang makes a *horrid* mess of it, generating this code:
> 
>         movl    %edi, %r14d
>         movq    32(%rbx), %rdx
>         movl    (%rdx), %eax
>         movq    %rax, 8(%rsp)
>         cmpq    8(%rsp), %r14
>         sbbq    %rcx, %rcx
> 
> which is just crazy. Notice how it does that "move rax to stack, then
> do the compare against the stack", instead of just using %rax.
> 
> In fact, that function shouldn't have a stack frame at all, and the
> only reason it is generated is because of this whole oddity.
> 
> All clang's fault, right?
> 
> Yeah, mostly. But it turns out that what really messes with clangs
> little head is that the x86 array_index_mask_nospec() function is
> being a bit annoying.
> 
> This is what we do:
> 
>   static __always_inline unsigned long
> array_index_mask_nospec(unsigned long index,
>                 unsigned long size)
>   {
>         unsigned long mask;
> 
>         asm volatile ("cmp %1,%2; sbb %0,%0;"
>                         :"=r" (mask)
>                         :"g"(size),"r" (index)
>                         :"cc");
>         return mask;
>   }
> 
> and look at the use again:
> 
>         nospec_mask = array_index_mask_nospec(fd, fdt->max_fds);
> 
> here all the values are actually 'unsigned int'. So what happens is
> that clang can't just use the fdt->max_fds value *directly* from
> memory, because it needs to be expanded from 32-bit to 64-bit because
> we've made our array_index_mask_nospec() function only work on 64-bit
> 'unsigned long' values.
> 
> So it turns out that by massaging this a bit, and making it just be a
> macro - so that the asm can decide that "I can do this in 32-bit" - I
> can get clang to generate much better code.
> 
> Clang still absolutely hates the "g" constraint, so to get clang to
> really get this right I have to use "ir" instead of "g". Which is
> wrong. Because gcc does this right, and could use the memory op
> directly. But even gcc cannot do that with our *current* function,
> because of that "the memory value is 32-bit, we require a 64-bit
> value"
> 
> Anyway, I can get gcc to generate the right code:
> 
>         movq    32(%r13), %rdx
>         cmp (%rdx),%ebx
>         sbb %esi,%esi
> 
> which is basically the right code for the six crazy instructions clang
> generates. And if I make the "g" be "ir", I can get clang to generate
> 
>         movq    32(%rdi), %rcx
>         movl    (%rcx), %eax
>         cmpl    %eax, %esi
>         sbbl    %esi, %esi
> 
> which is the same thing, but with that (pointless) load to a register.
> 
> And now clang doesn't generate that stack frame at all.
> 
> Anyway, this was a long email to explain the odd attached patch.
> 
> Comments? Note that this patch is *entirely* untested, I have done
> this purely by looking at the code generation in fs/file.c.
> 
>                 Linus

>  arch/x86/include/asm/barrier.h | 23 +++++++++--------------
>  1 file changed, 9 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
> index 66e57c010392..6159d2cbbfde 100644
> --- a/arch/x86/include/asm/barrier.h
> +++ b/arch/x86/include/asm/barrier.h
> @@ -33,20 +33,15 @@
>   * Returns:
>   *     0 - (index < size)
>   */
> -static __always_inline unsigned long array_index_mask_nospec(unsigned long index,
> -		unsigned long size)
> -{
> -	unsigned long mask;
> -
> -	asm volatile ("cmp %1,%2; sbb %0,%0;"
> -			:"=r" (mask)
> -			:"g"(size),"r" (index)
> -			:"cc");
> -	return mask;
> -}
> -
> -/* Override the default implementation from linux/nospec.h. */
> -#define array_index_mask_nospec array_index_mask_nospec
> +#define array_index_mask_nospec(idx,sz) ({	\
> +	typeof((idx)+(sz)) __idx = (idx);	\
> +	typeof(__idx) __sz = (sz);		\
> +	typeof(__idx) __mask;			\
> +	asm volatile ("cmp %1,%2; sbb %0,%0"	\
> +			:"=r" (__mask)		\
> +			:"ir"(__sz),"r" (__idx)	\
> +			:"cc");			\
> +	__mask; })
>  
>  /* Prevent speculative execution past this barrier. */
>  #define barrier_nospec() asm volatile("lfence":::"memory")


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: More annoying code generation by clang
  2024-04-06 10:56 ` Ingo Molnar
@ 2024-04-06 12:30   ` Uros Bizjak
  2024-04-06 15:39     ` Linus Torvalds
  0 siblings, 1 reply; 13+ messages in thread
From: Uros Bizjak @ 2024-04-06 12:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Nick Desaulniers, Nathan Chancellor,
	Thomas Gleixner, Peter Anvin, the arch/x86 maintainers,
	Linux Kernel Mailing List

On Sat, Apr 6, 2024 at 12:56 PM Ingo Molnar <mingo@kernel.org> wrote:
>
>
> [ I've Cc:-ed a few more people who might be interested in this. ]
> [ Copy of Linus's email below. ]
>
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> > So this doesn't really matter in any real life situation, but it
> > really grated on me.
> >
> > Clang has this nasty habit of taking our nice asm constraints, and
> > turning them into worst-case garbage. It's been reported a couple of
> > times where we use "g" to tell the compiler that pretty much any
> > source to the asm works, and then clang takes that to mean "I will
> > take that to use 'memory'" even when that makes no sense what-so-ever.
> >
> > See for example
> >
> >     https://lore.kernel.org/all/CAHk-=wgobnShg4c2yyMbk2p=U-wmnOmX_0=b3ZY_479Jjey2xw@mail.gmail.com/
> >
> > where I was ranting about clang just doing pointlessly stupid things.
> >
> > However, I found a case where yes, clang does pointlessly stupid
> > things, but it's at least _partly_ our fault, and gcc can't generate
> > optimal code either.
> >
> > We have this fairly critical code in __fget_files_rcu() to look up a
> > 'struct file *' from an fd, and it does this:
> >
> >                 /* Mask is a 0 for invalid fd's, ~0 for valid ones */
> >                 nospec_mask = array_index_mask_nospec(fd, fdt->max_fds);
> >
> > and clang makes a *horrid* mess of it, generating this code:
> >
> >         movl    %edi, %r14d
> >         movq    32(%rbx), %rdx
> >         movl    (%rdx), %eax
> >         movq    %rax, 8(%rsp)
> >         cmpq    8(%rsp), %r14
> >         sbbq    %rcx, %rcx
> >
> > which is just crazy. Notice how it does that "move rax to stack, then
> > do the compare against the stack", instead of just using %rax.
> >
> > In fact, that function shouldn't have a stack frame at all, and the
> > only reason it is generated is because of this whole oddity.
> >
> > All clang's fault, right?
> >
> > Yeah, mostly. But it turns out that what really messes with clangs
> > little head is that the x86 array_index_mask_nospec() function is
> > being a bit annoying.
> >
> > This is what we do:
> >
> >   static __always_inline unsigned long
> > array_index_mask_nospec(unsigned long index,
> >                 unsigned long size)
> >   {
> >         unsigned long mask;
> >
> >         asm volatile ("cmp %1,%2; sbb %0,%0;"
> >                         :"=r" (mask)
> >                         :"g"(size),"r" (index)
> >                         :"cc");
> >         return mask;
> >   }
> >
> > and look at the use again:
> >
> >         nospec_mask = array_index_mask_nospec(fd, fdt->max_fds);
> >
> > here all the values are actually 'unsigned int'. So what happens is
> > that clang can't just use the fdt->max_fds value *directly* from
> > memory, because it needs to be expanded from 32-bit to 64-bit because
> > we've made our array_index_mask_nospec() function only work on 64-bit
> > 'unsigned long' values.
> >
> > So it turns out that by massaging this a bit, and making it just be a
> > macro - so that the asm can decide that "I can do this in 32-bit" - I
> > can get clang to generate much better code.
> >
> > Clang still absolutely hates the "g" constraint, so to get clang to
> > really get this right I have to use "ir" instead of "g". Which is
> > wrong. Because gcc does this right, and could use the memory op
> > directly. But even gcc cannot do that with our *current* function,
> > because of that "the memory value is 32-bit, we require a 64-bit
> > value"
> >
> > Anyway, I can get gcc to generate the right code:
> >
> >         movq    32(%r13), %rdx
> >         cmp (%rdx),%ebx
> >         sbb %esi,%esi
> >
> > which is basically the right code for the six crazy instructions clang
> > generates. And if I make the "g" be "ir", I can get clang to generate
> >
> >         movq    32(%rdi), %rcx
> >         movl    (%rcx), %eax
> >         cmpl    %eax, %esi
> >         sbbl    %esi, %esi
> >
> > which is the same thing, but with that (pointless) load to a register.
> >
> > And now clang doesn't generate that stack frame at all.
> >
> > Anyway, this was a long email to explain the odd attached patch.
> >
> > Comments? Note that this patch is *entirely* untested, I have done
> > this purely by looking at the code generation in fs/file.c.

FYI, please note that gcc-12 is able to synthesize carry-flag compares
on its own:

--cut here--
unsigned long foo (unsigned long index, unsigned long size)
{
    return (index <= size) ? 0 : -1;
}

extern unsigned int size;

unsigned long bar (unsigned int index)
{
    return (index <= size) ? 0 : -1;
}
--cut here--

gcc-12 -O2:

foo:
        cmpq    %rdi, %rsi
        sbbq    %rax, %rax
        ret
bar:
        cmpl    %edi, size(%rip)
        sbbq    %rax, %rax
        ret

Uros.

> >
> >                 Linus
>
> >  arch/x86/include/asm/barrier.h | 23 +++++++++--------------
> >  1 file changed, 9 insertions(+), 14 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
> > index 66e57c010392..6159d2cbbfde 100644
> > --- a/arch/x86/include/asm/barrier.h
> > +++ b/arch/x86/include/asm/barrier.h
> > @@ -33,20 +33,15 @@
> >   * Returns:
> >   *     0 - (index < size)
> >   */
> > -static __always_inline unsigned long array_index_mask_nospec(unsigned long index,
> > -             unsigned long size)
> > -{
> > -     unsigned long mask;
> > -
> > -     asm volatile ("cmp %1,%2; sbb %0,%0;"
> > -                     :"=r" (mask)
> > -                     :"g"(size),"r" (index)
> > -                     :"cc");
> > -     return mask;
> > -}
> > -
> > -/* Override the default implementation from linux/nospec.h. */
> > -#define array_index_mask_nospec array_index_mask_nospec
> > +#define array_index_mask_nospec(idx,sz) ({   \
> > +     typeof((idx)+(sz)) __idx = (idx);       \
> > +     typeof(__idx) __sz = (sz);              \
> > +     typeof(__idx) __mask;                   \
> > +     asm volatile ("cmp %1,%2; sbb %0,%0"    \
> > +                     :"=r" (__mask)          \
> > +                     :"ir"(__sz),"r" (__idx) \
> > +                     :"cc");                 \
> > +     __mask; })
> >
> >  /* Prevent speculative execution past this barrier. */
> >  #define barrier_nospec() asm volatile("lfence":::"memory")
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: More annoying code generation by clang
  2024-04-06 12:30   ` Uros Bizjak
@ 2024-04-06 15:39     ` Linus Torvalds
  2024-04-06 16:04       ` Linus Torvalds
  0 siblings, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2024-04-06 15:39 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Ingo Molnar, Nick Desaulniers, Nathan Chancellor,
	Thomas Gleixner, Peter Anvin, the arch/x86 maintainers,
	Linux Kernel Mailing List

On Sat, 6 Apr 2024 at 05:30, Uros Bizjak <ubizjak@gmail.com> wrote:
>
> FYI, please note that gcc-12 is able to synthesize carry-flag compares
> on its own:

Oh, gcc has been able to do that for much longer than that. It's a
idiomatic i386 pattern, and gcc has generated it for as long as I can
remember.

HOWEVER.

There's a big difference between "able to" and "GUARANTEED to".

Because this code actually requires a data-depencency and not a
control dependency as a correctness issue because of Spectre-v1.

So while I know very well that gcc _can_ do it, I also know very well
that there are absolutely no guarantees that gcc won't use a
conditional branch instead.

So this code is needs to generate good code because it's actually
important code that shows up in benchmarks, but this code also needs
to generate a very _particular_ pattern of code, and it's not good
enough that gcc may "happen" to generate that pattern of code.

Thus the inline asm.

               Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: More annoying code generation by clang
  2024-04-06 15:39     ` Linus Torvalds
@ 2024-04-06 16:04       ` Linus Torvalds
  0 siblings, 0 replies; 13+ messages in thread
From: Linus Torvalds @ 2024-04-06 16:04 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Ingo Molnar, Nick Desaulniers, Nathan Chancellor,
	Thomas Gleixner, Peter Anvin, the arch/x86 maintainers,
	Linux Kernel Mailing List

On Sat, 6 Apr 2024 at 08:39, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Because this code actually requires a data-depencency and not a
> control dependency as a correctness issue because of Spectre-v1.

Just to clarify: our comments in this code are maybe a bit odd,
because our comments are not about the Spectre-v1 issue (which
predates a rewrite) and more about the odd RCU pattern and conditional
avoidance we use here:

        unsigned long nospec_mask;

        /* Mask is a 0 for invalid fd's, ~0 for valid ones */
        nospec_mask = array_index_mask_nospec(fd, fdt->max_fds);

        /*
         * fdentry points to the 'fd' offset, or fdt->fd[0].
         * Loading from fdt->fd[0] is always safe, because the
         * array always exists.
         */
        fdentry = fdt->fd + (fd & nospec_mask);

        /* Do the load, then mask any invalid result */
        file = rcu_dereference_raw(*fdentry);

where *normally* (if RCU wasn't an issue) we'd just write this as

        file = fdt->fd[array_index_nospec(fd, fdt->max_fds)];

where the key part is that "nospec" array indexing that will not
speculatively access the array past the "max_fds".

IOW, the code naively would want to do just

        if (fd < fdt->max_fds) {
                 file = fdt->fd[fd];
                ...

but we need to make sure that it can't be fooled into using a branch
mispredict and use a user-controlled index ("fd") to speculatively
access the array with an arbitrary index and then leak unrelated data
through some side channel (mostly cache access).

And while the normal pattern doesn't expose the mask generation and
just hides that mask in that simpler "array_index_nospec()" macro,
this code actually ends up using the same mask *twice*, because it
will later end up doing this hack:

        file = (void *)(nospec_mask & (unsigned long)file);
        if (unlikely(!file))
                return NULL;

to have just one single conditional at the end (ie we may have loaded
a non-NULL file pointer from fdt->fd[0] because an invalid index got
masked down to a zero index, and the second masking will mask away
that pointer and make it NULL because we're bad people and we know
that NULL is "bitpattern 0" and we care about the code working, not
about some unreal "NULL could be anything else" thing.

End result: this code that is just a few lines long and has more
comments than code, and generates only a handful of instruction is
fairly subtle but also fairly important both for hardware security
issues and for performance.

See commit 253ca8678d30 ("Improve __fget_files_rcu() code generation
(and thus __fget_light())" that actually started doing this "use mask
twice", and realize that that commit is what this performance
regression report is talking about:

    https://lore.kernel.org/all/ZWQ+LEcfFFi4YOAU@xsang-OptiPlex-9020/

ie that whole "use masks and avoid doing the obvious thing" may be a
bit subtle, but it's what turned a 2.9% performance regression into a
3.4% improvement.

(Ok, those performance numbers are on just one random microbenchmark
and don't really matter, so take that with a pinch of salt, but if you
care about a _lot_ of random benchmarks, eventually you get good
performance overall).

Anyway, hopefully that explains the dual issue here: we care about
performance, but we also have to use a specific instruction pattern,
and can't just hope for the best.

                    Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: More annoying code generation by clang
  2024-04-04 22:53 More annoying code generation by clang Linus Torvalds
  2024-04-06 10:56 ` Ingo Molnar
@ 2024-04-08  8:49 ` Peter Zijlstra
  2024-04-08 13:41   ` H. Peter Anvin
  2024-04-08 18:32   ` Linus Torvalds
  1 sibling, 2 replies; 13+ messages in thread
From: Peter Zijlstra @ 2024-04-08  8:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Thomas Gleixner, Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List

On Thu, Apr 04, 2024 at 03:53:49PM -0700, Linus Torvalds wrote:

> diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
> index 66e57c010392..6159d2cbbfde 100644
> --- a/arch/x86/include/asm/barrier.h
> +++ b/arch/x86/include/asm/barrier.h
> @@ -33,20 +33,15 @@
>   * Returns:
>   *     0 - (index < size)
>   */
> +#define array_index_mask_nospec(idx,sz) ({	\
> +	typeof((idx)+(sz)) __idx = (idx);	\
> +	typeof(__idx) __sz = (sz);		\
> +	typeof(__idx) __mask;			\
> +	asm volatile ("cmp %1,%2; sbb %0,%0"	\
> +			:"=r" (__mask)		\
> +			:"ir"(__sz),"r" (__idx)	\
> +			:"cc");			\
> +	__mask; })

Should this not carry a comment about the "ir" constraint wanting to be
"g" except for clang being daft?

(I really wish clang would go fix this, it keeps coming up time and
again).

>  
>  /* Prevent speculative execution past this barrier. */
>  #define barrier_nospec() asm volatile("lfence":::"memory")


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: More annoying code generation by clang
  2024-04-08  8:49 ` Peter Zijlstra
@ 2024-04-08 13:41   ` H. Peter Anvin
  2024-04-08 18:32   ` Linus Torvalds
  1 sibling, 0 replies; 13+ messages in thread
From: H. Peter Anvin @ 2024-04-08 13:41 UTC (permalink / raw)
  To: Peter Zijlstra, Linus Torvalds
  Cc: Ingo Molnar, Thomas Gleixner, the arch/x86 maintainers,
	Linux Kernel Mailing List

On April 8, 2024 1:49:34 AM PDT, Peter Zijlstra <peterz@infradead.org> wrote:
>On Thu, Apr 04, 2024 at 03:53:49PM -0700, Linus Torvalds wrote:
>
>> diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
>> index 66e57c010392..6159d2cbbfde 100644
>> --- a/arch/x86/include/asm/barrier.h
>> +++ b/arch/x86/include/asm/barrier.h
>> @@ -33,20 +33,15 @@
>>   * Returns:
>>   *     0 - (index < size)
>>   */
>> +#define array_index_mask_nospec(idx,sz) ({	\
>> +	typeof((idx)+(sz)) __idx = (idx);	\
>> +	typeof(__idx) __sz = (sz);		\
>> +	typeof(__idx) __mask;			\
>> +	asm volatile ("cmp %1,%2; sbb %0,%0"	\
>> +			:"=r" (__mask)		\
>> +			:"ir"(__sz),"r" (__idx)	\
>> +			:"cc");			\
>> +	__mask; })
>
>Should this not carry a comment about the "ir" constraint wanting to be
>"g" except for clang being daft?
>
>(I really wish clang would go fix this, it keeps coming up time and
>again).
>
>>  
>>  /* Prevent speculative execution past this barrier. */
>>  #define barrier_nospec() asm volatile("lfence":::"memory")
>

If the only reason for "ir" as opposed to "g" (= "irm") is clang then it really needs to be called out. Or better yet, don't do anything and let the clang people actually fix their code generation.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: More annoying code generation by clang
  2024-04-08  8:49 ` Peter Zijlstra
  2024-04-08 13:41   ` H. Peter Anvin
@ 2024-04-08 18:32   ` Linus Torvalds
  2024-04-08 19:42     ` Linus Torvalds
  1 sibling, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2024-04-08 18:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Thomas Gleixner, Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List

On Mon, 8 Apr 2024 at 01:49, Peter Zijlstra <peterz@infradead.org> wrote:
>
> Should this not carry a comment about the "ir" constraint wanting to be
> "g" except for clang being daft?

Yeah. Except I think I'll do something like

  /* Clang messes up "g" as an asm source */
  #define ASM_SOURCE_G "ir"

in <linux/compiler-clang.h>, and

  #ifndef ASM_SOURCE_G
    #define ASM_SOURCE_G "g"
  #endif

in linux/compiler.h.

> (I really wish clang would go fix this, it keeps coming up time and
> again).

It's been reported long ago, it seems to be hard to fix.

I suspect the issue is that the inline asm format is fairly closely
related to the gcc machine descriptions (look at the machine
descriptor files in gcc, and if you can ignore the horrid LISP-style
syntax you see how close they are).

And clang has a different model and needs to "translate" things, and
that one doesn't translate.

It's not like we don't have workarounds for gcc bugs in this area too
(eg "asm_goto_output()", née "asm_volatile_goto()").

There was another bug in my patch, though: the output mask should
always be "unsigned long", not tied to the input type.

              Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: More annoying code generation by clang
  2024-04-08 18:32   ` Linus Torvalds
@ 2024-04-08 19:42     ` Linus Torvalds
  2024-04-09 10:45       ` Peter Zijlstra
  2024-04-10  8:11       ` David Laight
  0 siblings, 2 replies; 13+ messages in thread
From: Linus Torvalds @ 2024-04-08 19:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Thomas Gleixner, Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1904 bytes --]

On Mon, 8 Apr 2024 at 11:32, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> It's been reported long ago, it seems to be hard to fix.
>
> I suspect the issue is that the inline asm format is fairly closely
> related to the gcc machine descriptions (look at the machine
> descriptor files in gcc, and if you can ignore the horrid LISP-style
> syntax you see how close they are).

Actually, one of the github issues pages has more of an explanation
(and yes, it's tied to impedance issues between the inline asm syntax
and how clang works):

      https://github.com/llvm/llvm-project/issues/20571#issuecomment-980933442

so I wrote more of a commit log and did that "ASM_SOURCE_G" thing
(except I decided to call it "input" instead of "source", since that's
the standard inline asm language).

This version also has that output size fixed, and the commit message
talks about it.

This does *not* fix other inline asms to use "ASM_INPUT_G/RM".

I think it's mainly some of the bitop code that people have noticed
before - fls and variable_ffs() and friends.

I suspect clang is more common in the arm64 world than it is for
x86-64 kernel developers, and arm64 inline asm basically never uses
"rm" or "g" since arm64 doesn't have instructions that take either a
register or a memory operand.

Anyway, with gcc this generates

        cmp (%rdx),%ebx; sbb %rax,%rax  # _7->max_fds, fd, __mask

IOW, it uses the memory location for "max_fds". It couldn't do that
before, because it used to think that it always had to do the compare
in 64 bits, and the memory location is only 32-bit.

With clang, this generates

        movl    (%rcx), %eax
        cmpl    %eax, %edi
        sbbq    %rdi, %rdi

which has that extra register use, but is at least much better than
what it used to generate with crazy "load into register, spill to
stack, then compare against stack contents".

               Linus

[-- Attachment #2: 0001-x86-improve-array_index_mask_nospec-code-generation.patch --]
[-- Type: text/x-patch, Size: 4554 bytes --]

From 7779d285040bab685296da2cd0afe9d2d7b58969 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Mon, 8 Apr 2024 11:38:30 -0700
Subject: [PATCH] x86: improve array_index_mask_nospec() code generation

Don't force the inputs to be 'unsigned long', when the comparison can
easily be done in 32-bit if that's more appropriate.

Note that while we can look at the inputs to choose an appropriate size
for the compare instruction, the output is fixed at 'unsigned long'.
That's not technically optimal either, since a 32-bit 'sbbl' would often
be sufficient.

But for the outgoing mask we don't know how the mask ends up being used
(ie we have uses thathave an incoming 32-bit array index, but end up
using the mask for other things).  That said, it only costs the extra
REX prefix to always generate the 64-bit mask.

[ A 'sbbl' also always technically generates a 64-bit mask, but with the
  upper 32 bits clear: that's fine for when the incoming index that will
  be masked is already 32-bit, but not if you use the mask to mask a
  pointer afterwards, like the file table lookup does ]

Also, work around clang problems with asm constraints that have multiple
possibilities, particularly "g" and "rm".  Clang seems to turn inputs
like that into the most generic form, which is the memory input - but to
make matters worse, clang won't even use a possible original memory
location, but will spill the value to stack, and use the stack for the
asm input.

See

  https://github.com/llvm/llvm-project/issues/20571#issuecomment-980933442

for some explanation of why clang has this strange behavior, but the end
result is that "g" and "rm" really end up generating horrid code.

Link: https://github.com/llvm/llvm-project/issues/20571
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 arch/x86/include/asm/barrier.h | 24 ++++++++++--------------
 include/linux/compiler-clang.h | 12 ++++++++++++
 include/linux/compiler_types.h |  9 +++++++++
 3 files changed, 31 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 66e57c010392..234fd892e39e 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -33,20 +33,16 @@
  * Returns:
  *     0 - (index < size)
  */
-static __always_inline unsigned long array_index_mask_nospec(unsigned long index,
-		unsigned long size)
-{
-	unsigned long mask;
-
-	asm volatile ("cmp %1,%2; sbb %0,%0;"
-			:"=r" (mask)
-			:"g"(size),"r" (index)
-			:"cc");
-	return mask;
-}
-
-/* Override the default implementation from linux/nospec.h. */
-#define array_index_mask_nospec array_index_mask_nospec
+#define array_index_mask_nospec(idx,sz) ({	\
+	typeof((idx)+(sz)) __idx = (idx);	\
+	typeof(__idx) __sz = (sz);		\
+	unsigned long __mask;			\
+	asm volatile ("cmp %1,%2; sbb %0,%0"	\
+			:"=r" (__mask)		\
+			:ASM_INPUT_G (__sz),	\
+			 "r" (__idx)		\
+			:"cc");			\
+	__mask; })
 
 /* Prevent speculative execution past this barrier. */
 #define barrier_nospec() asm volatile("lfence":::"memory")
diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
index 49feac0162a5..0dee061fd7a6 100644
--- a/include/linux/compiler-clang.h
+++ b/include/linux/compiler-clang.h
@@ -118,3 +118,15 @@
 
 #define __diag_ignore_all(option, comment) \
 	__diag_clang(13, ignore, option)
+
+/*
+ * clang has horrible behavior with "g" or "rm" constraints for asm
+ * inputs, turning them into something worse than "m". Avoid using
+ * constraints with multiple possible uses (but "ir" seems to be ok):
+ *
+ *	https://github.com/llvm/llvm-project/issues/20571
+ *	https://github.com/llvm/llvm-project/issues/30873
+ *	https://github.com/llvm/llvm-project/issues/34837
+ */
+#define ASM_INPUT_G "ir"
+#define ASM_INPUT_RM "r"
diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index 2abaa3a825a9..e53acd310545 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -380,6 +380,15 @@ struct ftrace_likely_data {
 #define asm_goto_output(x...) asm volatile goto(x)
 #endif
 
+/*
+ * Clang has trouble with constraints with multiple
+ * alternative behaviors (mainly "g" and "rm").
+ */
+#ifndef ASM_INPUT_G
+  #define ASM_INPUT_G "g"
+  #define ASM_INPUT_RM "rm"
+#endif
+
 #ifdef CONFIG_CC_HAS_ASM_INLINE
 #define asm_inline asm __inline
 #else
-- 
2.44.0.330.g4d18c88175


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: More annoying code generation by clang
  2024-04-08 19:42     ` Linus Torvalds
@ 2024-04-09 10:45       ` Peter Zijlstra
  2024-04-09 15:37         ` Nick Desaulniers
  2024-04-10  8:11       ` David Laight
  1 sibling, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2024-04-09 10:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Thomas Gleixner, Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	Nick Desaulniers

On Mon, Apr 08, 2024 at 12:42:31PM -0700, Linus Torvalds wrote:

> Actually, one of the github issues pages has more of an explanation
> (and yes, it's tied to impedance issues between the inline asm syntax
> and how clang works):
> 
>       https://github.com/llvm/llvm-project/issues/20571#issuecomment-980933442
> 

So that same issue seems to suggest Nick is actually working on this and
got stuff merged. Nick, what is the status of your efforts and should we
indeed do the below as Linus suggests or should he upgrade his compiler?

> diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
> index 49feac0162a5..0dee061fd7a6 100644
> --- a/include/linux/compiler-clang.h
> +++ b/include/linux/compiler-clang.h
> @@ -118,3 +118,15 @@
>  
>  #define __diag_ignore_all(option, comment) \
>  	__diag_clang(13, ignore, option)
> +
> +/*
> + * clang has horrible behavior with "g" or "rm" constraints for asm
> + * inputs, turning them into something worse than "m". Avoid using
> + * constraints with multiple possible uses (but "ir" seems to be ok):
> + *
> + *	https://github.com/llvm/llvm-project/issues/20571
> + *	https://github.com/llvm/llvm-project/issues/30873
> + *	https://github.com/llvm/llvm-project/issues/34837
> + */
> +#define ASM_INPUT_G "ir"
> +#define ASM_INPUT_RM "r"
> diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
> index 2abaa3a825a9..e53acd310545 100644
> --- a/include/linux/compiler_types.h
> +++ b/include/linux/compiler_types.h
> @@ -380,6 +380,15 @@ struct ftrace_likely_data {
>  #define asm_goto_output(x...) asm volatile goto(x)
>  #endif
>  
> +/*
> + * Clang has trouble with constraints with multiple
> + * alternative behaviors (mainly "g" and "rm").
> + */
> +#ifndef ASM_INPUT_G
> +  #define ASM_INPUT_G "g"
> +  #define ASM_INPUT_RM "rm"
> +#endif
> +
>  #ifdef CONFIG_CC_HAS_ASM_INLINE
>  #define asm_inline asm __inline
>  #else
> -- 
> 2.44.0.330.g4d18c88175
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: More annoying code generation by clang
  2024-04-09 10:45       ` Peter Zijlstra
@ 2024-04-09 15:37         ` Nick Desaulniers
  2024-04-10 19:23           ` Bill Wendling
  0 siblings, 1 reply; 13+ messages in thread
From: Nick Desaulniers @ 2024-04-09 15:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Ingo Molnar, Thomas Gleixner, Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List,
	clang-built-linux, Kees Cook

On Tue, Apr 9, 2024 at 3:45 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Apr 08, 2024 at 12:42:31PM -0700, Linus Torvalds wrote:
>
> > Actually, one of the github issues pages has more of an explanation
> > (and yes, it's tied to impedance issues between the inline asm syntax
> > and how clang works):
> >
> >       https://github.com/llvm/llvm-project/issues/20571#issuecomment-980933442
> >
>
> So that same issue seems to suggest Nick is actually working on this and
> got stuff merged. Nick, what is the status of your efforts and should we
> indeed do the below as Linus suggests or should he upgrade his compiler?

Sorry, I'm no longer working on the issue.  I should mark that as such.

The feature got hung up on rewriting one of the register allocation
frameworks in llvm.
https://github.com/llvm/llvm-project/pull/74344

I have a new set of responsibilities at work so I probably wont be
working on that issue any time soon.

>
> > diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
> > index 49feac0162a5..0dee061fd7a6 100644
> > --- a/include/linux/compiler-clang.h
> > +++ b/include/linux/compiler-clang.h
> > @@ -118,3 +118,15 @@
> >
> >  #define __diag_ignore_all(option, comment) \
> >       __diag_clang(13, ignore, option)
> > +
> > +/*
> > + * clang has horrible behavior with "g" or "rm" constraints for asm
> > + * inputs, turning them into something worse than "m". Avoid using
> > + * constraints with multiple possible uses (but "ir" seems to be ok):
> > + *
> > + *   https://github.com/llvm/llvm-project/issues/20571
> > + *   https://github.com/llvm/llvm-project/issues/30873
> > + *   https://github.com/llvm/llvm-project/issues/34837

20571 is the cannonical bug for this.

> > + */
> > +#define ASM_INPUT_G "ir"
> > +#define ASM_INPUT_RM "r"
> > diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
> > index 2abaa3a825a9..e53acd310545 100644
> > --- a/include/linux/compiler_types.h
> > +++ b/include/linux/compiler_types.h
> > @@ -380,6 +380,15 @@ struct ftrace_likely_data {
> >  #define asm_goto_output(x...) asm volatile goto(x)
> >  #endif
> >
> > +/*
> > + * Clang has trouble with constraints with multiple
> > + * alternative behaviors (mainly "g" and "rm").
> > + */
> > +#ifndef ASM_INPUT_G
> > +  #define ASM_INPUT_G "g"
> > +  #define ASM_INPUT_RM "rm"
> > +#endif
> > +
> >  #ifdef CONFIG_CC_HAS_ASM_INLINE
> >  #define asm_inline asm __inline
> >  #else
> > --
> > 2.44.0.330.g4d18c88175
> >
>


-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: More annoying code generation by clang
  2024-04-08 19:42     ` Linus Torvalds
  2024-04-09 10:45       ` Peter Zijlstra
@ 2024-04-10  8:11       ` David Laight
  1 sibling, 0 replies; 13+ messages in thread
From: David Laight @ 2024-04-10  8:11 UTC (permalink / raw)
  To: 'Linus Torvalds', Peter Zijlstra
  Cc: Ingo Molnar, Thomas Gleixner, Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List

From: Linus Torvalds
> Sent: 08 April 2024 20:43
...
> I think it's mainly some of the bitop code that people have noticed
> before - fls and variable_ffs() and friends.
> 
> I suspect clang is more common in the arm64 world than it is for
> x86-64 kernel developers, and arm64 inline asm basically never uses
> "rm" or "g" since arm64 doesn't have instructions that take either a
> register or a memory operand.
> 
> Anyway, with gcc this generates
> 
>         cmp (%rdx),%ebx; sbb %rax,%rax  # _7->max_fds, fd, __mask
> 
> IOW, it uses the memory location for "max_fds". It couldn't do that
> before, because it used to think that it always had to do the compare
> in 64 bits, and the memory location is only 32-bit.
> 
> With clang, this generates
> 
>         movl    (%rcx), %eax
>         cmpl    %eax, %edi
>         sbbq    %rdi, %rdi
> 
> which has that extra register use, but is at least much better than
> what it used to generate with crazy "load into register, spill to
> stack, then compare against stack contents".

Provided the compiler can find a register I doubt the extra
instruction makes much difference.
The 'cmp (%rdx),%ebx)' ends up being 2 u-ops the same as
the movl/cmpl pair.
Instruction decode and retirement aren't often bottlenecks on recent cpu.
So I suspect the main difference is cache footprint.

Trying to measure the difference is probably impossible...

You'll probably get a bigger difference by changing a lot of
function results and parameters to 'unsigned long' to remove
all the zero-extending that happens.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: More annoying code generation by clang
  2024-04-09 15:37         ` Nick Desaulniers
@ 2024-04-10 19:23           ` Bill Wendling
  0 siblings, 0 replies; 13+ messages in thread
From: Bill Wendling @ 2024-04-10 19:23 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Peter Zijlstra, Linus Torvalds, Ingo Molnar, Thomas Gleixner,
	Peter Anvin, the arch/x86 maintainers, Linux Kernel Mailing List,
	clang-built-linux, Kees Cook

On Tue, Apr 9, 2024 at 8:37 AM Nick Desaulniers <ndesaulniers@google.com> wrote:
>
> On Tue, Apr 9, 2024 at 3:45 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Mon, Apr 08, 2024 at 12:42:31PM -0700, Linus Torvalds wrote:
> >
> > > Actually, one of the github issues pages has more of an explanation
> > > (and yes, it's tied to impedance issues between the inline asm syntax
> > > and how clang works):
> > >
> > >       https://github.com/llvm/llvm-project/issues/20571#issuecomment-980933442
> > >
> >
> > So that same issue seems to suggest Nick is actually working on this and
> > got stuff merged. Nick, what is the status of your efforts and should we
> > indeed do the below as Linus suggests or should he upgrade his compiler?
>
> Sorry, I'm no longer working on the issue.  I should mark that as such.
>
> The feature got hung up on rewriting one of the register allocation
> frameworks in llvm.
> https://github.com/llvm/llvm-project/pull/74344
>
> I have a new set of responsibilities at work so I probably wont be
> working on that issue any time soon.
>
I plan on picking this up. Could you please create a WIP branch on
GitHub with your current work?

-bw

> >
> > > diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
> > > index 49feac0162a5..0dee061fd7a6 100644
> > > --- a/include/linux/compiler-clang.h
> > > +++ b/include/linux/compiler-clang.h
> > > @@ -118,3 +118,15 @@
> > >
> > >  #define __diag_ignore_all(option, comment) \
> > >       __diag_clang(13, ignore, option)
> > > +
> > > +/*
> > > + * clang has horrible behavior with "g" or "rm" constraints for asm
> > > + * inputs, turning them into something worse than "m". Avoid using
> > > + * constraints with multiple possible uses (but "ir" seems to be ok):
> > > + *
> > > + *   https://github.com/llvm/llvm-project/issues/20571
> > > + *   https://github.com/llvm/llvm-project/issues/30873
> > > + *   https://github.com/llvm/llvm-project/issues/34837
>
> 20571 is the cannonical bug for this.
>
> > > + */
> > > +#define ASM_INPUT_G "ir"
> > > +#define ASM_INPUT_RM "r"
> > > diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
> > > index 2abaa3a825a9..e53acd310545 100644
> > > --- a/include/linux/compiler_types.h
> > > +++ b/include/linux/compiler_types.h
> > > @@ -380,6 +380,15 @@ struct ftrace_likely_data {
> > >  #define asm_goto_output(x...) asm volatile goto(x)
> > >  #endif
> > >
> > > +/*
> > > + * Clang has trouble with constraints with multiple
> > > + * alternative behaviors (mainly "g" and "rm").
> > > + */
> > > +#ifndef ASM_INPUT_G
> > > +  #define ASM_INPUT_G "g"
> > > +  #define ASM_INPUT_RM "rm"
> > > +#endif
> > > +
> > >  #ifdef CONFIG_CC_HAS_ASM_INLINE
> > >  #define asm_inline asm __inline
> > >  #else
> > > --
> > > 2.44.0.330.g4d18c88175
> > >
> >
>
>
> --
> Thanks,
> ~Nick Desaulniers
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-04-10 19:23 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-04 22:53 More annoying code generation by clang Linus Torvalds
2024-04-06 10:56 ` Ingo Molnar
2024-04-06 12:30   ` Uros Bizjak
2024-04-06 15:39     ` Linus Torvalds
2024-04-06 16:04       ` Linus Torvalds
2024-04-08  8:49 ` Peter Zijlstra
2024-04-08 13:41   ` H. Peter Anvin
2024-04-08 18:32   ` Linus Torvalds
2024-04-08 19:42     ` Linus Torvalds
2024-04-09 10:45       ` Peter Zijlstra
2024-04-09 15:37         ` Nick Desaulniers
2024-04-10 19:23           ` Bill Wendling
2024-04-10  8:11       ` David Laight

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.