On Mon, 8 Apr 2024 at 11:32, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> It's been reported long ago, it seems to be hard to fix.
>
> I suspect the issue is that the inline asm format is fairly closely
> related to the gcc machine descriptions (look at the machine
> descriptor files in gcc, and if you can ignore the horrid LISP-style
> syntax you see how close they are).

Actually, one of the github issues pages has more of an explanation
(and yes, it's tied to impedance issues between the inline asm syntax
and how clang works):

      https://github.com/llvm/llvm-project/issues/20571#issuecomment-980933442

so I wrote more of a commit log and did that "ASM_SOURCE_G" thing
(except I decided to call it "input" instead of "source", since that's
the standard inline asm language).

This version also has that output size fixed, and the commit message
talks about it.

This does *not* fix other inline asms to use "ASM_INPUT_G/RM".

I think it's mainly some of the bitop code that people have noticed
before - fls and variable_ffs() and friends.

I suspect clang is more common in the arm64 world than it is for
x86-64 kernel developers, and arm64 inline asm basically never uses
"rm" or "g" since arm64 doesn't have instructions that take either a
register or a memory operand.

Anyway, with gcc this generates

        cmp (%rdx),%ebx; sbb %rax,%rax  # _7->max_fds, fd, __mask

IOW, it uses the memory location for "max_fds". It couldn't do that
before, because it used to think that it always had to do the compare
in 64 bits, and the memory location is only 32-bit.

With clang, this generates

        movl    (%rcx), %eax
        cmpl    %eax, %edi
        sbbq    %rdi, %rdi

which has that extra register use, but is at least much better than
what it used to generate with crazy "load into register, spill to
stack, then compare against stack contents".

               Linus