On Thu, Oct 12, 2023 at 8:01 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Thu, Oct 12, 2023 at 7:47 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > On Thu, 12 Oct 2023 at 10:10, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > >
> > > The fix seems to be a simple one-liner, ie just
> > >
> > > -       asm(__pcpu_op2_##size(op, __percpu_arg(P[var]), "%[val]")       \
> > > +       asm(__pcpu_op2_##size(op, __percpu_arg(a[var]), "%[val]")       \
> >
> > Nope. That doesn't work at all.
> >
> > It turns out that we're not the only ones that didn't know about the
> > 'a' modifier.
> >
> > clang has also never heard of it in this context, and the above
> > one-liner results in an endless sea of errors, with
> >
> >      error: invalid operand in inline asm: 'movq %gs:${1:a}, $0'
> >
> > Looking around, I think it's X86AsmPrinter::PrintAsmOperand() that is
> > supposed to handle these things, and while it does have some handling
> > for 'a', the comment around it says
> >
> >     case 'a': // This is an address.  Currently only 'i' and 'r' are expected.
> >
> > and I think our use ends up just confusing the heck out of clang. Of
> > course, clang also does this:
> >
> >     case 'P': // This is the operand of a call, treat specially.
> >         PrintPCRelImm(MI, OpNo, O);
> >         return false;
> >
> > so clang *already* generates those 'current' accesses as PCrelative, and I see
> >
> >         movq    %gs:pcpu_hot(%rip), %r13
> >
> > in the generated code.
> >
> > End result: clang actually generates what we want just using 'P', and
> > the whole "P vs a" is only a gcc thing.
>
> Ugh, this isn't exactly following Clang's claim that "In general,
> Clang is highly compatible with the GCC inline assembly extensions,
> allowing the same set of constraints, modifiers and operands as GCC
> inline assembly."

For added fun I obtained some old clang:

$ clang --version
clang version 11.0.0 (Fedora 11.0.0-3.fc33)

and tried to compile this:

int m;
__seg_gs int n;

void foo (void)
{
  asm ("# %a0 %a1" :: "p" (&m), "p" (&n));
  asm ("# %P0 %P1" :: "p" (&m), "p" (&n));
}

clang-11:

       # m n
       # m n

clang-11 -fpie:

       # m(%rip) n(%rip)
       # m n

clang-11 -m32:

       # m n
       # m n

gcc:

       # m(%rip) n(%rip)
       # m n

gcc -fpie:

       # m(%rip) n(%rip)
       # m n

gcc -m32:

       # m n
       # m n

Please find attached a patch that should bring some order to this
issue. The patch includes two demonstration sites, the generated code
for mem_encrypt_identity.c does not change while the change in
percpu.h brings expected 4kB code size reduction.

Uros.