On Thu, Oct 12, 2023 at 8:01 PM Uros Bizjak wrote: > > On Thu, Oct 12, 2023 at 7:47 PM Linus Torvalds > wrote: > > > > On Thu, 12 Oct 2023 at 10:10, Linus Torvalds > > wrote: > > > > > > The fix seems to be a simple one-liner, ie just > > > > > > - asm(__pcpu_op2_##size(op, __percpu_arg(P[var]), "%[val]") \ > > > + asm(__pcpu_op2_##size(op, __percpu_arg(a[var]), "%[val]") \ > > > > Nope. That doesn't work at all. > > > > It turns out that we're not the only ones that didn't know about the > > 'a' modifier. > > > > clang has also never heard of it in this context, and the above > > one-liner results in an endless sea of errors, with > > > > error: invalid operand in inline asm: 'movq %gs:${1:a}, $0' > > > > Looking around, I think it's X86AsmPrinter::PrintAsmOperand() that is > > supposed to handle these things, and while it does have some handling > > for 'a', the comment around it says > > > > case 'a': // This is an address. Currently only 'i' and 'r' are expected. > > > > and I think our use ends up just confusing the heck out of clang. Of > > course, clang also does this: > > > > case 'P': // This is the operand of a call, treat specially. > > PrintPCRelImm(MI, OpNo, O); > > return false; > > > > so clang *already* generates those 'current' accesses as PCrelative, and I see > > > > movq %gs:pcpu_hot(%rip), %r13 > > > > in the generated code. > > > > End result: clang actually generates what we want just using 'P', and > > the whole "P vs a" is only a gcc thing. > > Ugh, this isn't exactly following Clang's claim that "In general, > Clang is highly compatible with the GCC inline assembly extensions, > allowing the same set of constraints, modifiers and operands as GCC > inline assembly." For added fun I obtained some old clang: $ clang --version clang version 11.0.0 (Fedora 11.0.0-3.fc33) and tried to compile this: int m; __seg_gs int n; void foo (void) { asm ("# %a0 %a1" :: "p" (&m), "p" (&n)); asm ("# %P0 %P1" :: "p" (&m), "p" (&n)); } clang-11: # m n # m n clang-11 -fpie: # m(%rip) n(%rip) # m n clang-11 -m32: # m n # m n gcc: # m(%rip) n(%rip) # m n gcc -fpie: # m(%rip) n(%rip) # m n gcc -m32: # m n # m n Please find attached a patch that should bring some order to this issue. The patch includes two demonstration sites, the generated code for mem_encrypt_identity.c does not change while the change in percpu.h brings expected 4kB code size reduction. Uros.