linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/acrn: Improve ACRN hypercalls
@ 2022-08-04 18:03 Uros Bizjak
  2022-08-04 18:41 ` Dave Hansen
  2022-08-04 18:52 ` Sean Christopherson
  0 siblings, 2 replies; 7+ messages in thread
From: Uros Bizjak @ 2022-08-04 18:03 UTC (permalink / raw)
  To: x86, linux-kernel
  Cc: Uros Bizjak, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

As explained in section 6.47.5.2, "Specifying Registers for Local Variables"
of the GCC info documentation, the correct way to specify register for
input operands when calling Extended 'asm' is to define a local register
variable and associate it with a specified register:

	register unsigned long r8 asm ("r8") = hcall_id;

Use the above approach instead of explicit MOV to R8 at the beginning
of the asm. The relaxed assignment allows compiler to optimize and
shrink drivers/virt/acrn.o for 181 bytes:

   text    data     bss     dec     hex filename
   4284     208       0    4492    118c hsm-new.o
   4465     208       0    4673    1241 hsm-old.o

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
---
 arch/x86/include/asm/acrn.h | 27 ++++++++++++---------------
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/acrn.h b/arch/x86/include/asm/acrn.h
index e003a01b7c67..601867085b95 100644
--- a/arch/x86/include/asm/acrn.h
+++ b/arch/x86/include/asm/acrn.h
@@ -29,19 +29,16 @@ static inline u32 acrn_cpuid_base(void)
  *   - Hypercall number is passed in R8 register.
  *   - Up to 2 arguments are passed in RDI, RSI.
  *   - Return value will be placed in RAX.
- *
- * Because GCC doesn't support R8 register as direct register constraints, use
- * supported constraint as input with a explicit MOV to R8 in beginning of asm.
  */
 static inline long acrn_hypercall0(unsigned long hcall_id)
 {
 	long result;
 
-	asm volatile("movl %1, %%r8d\n\t"
-		     "vmcall\n\t"
+	register unsigned long r8 asm ("r8") = hcall_id;
+	asm volatile("vmcall"
 		     : "=a" (result)
-		     : "g" (hcall_id)
-		     : "r8", "memory");
+		     : "r" (r8)
+		     : "memory");
 
 	return result;
 }
@@ -51,11 +48,11 @@ static inline long acrn_hypercall1(unsigned long hcall_id,
 {
 	long result;
 
-	asm volatile("movl %1, %%r8d\n\t"
-		     "vmcall\n\t"
+	register unsigned long r8 asm ("r8") = hcall_id;
+	asm volatile("vmcall"
 		     : "=a" (result)
-		     : "g" (hcall_id), "D" (param1)
-		     : "r8", "memory");
+		     : "r" (r8), "D" (param1)
+		     : "memory");
 
 	return result;
 }
@@ -66,11 +63,11 @@ static inline long acrn_hypercall2(unsigned long hcall_id,
 {
 	long result;
 
-	asm volatile("movl %1, %%r8d\n\t"
-		     "vmcall\n\t"
+	register unsigned long r8 asm ("r8") = hcall_id;
+	asm volatile("vmcall"
 		     : "=a" (result)
-		     : "g" (hcall_id), "D" (param1), "S" (param2)
-		     : "r8", "memory");
+		     : "r" (r8), "D" (param1), "S" (param2)
+		     : "memory");
 
 	return result;
 }
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86/acrn: Improve ACRN hypercalls
  2022-08-04 18:03 [PATCH] x86/acrn: Improve ACRN hypercalls Uros Bizjak
@ 2022-08-04 18:41 ` Dave Hansen
  2022-08-04 18:56   ` Uros Bizjak
  2022-08-04 18:52 ` Sean Christopherson
  1 sibling, 1 reply; 7+ messages in thread
From: Dave Hansen @ 2022-08-04 18:41 UTC (permalink / raw)
  To: Uros Bizjak, x86, linux-kernel
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin

On 8/4/22 11:03, Uros Bizjak wrote:
> As explained in section 6.47.5.2, "Specifying Registers for Local Variables"
> of the GCC info documentation, the correct way to specify register for
> input operands when calling Extended 'asm' is to define a local register
> variable and associate it with a specified register:
> 
> 	register unsigned long r8 asm ("r8") = hcall_id;

IIRC, that's what the ACRN folks proposed first.  But, it's more fragile
because you can't, for instance, put a printk() in that function between
the variable definition and assebly.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86/acrn: Improve ACRN hypercalls
  2022-08-04 18:03 [PATCH] x86/acrn: Improve ACRN hypercalls Uros Bizjak
  2022-08-04 18:41 ` Dave Hansen
@ 2022-08-04 18:52 ` Sean Christopherson
  2022-08-04 19:05   ` Uros Bizjak
  1 sibling, 1 reply; 7+ messages in thread
From: Sean Christopherson @ 2022-08-04 18:52 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: x86, linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

On Thu, Aug 04, 2022, Uros Bizjak wrote:
> As explained in section 6.47.5.2, "Specifying Registers for Local Variables"
> of the GCC info documentation, the correct way to specify register for
> input operands when calling Extended 'asm' is to define a local register
> variable and associate it with a specified register:
> 
> 	register unsigned long r8 asm ("r8") = hcall_id;

Using "register" for input operands is subtly dangerous.  The compiler (at least
some versions of GCC) isn't smart enough to realize that it must not clobber the
register between setting the register and passing it into asm().  Or maybe that's
the designed/intended behavior, but if so, IMO it's a terrible design.

And since kernel style is to put declarations at the beginning of functions, it's
quite easy for a developer/debugger to mess this up (speaking from multiple
experiences).

E.g. adding a pr_warn_ratelimited() after the register declaration generates code
that doesn't even load r8.

Dump of assembler code for function fake_acrn_example:
Address range 0xffffffff8105a1e0 to 0xffffffff8105a216:
   0xffffffff8105a1e0 <+0>:	call   0xffffffff8104a8c0 <__fentry__>
   0xffffffff8105a1e5 <+5>:	push   %rbp
   0xffffffff8105a1e6 <+6>:	mov    $0xffffffff81e08b40,%rsi
   0xffffffff8105a1ed <+13>:	mov    %rsp,%rbp
   0xffffffff8105a1f0 <+16>:	push   %r12
   0xffffffff8105a1f2 <+18>:	movslq %edi,%r12
   0xffffffff8105a1f5 <+21>:	mov    $0xffffffff822229e0,%rdi
   0xffffffff8105a1fc <+28>:	call   0xffffffff81545e30 <___ratelimit>
   0xffffffff8105a201 <+33>:	test   %eax,%eax
   0xffffffff8105a203 <+35>:	jne    0xffffffff81843bf0 <fake_acrn_example.cold>
   0xffffffff8105a209 <+41>:	vmcall
   0xffffffff8105a20c <+44>:	mov    -0x8(%rbp),%r12
   0xffffffff8105a210 <+48>:	leave
   0xffffffff8105a211 <+49>:	jmp    0xffffffff81c01a40 <__x86_return_thunk>


diff --git a/arch/x86/include/asm/acrn.h b/arch/x86/include/asm/acrn.h
index 601867085b95..adc867a4a3d6 100644
--- a/arch/x86/include/asm/acrn.h
+++ b/arch/x86/include/asm/acrn.h
@@ -35,6 +35,9 @@ static inline long acrn_hypercall0(unsigned long hcall_id)
        long result;

        register unsigned long r8 asm ("r8") = hcall_id;
+
+       pr_warn_ratelimited("acrn: issuing hypercall = %ld\n", hcall_id);
+
        asm volatile("vmcall"
                     : "=a" (result)
                     : "r" (r8)
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index fad8faa29d04..1b6e13a8bb9d 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -33,6 +33,7 @@
 #include <asm/kvm_para.h>              /* kvm_handle_async_pf          */
 #include <asm/vdso.h>                  /* fixup_vdso_exception()       */
 #include <asm/irq_stack.h>
+#include <asm/acrn.h>

 #define CREATE_TRACE_POINTS
 #include <asm/trace/exceptions.h>
@@ -1542,3 +1543,9 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)

        irqentry_exit(regs, state);
 }
+
+
+long fake_acrn_example(int hcall)
+{
+       return acrn_hypercall0(hcall);
+}

Compile tested, but the below appears to generate sane code, e.g. reloads r8 when
given:

long fake_acrn_example(long hcall)
{
	pr_warn_ratelimited("acrn: attempting hcall2 for %ld\n", hcall);
	if (acrn_hypercall2(hcall, 0, 0))
		return -EINVAL;

	pr_warn_ratelimited("acrn: attempting hcall1 for %ld\n", hcall);
	if (acrn_hypercall1(hcall, 0))
		return -EINVAL;

	pr_warn_ratelimited("acrn: attempting hcall0 for %ld\n", hcall);
	return acrn_hypercall0(hcall);
}

---
From: Uros Bizjak <ubizjak@gmail.com>
Date: Thu, 4 Aug 2022 20:03:58 +0200
Subject: [PATCH] x86/acrn: Use "register" input constraint to set r8 for ACRN
 hypercalls

As explained in section 6.47.5.2, "Specifying Registers for Local Variables"
of the GCC info documentation, the correct way to specify register for
input operands when calling Extended 'asm' is to define a local register
variable and associate it with a specified register:

	register unsigned long r8 asm ("r8") = hcall_id;

Use the above approach instead of explicit MOV to R8 at the beginning
of the asm. The relaxed assignment allows compiler to optimize and
shrink drivers/virt/acrn.o for 181 bytes:

   text    data     bss     dec     hex filename
   4284     208       0    4492    118c hsm-new.o
   4465     208       0    4673    1241 hsm-old.o

Wrap the register approach in a macro as using "register" for inputs is
subtly dangerous.  If the compiler detects that the register (r8) would
be clobbered between setting it (declaration) and consuming it (asm), the
compiler is apparently allowed to ignore the input.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/acrn.h | 50 +++++++++++++++----------------------
 1 file changed, 20 insertions(+), 30 deletions(-)

diff --git a/arch/x86/include/asm/acrn.h b/arch/x86/include/asm/acrn.h
index e003a01b7c67..0817415f6d74 100644
--- a/arch/x86/include/asm/acrn.h
+++ b/arch/x86/include/asm/acrn.h
@@ -21,6 +21,23 @@ static inline u32 acrn_cpuid_base(void)
 	return 0;
 }

+/*
+  * Do not put code between the r8 register declaration and its usage in asm(),
+  * the compiler is allowed to ignore the register input if r8 would be
+  * clobbered before the asm() blob, e.g. by a function call.
+  */
+#define ACRN_HYPERCALL(id, params...)			\
+({							\
+	long result;					\
+	register unsigned long r8 asm ("r8") = id;	\
+							\
+	asm volatile("vmcall"				\
+		     : "=a" (result)			\
+		     : "r" (r8), ##params		\
+		     : "memory");			\
+	result;						\
+})
+
 /*
  * Hypercalls for ACRN
  *
@@ -29,50 +46,23 @@ static inline u32 acrn_cpuid_base(void)
  *   - Hypercall number is passed in R8 register.
  *   - Up to 2 arguments are passed in RDI, RSI.
  *   - Return value will be placed in RAX.
- *
- * Because GCC doesn't support R8 register as direct register constraints, use
- * supported constraint as input with a explicit MOV to R8 in beginning of asm.
  */
 static inline long acrn_hypercall0(unsigned long hcall_id)
 {
-	long result;
-
-	asm volatile("movl %1, %%r8d\n\t"
-		     "vmcall\n\t"
-		     : "=a" (result)
-		     : "g" (hcall_id)
-		     : "r8", "memory");
-
-	return result;
+	return ACRN_HYPERCALL(hcall_id);
 }

 static inline long acrn_hypercall1(unsigned long hcall_id,
 				   unsigned long param1)
 {
-	long result;
-
-	asm volatile("movl %1, %%r8d\n\t"
-		     "vmcall\n\t"
-		     : "=a" (result)
-		     : "g" (hcall_id), "D" (param1)
-		     : "r8", "memory");
-
-	return result;
+	return ACRN_HYPERCALL(hcall_id, "D" (param1));
 }

 static inline long acrn_hypercall2(unsigned long hcall_id,
 				   unsigned long param1,
 				   unsigned long param2)
 {
-	long result;
-
-	asm volatile("movl %1, %%r8d\n\t"
-		     "vmcall\n\t"
-		     : "=a" (result)
-		     : "g" (hcall_id), "D" (param1), "S" (param2)
-		     : "r8", "memory");
-
-	return result;
+	return ACRN_HYPERCALL(hcall_id, "D" (param1), "S" (param2));
 }

 #endif /* _ASM_X86_ACRN_H */

base-commit: 91dd4df51571bbea4d31e265cfbd59a85e0876be
--


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86/acrn: Improve ACRN hypercalls
  2022-08-04 18:41 ` Dave Hansen
@ 2022-08-04 18:56   ` Uros Bizjak
  2022-08-04 19:03     ` Dave Hansen
  0 siblings, 1 reply; 7+ messages in thread
From: Uros Bizjak @ 2022-08-04 18:56 UTC (permalink / raw)
  To: Dave Hansen
  Cc: x86, linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

On Thu, Aug 4, 2022 at 8:41 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 8/4/22 11:03, Uros Bizjak wrote:
> > As explained in section 6.47.5.2, "Specifying Registers for Local Variables"
> > of the GCC info documentation, the correct way to specify register for
> > input operands when calling Extended 'asm' is to define a local register
> > variable and associate it with a specified register:
> >
> >       register unsigned long r8 asm ("r8") = hcall_id;
>
> IIRC, that's what the ACRN folks proposed first.  But, it's more fragile
> because you can't, for instance, put a printk() in that function between
> the variable definition and assebly.

Yes, this is also stated in the documentation. The solution is to:

--quote--
    register int *p1 asm ("r0") = ...;
    register int *p2 asm ("r1") = ...;
    register int *result asm ("r0");
    asm ("sysint" : "=r" (result) : "0" (p1), "r" (p2));

_Warning:_ In the above example, be aware that a register (for example
'r0') can be call-clobbered by subsequent code, including function calls
and library calls for arithmetic operators on other variables (for
example the initialization of 'p2').  In this case, use temporary
variables for expressions between the register assignments:

    int t1 = ...;
    register int *p1 asm ("r0") = ...;
    register int *p2 asm ("r1") = t1;
    register int *result asm ("r0");
    asm ("sysint" : "=r" (result) : "0" (p1), "r" (p2));
--/quote--

The %r8 is not preserved across function calls, so your statement
above is correct. But as long as there is no function call *between*
the variable definition and the assembly, the approach with the local
register variable works without any problems. It is even something GCC
itself has used in its library for years.

Uros.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86/acrn: Improve ACRN hypercalls
  2022-08-04 18:56   ` Uros Bizjak
@ 2022-08-04 19:03     ` Dave Hansen
  2022-08-08 18:59       ` H. Peter Anvin
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Hansen @ 2022-08-04 19:03 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: x86, linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

On 8/4/22 11:56, Uros Bizjak wrote:
> The %r8 is not preserved across function calls, so your statement
> above is correct. But as long as there is no function call *between*
> the variable definition and the assembly, the approach with the local
> register variable works without any problems. It is even something GCC
> itself has used in its library for years.

I'm glad it's workout out for GCC.  But, the kernel is not GCC.  I
specifically asked for the ACRN code to be the way that it is today and
your argument is not making me reconsider it in the slightest.

So, thanks for the patch, but I don't think we should apply it.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86/acrn: Improve ACRN hypercalls
  2022-08-04 18:52 ` Sean Christopherson
@ 2022-08-04 19:05   ` Uros Bizjak
  0 siblings, 0 replies; 7+ messages in thread
From: Uros Bizjak @ 2022-08-04 19:05 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: x86, linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin

On Thu, Aug 4, 2022 at 8:52 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, Aug 04, 2022, Uros Bizjak wrote:
> > As explained in section 6.47.5.2, "Specifying Registers for Local Variables"
> > of the GCC info documentation, the correct way to specify register for
> > input operands when calling Extended 'asm' is to define a local register
> > variable and associate it with a specified register:
> >
> >       register unsigned long r8 asm ("r8") = hcall_id;
>
> Using "register" for input operands is subtly dangerous.  The compiler (at least
> some versions of GCC) isn't smart enough to realize that it must not clobber the
> register between setting the register and passing it into asm().  Or maybe that's
> the designed/intended behavior, but if so, IMO it's a terrible design.
>
> And since kernel style is to put declarations at the beginning of functions, it's
> quite easy for a developer/debugger to mess this up (speaking from multiple
> experiences).

All you said there is true, and these cases are even covered in the
GCC documentation (please see section 6.47.5.2, "Specifying Registers
for Local Variables"). But in this particular case, we have a static
inline function, which encapsulates low-level access, so we *can* add
the definition just above the asm.

Uros.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] x86/acrn: Improve ACRN hypercalls
  2022-08-04 19:03     ` Dave Hansen
@ 2022-08-08 18:59       ` H. Peter Anvin
  0 siblings, 0 replies; 7+ messages in thread
From: H. Peter Anvin @ 2022-08-08 18:59 UTC (permalink / raw)
  To: Dave Hansen, Uros Bizjak
  Cc: x86, linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen

On August 4, 2022 12:03:56 PM PDT, Dave Hansen <dave.hansen@intel.com> 
wrote:
>On 8/4/22 11:56, Uros Bizjak wrote:
>> The %r8 is not preserved across function calls, so your statement
>> above is correct. But as long as there is no function call *between*
>> the variable definition and the assembly, the approach with the local
>> register variable works without any problems. It is even something GCC
>> itself has used in its library for years.
>
>I'm glad it's workout out for GCC.  But, the kernel is not GCC.  I
>specifically asked for the ACRN code to be the way that it is today and
>your argument is not making me reconsider it in the slightest.
>
>So, thanks for the patch, but I don't think we should apply it.

Well, this is universally used, so it isn't going to break. x86 is kind 
of unique in that it often doesn't need it, because it has predicates 
for so many of the specific registers, but even x86 needs it for 
inlining system calls in glibc, for example.

There is a way to do it right, which is to wrap the assignments (which 
don't have to be the declarations, but customarily are) and asm() 
statements in a block (e.g. an inline function, but can also just be a 
plain old statement block.)



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-08-08 19:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-04 18:03 [PATCH] x86/acrn: Improve ACRN hypercalls Uros Bizjak
2022-08-04 18:41 ` Dave Hansen
2022-08-04 18:56   ` Uros Bizjak
2022-08-04 19:03     ` Dave Hansen
2022-08-08 18:59       ` H. Peter Anvin
2022-08-04 18:52 ` Sean Christopherson
2022-08-04 19:05   ` Uros Bizjak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).