linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] KVM: x86: add HC_VMM_CUSTOM hypercall
@ 2022-04-21 16:51 Peter Oskolkov
  2022-04-21 17:14 ` Paolo Bonzini
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Oskolkov @ 2022-04-21 16:51 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel
  Cc: kvm, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	x86, H . Peter Anvin, linux-kernel, Paul Turner, Peter Oskolkov,
	Peter Oskolkov

Allow kvm-based VMMs to request KVM to pass a custom vmcall
from the guest to the VMM in the host.

Quite often, operating systems research projects and/or specialized
paravirtualized workloads would benefit from a extra-low-overhead,
extra-low-latency guest-host communication channel.

With cloud-hypervisor modified to handle the new hypercall (simply
return the sum of the received arguments), the following function in
guest _userspace_ completes, on average, in 2.5 microseconds (walltime)
on a relatively modern Intel Xeon processor:

	uint64_t hypercall_custom_vmm(uint64_t a0, uint64_t a1,
					uint64_t a2, uint64_t a3)
	{
		uint64_t ret;

		asm(
			"movq   $13, %%rax \n\t"  // hypercall nr.
			"movq %[a0], %%rbx \n\t"  // a0
			"movq %[a1], %%rcx \n\t"  // a1
			"movq %[a2], %%rdx \n\t"  // a2
			"movq %[a3], %%rsi \n\t"  // a3
			"vmcall            \n\t"
			"movq %%rax, %[ret] \n\t" // ret
			: [ret] "=r"(ret)
			: [a0] "r"(a0), [a1] "r"(a1), [a2] "r"(a2), [a3] "r"(a3)
			: "rax", "rbx", "rcx", "rdx", "rsi"
		);

		return ret;
	}

Signed-off-by: Peter Oskolkov <posk@google.com>
---
 arch/x86/kvm/x86.c            | 28 ++++++++++++++++++++++++++--
 include/uapi/linux/kvm_para.h |  1 +
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ab336f7c82e4..343971128da7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -108,7 +108,8 @@ static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
 
 static u64 __read_mostly cr4_reserved_bits = CR4_RESERVED_BITS;
 
-#define KVM_EXIT_HYPERCALL_VALID_MASK (1 << KVM_HC_MAP_GPA_RANGE)
+#define KVM_EXIT_HYPERCALL_VALID_MASK  ((1 << KVM_HC_MAP_GPA_RANGE) | \
+					(1 << KVM_HC_VMM_CUSTOM))
 
 #define KVM_CAP_PMU_VALID_MASK KVM_PMU_CAP_DISABLE
 
@@ -9207,10 +9208,16 @@ static int complete_hypercall_exit(struct kvm_vcpu *vcpu)
 	return kvm_skip_emulated_instruction(vcpu);
 }
 
+static int kvm_allow_hypercall_from_userspace(int nr)
+{
+	return nr == KVM_HC_VMM_CUSTOM;
+}
+
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 {
 	unsigned long nr, a0, a1, a2, a3, ret;
 	int op_64_bit;
+	int cpl;
 
 	if (kvm_xen_hypercall_enabled(vcpu->kvm))
 		return kvm_xen_hypercall(vcpu);
@@ -9235,7 +9242,8 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 		a3 &= 0xFFFFFFFF;
 	}
 
-	if (static_call(kvm_x86_get_cpl)(vcpu) != 0) {
+	cpl = static_call(kvm_x86_get_cpl)(vcpu);
+	if (cpl != 0 && !kvm_allow_hypercall_from_userspace(nr)) {
 		ret = -KVM_EPERM;
 		goto out;
 	}
@@ -9294,6 +9302,22 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 		vcpu->arch.complete_userspace_io = complete_hypercall_exit;
 		return 0;
 	}
+	case KVM_HC_VMM_CUSTOM:
+		ret = -KVM_ENOSYS;
+		if (!(vcpu->kvm->arch.hypercall_exit_enabled & (1 << KVM_HC_VMM_CUSTOM)))
+			break;
+
+		vcpu->run->exit_reason        = KVM_EXIT_HYPERCALL;
+		vcpu->run->hypercall.nr       = KVM_HC_VMM_CUSTOM;
+		vcpu->run->hypercall.args[0]  = a0;
+		vcpu->run->hypercall.args[1]  = a1;
+		vcpu->run->hypercall.args[2]  = a2;
+		vcpu->run->hypercall.args[3]  = a3;
+		vcpu->run->hypercall.args[4]  = 0;
+		vcpu->run->hypercall.args[5]  = cpl;
+		vcpu->run->hypercall.longmode = op_64_bit;
+		vcpu->arch.complete_userspace_io = complete_hypercall_exit;
+		return 0;
 	default:
 		ret = -KVM_ENOSYS;
 		break;
diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
index 960c7e93d1a9..8caab28c9025 100644
--- a/include/uapi/linux/kvm_para.h
+++ b/include/uapi/linux/kvm_para.h
@@ -30,6 +30,7 @@
 #define KVM_HC_SEND_IPI		10
 #define KVM_HC_SCHED_YIELD		11
 #define KVM_HC_MAP_GPA_RANGE		12
+#define KVM_HC_VMM_CUSTOM		13
 
 /*
  * hypercalls use architecture specific

base-commit: 150866cd0ec871c765181d145aa0912628289c8a
-- 
2.36.0.rc2.479.g8af0fa9b8e-goog


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] KVM: x86: add HC_VMM_CUSTOM hypercall
  2022-04-21 16:51 [PATCH] KVM: x86: add HC_VMM_CUSTOM hypercall Peter Oskolkov
@ 2022-04-21 17:14 ` Paolo Bonzini
  2022-04-21 18:02   ` Peter Oskolkov
  2022-04-28 19:13   ` Peter Oskolkov
  0 siblings, 2 replies; 5+ messages in thread
From: Paolo Bonzini @ 2022-04-21 17:14 UTC (permalink / raw)
  To: Peter Oskolkov, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel
  Cc: kvm, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	x86, H . Peter Anvin, linux-kernel, Paul Turner, Peter Oskolkov

On 4/21/22 18:51, Peter Oskolkov wrote:
> Allow kvm-based VMMs to request KVM to pass a custom vmcall
> from the guest to the VMM in the host.
> 
> Quite often, operating systems research projects and/or specialized
> paravirtualized workloads would benefit from a extra-low-overhead,
> extra-low-latency guest-host communication channel.

You can use a memory page and an I/O port.  It should be as fast as a 
hypercall.  You can even change it to use ioeventfd if an asynchronous 
channel is enough, and then it's going to be less than 1 us latency.

Paolo

> With cloud-hypervisor modified to handle the new hypercall (simply
> return the sum of the received arguments), the following function in
> guest_userspace_  completes, on average, in 2.5 microseconds (walltime)
> on a relatively modern Intel Xeon processor:


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] KVM: x86: add HC_VMM_CUSTOM hypercall
  2022-04-21 17:14 ` Paolo Bonzini
@ 2022-04-21 18:02   ` Peter Oskolkov
  2022-04-28 19:13   ` Peter Oskolkov
  1 sibling, 0 replies; 5+ messages in thread
From: Peter Oskolkov @ 2022-04-21 18:02 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H . Peter Anvin, linux-kernel, Paul Turner,
	Peter Oskolkov

On Thu, Apr 21, 2022 at 10:14 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 4/21/22 18:51, Peter Oskolkov wrote:
> > Allow kvm-based VMMs to request KVM to pass a custom vmcall
> > from the guest to the VMM in the host.
> >
> > Quite often, operating systems research projects and/or specialized
> > paravirtualized workloads would benefit from a extra-low-overhead,
> > extra-low-latency guest-host communication channel.
>
> You can use a memory page and an I/O port.  It should be as fast as a
> hypercall.  You can even change it to use ioeventfd if an asynchronous
> channel is enough, and then it's going to be less than 1 us latency.

Thank you for the suggestion. Let me try that.

Thanks,
Peter

[...]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] KVM: x86: add HC_VMM_CUSTOM hypercall
  2022-04-21 17:14 ` Paolo Bonzini
  2022-04-21 18:02   ` Peter Oskolkov
@ 2022-04-28 19:13   ` Peter Oskolkov
  2022-04-28 22:21     ` Sean Christopherson
  1 sibling, 1 reply; 5+ messages in thread
From: Peter Oskolkov @ 2022-04-28 19:13 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H . Peter Anvin, linux-kernel, Paul Turner,
	Peter Oskolkov

On Thu, Apr 21, 2022 at 10:14 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 4/21/22 18:51, Peter Oskolkov wrote:
> > Allow kvm-based VMMs to request KVM to pass a custom vmcall
> > from the guest to the VMM in the host.
> >
> > Quite often, operating systems research projects and/or specialized
> > paravirtualized workloads would benefit from a extra-low-overhead,
> > extra-low-latency guest-host communication channel.
>
> You can use a memory page and an I/O port.  It should be as fast as a
> hypercall.  You can even change it to use ioeventfd if an asynchronous
> channel is enough, and then it's going to be less than 1 us latency.

So this function:

uint8_t hyperchannel_ping(uint8_t arg)
{
        uint8_t inb;
        uint16_t port = PORT;

        asm(
                "outb %[arg] , %[port]  \n\t"  // write arg
                "inb  %[port], %[inb]   \n\t"  // read  res
                : [inb] "=r"(inb)
                : [arg] "r"(arg), [port] "r"(port)
        );
        return inb;
}

takes about 5.5usec vs 2.5usec for a vmcall on the same
hardware/kernel/etc. I've also tried AF_VSOCK, and a roundtrip there
is 30-50usec.

The main problem of port I/O vs a vmcall is that with port I/O a
second VM exit is needed to return any result to the guest. Am I
missing something?

I'll try now using ioeventfd, but I suspect that building a
synchronous request/response channel on top of it will not match a
direct vmcall in terms of latency.

Are there any other alternatives I should look at?

Thanks,
Peter

>
> Paolo
>
> > With cloud-hypervisor modified to handle the new hypercall (simply
> > return the sum of the received arguments), the following function in
> > guest_userspace_  completes, on average, in 2.5 microseconds (walltime)
> > on a relatively modern Intel Xeon processor:
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] KVM: x86: add HC_VMM_CUSTOM hypercall
  2022-04-28 19:13   ` Peter Oskolkov
@ 2022-04-28 22:21     ` Sean Christopherson
  0 siblings, 0 replies; 5+ messages in thread
From: Sean Christopherson @ 2022-04-28 22:21 UTC (permalink / raw)
  To: Peter Oskolkov
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H . Peter Anvin, linux-kernel, Paul Turner,
	Peter Oskolkov

On Thu, Apr 28, 2022, Peter Oskolkov wrote:
> On Thu, Apr 21, 2022 at 10:14 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
> >
> > On 4/21/22 18:51, Peter Oskolkov wrote:
> > > Allow kvm-based VMMs to request KVM to pass a custom vmcall
> > > from the guest to the VMM in the host.
> > >
> > > Quite often, operating systems research projects and/or specialized
> > > paravirtualized workloads would benefit from a extra-low-overhead,
> > > extra-low-latency guest-host communication channel.
> >
> > You can use a memory page and an I/O port.  It should be as fast as a
> > hypercall.  You can even change it to use ioeventfd if an asynchronous
> > channel is enough, and then it's going to be less than 1 us latency.
> 
> So this function:
> 
> uint8_t hyperchannel_ping(uint8_t arg)
> {
>         uint8_t inb;
>         uint16_t port = PORT;
> 
>         asm(
>                 "outb %[arg] , %[port]  \n\t"  // write arg
>                 "inb  %[port], %[inb]   \n\t"  // read  res
>                 : [inb] "=r"(inb)
>                 : [arg] "r"(arg), [port] "r"(port)
>         );
>         return inb;
> }
> 
> takes about 5.5usec vs 2.5usec for a vmcall on the same
> hardware/kernel/etc. I've also tried AF_VSOCK, and a roundtrip there
> is 30-50usec.
> 
> The main problem of port I/O vs a vmcall is that with port I/O a
> second VM exit is needed to return any result to the guest. Am I
> missing something?

The intent of the port I/O approach is that it's just a kick, the actual data
payload is delivered via a different memory channel. 

  0. guest/host establish a memory channel, e.g. guest annouces address to host at boot
  1. guest writes parameters to the memory channel
  2. guest does port I/O to let the host know there's work to be done
  3. KVM exits to the host
  4. host does the work, fills memory with the response
  5. host does KVM_RUN to re-enter the guest
  6. KVM runs the guest
  7. guest reads the response from memory

This is what Paolo meant by "memory page".

Using an ioeventfd avoids the overhead of #3 and #5.  Instead of exiting to
userspace, KVM signals the ioeventfd to wake the userspace I/O thread and immediately
resumes the guest.  The catch is that if you want a synchronous response, the guest
will have to wait for the host I/O thread to service the request, at which point the
benefits of avoiding the exit to userspace are largely lost.

Things like virtio-net (and presumably other virtio devices?) take advantage of
ioeventfd by using a ring buffer, e.g. put a Tx payload in the buffer, kick the
host and move on.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-04-28 22:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-21 16:51 [PATCH] KVM: x86: add HC_VMM_CUSTOM hypercall Peter Oskolkov
2022-04-21 17:14 ` Paolo Bonzini
2022-04-21 18:02   ` Peter Oskolkov
2022-04-28 19:13   ` Peter Oskolkov
2022-04-28 22:21     ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).