linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 1/3] kvm: x86: Introduce hypercall x86 ops for handling hypercall not in cpl0
       [not found] <cover.1631188011.git.houwenlong93@linux.alibaba.com>
@ 2021-09-09 11:55 ` Hou Wenlong
  2021-09-09 16:39   ` Yu Zhang
  2021-09-09 11:55 ` [PATCH v2 2/3] kvm: x86: Refactor kvm_emulate_hypercall() to no skip instruction Hou Wenlong
  2021-09-09 11:55 ` [PATCH v2 3/3] kvm: x86: Emulate hypercall instead of fixing hypercall instruction Hou Wenlong
  2 siblings, 1 reply; 7+ messages in thread
From: Hou Wenlong @ 2021-09-09 11:55 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, Jan Kiszka, Avi Kivity,
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

Per Intel's SDM, use vmcall instruction in non VMX operation for cpl3
it should trigger a #UD. And in VMX root operation, it should
trigger a #GP for cpl3. So hypervisor could inject such exceptions
for guest cpl3 to act like host.

Per AMD's APM, no cpl check for vmmcall instruction. But use it
in host can trigger a #UD, so hypervisor is suitable to inject a #UD.

Fixes: 07708c4af1346 ("KVM: x86: Disallow hypercalls for guest callers in rings > 0")
Signed-off-by: Hou Wenlong <houwenlong93@linux.alibaba.com>
---
 arch/x86/include/asm/kvm-x86-ops.h | 1 +
 arch/x86/include/asm/kvm_host.h    | 1 +
 arch/x86/kvm/svm/svm.c             | 6 ++++++
 arch/x86/kvm/vmx/vmx.c             | 9 +++++++++
 arch/x86/kvm/x86.c                 | 6 +++---
 5 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index cefe1d81e2e8..00a8b8c80cb0 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -60,6 +60,7 @@ KVM_X86_OP_NULL(update_emulated_instruction)
 KVM_X86_OP(set_interrupt_shadow)
 KVM_X86_OP(get_interrupt_shadow)
 KVM_X86_OP(patch_hypercall)
+KVM_X86_OP(handle_hypercall_fail)
 KVM_X86_OP(set_irq)
 KVM_X86_OP(set_nmi)
 KVM_X86_OP(queue_exception)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f8f48a7ec577..3548c8047820 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1369,6 +1369,7 @@ struct kvm_x86_ops {
 	u32 (*get_interrupt_shadow)(struct kvm_vcpu *vcpu);
 	void (*patch_hypercall)(struct kvm_vcpu *vcpu,
 				unsigned char *hypercall_addr);
+	void (*handle_hypercall_fail)(struct kvm_vcpu *vcpu);
 	void (*set_irq)(struct kvm_vcpu *vcpu);
 	void (*set_nmi)(struct kvm_vcpu *vcpu);
 	void (*queue_exception)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 1a70e11f0487..a8048e5b2aff 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3944,6 +3944,11 @@ svm_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall)
 	hypercall[2] = 0xd9;
 }
 
+static void svm_handle_hypercall_fail(struct kvm_vcpu *vcpu)
+{
+	kvm_queue_exception(vcpu, UD_VECTOR);
+}
+
 static int __init svm_check_processor_compat(void)
 {
 	return 0;
@@ -4563,6 +4568,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.set_interrupt_shadow = svm_set_interrupt_shadow,
 	.get_interrupt_shadow = svm_get_interrupt_shadow,
 	.patch_hypercall = svm_patch_hypercall,
+	.handle_hypercall_fail = svm_handle_hypercall_fail,
 	.set_irq = svm_set_irq,
 	.set_nmi = svm_inject_nmi,
 	.queue_exception = svm_queue_exception,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 0c2c0d5ae873..3bd66eb46309 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4921,6 +4921,14 @@ vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall)
 	hypercall[2] = 0xc1;
 }
 
+static void vmx_handle_hypercall_fail(struct kvm_vcpu *vcpu)
+{
+	if (to_vmx(vcpu)->nested.vmxon)
+		kvm_queue_exception_e(vcpu, GP_VECTOR, 0);
+	else
+		kvm_queue_exception(vcpu, UD_VECTOR);
+}
+
 /* called to set cr0 as appropriate for a mov-to-cr0 exit. */
 static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned long val)
 {
@@ -7606,6 +7614,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
 	.set_interrupt_shadow = vmx_set_interrupt_shadow,
 	.get_interrupt_shadow = vmx_get_interrupt_shadow,
 	.patch_hypercall = vmx_patch_hypercall,
+	.handle_hypercall_fail = vmx_handle_hypercall_fail,
 	.set_irq = vmx_inject_irq,
 	.set_nmi = vmx_inject_nmi,
 	.queue_exception = vmx_queue_exception,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 28ef14155726..4e2836b94a01 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8665,8 +8665,8 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 	}
 
 	if (static_call(kvm_x86_get_cpl)(vcpu) != 0) {
-		ret = -KVM_EPERM;
-		goto out;
+		static_call(kvm_x86_handle_hypercall_fail)(vcpu);
+		return 1;
 	}
 
 	ret = -KVM_ENOSYS;
@@ -8727,7 +8727,7 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 		ret = -KVM_ENOSYS;
 		break;
 	}
-out:
+
 	if (!op_64_bit)
 		ret = (u32)ret;
 	kvm_rax_write(vcpu, ret);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 2/3] kvm: x86: Refactor kvm_emulate_hypercall() to no skip instruction
       [not found] <cover.1631188011.git.houwenlong93@linux.alibaba.com>
  2021-09-09 11:55 ` [PATCH v2 1/3] kvm: x86: Introduce hypercall x86 ops for handling hypercall not in cpl0 Hou Wenlong
@ 2021-09-09 11:55 ` Hou Wenlong
  2021-09-09 11:55 ` [PATCH v2 3/3] kvm: x86: Emulate hypercall instead of fixing hypercall instruction Hou Wenlong
  2 siblings, 0 replies; 7+ messages in thread
From: Hou Wenlong @ 2021-09-09 11:55 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

Refactor kvm_emulate_hypercall() to no skip instruction, it can
be used in next patch for emulating hypercall in instruction
emulation.

Signed-off-by: Hou Wenlong <houwenlong93@linux.alibaba.com>
---
 arch/x86/kvm/x86.c | 36 +++++++++++++++++++++++-------------
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4e2836b94a01..b8d799e1c57c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8636,17 +8636,11 @@ static int complete_hypercall_exit(struct kvm_vcpu *vcpu)
 	return kvm_skip_emulated_instruction(vcpu);
 }
 
-int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
+static int kvm_emulate_hypercall_noskip(struct kvm_vcpu *vcpu)
 {
 	unsigned long nr, a0, a1, a2, a3, ret;
 	int op_64_bit;
 
-	if (kvm_xen_hypercall_enabled(vcpu->kvm))
-		return kvm_xen_hypercall(vcpu);
-
-	if (kvm_hv_hypercall_enabled(vcpu))
-		return kvm_hv_hypercall(vcpu);
-
 	nr = kvm_rax_read(vcpu);
 	a0 = kvm_rbx_read(vcpu);
 	a1 = kvm_rcx_read(vcpu);
@@ -8664,11 +8658,6 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 		a3 &= 0xFFFFFFFF;
 	}
 
-	if (static_call(kvm_x86_get_cpl)(vcpu) != 0) {
-		static_call(kvm_x86_handle_hypercall_fail)(vcpu);
-		return 1;
-	}
-
 	ret = -KVM_ENOSYS;
 
 	switch (nr) {
@@ -8733,7 +8722,28 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 	kvm_rax_write(vcpu, ret);
 
 	++vcpu->stat.hypercalls;
-	return kvm_skip_emulated_instruction(vcpu);
+	return 1;
+}
+
+int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
+{
+	int ret;
+
+	if (kvm_xen_hypercall_enabled(vcpu->kvm))
+		return kvm_xen_hypercall(vcpu);
+
+	if (kvm_hv_hypercall_enabled(vcpu))
+		return kvm_hv_hypercall(vcpu);
+
+	if (static_call(kvm_x86_get_cpl)(vcpu) != 0) {
+		static_call(kvm_x86_handle_hypercall_fail)(vcpu);
+		return 1;
+	}
+
+	ret = kvm_emulate_hypercall_noskip(vcpu);
+	if (ret)
+		return kvm_skip_emulated_instruction(vcpu);
+	return ret;
 }
 EXPORT_SYMBOL_GPL(kvm_emulate_hypercall);
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 3/3] kvm: x86: Emulate hypercall instead of fixing hypercall instruction
       [not found] <cover.1631188011.git.houwenlong93@linux.alibaba.com>
  2021-09-09 11:55 ` [PATCH v2 1/3] kvm: x86: Introduce hypercall x86 ops for handling hypercall not in cpl0 Hou Wenlong
  2021-09-09 11:55 ` [PATCH v2 2/3] kvm: x86: Refactor kvm_emulate_hypercall() to no skip instruction Hou Wenlong
@ 2021-09-09 11:55 ` Hou Wenlong
  2021-09-16 16:00   ` Sean Christopherson
  2 siblings, 1 reply; 7+ messages in thread
From: Hou Wenlong @ 2021-09-09 11:55 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

It is guest's resposibility to use right instruction for hypercall,
hypervisor could emulate wrong instruction instead of modifying
guest's instruction.

Signed-off-by: Hou Wenlong <houwenlong93@linux.alibaba.com>
---
 arch/x86/kvm/emulate.c     | 20 +++++++++-----------
 arch/x86/kvm/kvm_emulate.h |  2 +-
 arch/x86/kvm/x86.c         | 17 ++++++++---------
 3 files changed, 18 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 2837110e66ed..671008a4ee20 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3732,13 +3732,11 @@ static int em_clts(struct x86_emulate_ctxt *ctxt)
 
 static int em_hypercall(struct x86_emulate_ctxt *ctxt)
 {
-	int rc = ctxt->ops->fix_hypercall(ctxt);
+	int rc = ctxt->ops->hypercall(ctxt);
 
 	if (rc != X86EMUL_CONTINUE)
 		return rc;
 
-	/* Let the processor re-execute the fixed hypercall */
-	ctxt->_eip = ctxt->eip;
 	/* Disable writeback. */
 	ctxt->dst.type = OP_NONE;
 	return X86EMUL_CONTINUE;
@@ -4298,14 +4296,14 @@ static const struct opcode group7_rm2[] = {
 };
 
 static const struct opcode group7_rm3[] = {
-	DIP(SrcNone | Prot | Priv,		vmrun,		check_svme_pa),
-	II(SrcNone  | Prot | EmulateOnUD,	em_hypercall,	vmmcall),
-	DIP(SrcNone | Prot | Priv,		vmload,		check_svme_pa),
-	DIP(SrcNone | Prot | Priv,		vmsave,		check_svme_pa),
-	DIP(SrcNone | Prot | Priv,		stgi,		check_svme),
-	DIP(SrcNone | Prot | Priv,		clgi,		check_svme),
-	DIP(SrcNone | Prot | Priv,		skinit,		check_svme),
-	DIP(SrcNone | Prot | Priv,		invlpga,	check_svme),
+	DIP(SrcNone | Prot | Priv,			vmrun,		check_svme_pa),
+	II(SrcNone  | Prot | Priv | EmulateOnUD,	em_hypercall,	vmmcall),
+	DIP(SrcNone | Prot | Priv,			vmload,		check_svme_pa),
+	DIP(SrcNone | Prot | Priv,			vmsave,		check_svme_pa),
+	DIP(SrcNone | Prot | Priv,			stgi,		check_svme),
+	DIP(SrcNone | Prot | Priv,			clgi,		check_svme),
+	DIP(SrcNone | Prot | Priv,			skinit,		check_svme),
+	DIP(SrcNone | Prot | Priv,			invlpga,	check_svme),
 };
 
 static const struct opcode group7_rm7[] = {
diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h
index 68b420289d7e..b090ec0688a6 100644
--- a/arch/x86/kvm/kvm_emulate.h
+++ b/arch/x86/kvm/kvm_emulate.h
@@ -216,7 +216,7 @@ struct x86_emulate_ops {
 	int (*read_pmc)(struct x86_emulate_ctxt *ctxt, u32 pmc, u64 *pdata);
 	void (*halt)(struct x86_emulate_ctxt *ctxt);
 	void (*wbinvd)(struct x86_emulate_ctxt *ctxt);
-	int (*fix_hypercall)(struct x86_emulate_ctxt *ctxt);
+	int (*hypercall)(struct x86_emulate_ctxt *ctxt);
 	int (*intercept)(struct x86_emulate_ctxt *ctxt,
 			 struct x86_instruction_info *info,
 			 enum x86_intercept_stage stage);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b8d799e1c57c..aee3b08a1d85 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -329,7 +329,7 @@ static struct kmem_cache *kvm_alloc_emulator_cache(void)
 					  size - useroffset, NULL);
 }
 
-static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt);
+static int emulator_hypercall(struct x86_emulate_ctxt *ctxt);
 
 static inline void kvm_async_pf_hash_reset(struct kvm_vcpu *vcpu)
 {
@@ -7352,7 +7352,7 @@ static const struct x86_emulate_ops emulate_ops = {
 	.read_pmc            = emulator_read_pmc,
 	.halt                = emulator_halt,
 	.wbinvd              = emulator_wbinvd,
-	.fix_hypercall       = emulator_fix_hypercall,
+	.hypercall           = emulator_hypercall,
 	.intercept           = emulator_intercept,
 	.get_cpuid           = emulator_get_cpuid,
 	.guest_has_long_mode = emulator_guest_has_long_mode,
@@ -8747,16 +8747,15 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_emulate_hypercall);
 
-static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt)
+static int emulator_hypercall(struct x86_emulate_ctxt *ctxt)
 {
+	int ret;
 	struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
-	char instruction[3];
-	unsigned long rip = kvm_rip_read(vcpu);
-
-	static_call(kvm_x86_patch_hypercall)(vcpu, instruction);
 
-	return emulator_write_emulated(ctxt, rip, instruction, 3,
-		&ctxt->exception);
+	ret = kvm_emulate_hypercall_noskip(vcpu);
+	if (ret)
+		return X86EMUL_CONTINUE;
+	return X86EMUL_IO_NEEDED;
 }
 
 static int dm_request_for_irq_injection(struct kvm_vcpu *vcpu)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 1/3] kvm: x86: Introduce hypercall x86 ops for handling hypercall not in cpl0
  2021-09-09 11:55 ` [PATCH v2 1/3] kvm: x86: Introduce hypercall x86 ops for handling hypercall not in cpl0 Hou Wenlong
@ 2021-09-09 16:39   ` Yu Zhang
  2021-09-09 17:09     ` Sean Christopherson
  0 siblings, 1 reply; 7+ messages in thread
From: Yu Zhang @ 2021-09-09 16:39 UTC (permalink / raw)
  To: Hou Wenlong
  Cc: kvm, Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, Jan Kiszka, Avi Kivity,
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

On Thu, Sep 09, 2021 at 07:55:23PM +0800, Hou Wenlong wrote:
> Per Intel's SDM, use vmcall instruction in non VMX operation for cpl3
> it should trigger a #UD. And in VMX root operation, it should

Are you sure? IIRC, vmcall will always cause VM exit as long as CPU
is in non-root mode(regardless the CPL).

Also, could you please explain why skipping the vmcall would cause
exception in the host? Thanks!

B.R.
Yu


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 1/3] kvm: x86: Introduce hypercall x86 ops for handling hypercall not in cpl0
  2021-09-09 16:39   ` Yu Zhang
@ 2021-09-09 17:09     ` Sean Christopherson
  2021-09-10  1:53       ` Yu Zhang
  0 siblings, 1 reply; 7+ messages in thread
From: Sean Christopherson @ 2021-09-09 17:09 UTC (permalink / raw)
  To: Yu Zhang
  Cc: Hou Wenlong, kvm, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, Jan Kiszka, Avi Kivity,
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

On Fri, Sep 10, 2021, Yu Zhang wrote:
> On Thu, Sep 09, 2021 at 07:55:23PM +0800, Hou Wenlong wrote:
> > Per Intel's SDM, use vmcall instruction in non VMX operation for cpl3
> > it should trigger a #UD. And in VMX root operation, it should
> 
> Are you sure? IIRC, vmcall will always cause VM exit as long as CPU
> is in non-root mode(regardless the CPL).

Correct, VMCALL unconditionally causes VM-Exit in non-root mode, but Hou is
referring to the first fault condition of "non VMX operation".  The intent of the
patch is to emulate hardware behavior for CPL>0: if L1 is not in VMX operation,
a.k.a. not post-VMXON, then #UD, else #GP (because VMCALL #GPs at CPL>0 in VMX
root).

On one hand, I agree with Hou's logic; injecting #UD/#GP is architecturally
correct if KVM is emulating a bare metal environment for the guest.  On the
other hand, that contradicts with KVM _not_ injecting #UD for guest CPL0, i.e.
KVM is clearly not emulating a bare metal environment.

In the end, this would represent an ABI change for guest CPL>0.  While it's highly
unlikely that such a change would cause problems, maintaining the current behavior
is the safe option unless there's strong motivation for changing the guest ABI.

And injecting #UD/#GP would also mean KVM would again have to change its ABI if
there is a future hypercall KVM wants to allow at CPL>0.  Again, that's unlikely,
but again I don't see sufficient justification.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 1/3] kvm: x86: Introduce hypercall x86 ops for handling hypercall not in cpl0
  2021-09-09 17:09     ` Sean Christopherson
@ 2021-09-10  1:53       ` Yu Zhang
  0 siblings, 0 replies; 7+ messages in thread
From: Yu Zhang @ 2021-09-10  1:53 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Hou Wenlong, kvm, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, Jan Kiszka, Avi Kivity,
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

On Thu, Sep 09, 2021 at 05:09:11PM +0000, Sean Christopherson wrote:
> On Fri, Sep 10, 2021, Yu Zhang wrote:
> > On Thu, Sep 09, 2021 at 07:55:23PM +0800, Hou Wenlong wrote:
> > > Per Intel's SDM, use vmcall instruction in non VMX operation for cpl3
> > > it should trigger a #UD. And in VMX root operation, it should
> > 
> > Are you sure? IIRC, vmcall will always cause VM exit as long as CPU
> > is in non-root mode(regardless the CPL).
> 
> Correct, VMCALL unconditionally causes VM-Exit in non-root mode, but Hou is
> referring to the first fault condition of "non VMX operation".  The intent of the
> patch is to emulate hardware behavior for CPL>0: if L1 is not in VMX operation,
> a.k.a. not post-VMXON, then #UD, else #GP (because VMCALL #GPs at CPL>0 in VMX
> root).

Oh, I see. It's to make the virtualized world more real. But like you said, 
it's not KVM's target. And doing that could cause more problems - a PV guest
expects the VMCALL to succeed, regardless it has VMX capability or its VMX is
on or not.

Thanks for the explaination.

B.R.
Yu

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 3/3] kvm: x86: Emulate hypercall instead of fixing hypercall instruction
  2021-09-09 11:55 ` [PATCH v2 3/3] kvm: x86: Emulate hypercall instead of fixing hypercall instruction Hou Wenlong
@ 2021-09-16 16:00   ` Sean Christopherson
  0 siblings, 0 replies; 7+ messages in thread
From: Sean Christopherson @ 2021-09-16 16:00 UTC (permalink / raw)
  To: Hou Wenlong
  Cc: kvm, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT),
	H. Peter Anvin, open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

On Thu, Sep 09, 2021, Hou Wenlong wrote:
> It is guest's resposibility to use right instruction for hypercall,
> hypervisor could emulate wrong instruction instead of modifying
> guest's instruction.
> 
> Signed-off-by: Hou Wenlong <houwenlong93@linux.alibaba.com>
> ---
> @@ -8747,16 +8747,15 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
>  }
>  EXPORT_SYMBOL_GPL(kvm_emulate_hypercall);
>  
> -static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt)
> +static int emulator_hypercall(struct x86_emulate_ctxt *ctxt)
>  {
> +	int ret;
>  	struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
> -	char instruction[3];
> -	unsigned long rip = kvm_rip_read(vcpu);
> -
> -	static_call(kvm_x86_patch_hypercall)(vcpu, instruction);
>  
> -	return emulator_write_emulated(ctxt, rip, instruction, 3,
> -		&ctxt->exception);
> +	ret = kvm_emulate_hypercall_noskip(vcpu);

I have mixed feelings on calling out of the emulator to do all this work.  One on
hand, I think it's a somewhat silly, arbitrary boundary.  On the other hand, reading
and writing GPRs directly means the emulation context holds stale data, especially
in the case where the hypercall triggers an exit to userspace.

> +	if (ret)
> +		return X86EMUL_CONTINUE;
> +	return X86EMUL_IO_NEEDED;

Unfortunately, simply returning X86EMUL_IO_NEEDED is not sufficient to handle the
KVM_HC_MAP_GPA_RANGE case.  x86_emulate_instruction() doesn't directly act on
X86EMUL_IO_NEEDED, because x86_emulate_insn() doesn't return X86EMUL_*, it returns
EMULATION_FAILED, EMULATION_OK, etc...  x86_emulate_instruction() instead looks
for other signals to differentiate between exception injection, PIO, MMIO, etc...

The IO_NEEDED path would also need to provide an alternative complete_userspace_io
callback, otherwise complete_hypercall_exit() will attempt to skip the instruction
using e.g. vmcs.VM_EXIT_INSTRUCTION_LEN and likely send the guest into the weeds.

In the prior patch, having kvm_emulate_hypercall_noskip() skip the CPL check is
definitely wrong, and skipping Xen/Hyper-V hypercall support is odd, though arguably
correct since Xen/Hyper-V hypercalls should never hit this path.

All of the above are solvable problems, but there is a non-trivial cost in doing
so, especially looking ahead to TDX support, which also needs/wants to split
kvm_emulate_hypercall() but in slightly different ways[*].

I 100% agree that patching the hypercall instruction is awful.  There are myriad
fatal issues with the current approach:

  1. Patches using an emulated guest write, which will fail if RIP is not mapped
     writable, and even injects a #PF into the guest on failure.

  2. Doesn't ensure the write is "atomic", e.g. a hypercall that splits a page
     boundary will be handled as two separate writes, which means that a partial,
     corrupted instruction can be observed by a separate vCPU.
 
  3. Doesn't serialize other CPU cores after updating the code stream.

  4. Completely fails to account for the case where KVM is emulating due to invalid
     guest state with unrestricted_guest=0.  Patching and retrying the instruction
     will result in vCPU getting stuck in an infinite loop.

But, the "support" _so_ awful, especially #1, that there's practically zero chance
that a modern guest kernel can rely on KVM to patch the guest.  E.g. patching fails
on modern Linux due to kernel code being mapped NX (barring admin override).  This
was addressed in the Linux guest back in 2014 by commit c1118b3602c2 ("x86: kvm: use
alternatives for VMCALL vs. VMMCALL if kernel text is read-only").

In other words, odds are very good that only old Linux guest kernels rely on KVM's
patching.  Because of that, my preference is to keep the patching, do our best to
make it suck a little less, and aim to completely remove the patching entirely at
some point in the future.

For #1, inject a #UD instead of #PF if the write fails.  The guest will still likely
die, but the failure signature is much friendlier to debuggers.

For #2 and #3, do nothing as fixing those is non-trivial.

For #4, inject a #UD if the hypercall instruction is already the "right" instruction.
I.e. retroactively define KVM's ABI to be that KVM hypercalls have undefined behavior
in Big Real Mode and other modes that trigger invalid guest state (since the hypercalls
will still work if unrestricted_guest=1).  This can't be an ABI breakage since it does
not work today and can't possibly have ever worked in the past.

I have mostly-complete to do the above (I ran afoul of the NX thing), I'll hopefully get
them out next week.

[*] https://lkml.kernel.org/r/9e1e66787c50232391e20cb4b3d1c5b249e3f910.1625186503.git.isaku.yamahata@intel.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-09-16 16:00 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <cover.1631188011.git.houwenlong93@linux.alibaba.com>
2021-09-09 11:55 ` [PATCH v2 1/3] kvm: x86: Introduce hypercall x86 ops for handling hypercall not in cpl0 Hou Wenlong
2021-09-09 16:39   ` Yu Zhang
2021-09-09 17:09     ` Sean Christopherson
2021-09-10  1:53       ` Yu Zhang
2021-09-09 11:55 ` [PATCH v2 2/3] kvm: x86: Refactor kvm_emulate_hypercall() to no skip instruction Hou Wenlong
2021-09-09 11:55 ` [PATCH v2 3/3] kvm: x86: Emulate hypercall instead of fixing hypercall instruction Hou Wenlong
2021-09-16 16:00   ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).