kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] KVM: some fixes about RDMSR/WRMSR instruction emulation
@ 2021-10-22  2:59 Hou Wenlong
  2021-10-22  2:59 ` [PATCH 1/2] KVM: VMX: fix instruction skipping when handling UD exception Hou Wenlong
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Hou Wenlong @ 2021-10-22  2:59 UTC (permalink / raw)
  To: kvm

When KVM_CAP_X86_USER_SPACE_MSR cap is enabled, userspace can control
MSR accesses. In normal scenario, RDMSR/WRMSR can be interceped, but
when kvm.force_emulation_prefix is enabled, RDMSR/WRMSR with kvm prefix
would trigger an UD and cause instruction emulation. If MSR accesses is
filtered, em_rdmsr()/em_wrmsr() returns X86EMUL_IO_NEEDED, but it is
ignored by x86_emulate_instruction(). Then guest continues execution,
but RIP has been updated to point to RDMSR/WRMSR in handle_ud(), so
RDMSR/WRMSR can be interceped and guest exits to userspace finnaly by
mistake. Such behaviour leads to two vm exits and wastes one instruction
emulation.

After let x86_emulate_instruction() returns 0 for RDMSR/WRMSR emulation,
if it needs to exit to userspace, its complete_userspace_io callback
would call kvm_skip_instruction() to skip instruction. But for vmx,
VMX_EXIT_INSTRUCTION_LEN in vmcs is invalid for UD, it can't be used to
update RIP, kvm_emulate_instruction() should be used instead. As for
svm, nRIP in vmcb is 0 for UD, so kvm_emulate_instruction() is used.
But for nested svm, I'm not sure, since svm_check_intercept() would
change nRIP.

Hou Wenlong (2):
  KVM: VMX: fix instruction skipping when handling UD exception
  KVM: X86: Exit to userspace if RDMSR/WRMSR emulation returns
    X86EMUL_IO_NEEDED

 arch/x86/kvm/vmx/vmx.c | 4 ++--
 arch/x86/kvm/vmx/vmx.h | 9 +++++++++
 arch/x86/kvm/x86.c     | 4 +++-
 3 files changed, 14 insertions(+), 3 deletions(-)

--
2.31.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] KVM: VMX: fix instruction skipping when handling UD exception
  2021-10-22  2:59 [PATCH 0/2] KVM: some fixes about RDMSR/WRMSR instruction emulation Hou Wenlong
@ 2021-10-22  2:59 ` Hou Wenlong
  2021-10-26 16:37   ` Sean Christopherson
  2021-10-22  2:59 ` [PATCH 2/2] KVM: X86: Exit to userspace if RDMSR/WRMSR emulation returns X86EMUL_IO_NEEDED Hou Wenlong
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Hou Wenlong @ 2021-10-22  2:59 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, linux-kernel

When kvm.force_emulation_prefix is enabled, instruction with
kvm prefix would trigger an UD exception and do instruction
emulation. The emulation may need to exit to userspace due
to userspace io, and the complete_userspace_io callback may
skip instruction, i.e. MSR accesses emulation would exit to
userspace if userspace wanted to know about the MSR fault.
However, VM_EXIT_INSTRUCTION_LEN in vmcs is invalid now, it
should use kvm_emulate_instruction() to skip instruction.

Signed-off-by: Hou Wenlong <houwenlong93@linux.alibaba.com>
---
 arch/x86/kvm/vmx/vmx.c | 4 ++--
 arch/x86/kvm/vmx/vmx.h | 9 +++++++++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1c8b2b6e7ed9..01049d65da26 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1501,8 +1501,8 @@ static int skip_emulated_instruction(struct kvm_vcpu *vcpu)
 	 * (namely Hyper-V) don't set it due to it being undefined behavior,
 	 * i.e. we end up advancing IP with some random value.
 	 */
-	if (!static_cpu_has(X86_FEATURE_HYPERVISOR) ||
-	    exit_reason.basic != EXIT_REASON_EPT_MISCONFIG) {
+	if (!is_ud_exit(vcpu) && (!static_cpu_has(X86_FEATURE_HYPERVISOR) ||
+	    exit_reason.basic != EXIT_REASON_EPT_MISCONFIG)) {
 		instr_len = vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
 
 		/*
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 592217fd7d92..e7a7f580acd1 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -481,6 +481,15 @@ static inline u32 vmx_get_intr_info(struct kvm_vcpu *vcpu)
 	return vmx->exit_intr_info;
 }
 
+static inline bool is_ud_exit(struct kvm_vcpu *vcpu)
+{
+	union vmx_exit_reason exit_reason = to_vmx(vcpu)->exit_reason;
+	u32 intr_info = vmx_get_intr_info(vcpu);
+
+	return exit_reason.basic == EXIT_REASON_EXCEPTION_NMI &&
+	       is_invalid_opcode(intr_info);
+}
+
 struct vmcs *alloc_vmcs_cpu(bool shadow, int cpu, gfp_t flags);
 void free_vmcs(struct vmcs *vmcs);
 int alloc_loaded_vmcs(struct loaded_vmcs *loaded_vmcs);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] KVM: X86: Exit to userspace if RDMSR/WRMSR emulation returns X86EMUL_IO_NEEDED
  2021-10-22  2:59 [PATCH 0/2] KVM: some fixes about RDMSR/WRMSR instruction emulation Hou Wenlong
  2021-10-22  2:59 ` [PATCH 1/2] KVM: VMX: fix instruction skipping when handling UD exception Hou Wenlong
@ 2021-10-22  2:59 ` Hou Wenlong
  2021-10-22  9:46 ` [PATCH 0/2] KVM: some fixes about RDMSR/WRMSR instruction emulation Paolo Bonzini
  2021-11-02  9:15 ` [PATCH v2 0/4] KVM: x86: some fixes about msr access emulation Hou Wenlong
  3 siblings, 0 replies; 15+ messages in thread
From: Hou Wenlong @ 2021-10-22  2:59 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Alexander Graf,
	linux-kernel

In em_rdmsr()/em_wrmsr(), it returns X86EMUL_IO_NEEDED if MSR accesses
needed to exit to userspace. However, x86_emulate_insn() doesn't return
X86EMUL_*, so x86_emulate_instruction() doesn't directly act on
X86EMUL_IO_NEEDED, it instead looks for other signals to differentiate
between PIO, MMIO, etc... So RDMSR/WRMSR emulation won't exit to
userspace now.

The userspace_msr_exit_test testcase in seftests had tested RDMSR/WRMSR
emulation with kvm.force_enable_prefix enabled and it was passed.
Because x86_emulate_instruction() returns 1 and guest continues,
but RIP has been updated to point to RDMSR/WRMSR. Then guest would
execute RDMSR/WRMSR and exit to userspace by
kvm_emulate_rdmsr()/kvm_emulate_wrmsr() finally. In such situation,
instruction emulation didn't take effect but userspace exit information
had been filled, which was inappropriate.

Since X86EMUL_IO_NEEDED path would provide a complete_userspace_io
callback, x86_emulate_instruction() should return 0 if callback is
not NULL. Then RDMSR/WRMSR instruction emulation can exit to userspace
and skip RDMSR/WRMSR execution and inteception.

Fixes: 1ae099540e8c7 ("KVM: x86: Allow deflecting unknown MSR accesses to user space")
Signed-off-by: Hou Wenlong <houwenlong93@linux.alibaba.com>
---
 arch/x86/kvm/x86.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3ea4f6ef2474..4e0dc5b06d03 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7950,7 +7950,9 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 			writeback = false;
 		r = 0;
 		vcpu->arch.complete_userspace_io = complete_emulated_mmio;
-	} else if (r == EMULATION_RESTART)
+	} else if (vcpu->arch.complete_userspace_io)
+		r = 0;
+	else if (r == EMULATION_RESTART)
 		goto restart;
 	else
 		r = 1;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] KVM: some fixes about RDMSR/WRMSR instruction emulation
  2021-10-22  2:59 [PATCH 0/2] KVM: some fixes about RDMSR/WRMSR instruction emulation Hou Wenlong
  2021-10-22  2:59 ` [PATCH 1/2] KVM: VMX: fix instruction skipping when handling UD exception Hou Wenlong
  2021-10-22  2:59 ` [PATCH 2/2] KVM: X86: Exit to userspace if RDMSR/WRMSR emulation returns X86EMUL_IO_NEEDED Hou Wenlong
@ 2021-10-22  9:46 ` Paolo Bonzini
  2021-10-22 15:10   ` Hou Wenlong
  2021-11-02  9:15 ` [PATCH v2 0/4] KVM: x86: some fixes about msr access emulation Hou Wenlong
  3 siblings, 1 reply; 15+ messages in thread
From: Paolo Bonzini @ 2021-10-22  9:46 UTC (permalink / raw)
  To: Hou Wenlong, kvm

On 22/10/21 04:59, Hou Wenlong wrote:
> When KVM_CAP_X86_USER_SPACE_MSR cap is enabled, userspace can control
> MSR accesses. In normal scenario, RDMSR/WRMSR can be interceped, but
> when kvm.force_emulation_prefix is enabled, RDMSR/WRMSR with kvm prefix
> would trigger an UD and cause instruction emulation. If MSR accesses is
> filtered, em_rdmsr()/em_wrmsr() returns X86EMUL_IO_NEEDED, but it is
> ignored by x86_emulate_instruction(). Then guest continues execution,
> but RIP has been updated to point to RDMSR/WRMSR in handle_ud(), so
> RDMSR/WRMSR can be interceped and guest exits to userspace finnaly by
> mistake. Such behaviour leads to two vm exits and wastes one instruction
> emulation.
> 
> After let x86_emulate_instruction() returns 0 for RDMSR/WRMSR emulation,
> if it needs to exit to userspace, its complete_userspace_io callback
> would call kvm_skip_instruction() to skip instruction. But for vmx,
> VMX_EXIT_INSTRUCTION_LEN in vmcs is invalid for UD, it can't be used to
> update RIP, kvm_emulate_instruction() should be used instead. As for
> svm, nRIP in vmcb is 0 for UD, so kvm_emulate_instruction() is used.
> But for nested svm, I'm not sure, since svm_check_intercept() would
> change nRIP.

Hi, can you provide a testcase for this bug using the 
tools/testing/selftests/kvm framework?

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] KVM: some fixes about RDMSR/WRMSR instruction emulation
  2021-10-22  9:46 ` [PATCH 0/2] KVM: some fixes about RDMSR/WRMSR instruction emulation Paolo Bonzini
@ 2021-10-22 15:10   ` Hou Wenlong
  0 siblings, 0 replies; 15+ messages in thread
From: Hou Wenlong @ 2021-10-22 15:10 UTC (permalink / raw)
  To: Paolo Bonzini, kvm

On Fri, Oct 22, 2021 at 11:46:17AM +0200, Paolo Bonzini wrote:
> On 22/10/21 04:59, Hou Wenlong wrote:
> >When KVM_CAP_X86_USER_SPACE_MSR cap is enabled, userspace can control
> >MSR accesses. In normal scenario, RDMSR/WRMSR can be interceped, but
> >when kvm.force_emulation_prefix is enabled, RDMSR/WRMSR with kvm prefix
> >would trigger an UD and cause instruction emulation. If MSR accesses is
> >filtered, em_rdmsr()/em_wrmsr() returns X86EMUL_IO_NEEDED, but it is
> >ignored by x86_emulate_instruction(). Then guest continues execution,
> >but RIP has been updated to point to RDMSR/WRMSR in handle_ud(), so
> >RDMSR/WRMSR can be interceped and guest exits to userspace finnaly by
> >mistake. Such behaviour leads to two vm exits and wastes one instruction
> >emulation.
> >
> >After let x86_emulate_instruction() returns 0 for RDMSR/WRMSR emulation,
> >if it needs to exit to userspace, its complete_userspace_io callback
> >would call kvm_skip_instruction() to skip instruction. But for vmx,
> >VMX_EXIT_INSTRUCTION_LEN in vmcs is invalid for UD, it can't be used to
> >update RIP, kvm_emulate_instruction() should be used instead. As for
> >svm, nRIP in vmcb is 0 for UD, so kvm_emulate_instruction() is used.
> >But for nested svm, I'm not sure, since svm_check_intercept() would
> >change nRIP.
> 
> Hi, can you provide a testcase for this bug using the
> tools/testing/selftests/kvm framework?
> 
> Thanks,
> 
> Paolo
Hi, Paolo

There is already a testcase in kvm selftests
(test_msr_filter_allow() in tools/testing/selftests/kvm/x86/userspace_msr_exit_test.c),
which is mentioned in Patch 2.

In that testcase, it tests MSR accesses emulation with
kvm.force_emulation_prefix enabled, and it is passed. But I think
the logic may be not right. As I explained in Patch 2,
x86_emulate_instruction() ignored X86EMUL_IO_NEEDED, so guest would
continue execution, but RIP had been updated to point to RDMSR/WRMSR
in handle_ud(). Then RDMSR/WRMSR would be intercepted and guest could
exit to userspace later. Although the final result seemed to be right,
it wasted the instruction emulation in the first vm exit.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] KVM: VMX: fix instruction skipping when handling UD exception
  2021-10-22  2:59 ` [PATCH 1/2] KVM: VMX: fix instruction skipping when handling UD exception Hou Wenlong
@ 2021-10-26 16:37   ` Sean Christopherson
  2021-10-27  7:00     ` Hou Wenlong
  2021-10-29 10:57     ` Hou Wenlong
  0 siblings, 2 replies; 15+ messages in thread
From: Sean Christopherson @ 2021-10-26 16:37 UTC (permalink / raw)
  To: Hou Wenlong
  Cc: kvm, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, linux-kernel

On Fri, Oct 22, 2021, Hou Wenlong wrote:
> When kvm.force_emulation_prefix is enabled, instruction with
> kvm prefix would trigger an UD exception and do instruction
> emulation. The emulation may need to exit to userspace due
> to userspace io, and the complete_userspace_io callback may
> skip instruction, i.e. MSR accesses emulation would exit to
> userspace if userspace wanted to know about the MSR fault.
> However, VM_EXIT_INSTRUCTION_LEN in vmcs is invalid now, it
> should use kvm_emulate_instruction() to skip instruction.
> 
> Signed-off-by: Hou Wenlong <houwenlong93@linux.alibaba.com>
> ---
>  arch/x86/kvm/vmx/vmx.c | 4 ++--
>  arch/x86/kvm/vmx/vmx.h | 9 +++++++++
>  2 files changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 1c8b2b6e7ed9..01049d65da26 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1501,8 +1501,8 @@ static int skip_emulated_instruction(struct kvm_vcpu *vcpu)
>  	 * (namely Hyper-V) don't set it due to it being undefined behavior,
>  	 * i.e. we end up advancing IP with some random value.
>  	 */
> -	if (!static_cpu_has(X86_FEATURE_HYPERVISOR) ||
> -	    exit_reason.basic != EXIT_REASON_EPT_MISCONFIG) {
> +	if (!is_ud_exit(vcpu) && (!static_cpu_has(X86_FEATURE_HYPERVISOR) ||

This is incomplete and is just a workaround for the underlying bug.  The same
mess can occur if the emulator triggers an exit to userspace during "normal"
emulation, e.g. if unrestricted guest is disabled and the guest accesses an MSR
while in Big RM.  In that case, there's no #UD to key off of.

The correct way to fix this is to attach a different callback when the MSR access
comes from the emulator.  I'd say rename the existing complete_emulated_{rd,wr}msr()
callbacks to complete_fast_{rd,wr}msr() to match the port I/O nomenclature.

Something like this (which also has some opportunistic simplification of the
error handling in kvm_emulate_{rd,wr}msr()).

---
 arch/x86/kvm/x86.c | 82 +++++++++++++++++++++++++---------------------
 1 file changed, 45 insertions(+), 37 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ac83d873d65b..7ff5b8d58ca3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1814,18 +1814,44 @@ int kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data)
 }
 EXPORT_SYMBOL_GPL(kvm_set_msr);

-static int complete_emulated_rdmsr(struct kvm_vcpu *vcpu)
+static void __complete_emulated_rdmsr(struct kvm_vcpu *vcpu)
 {
-	int err = vcpu->run->msr.error;
-	if (!err) {
+	if (!vcpu->run->msr.error) {
 		kvm_rax_write(vcpu, (u32)vcpu->run->msr.data);
 		kvm_rdx_write(vcpu, vcpu->run->msr.data >> 32);
 	}
+}

-	return static_call(kvm_x86_complete_emulated_msr)(vcpu, err);
+static int complete_emulated_msr_access(struct kvm_vcpu *vcpu)
+{
+	if (vcpu->run->msr.error) {
+		kvm_inject_gp(vcpu, 0);
+		return 1;
+	}
+
+	return kvm_emulate_instruction(vcpu, EMULTYPE_SKIP);
+}
+
+static int complete_emulated_rdmsr(struct kvm_vcpu *vcpu)
+{
+	__complete_emulated_rdmsr(vcpu);
+
+	return complete_emulated_msr_access(vcpu);
 }

 static int complete_emulated_wrmsr(struct kvm_vcpu *vcpu)
+{
+	return complete_emulated_msr_access(vcpu);
+}
+
+static int complete_fast_rdmsr(struct kvm_vcpu *vcpu)
+{
+	__complete_emulated_rdmsr(vcpu);
+
+	return static_call(kvm_x86_complete_emulated_msr)(vcpu, vcpu->run->msr.error);
+}
+
+static int complete_fast_wrmsr(struct kvm_vcpu *vcpu)
 {
 	return static_call(kvm_x86_complete_emulated_msr)(vcpu, vcpu->run->msr.error);
 }
@@ -1864,18 +1890,6 @@ static int kvm_msr_user_space(struct kvm_vcpu *vcpu, u32 index,
 	return 1;
 }

-static int kvm_get_msr_user_space(struct kvm_vcpu *vcpu, u32 index, int r)
-{
-	return kvm_msr_user_space(vcpu, index, KVM_EXIT_X86_RDMSR, 0,
-				   complete_emulated_rdmsr, r);
-}
-
-static int kvm_set_msr_user_space(struct kvm_vcpu *vcpu, u32 index, u64 data, int r)
-{
-	return kvm_msr_user_space(vcpu, index, KVM_EXIT_X86_WRMSR, data,
-				   complete_emulated_wrmsr, r);
-}
-
 int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu)
 {
 	u32 ecx = kvm_rcx_read(vcpu);
@@ -1883,19 +1897,15 @@ int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu)
 	int r;

 	r = kvm_get_msr(vcpu, ecx, &data);
-
-	/* MSR read failed? See if we should ask user space */
-	if (r && kvm_get_msr_user_space(vcpu, ecx, r)) {
-		/* Bounce to user space */
-		return 0;
-	}
-
 	if (!r) {
 		trace_kvm_msr_read(ecx, data);

 		kvm_rax_write(vcpu, data & -1u);
 		kvm_rdx_write(vcpu, (data >> 32) & -1u);
 	} else {
+		if (kvm_msr_user_space(vcpu, ecx, KVM_EXIT_X86_RDMSR, 0,
+				       complete_fast_rdmsr, r))
+			return 0;
 		trace_kvm_msr_read_ex(ecx);
 	}

@@ -1910,20 +1920,16 @@ int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu)
 	int r;

 	r = kvm_set_msr(vcpu, ecx, data);
-
-	/* MSR write failed? See if we should ask user space */
-	if (r && kvm_set_msr_user_space(vcpu, ecx, data, r))
-		/* Bounce to user space */
-		return 0;
-
-	/* Signal all other negative errors to userspace */
-	if (r < 0)
-		return r;
-
-	if (!r)
+	if (!r) {
 		trace_kvm_msr_write(ecx, data);
-	else
+	} else {
+		if (kvm_msr_user_space(vcpu, ecx, KVM_EXIT_X86_WRMSR, data,
+				       complete_fast_wrmsr, r))
+			return 0;
+		if (r < 0)
+			return r;
 		trace_kvm_msr_write_ex(ecx, data);
+	}

 	return static_call(kvm_x86_complete_emulated_msr)(vcpu, r);
 }
@@ -7387,7 +7393,8 @@ static int emulator_get_msr(struct x86_emulate_ctxt *ctxt,

 	r = kvm_get_msr(vcpu, msr_index, pdata);

-	if (r && kvm_get_msr_user_space(vcpu, msr_index, r)) {
+	if (r && kvm_msr_user_space(vcpu, msr_index, KVM_EXIT_X86_RDMSR, 0,
+				    complete_emulated_rdmsr, r)) {
 		/* Bounce to user space */
 		return X86EMUL_IO_NEEDED;
 	}
@@ -7403,7 +7410,8 @@ static int emulator_set_msr(struct x86_emulate_ctxt *ctxt,

 	r = kvm_set_msr(vcpu, msr_index, data);

-	if (r && kvm_set_msr_user_space(vcpu, msr_index, data, r)) {
+	if (r && kvm_msr_user_space(vcpu, msr_index, KVM_EXIT_X86_WRMSR, data,
+				    complete_emulated_wrmsr, r)) {
 		/* Bounce to user space */
 		return X86EMUL_IO_NEEDED;
 	}
--

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] KVM: VMX: fix instruction skipping when handling UD exception
  2021-10-26 16:37   ` Sean Christopherson
@ 2021-10-27  7:00     ` Hou Wenlong
  2021-10-29 10:57     ` Hou Wenlong
  1 sibling, 0 replies; 15+ messages in thread
From: Hou Wenlong @ 2021-10-27  7:00 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm

On Tue, Oct 26, 2021 at 04:37:50PM +0000, Sean Christopherson wrote:
> On Fri, Oct 22, 2021, Hou Wenlong wrote:
> > When kvm.force_emulation_prefix is enabled, instruction with
> > kvm prefix would trigger an UD exception and do instruction
> > emulation. The emulation may need to exit to userspace due
> > to userspace io, and the complete_userspace_io callback may
> > skip instruction, i.e. MSR accesses emulation would exit to
> > userspace if userspace wanted to know about the MSR fault.
> > However, VM_EXIT_INSTRUCTION_LEN in vmcs is invalid now, it
> > should use kvm_emulate_instruction() to skip instruction.
> > 
> > Signed-off-by: Hou Wenlong <houwenlong93@linux.alibaba.com>
> > ---
> >  arch/x86/kvm/vmx/vmx.c | 4 ++--
> >  arch/x86/kvm/vmx/vmx.h | 9 +++++++++
> >  2 files changed, 11 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 1c8b2b6e7ed9..01049d65da26 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -1501,8 +1501,8 @@ static int skip_emulated_instruction(struct kvm_vcpu *vcpu)
> >  	 * (namely Hyper-V) don't set it due to it being undefined behavior,
> >  	 * i.e. we end up advancing IP with some random value.
> >  	 */
> > -	if (!static_cpu_has(X86_FEATURE_HYPERVISOR) ||
> > -	    exit_reason.basic != EXIT_REASON_EPT_MISCONFIG) {
> > +	if (!is_ud_exit(vcpu) && (!static_cpu_has(X86_FEATURE_HYPERVISOR) ||
> 
> This is incomplete and is just a workaround for the underlying bug.  The same
> mess can occur if the emulator triggers an exit to userspace during "normal"
> emulation, e.g. if unrestricted guest is disabled and the guest accesses an MSR
> while in Big RM.  In that case, there's no #UD to key off of.
> 
> The correct way to fix this is to attach a different callback when the MSR access
> comes from the emulator.  I'd say rename the existing complete_emulated_{rd,wr}msr()
> callbacks to complete_fast_{rd,wr}msr() to match the port I/O nomenclature.
> 
> Something like this (which also has some opportunistic simplification of the
> error handling in kvm_emulate_{rd,wr}msr()).
> 
> ---
>  arch/x86/kvm/x86.c | 82 +++++++++++++++++++++++++---------------------
>  1 file changed, 45 insertions(+), 37 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index ac83d873d65b..7ff5b8d58ca3 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1814,18 +1814,44 @@ int kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data)
>  }
>  EXPORT_SYMBOL_GPL(kvm_set_msr);
> 
> -static int complete_emulated_rdmsr(struct kvm_vcpu *vcpu)
> +static void __complete_emulated_rdmsr(struct kvm_vcpu *vcpu)
>  {
> -	int err = vcpu->run->msr.error;
> -	if (!err) {
> +	if (!vcpu->run->msr.error) {
>  		kvm_rax_write(vcpu, (u32)vcpu->run->msr.data);
>  		kvm_rdx_write(vcpu, vcpu->run->msr.data >> 32);
>  	}
> +}
> 
> -	return static_call(kvm_x86_complete_emulated_msr)(vcpu, err);
> +static int complete_emulated_msr_access(struct kvm_vcpu *vcpu)
> +{
> +	if (vcpu->run->msr.error) {
> +		kvm_inject_gp(vcpu, 0);
> +		return 1;
> +	}
> +
> +	return kvm_emulate_instruction(vcpu, EMULTYPE_SKIP);
Lose single step checking after instruction skipping. It seems
we should skip single step checking in x86_emulate_instruction()
too for msr access. I don't understand `writeback` variable in
x86_emulate_instruction(), does it means emulation is broken and
some regs in emulation context haven't been updated to vcpu context?
Then in this case, I should set it to false in Patch 2? (although no
regs have been changed)

> +}
> +
> +static int complete_emulated_rdmsr(struct kvm_vcpu *vcpu)
> +{
> +	__complete_emulated_rdmsr(vcpu);
> +
> +	return complete_emulated_msr_access(vcpu);
>  }
> 
>  static int complete_emulated_wrmsr(struct kvm_vcpu *vcpu)
> +{
> +	return complete_emulated_msr_access(vcpu);
> +}
> +
> +static int complete_fast_rdmsr(struct kvm_vcpu *vcpu)
> +{
> +	__complete_emulated_rdmsr(vcpu);
> +
> +	return static_call(kvm_x86_complete_emulated_msr)(vcpu, vcpu->run->msr.error);
> +}
> +
> +static int complete_fast_wrmsr(struct kvm_vcpu *vcpu)
>  {
>  	return static_call(kvm_x86_complete_emulated_msr)(vcpu, vcpu->run->msr.error);
>  }
> @@ -1864,18 +1890,6 @@ static int kvm_msr_user_space(struct kvm_vcpu *vcpu, u32 index,
>  	return 1;
>  }
> 
> -static int kvm_get_msr_user_space(struct kvm_vcpu *vcpu, u32 index, int r)
> -{
> -	return kvm_msr_user_space(vcpu, index, KVM_EXIT_X86_RDMSR, 0,
> -				   complete_emulated_rdmsr, r);
> -}
> -
> -static int kvm_set_msr_user_space(struct kvm_vcpu *vcpu, u32 index, u64 data, int r)
> -{
> -	return kvm_msr_user_space(vcpu, index, KVM_EXIT_X86_WRMSR, data,
> -				   complete_emulated_wrmsr, r);
> -}
> -
>  int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu)
>  {
>  	u32 ecx = kvm_rcx_read(vcpu);
> @@ -1883,19 +1897,15 @@ int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu)
>  	int r;
> 
>  	r = kvm_get_msr(vcpu, ecx, &data);
> -
> -	/* MSR read failed? See if we should ask user space */
> -	if (r && kvm_get_msr_user_space(vcpu, ecx, r)) {
> -		/* Bounce to user space */
> -		return 0;
> -	}
> -
>  	if (!r) {
>  		trace_kvm_msr_read(ecx, data);
> 
>  		kvm_rax_write(vcpu, data & -1u);
>  		kvm_rdx_write(vcpu, (data >> 32) & -1u);
>  	} else {
> +		if (kvm_msr_user_space(vcpu, ecx, KVM_EXIT_X86_RDMSR, 0,
> +				       complete_fast_rdmsr, r))
> +			return 0;
>  		trace_kvm_msr_read_ex(ecx);
>  	}
> 
> @@ -1910,20 +1920,16 @@ int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu)
>  	int r;
> 
>  	r = kvm_set_msr(vcpu, ecx, data);
> -
> -	/* MSR write failed? See if we should ask user space */
> -	if (r && kvm_set_msr_user_space(vcpu, ecx, data, r))
> -		/* Bounce to user space */
> -		return 0;
> -
> -	/* Signal all other negative errors to userspace */
> -	if (r < 0)
> -		return r;
> -
> -	if (!r)
> +	if (!r) {
>  		trace_kvm_msr_write(ecx, data);
> -	else
> +	} else {
> +		if (kvm_msr_user_space(vcpu, ecx, KVM_EXIT_X86_WRMSR, data,
> +				       complete_fast_wrmsr, r))
> +			return 0;
> +		if (r < 0)
> +			return r;
>  		trace_kvm_msr_write_ex(ecx, data);
> +	}
> 
>  	return static_call(kvm_x86_complete_emulated_msr)(vcpu, r);
>  }
> @@ -7387,7 +7393,8 @@ static int emulator_get_msr(struct x86_emulate_ctxt *ctxt,
> 
>  	r = kvm_get_msr(vcpu, msr_index, pdata);
> 
> -	if (r && kvm_get_msr_user_space(vcpu, msr_index, r)) {
> +	if (r && kvm_msr_user_space(vcpu, msr_index, KVM_EXIT_X86_RDMSR, 0,
> +				    complete_emulated_rdmsr, r)) {
>  		/* Bounce to user space */
>  		return X86EMUL_IO_NEEDED;
>  	}
> @@ -7403,7 +7410,8 @@ static int emulator_set_msr(struct x86_emulate_ctxt *ctxt,
> 
>  	r = kvm_set_msr(vcpu, msr_index, data);
> 
> -	if (r && kvm_set_msr_user_space(vcpu, msr_index, data, r)) {
> +	if (r && kvm_msr_user_space(vcpu, msr_index, KVM_EXIT_X86_WRMSR, data,
> +				    complete_emulated_wrmsr, r)) {
>  		/* Bounce to user space */
>  		return X86EMUL_IO_NEEDED;
>  	}
> --

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] KVM: VMX: fix instruction skipping when handling UD exception
  2021-10-26 16:37   ` Sean Christopherson
  2021-10-27  7:00     ` Hou Wenlong
@ 2021-10-29 10:57     ` Hou Wenlong
  2021-11-01 17:03       ` Sean Christopherson
  1 sibling, 1 reply; 15+ messages in thread
From: Hou Wenlong @ 2021-10-29 10:57 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm

On Tue, Oct 26, 2021 at 04:37:50PM +0000, Sean Christopherson wrote:
> On Fri, Oct 22, 2021, Hou Wenlong wrote:
> > When kvm.force_emulation_prefix is enabled, instruction with
> > kvm prefix would trigger an UD exception and do instruction
> > emulation. The emulation may need to exit to userspace due
> > to userspace io, and the complete_userspace_io callback may
> > skip instruction, i.e. MSR accesses emulation would exit to
> > userspace if userspace wanted to know about the MSR fault.
> > However, VM_EXIT_INSTRUCTION_LEN in vmcs is invalid now, it
> > should use kvm_emulate_instruction() to skip instruction.
> > 
> > Signed-off-by: Hou Wenlong <houwenlong93@linux.alibaba.com>
> > ---
> >  arch/x86/kvm/vmx/vmx.c | 4 ++--
> >  arch/x86/kvm/vmx/vmx.h | 9 +++++++++
> >  2 files changed, 11 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 1c8b2b6e7ed9..01049d65da26 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -1501,8 +1501,8 @@ static int skip_emulated_instruction(struct kvm_vcpu *vcpu)
> >  	 * (namely Hyper-V) don't set it due to it being undefined behavior,
> >  	 * i.e. we end up advancing IP with some random value.
> >  	 */
> > -	if (!static_cpu_has(X86_FEATURE_HYPERVISOR) ||
> > -	    exit_reason.basic != EXIT_REASON_EPT_MISCONFIG) {
> > +	if (!is_ud_exit(vcpu) && (!static_cpu_has(X86_FEATURE_HYPERVISOR) ||
> 
> This is incomplete and is just a workaround for the underlying bug.  The same
> mess can occur if the emulator triggers an exit to userspace during "normal"
> emulation, e.g. if unrestricted guest is disabled and the guest accesses an MSR
> while in Big RM.  In that case, there's no #UD to key off of.
> 
> The correct way to fix this is to attach a different callback when the MSR access
> comes from the emulator.  I'd say rename the existing complete_emulated_{rd,wr}msr()
> callbacks to complete_fast_{rd,wr}msr() to match the port I/O nomenclature.
> 
> Something like this (which also has some opportunistic simplification of the
> error handling in kvm_emulate_{rd,wr}msr()).
> 
> ---
>  arch/x86/kvm/x86.c | 82 +++++++++++++++++++++++++---------------------
>  1 file changed, 45 insertions(+), 37 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index ac83d873d65b..7ff5b8d58ca3 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1814,18 +1814,44 @@ int kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data)
>  }
>  EXPORT_SYMBOL_GPL(kvm_set_msr);
> 
> -static int complete_emulated_rdmsr(struct kvm_vcpu *vcpu)
> +static void __complete_emulated_rdmsr(struct kvm_vcpu *vcpu)
>  {
> -	int err = vcpu->run->msr.error;
> -	if (!err) {
> +	if (!vcpu->run->msr.error) {
>  		kvm_rax_write(vcpu, (u32)vcpu->run->msr.data);
>  		kvm_rdx_write(vcpu, vcpu->run->msr.data >> 32);
>  	}
> +}
> 
> -	return static_call(kvm_x86_complete_emulated_msr)(vcpu, err);
> +static int complete_emulated_msr_access(struct kvm_vcpu *vcpu)
> +{
> +	if (vcpu->run->msr.error) {
> +		kvm_inject_gp(vcpu, 0);
> +		return 1;
> +	}
> +
> +	return kvm_emulate_instruction(vcpu, EMULTYPE_SKIP);
> +}
> +
> +static int complete_emulated_rdmsr(struct kvm_vcpu *vcpu)
> +{
> +	__complete_emulated_rdmsr(vcpu);
> +
> +	return complete_emulated_msr_access(vcpu);
>  }
> 
>  static int complete_emulated_wrmsr(struct kvm_vcpu *vcpu)
> +{
> +	return complete_emulated_msr_access(vcpu);
> +}
> +
> +static int complete_fast_rdmsr(struct kvm_vcpu *vcpu)
> +{
> +	__complete_emulated_rdmsr(vcpu);
> +
> +	return static_call(kvm_x86_complete_emulated_msr)(vcpu, vcpu->run->msr.error);
> +}
> +
> +static int complete_fast_wrmsr(struct kvm_vcpu *vcpu)
>  {
>  	return static_call(kvm_x86_complete_emulated_msr)(vcpu, vcpu->run->msr.error);
>  }
> @@ -1864,18 +1890,6 @@ static int kvm_msr_user_space(struct kvm_vcpu *vcpu, u32 index,
>  	return 1;
>  }
> 
> -static int kvm_get_msr_user_space(struct kvm_vcpu *vcpu, u32 index, int r)
> -{
> -	return kvm_msr_user_space(vcpu, index, KVM_EXIT_X86_RDMSR, 0,
> -				   complete_emulated_rdmsr, r);
> -}
> -
> -static int kvm_set_msr_user_space(struct kvm_vcpu *vcpu, u32 index, u64 data, int r)
> -{
> -	return kvm_msr_user_space(vcpu, index, KVM_EXIT_X86_WRMSR, data,
> -				   complete_emulated_wrmsr, r);
> -}
> -
>  int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu)
>  {
>  	u32 ecx = kvm_rcx_read(vcpu);
> @@ -1883,19 +1897,15 @@ int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu)
>  	int r;
> 
>  	r = kvm_get_msr(vcpu, ecx, &data);
> -
> -	/* MSR read failed? See if we should ask user space */
> -	if (r && kvm_get_msr_user_space(vcpu, ecx, r)) {
> -		/* Bounce to user space */
> -		return 0;
> -	}
> -
>  	if (!r) {
>  		trace_kvm_msr_read(ecx, data);
> 
>  		kvm_rax_write(vcpu, data & -1u);
>  		kvm_rdx_write(vcpu, (data >> 32) & -1u);
>  	} else {
> +		if (kvm_msr_user_space(vcpu, ecx, KVM_EXIT_X86_RDMSR, 0,
> +				       complete_fast_rdmsr, r))
> +			return 0;
>  		trace_kvm_msr_read_ex(ecx);
>  	}
> 
> @@ -1910,20 +1920,16 @@ int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu)
>  	int r;
> 
>  	r = kvm_set_msr(vcpu, ecx, data);
> -
> -	/* MSR write failed? See if we should ask user space */
> -	if (r && kvm_set_msr_user_space(vcpu, ecx, data, r))
> -		/* Bounce to user space */
> -		return 0;
> -
> -	/* Signal all other negative errors to userspace */
> -	if (r < 0)
> -		return r;
> -
> -	if (!r)
> +	if (!r) {
>  		trace_kvm_msr_write(ecx, data);
> -	else
> +	} else {
> +		if (kvm_msr_user_space(vcpu, ecx, KVM_EXIT_X86_WRMSR, data,
> +				       complete_fast_wrmsr, r))
> +			return 0;
> +		if (r < 0)
> +			return r;
>  		trace_kvm_msr_write_ex(ecx, data);
> +	}
> 
>  	return static_call(kvm_x86_complete_emulated_msr)(vcpu, r);
>  }
> @@ -7387,7 +7393,8 @@ static int emulator_get_msr(struct x86_emulate_ctxt *ctxt,
> 
>  	r = kvm_get_msr(vcpu, msr_index, pdata);
> 
> -	if (r && kvm_get_msr_user_space(vcpu, msr_index, r)) {
> +	if (r && kvm_msr_user_space(vcpu, msr_index, KVM_EXIT_X86_RDMSR, 0,
> +				    complete_emulated_rdmsr, r)) {
>  		/* Bounce to user space */
>  		return X86EMUL_IO_NEEDED;
>  	}
> @@ -7403,7 +7410,8 @@ static int emulator_set_msr(struct x86_emulate_ctxt *ctxt,
> 
>  	r = kvm_set_msr(vcpu, msr_index, data);
> 
> -	if (r && kvm_set_msr_user_space(vcpu, msr_index, data, r)) {
> +	if (r && kvm_msr_user_space(vcpu, msr_index, KVM_EXIT_X86_WRMSR, data,
> +				    complete_emulated_wrmsr, r)) {
>  		/* Bounce to user space */
>  		return X86EMUL_IO_NEEDED;
>  	}
> --
Hi Sean,

The note in x86_emulate_instruction() for EMULTYPE_SKIP said that the
caller should be responsible for updating interruptibility state and
injecting single-step #DB. And vendor callbacks for
kvm_skip_emulated_instruction() also do some special things, e.g.
I found that sev_es guest just skips RIP updating. So it may be
more appropriate to add a parameter for skip_emulated_instruction()
callback, which force to use x86_skip_instruction() if the instruction
length is invalid.

Thanks

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] KVM: VMX: fix instruction skipping when handling UD exception
  2021-10-29 10:57     ` Hou Wenlong
@ 2021-11-01 17:03       ` Sean Christopherson
  0 siblings, 0 replies; 15+ messages in thread
From: Sean Christopherson @ 2021-11-01 17:03 UTC (permalink / raw)
  To: Hou Wenlong; +Cc: kvm

On Fri, Oct 29, 2021, Hou Wenlong wrote:
> On Tue, Oct 26, 2021 at 04:37:50PM +0000, Sean Christopherson wrote:
> > On Fri, Oct 22, 2021, Hou Wenlong wrote:
> > +static int complete_emulated_msr_access(struct kvm_vcpu *vcpu)
> > +{
> > +	if (vcpu->run->msr.error) {
> > +		kvm_inject_gp(vcpu, 0);
> > +		return 1;
> > +	}
> > +
> > +	return kvm_emulate_instruction(vcpu, EMULTYPE_SKIP);
> > +}

...

> The note in x86_emulate_instruction() for EMULTYPE_SKIP said that the
> caller should be responsible for updating interruptibility state and
> injecting single-step #DB.

Urgh, yes.  And that note also very clear states it's for use only by the vendor
callbacks for exactly that reason.

> And vendor callbacks for kvm_skip_emulated_instruction() also do some special
> things,

Luckily, the emulator also does (almost) all those special things.

> e.g. I found that sev_es guest just skips RIP updating.

Emulation is impossible with sev_es because KVM can't decode the guest code stream,
so that particular wrinkle is out of scope.

> So it may be more appropriate to add a parameter for skip_emulated_instruction()
> callback, which force to use x86_skip_instruction() if the instruction length
> is invalid.

I really don't like the idea of routing this through kvm_skip_emulated_instruction(),
anything originating from the emulator ideally would be handled within the emulator
when possible, especially since we know that KVM is going to end up in the emulator
anyways.

The best idea I can come up with is to add a new emulation type to pair with _SKIP
to handle completion of user exits.  In theory it should be a tiny code change to
add a branch inside the EMULTYPE_SKIP path.

On a related topic, I think EMULTYPE_SKIP fails to handled wrapping EIP when the
guest has a flat code segment.

So this:

From e3511669c40e4d074fb19f43256fc5da8634af14 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc@google.com>
Date: Mon, 1 Nov 2021 09:52:35 -0700
Subject: [PATCH] KVM: x86: Handle 32-bit wrap of EIP for EMULTYPE_SKIP with
 flat code seg

Truncate the new EIP to a 32-bit value when handling EMULTYPE_SKIP as the
decode phase does not truncate _eip.  Wwrapping the 32-bit boundary is
legal if and only if CS is a flat code segment, but that check is
implicitly handled in the form of limit checks in the decode phase.

Opportunstically prepare for a future fix by storing the result of any
truncation in "eip" instead of "_eip".

Fixes: 1957aa63be53 ("KVM: VMX: Handle single-step #DB for EMULTYPE_SKIP on EPT misconfig")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ac83d873d65b..3d7fc5c21ceb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8124,7 +8124,12 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	 * updating interruptibility state and injecting single-step #DBs.
 	 */
 	if (emulation_type & EMULTYPE_SKIP) {
-		kvm_rip_write(vcpu, ctxt->_eip);
+		if (ctxt->mode != X86EMUL_MODE_PROT64)
+			ctxt->eip = (u32)ctxt->_eip;
+		else
+			ctxt->eip = ctxt->_eip;
+
+		kvm_rip_write(vcpu, ctxt->eip);
 		if (ctxt->eflags & X86_EFLAGS_RF)
 			kvm_set_rflags(vcpu, ctxt->eflags & ~X86_EFLAGS_RF);
 		return 1;
--


followed by the rework with complete_emulated_msr_access() doing
"EMULTYPE_SKIP | EMULTYPE_COMPLETE_USER_EXIT" with this as the functional change
in the emulator:

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3d7fc5c21ceb..13d4758810d1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8118,10 +8118,12 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
                return 1;
        }

+
        /*
-        * Note, EMULTYPE_SKIP is intended for use *only* by vendor callbacks
-        * for kvm_skip_emulated_instruction().  The caller is responsible for
-        * updating interruptibility state and injecting single-step #DBs.
+        * EMULTYPE_SKIP without is EMULTYPE_COMPLETE_USER_EXIT intended for
+        * use *only* by vendor callbacks for kvm_skip_emulated_instruction().
+        * The caller is responsible for updating interruptibility state and
+        * injecting single-step #DBs.
         */
        if (emulation_type & EMULTYPE_SKIP) {
                if (ctxt->mode != X86EMUL_MODE_PROT64)
@@ -8129,6 +8131,9 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
                else
                        ctxt->eip = ctxt->_eip;

+               if (emulation_type & EMULTYPE_COMPLETE_USER_EXIT)
+                       goto writeback;
+
                kvm_rip_write(vcpu, ctxt->eip);
                if (ctxt->eflags & X86_EFLAGS_RF)
                        kvm_set_rflags(vcpu, ctxt->eflags & ~X86_EFLAGS_RF);
@@ -8198,6 +8203,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
        else
                r = 1;

+writeback:
        if (writeback) {
                unsigned long rflags = static_call(kvm_x86_get_rflags)(vcpu);
                toggle_interruptibility(vcpu, ctxt->interruptibility);

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 0/4] KVM: x86: some fixes about msr access emulation
  2021-10-22  2:59 [PATCH 0/2] KVM: some fixes about RDMSR/WRMSR instruction emulation Hou Wenlong
                   ` (2 preceding siblings ...)
  2021-10-22  9:46 ` [PATCH 0/2] KVM: some fixes about RDMSR/WRMSR instruction emulation Paolo Bonzini
@ 2021-11-02  9:15 ` Hou Wenlong
  2021-11-02  9:15   ` [PATCH v2 1/4] KVM: x86: Handle 32-bit wrap of EIP for EMULTYPE_SKIP with flat code seg Hou Wenlong
                     ` (3 more replies)
  3 siblings, 4 replies; 15+ messages in thread
From: Hou Wenlong @ 2021-11-02  9:15 UTC (permalink / raw)
  To: kvm

When KVM_CAP_X86_USER_SPACE_MSR cap is enabled, userspace can control
MSR accesses. In normal scenario, RDMSR/WRMSR can be interceped, but
when kvm.force_emulation_prefix is enabled, RDMSR/WRMSR with kvm prefix
would trigger an UD and cause instruction emulation. If MSR accesses is
filtered, em_rdmsr()/em_wrmsr() returns X86EMUL_IO_NEEDED, but it is
ignored by x86_emulate_instruction(). Then guest continues execution,
but RIP has been updated to point to RDMSR/WRMSR in handle_ud(), so
RDMSR/WRMSR can be interceped and guest exits to userspace finnaly by
mistake. Such behaviour leads to two vm exits and wastes one instruction
emulation.

After let x86_emulate_instruction() returns 0 for RDMSR/WRMSR emulation,
if it needs to exit to userspace, its complete_userspace_io callback
would call kvm_skip_instruction() to skip instruction. But for vmx,
VMX_EXIT_INSTRUCTION_LEN in vmcs is invalid for UD, it can't be used to
update RIP, kvm_emulate_instruction() should be used instead. As for
svm, nRIP in vmcb is 0 for UD, so kvm_emulate_instruction() is used.
But for nested svm, I'm not sure, since svm_check_intercept() would
change nRIP.

Changed from v1:
	As Sean suggested, fix the problem within the emulator
	instead of routing to the vendor callback.
	Add a new emulation type to handle completion of user exits.
	Attach a different callback for msr access emulation in the
	emulator.

Hou Wenlong (3):
  KVM: x86: Add an emulation type to handle completion of user exits
  KVM: x86: Use different callback if msr access comes from the emulator
  KVM: x86: Exit to userspace if RDMSR/WRMSR emulation returns
    X86EMUL_IO_NEEDED

Sean Christopherson (1):
  KVM: x86: Handle 32-bit wrap of EIP for EMULTYPE_SKIP with flat code
    seg

 arch/x86/include/asm/kvm_host.h |   8 ++-
 arch/x86/kvm/x86.c              | 108 ++++++++++++++++++++------------
 2 files changed, 76 insertions(+), 40 deletions(-)

--
2.31.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/4] KVM: x86: Handle 32-bit wrap of EIP for EMULTYPE_SKIP with flat code seg
  2021-11-02  9:15 ` [PATCH v2 0/4] KVM: x86: some fixes about msr access emulation Hou Wenlong
@ 2021-11-02  9:15   ` Hou Wenlong
  2021-11-02  9:15   ` [PATCH v2 2/4] KVM: x86: Add an emulation type to handle completion of user exits Hou Wenlong
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Hou Wenlong @ 2021-11-02  9:15 UTC (permalink / raw)
  To: kvm
  Cc: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, linux-kernel

From: Sean Christopherson <seanjc@google.com>

Truncate the new EIP to a 32-bit value when handling EMULTYPE_SKIP as the
decode phase does not truncate _eip.  Wrapping the 32-bit boundary is
legal if and only if CS is a flat code segment, but that check is
implicitly handled in the form of limit checks in the decode phase.

Opportunstically prepare for a future fix by storing the result of any
truncation in "eip" instead of "_eip".

Fixes: 1957aa63be53 ("KVM: VMX: Handle single-step #DB for EMULTYPE_SKIP on EPT misconfig")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ac83d873d65b..3d7fc5c21ceb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8124,7 +8124,12 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	 * updating interruptibility state and injecting single-step #DBs.
 	 */
 	if (emulation_type & EMULTYPE_SKIP) {
-		kvm_rip_write(vcpu, ctxt->_eip);
+		if (ctxt->mode != X86EMUL_MODE_PROT64)
+			ctxt->eip = (u32)ctxt->_eip;
+		else
+			ctxt->eip = ctxt->_eip;
+
+		kvm_rip_write(vcpu, ctxt->eip);
 		if (ctxt->eflags & X86_EFLAGS_RF)
 			kvm_set_rflags(vcpu, ctxt->eflags & ~X86_EFLAGS_RF);
 		return 1;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/4] KVM: x86: Add an emulation type to handle completion of user exits
  2021-11-02  9:15 ` [PATCH v2 0/4] KVM: x86: some fixes about msr access emulation Hou Wenlong
  2021-11-02  9:15   ` [PATCH v2 1/4] KVM: x86: Handle 32-bit wrap of EIP for EMULTYPE_SKIP with flat code seg Hou Wenlong
@ 2021-11-02  9:15   ` Hou Wenlong
  2021-11-02  9:15   ` [PATCH v2 3/4] KVM: x86: Use different callback if msr access comes from the emulator Hou Wenlong
  2021-11-02  9:15   ` [PATCH v2 4/4] KVM: x86: Exit to userspace if RDMSR/WRMSR emulation returns X86EMUL_IO_NEEDED Hou Wenlong
  3 siblings, 0 replies; 15+ messages in thread
From: Hou Wenlong @ 2021-11-02  9:15 UTC (permalink / raw)
  To: kvm
  Cc: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, linux-kernel

The next patch would use kvm_emulate_instruction() with
EMULTYPE_SKIP in complete_userspace_io callback to fix a
problem in msr access emulation. However, EMULTYPE_SKIP
only updates RIP, more things like updating interruptibility
state and injecting single-step #DBs would be done in the
callback. Since the emulator also does those things after
x86_emulate_insn(), add a new emulation type to pair with
EMULTYPE_SKIP to do those things for completion of user exits
within the emulator.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Hou Wenlong <houwenlong93@linux.alibaba.com>
---
 arch/x86/include/asm/kvm_host.h |  8 +++++++-
 arch/x86/kvm/x86.c              | 13 ++++++++++---
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 88fce6ab4bbd..b99e62b39097 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1635,7 +1635,8 @@ extern u64 kvm_mce_cap_supported;
  *
  * EMULTYPE_SKIP - Set when emulating solely to skip an instruction, i.e. to
  *		   decode the instruction length.  For use *only* by
- *		   kvm_x86_ops.skip_emulated_instruction() implementations.
+ *		   kvm_x86_ops.skip_emulated_instruction() implementations if
+ *		   EMULTYPE_COMPLETE_USER_EXIT is not set.
  *
  * EMULTYPE_ALLOW_RETRY_PF - Set when the emulator should resume the guest to
  *			     retry native execution under certain conditions,
@@ -1655,6 +1656,10 @@ extern u64 kvm_mce_cap_supported;
  *
  * EMULTYPE_PF - Set when emulating MMIO by way of an intercepted #PF, in which
  *		 case the CR2/GPA value pass on the stack is valid.
+ *
+ * EMULTYPE_COMPLETE_USER_EXIT - Set when the emulator should update interruptibility
+ *				 state and inject single-step #DBs after skipping
+ *				 an instruction (after completing userspace I/O).
  */
 #define EMULTYPE_NO_DECODE	    (1 << 0)
 #define EMULTYPE_TRAP_UD	    (1 << 1)
@@ -1663,6 +1668,7 @@ extern u64 kvm_mce_cap_supported;
 #define EMULTYPE_TRAP_UD_FORCED	    (1 << 4)
 #define EMULTYPE_VMWARE_GP	    (1 << 5)
 #define EMULTYPE_PF		    (1 << 6)
+#define EMULTYPE_COMPLETE_USER_EXIT (1 << 7)
 
 int kvm_emulate_instruction(struct kvm_vcpu *vcpu, int emulation_type);
 int kvm_emulate_instruction_from_buffer(struct kvm_vcpu *vcpu,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3d7fc5c21ceb..a961b49c8c44 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8119,9 +8119,10 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	}
 
 	/*
-	 * Note, EMULTYPE_SKIP is intended for use *only* by vendor callbacks
-	 * for kvm_skip_emulated_instruction().  The caller is responsible for
-	 * updating interruptibility state and injecting single-step #DBs.
+	 * EMULTYPE_SKIP without EMULTYPE_COMPLETE_USER_EXIT is intended for
+	 * use *only* by vendor callbacks for kvm_skip_emulated_instruction().
+	 * The caller is responsible for updating interruptibility state and
+	 * injecting single-step #DBs.
 	 */
 	if (emulation_type & EMULTYPE_SKIP) {
 		if (ctxt->mode != X86EMUL_MODE_PROT64)
@@ -8129,6 +8130,11 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 		else
 			ctxt->eip = ctxt->_eip;
 
+		if (emulation_type & EMULTYPE_COMPLETE_USER_EXIT) {
+			r = 1;
+			goto writeback;
+		}
+
 		kvm_rip_write(vcpu, ctxt->eip);
 		if (ctxt->eflags & X86_EFLAGS_RF)
 			kvm_set_rflags(vcpu, ctxt->eflags & ~X86_EFLAGS_RF);
@@ -8198,6 +8204,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	else
 		r = 1;
 
+writeback:
 	if (writeback) {
 		unsigned long rflags = static_call(kvm_x86_get_rflags)(vcpu);
 		toggle_interruptibility(vcpu, ctxt->interruptibility);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 3/4] KVM: x86: Use different callback if msr access comes from the emulator
  2021-11-02  9:15 ` [PATCH v2 0/4] KVM: x86: some fixes about msr access emulation Hou Wenlong
  2021-11-02  9:15   ` [PATCH v2 1/4] KVM: x86: Handle 32-bit wrap of EIP for EMULTYPE_SKIP with flat code seg Hou Wenlong
  2021-11-02  9:15   ` [PATCH v2 2/4] KVM: x86: Add an emulation type to handle completion of user exits Hou Wenlong
@ 2021-11-02  9:15   ` Hou Wenlong
  2021-11-26 17:39     ` Paolo Bonzini
  2021-11-02  9:15   ` [PATCH v2 4/4] KVM: x86: Exit to userspace if RDMSR/WRMSR emulation returns X86EMUL_IO_NEEDED Hou Wenlong
  3 siblings, 1 reply; 15+ messages in thread
From: Hou Wenlong @ 2021-11-02  9:15 UTC (permalink / raw)
  To: kvm
  Cc: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, linux-kernel

If msr access triggers an exit to userspace, the
complete_userspace_io callback would skip instruction by vendor
callback for kvm_skip_emulated_instruction(). However, when msr
access comes from the emulator, e.g. if kvm.force_emulation_prefix
is enabled and the guest uses rdmsr/wrmsr with kvm prefix,
VM_EXIT_INSTRUCTION_LEN in vmcs is invalid and
kvm_emulate_instruction() should be used to skip instruction
instead.

As Sean noted, unlike the previous case, there's no #UD if
unrestricted guest is disabled and the guest accesses an MSR in
Big RM. So the correct way to fix this is to attach a different
callback when the msr access comes from the emulator.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Hou Wenlong <houwenlong93@linux.alibaba.com>
---
 arch/x86/kvm/x86.c | 85 +++++++++++++++++++++++++++-------------------
 1 file changed, 50 insertions(+), 35 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a961b49c8c44..5eadf5ddba3e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -116,6 +116,7 @@ static void enter_smm(struct kvm_vcpu *vcpu);
 static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
 static void store_regs(struct kvm_vcpu *vcpu);
 static int sync_regs(struct kvm_vcpu *vcpu);
+static int kvm_vcpu_do_singlestep(struct kvm_vcpu *vcpu);
 
 static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
 static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2);
@@ -1814,18 +1815,45 @@ int kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data)
 }
 EXPORT_SYMBOL_GPL(kvm_set_msr);
 
-static int complete_emulated_rdmsr(struct kvm_vcpu *vcpu)
+static void __complete_emulated_rdmsr(struct kvm_vcpu *vcpu)
 {
-	int err = vcpu->run->msr.error;
-	if (!err) {
+	if (!vcpu->run->msr.error) {
 		kvm_rax_write(vcpu, (u32)vcpu->run->msr.data);
 		kvm_rdx_write(vcpu, vcpu->run->msr.data >> 32);
 	}
+}
+
+static int complete_emulated_msr_access(struct kvm_vcpu *vcpu)
+{
+	if (vcpu->run->msr.error) {
+		kvm_inject_gp(vcpu, 0);
+		return 1;
+	}
+
+	return kvm_emulate_instruction(vcpu, EMULTYPE_NO_DECODE | EMULTYPE_SKIP |
+				       EMULTYPE_COMPLETE_USER_EXIT);
+}
+
+static int complete_emulated_rdmsr(struct kvm_vcpu *vcpu)
+{
+	__complete_emulated_rdmsr(vcpu);
 
-	return static_call(kvm_x86_complete_emulated_msr)(vcpu, err);
+	return complete_emulated_msr_access(vcpu);
 }
 
 static int complete_emulated_wrmsr(struct kvm_vcpu *vcpu)
+{
+	return complete_emulated_msr_access(vcpu);
+}
+
+static int complete_fast_rdmsr(struct kvm_vcpu *vcpu)
+{
+	__complete_emulated_rdmsr(vcpu);
+
+	return static_call(kvm_x86_complete_emulated_msr)(vcpu, vcpu->run->msr.error);
+}
+
+static int complete_fast_wrmsr(struct kvm_vcpu *vcpu)
 {
 	return static_call(kvm_x86_complete_emulated_msr)(vcpu, vcpu->run->msr.error);
 }
@@ -1864,18 +1892,6 @@ static int kvm_msr_user_space(struct kvm_vcpu *vcpu, u32 index,
 	return 1;
 }
 
-static int kvm_get_msr_user_space(struct kvm_vcpu *vcpu, u32 index, int r)
-{
-	return kvm_msr_user_space(vcpu, index, KVM_EXIT_X86_RDMSR, 0,
-				   complete_emulated_rdmsr, r);
-}
-
-static int kvm_set_msr_user_space(struct kvm_vcpu *vcpu, u32 index, u64 data, int r)
-{
-	return kvm_msr_user_space(vcpu, index, KVM_EXIT_X86_WRMSR, data,
-				   complete_emulated_wrmsr, r);
-}
-
 int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu)
 {
 	u32 ecx = kvm_rcx_read(vcpu);
@@ -1884,18 +1900,16 @@ int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu)
 
 	r = kvm_get_msr(vcpu, ecx, &data);
 
-	/* MSR read failed? See if we should ask user space */
-	if (r && kvm_get_msr_user_space(vcpu, ecx, r)) {
-		/* Bounce to user space */
-		return 0;
-	}
-
 	if (!r) {
 		trace_kvm_msr_read(ecx, data);
 
 		kvm_rax_write(vcpu, data & -1u);
 		kvm_rdx_write(vcpu, (data >> 32) & -1u);
 	} else {
+		/* MSR read failed? See if we should ask user space */
+		if (kvm_msr_user_space(vcpu, ecx, KVM_EXIT_X86_RDMSR, 0,
+				       complete_fast_rdmsr, r))
+			return 0;
 		trace_kvm_msr_read_ex(ecx);
 	}
 
@@ -1911,19 +1925,18 @@ int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu)
 
 	r = kvm_set_msr(vcpu, ecx, data);
 
-	/* MSR write failed? See if we should ask user space */
-	if (r && kvm_set_msr_user_space(vcpu, ecx, data, r))
-		/* Bounce to user space */
-		return 0;
-
-	/* Signal all other negative errors to userspace */
-	if (r < 0)
-		return r;
-
-	if (!r)
+	if (!r) {
 		trace_kvm_msr_write(ecx, data);
-	else
+	} else {
+		/* MSR write failed? See if we should ask user space */
+		if (kvm_msr_user_space(vcpu, ecx, KVM_EXIT_X86_WRMSR, data,
+				       complete_fast_wrmsr, r))
+			return 0;
+		/* Signal all other negative errors to userspace */
+		if (r < 0)
+			return r;
 		trace_kvm_msr_write_ex(ecx, data);
+	}
 
 	return static_call(kvm_x86_complete_emulated_msr)(vcpu, r);
 }
@@ -7387,7 +7400,8 @@ static int emulator_get_msr(struct x86_emulate_ctxt *ctxt,
 
 	r = kvm_get_msr(vcpu, msr_index, pdata);
 
-	if (r && kvm_get_msr_user_space(vcpu, msr_index, r)) {
+	if (r && kvm_msr_user_space(vcpu, msr_index, KVM_EXIT_X86_RDMSR, 0,
+				    complete_emulated_rdmsr, r)) {
 		/* Bounce to user space */
 		return X86EMUL_IO_NEEDED;
 	}
@@ -7403,7 +7417,8 @@ static int emulator_set_msr(struct x86_emulate_ctxt *ctxt,
 
 	r = kvm_set_msr(vcpu, msr_index, data);
 
-	if (r && kvm_set_msr_user_space(vcpu, msr_index, data, r)) {
+	if (r && kvm_msr_user_space(vcpu, msr_index, KVM_EXIT_X86_WRMSR, data,
+				    complete_emulated_wrmsr, r)) {
 		/* Bounce to user space */
 		return X86EMUL_IO_NEEDED;
 	}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 4/4] KVM: x86: Exit to userspace if RDMSR/WRMSR emulation returns X86EMUL_IO_NEEDED
  2021-11-02  9:15 ` [PATCH v2 0/4] KVM: x86: some fixes about msr access emulation Hou Wenlong
                     ` (2 preceding siblings ...)
  2021-11-02  9:15   ` [PATCH v2 3/4] KVM: x86: Use different callback if msr access comes from the emulator Hou Wenlong
@ 2021-11-02  9:15   ` Hou Wenlong
  3 siblings, 0 replies; 15+ messages in thread
From: Hou Wenlong @ 2021-11-02  9:15 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, H. Peter Anvin, Alexander Graf,
	linux-kernel

In em_{rd,wr}msr(), it returns X86EMUL_IO_NEEDED if MSR accesses needed
to exit to userspace. However, x86_emulate_insn() doesn't return
X86EMUL_*, so x86_emulate_instruction() doesn't directly act on
X86EMUL_IO_NEEDED, it instead looks for other signals to differentiate
between PIO, MMIO, etc... So RDMSR/WRMSR emulation won't exit to
userspace now.

The userspace_msr_exit_test testcase in seftests had tested RDMSR/WRMSR
emulation with kvm.force_enable_prefix enabled and it was passed.
Because x86_emulate_instruction() returns 1 and guest continues,
but RIP has been updated to point to RDMSR/WRMSR. Then guest would
execute RDMSR/WRMSR and exit to userspace by kvm_emulate_{rd,wr}msr()
finally. In such situation, instruction emulation didn't take effect but
userspace exit information had been filled, which was inappropriate.

Since X86EMUL_IO_NEEDED path would provide a complete_userspace_io
callback, x86_emulate_instruction() should return 0 if callback is
not NULL. Then RDMSR/WRMSR instruction emulation can exit to userspace
and skip RDMSR/WRMSR execution and inteception.

Fixes: 1ae099540e8c7 ("KVM: x86: Allow deflecting unknown MSR accesses to user space")
Signed-off-by: Hou Wenlong <houwenlong93@linux.alibaba.com>
---
 arch/x86/kvm/x86.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5eadf5ddba3e..f5573f010a8e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8214,6 +8214,9 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 			writeback = false;
 		r = 0;
 		vcpu->arch.complete_userspace_io = complete_emulated_mmio;
+	} else if (vcpu->arch.complete_userspace_io) {
+		writeback = false;
+		r = 0;
 	} else if (r == EMULATION_RESTART)
 		goto restart;
 	else
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/4] KVM: x86: Use different callback if msr access comes from the emulator
  2021-11-02  9:15   ` [PATCH v2 3/4] KVM: x86: Use different callback if msr access comes from the emulator Hou Wenlong
@ 2021-11-26 17:39     ` Paolo Bonzini
  0 siblings, 0 replies; 15+ messages in thread
From: Paolo Bonzini @ 2021-11-26 17:39 UTC (permalink / raw)
  To: Hou Wenlong, kvm
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, linux-kernel

On 11/2/21 10:15, Hou Wenlong wrote:
> If msr access triggers an exit to userspace, the
> complete_userspace_io callback would skip instruction by vendor
> callback for kvm_skip_emulated_instruction(). However, when msr
> access comes from the emulator, e.g. if kvm.force_emulation_prefix
> is enabled and the guest uses rdmsr/wrmsr with kvm prefix,
> VM_EXIT_INSTRUCTION_LEN in vmcs is invalid and
> kvm_emulate_instruction() should be used to skip instruction
> instead.
> 
> As Sean noted, unlike the previous case, there's no #UD if
> unrestricted guest is disabled and the guest accesses an MSR in
> Big RM. So the correct way to fix this is to attach a different
> callback when the msr access comes from the emulator.
> 
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Hou Wenlong <houwenlong93@linux.alibaba.com>
> ---

Queued with a small tweak: complete_emulated_msr_access is a version
of kvm_complete_insn_gp for emulated instructions, so call it
complete_emulated_insn_gp and give it an err argument.

Also I renamed __complete_emulated to complete_userspace_rdmsr, since
it applies also to the "fast" case.

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e651ff56b4ad..3928c96d28be 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -711,6 +711,17 @@ int kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err)
  }
  EXPORT_SYMBOL_GPL(kvm_complete_insn_gp);
  
+static int complete_emulated_insn_gp(struct kvm_vcpu *vcpu, int err)
+{
+	if (err) {
+		kvm_inject_gp(vcpu, 0);
+		return 1;
+	}
+
+	return kvm_emulate_instruction(vcpu, EMULTYPE_NO_DECODE | EMULTYPE_SKIP |
+				       EMULTYPE_COMPLETE_USER_EXIT);
+}
+
  void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault)
  {
  	++vcpu->stat.pf_guest;
@@ -1816,7 +1827,7 @@ int kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data)
  }
  EXPORT_SYMBOL_GPL(kvm_set_msr);
  
-static void __complete_emulated_rdmsr(struct kvm_vcpu *vcpu)
+static void complete_userspace_rdmsr(struct kvm_vcpu *vcpu)
  {
  	if (!vcpu->run->msr.error) {
  		kvm_rax_write(vcpu, (u32)vcpu->run->msr.data);
@@ -1826,37 +1837,24 @@ static void __complete_emulated_rdmsr(struct kvm_vcpu *vcpu)
  
  static int complete_emulated_msr_access(struct kvm_vcpu *vcpu)
  {
-	if (vcpu->run->msr.error) {
-		kvm_inject_gp(vcpu, 0);
-		return 1;
-	}
-
-	return kvm_emulate_instruction(vcpu, EMULTYPE_NO_DECODE | EMULTYPE_SKIP |
-				       EMULTYPE_COMPLETE_USER_EXIT);
+	return complete_emulated_insn_gp(vcpu, vcpu->run->msr.error);
  }
  
  static int complete_emulated_rdmsr(struct kvm_vcpu *vcpu)
  {
-	__complete_emulated_rdmsr(vcpu);
-
+	complete_userspace_rdmsr(vcpu);
  	return complete_emulated_msr_access(vcpu);
  }
  
-static int complete_emulated_wrmsr(struct kvm_vcpu *vcpu)
+static int complete_fast_msr_access(struct kvm_vcpu *vcpu)
  {
-	return complete_emulated_msr_access(vcpu);
+	return static_call(kvm_x86_complete_emulated_msr)(vcpu, vcpu->run->msr.error);
  }
  
  static int complete_fast_rdmsr(struct kvm_vcpu *vcpu)
  {
-	__complete_emulated_rdmsr(vcpu);
-
-	return static_call(kvm_x86_complete_emulated_msr)(vcpu, vcpu->run->msr.error);
-}
-
-static int complete_fast_wrmsr(struct kvm_vcpu *vcpu)
-{
-	return static_call(kvm_x86_complete_emulated_msr)(vcpu, vcpu->run->msr.error);
+	complete_userspace_rdmsr(vcpu);
+	return complete_fast_msr_access(vcpu);
  }
  
  static u64 kvm_msr_reason(int r)
@@ -1931,7 +1929,7 @@ int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu)
  	} else {
  		/* MSR write failed? See if we should ask user space */
  		if (kvm_msr_user_space(vcpu, ecx, KVM_EXIT_X86_WRMSR, data,
-				       complete_fast_wrmsr, r))
+				       complete_fast_msr_access, r))
  			return 0;
  		/* Signal all other negative errors to userspace */
  		if (r < 0)
@@ -7429,7 +7427,7 @@ static int emulator_set_msr(struct x86_emulate_ctxt *ctxt,
  	r = kvm_set_msr(vcpu, msr_index, data);
  
  	if (r && kvm_msr_user_space(vcpu, msr_index, KVM_EXIT_X86_WRMSR, data,
-				    complete_emulated_wrmsr, r)) {
+				    complete_emulated_msr_access, r)) {
  		/* Bounce to user space */
  		return X86EMUL_IO_NEEDED;
  	}


^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-11-26 17:42 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-22  2:59 [PATCH 0/2] KVM: some fixes about RDMSR/WRMSR instruction emulation Hou Wenlong
2021-10-22  2:59 ` [PATCH 1/2] KVM: VMX: fix instruction skipping when handling UD exception Hou Wenlong
2021-10-26 16:37   ` Sean Christopherson
2021-10-27  7:00     ` Hou Wenlong
2021-10-29 10:57     ` Hou Wenlong
2021-11-01 17:03       ` Sean Christopherson
2021-10-22  2:59 ` [PATCH 2/2] KVM: X86: Exit to userspace if RDMSR/WRMSR emulation returns X86EMUL_IO_NEEDED Hou Wenlong
2021-10-22  9:46 ` [PATCH 0/2] KVM: some fixes about RDMSR/WRMSR instruction emulation Paolo Bonzini
2021-10-22 15:10   ` Hou Wenlong
2021-11-02  9:15 ` [PATCH v2 0/4] KVM: x86: some fixes about msr access emulation Hou Wenlong
2021-11-02  9:15   ` [PATCH v2 1/4] KVM: x86: Handle 32-bit wrap of EIP for EMULTYPE_SKIP with flat code seg Hou Wenlong
2021-11-02  9:15   ` [PATCH v2 2/4] KVM: x86: Add an emulation type to handle completion of user exits Hou Wenlong
2021-11-02  9:15   ` [PATCH v2 3/4] KVM: x86: Use different callback if msr access comes from the emulator Hou Wenlong
2021-11-26 17:39     ` Paolo Bonzini
2021-11-02  9:15   ` [PATCH v2 4/4] KVM: x86: Exit to userspace if RDMSR/WRMSR emulation returns X86EMUL_IO_NEEDED Hou Wenlong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).