All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/4] KVM: async_pf: Fix async pf exception injection
@ 2017-06-29  3:01 Wanpeng Li
  2017-06-29  3:01 ` [PATCH v7 1/4] KVM: x86: Simple kvm_x86_ops->queue_exception parameter Wanpeng Li
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Wanpeng Li @ 2017-06-29  3:01 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Paolo Bonzini, Radim Krčmář, Wanpeng Li

 INFO: task gnome-terminal-:1734 blocked for more than 120 seconds.
       Not tainted 4.12.0-rc4+ #8
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 gnome-terminal- D    0  1734   1015 0x00000000
 Call Trace:
  __schedule+0x3cd/0xb30
  schedule+0x40/0x90
  kvm_async_pf_task_wait+0x1cc/0x270
  ? __vfs_read+0x37/0x150
  ? prepare_to_swait+0x22/0x70
  do_async_page_fault+0x77/0xb0
  ? do_async_page_fault+0x77/0xb0
  async_page_fault+0x28/0x30

This is triggered by running both win7 and win2016 on L1 KVM simultaneously, 
and then gives stress to memory on L1, I can observed this hang on L1 when 
at least ~70% swap area is occupied on L0.

This is due to async pf was injected to L2 which should be injected to L1, 
L2 guest starts receiving pagefault w/ bogus %cr2(apf token from the host 
actually), and L1 guest starts accumulating tasks stuck in D state in 
kvm_async_pf_task_wait() since missing PAGE_READY async_pfs.

This patchset fixes it according to Radim's proposal "force a nested VM exit 
from nested_vmx_check_exception if the injected #PF is async_pf and handle 
the #PF VM exit in L1". https://www.spinics.net/lists/kvm/msg142498.html

v6 -> v7:
 * drop KVM_GET/PUT_VCPU_EVENTS stuff for nested_apf

v5 -> v6: 
 * move vcpu_svm's apf_reason to vcpu->arch.apf.host_apf_reason
 * introduce function kvm_handle_page_fault() to be used by both VMX/SVM
 * introduce svm's codes posted by Paolo
 * introduce nested_apf 
 * better set MSR_KVM_ASYNC_PF_EN 

v4 -> v5:
 * utilize wrmsr_safe for MSR_KVM_ASYNC_PF_EN

v3 -> v4:
 * reuse pad field in kvm_vcpu_events for async_page_fault
 * update kvm_vcpu_events API documentations
 * change async_page_fault type in vcpu->arch.exception from bool to u8

v2 -> v3:
 * add the flag to the userspace interface(KVM_GET/PUT_VCPU_EVENTS)

v1 -> v2:
 * remove nested_vmx_check_exception nr parameter
 * construct a simple special vm-exit information field for async pf
 * introduce nested_apf_token to vcpu->arch.apf to avoid change the CR2 
   visible in L2 guest 
 * avoid pass the apf directed towards it (L1) into L2 if there is L3 
   at the moment

Wanpeng Li (4):
  KVM: x86: Simple kvm_x86_ops->queue_exception parameter
  KVM: async_pf: Add L1 guest async_pf #PF vmexit handler
  KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf
  KVM: async_pf: Let host know whether the guest support delivery async_pf as #PF vmexit

 Documentation/virtual/kvm/msr.txt    |  5 ++--
 arch/x86/include/asm/kvm_emulate.h   |  1 +
 arch/x86/include/asm/kvm_host.h      |  8 +++--
 arch/x86/include/uapi/asm/kvm_para.h |  1 +
 arch/x86/kernel/kvm.c                |  7 ++++-
 arch/x86/kvm/mmu.c                   | 35 +++++++++++++++++++++-
 arch/x86/kvm/mmu.h                   |  2 ++
 arch/x86/kvm/svm.c                   | 58 ++++++++++++------------------------
 arch/x86/kvm/vmx.c                   | 39 +++++++++++++++---------
 arch/x86/kvm/x86.c                   | 19 +++++++-----
 10 files changed, 108 insertions(+), 67 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v7 1/4] KVM: x86: Simple kvm_x86_ops->queue_exception parameter
  2017-06-29  3:01 [PATCH v7 0/4] KVM: async_pf: Fix async pf exception injection Wanpeng Li
@ 2017-06-29  3:01 ` Wanpeng Li
  2017-06-29  3:01 ` [PATCH v7 2/4] KVM: async_pf: Add L1 guest async_pf #PF vmexit handler Wanpeng Li
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Wanpeng Li @ 2017-06-29  3:01 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Paolo Bonzini, Radim Krčmář, Wanpeng Li

From: Wanpeng Li <wanpeng.li@hotmail.com>

This patch removes all arguments except the first in kvm_x86_ops->queue_exception
since they can extract the arguments from vcpu->arch.exception themselves, do the
same in nested_{vmx,svm}_check_exception.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 arch/x86/include/asm/kvm_host.h | 4 +---
 arch/x86/kvm/svm.c              | 8 +++++---
 arch/x86/kvm/vmx.c              | 8 +++++---
 arch/x86/kvm/x86.c              | 5 +----
 4 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 695605e..1f01bfb 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -948,9 +948,7 @@ struct kvm_x86_ops {
 				unsigned char *hypercall_addr);
 	void (*set_irq)(struct kvm_vcpu *vcpu);
 	void (*set_nmi)(struct kvm_vcpu *vcpu);
-	void (*queue_exception)(struct kvm_vcpu *vcpu, unsigned nr,
-				bool has_error_code, u32 error_code,
-				bool reinject);
+	void (*queue_exception)(struct kvm_vcpu *vcpu);
 	void (*cancel_injection)(struct kvm_vcpu *vcpu);
 	int (*interrupt_allowed)(struct kvm_vcpu *vcpu);
 	int (*nmi_allowed)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ba9891a..e1f8e89 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -631,11 +631,13 @@ static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
 	svm_set_interrupt_shadow(vcpu, 0);
 }
 
-static void svm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr,
-				bool has_error_code, u32 error_code,
-				bool reinject)
+static void svm_queue_exception(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
+	unsigned nr = vcpu->arch.exception.nr;
+	bool has_error_code = vcpu->arch.exception.has_error_code;
+	bool reinject = vcpu->arch.exception.reinject;
+	u32 error_code = vcpu->arch.exception.error_code;
 
 	/*
 	 * If we are within a nested VM we'd better #VMEXIT and let the guest
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ca5d2b9..df825bb 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2431,11 +2431,13 @@ static int nested_vmx_check_exception(struct kvm_vcpu *vcpu, unsigned nr)
 	return 1;
 }
 
-static void vmx_queue_exception(struct kvm_vcpu *vcpu, unsigned nr,
-				bool has_error_code, u32 error_code,
-				bool reinject)
+static void vmx_queue_exception(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	unsigned nr = vcpu->arch.exception.nr;
+	bool has_error_code = vcpu->arch.exception.has_error_code;
+	bool reinject = vcpu->arch.exception.reinject;
+	u32 error_code = vcpu->arch.exception.error_code;
 	u32 intr_info = nr | INTR_INFO_VALID_MASK;
 
 	if (!reinject && is_guest_mode(vcpu) &&
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0e846f0..7511c0a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6347,10 +6347,7 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool req_int_win)
 			kvm_update_dr7(vcpu);
 		}
 
-		kvm_x86_ops->queue_exception(vcpu, vcpu->arch.exception.nr,
-					  vcpu->arch.exception.has_error_code,
-					  vcpu->arch.exception.error_code,
-					  vcpu->arch.exception.reinject);
+		kvm_x86_ops->queue_exception(vcpu);
 		return 0;
 	}
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v7 2/4] KVM: async_pf: Add L1 guest async_pf #PF vmexit handler
  2017-06-29  3:01 [PATCH v7 0/4] KVM: async_pf: Fix async pf exception injection Wanpeng Li
  2017-06-29  3:01 ` [PATCH v7 1/4] KVM: x86: Simple kvm_x86_ops->queue_exception parameter Wanpeng Li
@ 2017-06-29  3:01 ` Wanpeng Li
  2017-07-12 21:44   ` Radim Krčmář
  2017-06-29  3:02 ` [PATCH v7 3/4] KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf Wanpeng Li
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Wanpeng Li @ 2017-06-29  3:01 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Paolo Bonzini, Radim Krčmář, Wanpeng Li

From: Wanpeng Li <wanpeng.li@hotmail.com>

This patch adds the L1 guest async page fault #PF vmexit handler, such
#PF is converted into vmexit from L2 to L1 on #PF which is then handled
by L1 similar to ordinary async page fault.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/mmu.c              | 33 +++++++++++++++++++++++++++++++++
 arch/x86/kvm/mmu.h              |  2 ++
 arch/x86/kvm/svm.c              | 36 +++++-------------------------------
 arch/x86/kvm/vmx.c              | 12 +++++-------
 5 files changed, 46 insertions(+), 38 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1f01bfb..e20d8a8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -645,6 +645,7 @@ struct kvm_vcpu_arch {
 		u64 msr_val;
 		u32 id;
 		bool send_user_only;
+		u32 host_apf_reason;
 	} apf;
 
 	/* OSVW MSRs (AMD only) */
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index cb82259..4a7dc00 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -46,6 +46,7 @@
 #include <asm/io.h>
 #include <asm/vmx.h>
 #include <asm/kvm_page_track.h>
+#include "trace.h"
 
 /*
  * When setting this variable to true it enables Two-Dimensional-Paging
@@ -3736,6 +3737,38 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 	return false;
 }
 
+int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
+				u64 fault_address)
+{
+	int r = 1;
+
+	switch (vcpu->arch.apf.host_apf_reason) {
+	default:
+		/* TDP won't cause page fault directly */
+		WARN_ON_ONCE(tdp_enabled);
+		trace_kvm_page_fault(fault_address, error_code);
+
+		if (kvm_event_needs_reinjection(vcpu))
+			kvm_mmu_unprotect_page_virt(vcpu, fault_address);
+		r = kvm_mmu_page_fault(vcpu, fault_address, error_code, NULL, 0);
+		break;
+	case KVM_PV_REASON_PAGE_NOT_PRESENT:
+		vcpu->arch.apf.host_apf_reason = 0;
+		local_irq_disable();
+		kvm_async_pf_task_wait(fault_address);
+		local_irq_enable();
+		break;
+	case KVM_PV_REASON_PAGE_READY:
+		vcpu->arch.apf.host_apf_reason = 0;
+		local_irq_disable();
+		kvm_async_pf_task_wake(fault_address);
+		local_irq_enable();
+		break;
+	}
+	return r;
+}
+EXPORT_SYMBOL_GPL(kvm_handle_page_fault);
+
 static bool
 check_hugepage_cache_consistency(struct kvm_vcpu *vcpu, gfn_t gfn, int level)
 {
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 330bf3a..2ae88f0 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -77,6 +77,8 @@ void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu);
 void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
 			     bool accessed_dirty);
 bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu);
+int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
+				u64 fault_address);
 
 static inline unsigned int kvm_mmu_available_pages(struct kvm *kvm)
 {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index e1f8e89..8f263bf 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -192,7 +192,6 @@ struct vcpu_svm {
 
 	unsigned int3_injected;
 	unsigned long int3_rip;
-	u32 apf_reason;
 
 	/* cached guest cpuid flags for faster access */
 	bool nrips_enabled	: 1;
@@ -2071,34 +2070,9 @@ static void svm_set_dr7(struct kvm_vcpu *vcpu, unsigned long value)
 static int pf_interception(struct vcpu_svm *svm)
 {
 	u64 fault_address = svm->vmcb->control.exit_info_2;
-	u64 error_code;
-	int r = 1;
+	u64 error_code = svm->vmcb->control.exit_info_1;
 
-	switch (svm->apf_reason) {
-	default:
-		error_code = svm->vmcb->control.exit_info_1;
-
-		trace_kvm_page_fault(fault_address, error_code);
-		if (!npt_enabled && kvm_event_needs_reinjection(&svm->vcpu))
-			kvm_mmu_unprotect_page_virt(&svm->vcpu, fault_address);
-		r = kvm_mmu_page_fault(&svm->vcpu, fault_address, error_code,
-			svm->vmcb->control.insn_bytes,
-			svm->vmcb->control.insn_len);
-		break;
-	case KVM_PV_REASON_PAGE_NOT_PRESENT:
-		svm->apf_reason = 0;
-		local_irq_disable();
-		kvm_async_pf_task_wait(fault_address);
-		local_irq_enable();
-		break;
-	case KVM_PV_REASON_PAGE_READY:
-		svm->apf_reason = 0;
-		local_irq_disable();
-		kvm_async_pf_task_wake(fault_address);
-		local_irq_enable();
-		break;
-	}
-	return r;
+	return kvm_handle_page_fault(&svm->vcpu, error_code, fault_address);
 }
 
 static int db_interception(struct vcpu_svm *svm)
@@ -2551,7 +2525,7 @@ static int nested_svm_exit_special(struct vcpu_svm *svm)
 		break;
 	case SVM_EXIT_EXCP_BASE + PF_VECTOR:
 		/* When we're shadowing, trap PFs, but not async PF */
-		if (!npt_enabled && svm->apf_reason == 0)
+		if (!npt_enabled && svm->vcpu.arch.apf.host_apf_reason == 0)
 			return NESTED_EXIT_HOST;
 		break;
 	default:
@@ -2594,7 +2568,7 @@ static int nested_svm_intercept(struct vcpu_svm *svm)
 			vmexit = NESTED_EXIT_DONE;
 		/* async page fault always cause vmexit */
 		else if ((exit_code == SVM_EXIT_EXCP_BASE + PF_VECTOR) &&
-			 svm->apf_reason != 0)
+			 svm->vcpu.arch.apf.host_apf_reason != 0)
 			vmexit = NESTED_EXIT_DONE;
 		break;
 	}
@@ -4891,7 +4865,7 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	/* if exit due to PF check for async PF */
 	if (svm->vmcb->control.exit_code == SVM_EXIT_EXCP_BASE + PF_VECTOR)
-		svm->apf_reason = kvm_read_and_reset_pf_reason();
+		svm->vcpu.arch.apf.host_apf_reason = kvm_read_and_reset_pf_reason();
 
 	if (npt_enabled) {
 		vcpu->arch.regs_avail &= ~(1 << VCPU_EXREG_PDPTR);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index df825bb..d20f794 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5648,14 +5648,8 @@ static int handle_exception(struct kvm_vcpu *vcpu)
 	}
 
 	if (is_page_fault(intr_info)) {
-		/* EPT won't cause page fault directly */
-		BUG_ON(enable_ept);
 		cr2 = vmcs_readl(EXIT_QUALIFICATION);
-		trace_kvm_page_fault(cr2, error_code);
-
-		if (kvm_event_needs_reinjection(vcpu))
-			kvm_mmu_unprotect_page_virt(vcpu, cr2);
-		return kvm_mmu_page_fault(vcpu, cr2, error_code, NULL, 0);
+		return kvm_handle_page_fault(vcpu, error_code, cr2);
 	}
 
 	ex_no = intr_info & INTR_INFO_VECTOR_MASK;
@@ -8602,6 +8596,10 @@ static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
 	vmx->exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
 	exit_intr_info = vmx->exit_intr_info;
 
+	/* if exit due to PF check for async PF */
+	if (is_page_fault(exit_intr_info))
+		vmx->vcpu.arch.apf.host_apf_reason = kvm_read_and_reset_pf_reason();
+
 	/* Handle machine checks before interrupts are enabled */
 	if (is_machine_check(exit_intr_info))
 		kvm_machine_check();
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v7 3/4] KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf
  2017-06-29  3:01 [PATCH v7 0/4] KVM: async_pf: Fix async pf exception injection Wanpeng Li
  2017-06-29  3:01 ` [PATCH v7 1/4] KVM: x86: Simple kvm_x86_ops->queue_exception parameter Wanpeng Li
  2017-06-29  3:01 ` [PATCH v7 2/4] KVM: async_pf: Add L1 guest async_pf #PF vmexit handler Wanpeng Li
@ 2017-06-29  3:02 ` Wanpeng Li
  2017-06-29  3:02 ` [PATCH v7 4/4] KVM: async_pf: Let host know whether the guest support delivery async_pf as #PF vmexit Wanpeng Li
  2017-06-29  7:13 ` [PATCH v7 0/4] KVM: async_pf: Fix async pf exception injection Wanpeng Li
  4 siblings, 0 replies; 11+ messages in thread
From: Wanpeng Li @ 2017-06-29  3:02 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Paolo Bonzini, Radim Krčmář, Wanpeng Li

From: Wanpeng Li <wanpeng.li@hotmail.com>

Add an nested_apf field to vcpu->arch.exception to identify an async page 
fault, and constructs the expected vm-exit information fields. Force a 
nested VM exit from nested_vmx_check_exception() if the injected #PF is 
async page fault.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 arch/x86/include/asm/kvm_emulate.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  2 ++
 arch/x86/kvm/svm.c                 | 16 ++++++++++------
 arch/x86/kvm/vmx.c                 | 17 ++++++++++++++---
 arch/x86/kvm/x86.c                 |  9 ++++++++-
 5 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index 722d0e5..fde36f1 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -23,6 +23,7 @@ struct x86_exception {
 	u16 error_code;
 	bool nested_page_fault;
 	u64 address; /* cr2 or nested page fault gpa */
+	u8 async_page_fault;
 };
 
 /*
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e20d8a8..71aef4b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -545,6 +545,7 @@ struct kvm_vcpu_arch {
 		bool reinject;
 		u8 nr;
 		u32 error_code;
+		u8 nested_apf;
 	} exception;
 
 	struct kvm_queued_interrupt {
@@ -646,6 +647,7 @@ struct kvm_vcpu_arch {
 		u32 id;
 		bool send_user_only;
 		u32 host_apf_reason;
+		unsigned long nested_apf_token;
 	} apf;
 
 	/* OSVW MSRs (AMD only) */
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 8f263bf..49cdb8e 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2367,15 +2367,19 @@ static int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr,
 	if (!is_guest_mode(&svm->vcpu))
 		return 0;
 
+	vmexit = nested_svm_intercept(svm);
+	if (vmexit != NESTED_EXIT_DONE)
+		return 0;
+
 	svm->vmcb->control.exit_code = SVM_EXIT_EXCP_BASE + nr;
 	svm->vmcb->control.exit_code_hi = 0;
 	svm->vmcb->control.exit_info_1 = error_code;
-	svm->vmcb->control.exit_info_2 = svm->vcpu.arch.cr2;
-
-	vmexit = nested_svm_intercept(svm);
-	if (vmexit == NESTED_EXIT_DONE)
-		svm->nested.exit_required = true;
+	if (svm->vcpu.arch.exception.nested_apf)
+		svm->vmcb->control.exit_info_2 = svm->vcpu.arch.apf.nested_apf_token;
+	else
+		svm->vmcb->control.exit_info_2 = svm->vcpu.arch.cr2;
 
+	svm->nested.exit_required = true;
 	return vmexit;
 }
 
@@ -2568,7 +2572,7 @@ static int nested_svm_intercept(struct vcpu_svm *svm)
 			vmexit = NESTED_EXIT_DONE;
 		/* async page fault always cause vmexit */
 		else if ((exit_code == SVM_EXIT_EXCP_BASE + PF_VECTOR) &&
-			 svm->vcpu.arch.apf.host_apf_reason != 0)
+			 svm->vcpu.arch.exception.nested_apf != 0)
 			vmexit = NESTED_EXIT_DONE;
 		break;
 	}
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index d20f794..8724ea6 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2418,13 +2418,24 @@ static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
  * KVM wants to inject page-faults which it got to the guest. This function
  * checks whether in a nested guest, we need to inject them to L1 or L2.
  */
-static int nested_vmx_check_exception(struct kvm_vcpu *vcpu, unsigned nr)
+static int nested_vmx_check_exception(struct kvm_vcpu *vcpu)
 {
 	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+	unsigned int nr = vcpu->arch.exception.nr;
 
-	if (!(vmcs12->exception_bitmap & (1u << nr)))
+	if (!((vmcs12->exception_bitmap & (1u << nr)) ||
+		(nr == PF_VECTOR && vcpu->arch.exception.nested_apf)))
 		return 0;
 
+	if (vcpu->arch.exception.nested_apf) {
+		vmcs_write32(VM_EXIT_INTR_ERROR_CODE, vcpu->arch.exception.error_code);
+		nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
+			PF_VECTOR | INTR_TYPE_HARD_EXCEPTION |
+			INTR_INFO_DELIVER_CODE_MASK | INTR_INFO_VALID_MASK,
+			vcpu->arch.apf.nested_apf_token);
+		return 1;
+	}
+
 	nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
 			  vmcs_read32(VM_EXIT_INTR_INFO),
 			  vmcs_readl(EXIT_QUALIFICATION));
@@ -2441,7 +2452,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu)
 	u32 intr_info = nr | INTR_INFO_VALID_MASK;
 
 	if (!reinject && is_guest_mode(vcpu) &&
-	    nested_vmx_check_exception(vcpu, nr))
+	    nested_vmx_check_exception(vcpu))
 		return;
 
 	if (has_error_code) {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7511c0a..2a4520f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -452,7 +452,12 @@ EXPORT_SYMBOL_GPL(kvm_complete_insn_gp);
 void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault)
 {
 	++vcpu->stat.pf_guest;
-	vcpu->arch.cr2 = fault->address;
+	vcpu->arch.exception.nested_apf =
+		is_guest_mode(vcpu) && fault->async_page_fault;
+	if (vcpu->arch.exception.nested_apf)
+		vcpu->arch.apf.nested_apf_token = fault->address;
+	else
+		vcpu->arch.cr2 = fault->address;
 	kvm_queue_exception_e(vcpu, PF_VECTOR, fault->error_code);
 }
 EXPORT_SYMBOL_GPL(kvm_inject_page_fault);
@@ -8573,6 +8578,7 @@ void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
 		fault.error_code = 0;
 		fault.nested_page_fault = false;
 		fault.address = work->arch.token;
+		fault.async_page_fault = true;
 		kvm_inject_page_fault(vcpu, &fault);
 	}
 }
@@ -8595,6 +8601,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 		fault.error_code = 0;
 		fault.nested_page_fault = false;
 		fault.address = work->arch.token;
+		fault.async_page_fault = true;
 		kvm_inject_page_fault(vcpu, &fault);
 	}
 	vcpu->arch.apf.halted = false;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v7 4/4] KVM: async_pf: Let host know whether the guest support delivery async_pf as #PF vmexit
  2017-06-29  3:01 [PATCH v7 0/4] KVM: async_pf: Fix async pf exception injection Wanpeng Li
                   ` (2 preceding siblings ...)
  2017-06-29  3:02 ` [PATCH v7 3/4] KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf Wanpeng Li
@ 2017-06-29  3:02 ` Wanpeng Li
  2017-06-29  7:13 ` [PATCH v7 0/4] KVM: async_pf: Fix async pf exception injection Wanpeng Li
  4 siblings, 0 replies; 11+ messages in thread
From: Wanpeng Li @ 2017-06-29  3:02 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Paolo Bonzini, Radim Krčmář, Wanpeng Li

From: Wanpeng Li <wanpeng.li@hotmail.com>

Adds another flag bit (bit 2) to MSR_KVM_ASYNC_PF_EN. If bit 2 is 1, async
page faults are delivered to L1 as #PF vmexits; if bit 2 is 0, kvm_can_do_async_pf
returns 0 if in guest mode.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 Documentation/virtual/kvm/msr.txt    | 5 +++--
 arch/x86/include/asm/kvm_host.h      | 1 +
 arch/x86/include/uapi/asm/kvm_para.h | 1 +
 arch/x86/kernel/kvm.c                | 7 ++++++-
 arch/x86/kvm/mmu.c                   | 2 +-
 arch/x86/kvm/vmx.c                   | 2 +-
 arch/x86/kvm/x86.c                   | 5 +++--
 7 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/Documentation/virtual/kvm/msr.txt b/Documentation/virtual/kvm/msr.txt
index 0a9ea51..1ebecc1 100644
--- a/Documentation/virtual/kvm/msr.txt
+++ b/Documentation/virtual/kvm/msr.txt
@@ -166,10 +166,11 @@ MSR_KVM_SYSTEM_TIME: 0x12
 MSR_KVM_ASYNC_PF_EN: 0x4b564d02
 	data: Bits 63-6 hold 64-byte aligned physical address of a
 	64 byte memory area which must be in guest RAM and must be
-	zeroed. Bits 5-2 are reserved and should be zero. Bit 0 is 1
+	zeroed. Bits 5-3 are reserved and should be zero. Bit 0 is 1
 	when asynchronous page faults are enabled on the vcpu 0 when
 	disabled. Bit 1 is 1 if asynchronous page faults can be injected
-	when vcpu is in cpl == 0.
+	when vcpu is in cpl == 0. Bit 2 is 1 if asynchronous page faults
+	are delivered to L1 as #PF vmexits.
 
 	First 4 byte of 64 byte memory location will be written to by
 	the hypervisor at the time of asynchronous page fault (APF)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 71aef4b..a981ab8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -648,6 +648,7 @@ struct kvm_vcpu_arch {
 		bool send_user_only;
 		u32 host_apf_reason;
 		unsigned long nested_apf_token;
+		bool delivery_as_pf_vmexit;
 	} apf;
 
 	/* OSVW MSRs (AMD only) */
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index cff0bb6..a965e5b 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -67,6 +67,7 @@ struct kvm_clock_pairing {
 
 #define KVM_ASYNC_PF_ENABLED			(1 << 0)
 #define KVM_ASYNC_PF_SEND_ALWAYS		(1 << 1)
+#define KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT	(1 << 2)
 
 /* Operations for KVM_HC_MMU_OP */
 #define KVM_MMU_OP_WRITE_PTE            1
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 43e10d6..71c17a5 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -330,7 +330,12 @@ static void kvm_guest_cpu_init(void)
 #ifdef CONFIG_PREEMPT
 		pa |= KVM_ASYNC_PF_SEND_ALWAYS;
 #endif
-		wrmsrl(MSR_KVM_ASYNC_PF_EN, pa | KVM_ASYNC_PF_ENABLED);
+		pa |= KVM_ASYNC_PF_ENABLED;
+
+		/* Async page fault support for L1 hypervisor is optional */
+		if (wrmsr_safe(MSR_KVM_ASYNC_PF_EN,
+			(pa | KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT) & 0xffffffff, pa >> 32) < 0)
+			wrmsrl(MSR_KVM_ASYNC_PF_EN, pa);
 		__this_cpu_write(apf_reason.enabled, 1);
 		printk(KERN_INFO"KVM setup async PF for cpu %d\n",
 		       smp_processor_id());
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 4a7dc00..fb8c35f 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3705,7 +3705,7 @@ bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu)
 		     kvm_event_needs_reinjection(vcpu)))
 		return false;
 
-	if (is_guest_mode(vcpu))
+	if (!vcpu->arch.apf.delivery_as_pf_vmexit && is_guest_mode(vcpu))
 		return false;
 
 	return kvm_x86_ops->interrupt_allowed(vcpu);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 8724ea6..4f616db 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8001,7 +8001,7 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
 		if (is_nmi(intr_info))
 			return false;
 		else if (is_page_fault(intr_info))
-			return enable_ept;
+			return !vmx->vcpu.arch.apf.host_apf_reason && enable_ept;
 		else if (is_no_device(intr_info) &&
 			 !(vmcs12->guest_cr0 & X86_CR0_TS))
 			return false;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2a4520f..fdeeb66 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2065,8 +2065,8 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
 {
 	gpa_t gpa = data & ~0x3f;
 
-	/* Bits 2:5 are reserved, Should be zero */
-	if (data & 0x3c)
+	/* Bits 3:5 are reserved, Should be zero */
+	if (data & 0x38)
 		return 1;
 
 	vcpu->arch.apf.msr_val = data;
@@ -2082,6 +2082,7 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
 		return 1;
 
 	vcpu->arch.apf.send_user_only = !(data & KVM_ASYNC_PF_SEND_ALWAYS);
+	vcpu->arch.apf.delivery_as_pf_vmexit = data & KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT;
 	kvm_async_pf_wakeup_all(vcpu);
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v7 0/4] KVM: async_pf: Fix async pf exception injection
  2017-06-29  3:01 [PATCH v7 0/4] KVM: async_pf: Fix async pf exception injection Wanpeng Li
                   ` (3 preceding siblings ...)
  2017-06-29  3:02 ` [PATCH v7 4/4] KVM: async_pf: Let host know whether the guest support delivery async_pf as #PF vmexit Wanpeng Li
@ 2017-06-29  7:13 ` Wanpeng Li
  2017-06-29 12:15   ` Paolo Bonzini
  4 siblings, 1 reply; 11+ messages in thread
From: Wanpeng Li @ 2017-06-29  7:13 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Paolo Bonzini, Radim Krčmář, Wanpeng Li

2017-06-29 11:01 GMT+08:00 Wanpeng Li <kernellwp@gmail.com>:
>  INFO: task gnome-terminal-:1734 blocked for more than 120 seconds.
>        Not tainted 4.12.0-rc4+ #8
>  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>  gnome-terminal- D    0  1734   1015 0x00000000
>  Call Trace:
>   __schedule+0x3cd/0xb30
>   schedule+0x40/0x90
>   kvm_async_pf_task_wait+0x1cc/0x270
>   ? __vfs_read+0x37/0x150
>   ? prepare_to_swait+0x22/0x70
>   do_async_page_fault+0x77/0xb0
>   ? do_async_page_fault+0x77/0xb0
>   async_page_fault+0x28/0x30
>
> This is triggered by running both win7 and win2016 on L1 KVM simultaneously,
> and then gives stress to memory on L1, I can observed this hang on L1 when
> at least ~70% swap area is occupied on L0.
>
> This is due to async pf was injected to L2 which should be injected to L1,
> L2 guest starts receiving pagefault w/ bogus %cr2(apf token from the host
> actually), and L1 guest starts accumulating tasks stuck in D state in
> kvm_async_pf_task_wait() since missing PAGE_READY async_pfs.
>
> This patchset fixes it according to Radim's proposal "force a nested VM exit
> from nested_vmx_check_exception if the injected #PF is async_pf and handle
> the #PF VM exit in L1". https://www.spinics.net/lists/kvm/msg142498.html

Note: The patchset depends on commit d4912215d10 ("KVM: nVMX: Fix
exception injection") and commit 9bc1f09f6fa76fd (KVM: async_pf: avoid
async pf injection when in guest mode) on the master branch.

Regards,
Wanpeng Li

>
> v6 -> v7:
>  * drop KVM_GET/PUT_VCPU_EVENTS stuff for nested_apf
>
> v5 -> v6:
>  * move vcpu_svm's apf_reason to vcpu->arch.apf.host_apf_reason
>  * introduce function kvm_handle_page_fault() to be used by both VMX/SVM
>  * introduce svm's codes posted by Paolo
>  * introduce nested_apf
>  * better set MSR_KVM_ASYNC_PF_EN
>
> v4 -> v5:
>  * utilize wrmsr_safe for MSR_KVM_ASYNC_PF_EN
>
> v3 -> v4:
>  * reuse pad field in kvm_vcpu_events for async_page_fault
>  * update kvm_vcpu_events API documentations
>  * change async_page_fault type in vcpu->arch.exception from bool to u8
>
> v2 -> v3:
>  * add the flag to the userspace interface(KVM_GET/PUT_VCPU_EVENTS)
>
> v1 -> v2:
>  * remove nested_vmx_check_exception nr parameter
>  * construct a simple special vm-exit information field for async pf
>  * introduce nested_apf_token to vcpu->arch.apf to avoid change the CR2
>    visible in L2 guest
>  * avoid pass the apf directed towards it (L1) into L2 if there is L3
>    at the moment
>
> Wanpeng Li (4):
>   KVM: x86: Simple kvm_x86_ops->queue_exception parameter
>   KVM: async_pf: Add L1 guest async_pf #PF vmexit handler
>   KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf
>   KVM: async_pf: Let host know whether the guest support delivery async_pf as #PF vmexit
>
>  Documentation/virtual/kvm/msr.txt    |  5 ++--
>  arch/x86/include/asm/kvm_emulate.h   |  1 +
>  arch/x86/include/asm/kvm_host.h      |  8 +++--
>  arch/x86/include/uapi/asm/kvm_para.h |  1 +
>  arch/x86/kernel/kvm.c                |  7 ++++-
>  arch/x86/kvm/mmu.c                   | 35 +++++++++++++++++++++-
>  arch/x86/kvm/mmu.h                   |  2 ++
>  arch/x86/kvm/svm.c                   | 58 ++++++++++++------------------------
>  arch/x86/kvm/vmx.c                   | 39 +++++++++++++++---------
>  arch/x86/kvm/x86.c                   | 19 +++++++-----
>  10 files changed, 108 insertions(+), 67 deletions(-)
>
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v7 0/4] KVM: async_pf: Fix async pf exception injection
  2017-06-29  7:13 ` [PATCH v7 0/4] KVM: async_pf: Fix async pf exception injection Wanpeng Li
@ 2017-06-29 12:15   ` Paolo Bonzini
  0 siblings, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2017-06-29 12:15 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm; +Cc: Radim Krčmář, Wanpeng Li



On 29/06/2017 09:13, Wanpeng Li wrote:
>> This patchset fixes it according to Radim's proposal "force a nested VM exit
>> from nested_vmx_check_exception if the injected #PF is async_pf and handle
>> the #PF VM exit in L1". https://www.spinics.net/lists/kvm/msg142498.html
>
> Note: The patchset depends on commit d4912215d10 ("KVM: nVMX: Fix
> exception injection") and commit 9bc1f09f6fa76fd (KVM: async_pf: avoid
> async pf injection when in guest mode) on the master branch.

Thanks for reminding me.  I'll wait for the merge window before applying
it, but it looks good.

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v7 2/4] KVM: async_pf: Add L1 guest async_pf #PF vmexit handler
  2017-06-29  3:01 ` [PATCH v7 2/4] KVM: async_pf: Add L1 guest async_pf #PF vmexit handler Wanpeng Li
@ 2017-07-12 21:44   ` Radim Krčmář
  2017-07-13  1:34     ` Wanpeng Li
  2017-07-13 15:29     ` Radim Krčmář
  0 siblings, 2 replies; 11+ messages in thread
From: Radim Krčmář @ 2017-07-12 21:44 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: linux-kernel, kvm, Paolo Bonzini, Wanpeng Li

2017-06-28 20:01-0700, Wanpeng Li:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
> 
> This patch adds the L1 guest async page fault #PF vmexit handler, such
> #PF is converted into vmexit from L2 to L1 on #PF which is then handled
> by L1 similar to ordinary async page fault.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---

This patch breaks SVM, so I've taken the series off kvm/queue for now;
I'll look into it tomorrow.  The error is:

 BUG: unable to handle kernel paging request at ffffffffc0735ad2
 IP: report_bug+0x94/0x120
 PGD 43e14067 
 P4D 43e14067 
 PUD 43e16067 
 PMD 2164bf067 
 PTE 80000002181fc161

 Oops: 0003 [#1] SMP
 Modules linked in: kvm_amd(OE) kvm(OE) irqbypass(E) xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables sunrpc snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm ppdev joydev parport_serial parport_pc snd_timer parport k10temp sky2 snd shpchp sp5100_tco acpi_cpufreq wmi soundcore i2c_piix4 amdkfd amd_iommu_v2 radeon i2c_algo_bit drm_kms_helper uas serio_raw usb_storage ttm pata_atiixp drm ata_generic pata_acpi pata_jmicron [last unloaded: irqbypass]
 CPU: 3 PID: 1868 Comm: CPU 0/KVM Tainted: G           OE   4.12.0+ #1
 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS 080014  03/07/2008
 task: ffff8bcbe3f1b140 task.stack: ffffabb481970000
 RIP: 0010:report_bug+0x94/0x120
 RSP: 0018:ffffabb481973a70 EFLAGS: 00010202
 RAX: 0000000000000907 RBX: ffffabb481973bd8 RCX: ffffffffc0735ac8
 RDX: 0000000000000001 RSI: 0000000000000ed0 RDI: 0000000000000001
 RBP: ffffabb481973a90 R08: 0000000000000001 R09: 7f9f279200000000
 R10: ffffabb4819739d0 R11: 0000000000000000 R12: ffffffffc07023d0
 R13: ffffffffc0733078 R14: 0000000000000004 R15: ffffabb481973bd8
 FS:  0000000000000000(0000) GS:ffff8bcbe7400000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffffffffc0735ad2 CR3: 00000002189d7000 CR4: 00000000000006e0
 Call Trace:
  ? kvm_handle_page_fault+0x1f0/0x200 [kvm]
  fixup_bug+0x2e/0x50
  do_trap+0x119/0x150
  do_error_trap+0xa3/0x160
  ? kvm_handle_page_fault+0x1f0/0x200 [kvm]
  ? trace_hardirqs_off_thunk+0x1a/0x1c
  do_invalid_op+0x20/0x30
  invalid_op+0x1e/0x30
 RIP: 0010:kvm_handle_page_fault+0x1f0/0x200 [kvm]
 RSP: 0018:ffffabb481973c80 EFLAGS: 00010202
 RAX: 0000000000000000 RBX: ffff8bcbd7550000 RCX: 0000000000000000
 RDX: 00000000fffffff0 RSI: 0000000000000014 RDI: ffff8bcbd7550000
 RBP: ffffabb481973ca0 R08: 0000000000000001 R09: 27624b3d00000000
 R10: ffffabb481973ca8 R11: ffff8bcbe3fb25f0 R12: 00000000fffffff0
 R13: 0000000000000014 R14: ffff8bcbd7550000 R15: ffff8bcbd7550000
  pf_interception+0x20/0x30 [kvm_amd]
  handle_exit+0x213/0xbb0 [kvm_amd]
  kvm_arch_vcpu_ioctl_run+0x7f1/0x1ae0 [kvm]
  kvm_vcpu_ioctl+0x2ac/0x6f0 [kvm]
  ? kvm_vcpu_ioctl+0x2ac/0x6f0 [kvm]
  ? sched_clock+0x9/0x10
  ? debug_lockdep_rcu_enabled+0x1d/0x30
  do_vfs_ioctl+0xa6/0x6c0
  SyS_ioctl+0x79/0x90
  entry_SYSCALL_64_fastpath+0x1f/0xbe
 RIP: 0033:0x7fabf6d815c7
 RSP: 002b:00007fabe87e77c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
 RAX: ffffffffffffffda RBX: 0000000000010000 RCX: 00007fabf6d815c7
 RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000010
 RBP: 000055a7cb502fe0 R08: 000055a7cb51e410 R09: 000055a7cb509390
 R10: 000055a7cdb01000 R11: 0000000000000246 R12: 000055a7cdace0a6
 R13: 0000000000000000 R14: 00007fac00621000 R15: 000055a7cdace000
 Code: 74 59 0f b7 41 0a 4c 63 69 04 0f b7 71 08 89 c7 49 01 cd 83 e7 01 a8 02 74 15 66 85 ff 74 10 a8 04 ba 01 00 00 00 75 26 83 c8 04 <66> 89 41 0a 66 85 ff 74 49 0f b6 49 0b 4c 89 e2 45 31 c9 49 89 
 RIP: report_bug+0x94/0x120 RSP: ffffabb481973a70
 CR2: ffffffffc0735ad2
 ---[ end trace aec3a1f15664a4af ]---
 BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:33
 in_atomic(): 0, irqs_disabled(): 1, pid: 1868, name: CPU 0/KVM
 INFO: lockdep is turned off.
 irq event stamp: 1868
 hardirqs last  enabled at (1867): [<ffffffffa398eaab>] restore_regs_and_iret+0x0/0x1d
 hardirqs last disabled at (1868): [<ffffffffa398f7dc>] error_entry+0x7c/0xd0
 softirqs last  enabled at (1834): [<ffffffffa3992f62>] __do_softirq+0x382/0x4ed
 softirqs last disabled at (1817): [<ffffffffa30b9a2f>] irq_exit+0x10f/0x120
 CPU: 3 PID: 1868 Comm: CPU 0/KVM Tainted: G      D    OE   4.12.0+ #1
 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS 080014  03/07/2008
 Call Trace:
  dump_stack+0x8e/0xcd
  ___might_sleep+0x164/0x250
  __might_sleep+0x4a/0x80
  exit_signals+0x33/0x240
  do_exit+0xb4/0xd20
  ? SyS_ioctl+0x79/0x90
  rewind_stack_do_exit+0x17/0x20
 RIP: 0033:0x7fabf6d815c7
 RSP: 002b:00007fabe87e77c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
 RAX: ffffffffffffffda RBX: 0000000000010000 RCX: 00007fabf6d815c7
 RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000010
 RBP: 000055a7cb502fe0 R08: 000055a7cb51e410 R09: 000055a7cb509390
 R10: 000055a7cdb01000 R11: 0000000000000246 R12: 000055a7cdace0a6
 R13: 0000000000000000 R14: 00007fac00621000 R15: 000055a7cdace000

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v7 2/4] KVM: async_pf: Add L1 guest async_pf #PF vmexit handler
  2017-07-12 21:44   ` Radim Krčmář
@ 2017-07-13  1:34     ` Wanpeng Li
  2017-07-13 15:29     ` Radim Krčmář
  1 sibling, 0 replies; 11+ messages in thread
From: Wanpeng Li @ 2017-07-13  1:34 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: linux-kernel, kvm, Paolo Bonzini, Wanpeng Li

2017-07-13 5:44 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> 2017-06-28 20:01-0700, Wanpeng Li:
>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>
>> This patch adds the L1 guest async page fault #PF vmexit handler, such
>> #PF is converted into vmexit from L2 to L1 on #PF which is then handled
>> by L1 similar to ordinary async page fault.
>>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> ---
>
> This patch breaks SVM, so I've taken the series off kvm/queue for now;

> I'll look into it tomorrow.

Thanks for the help. :)

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v7 2/4] KVM: async_pf: Add L1 guest async_pf #PF vmexit handler
  2017-07-12 21:44   ` Radim Krčmář
  2017-07-13  1:34     ` Wanpeng Li
@ 2017-07-13 15:29     ` Radim Krčmář
  2017-07-14  1:40       ` Wanpeng Li
  1 sibling, 1 reply; 11+ messages in thread
From: Radim Krčmář @ 2017-07-13 15:29 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: linux-kernel, kvm, Paolo Bonzini, Wanpeng Li

2017-07-12 23:44+0200, Radim Krčmář:
> 2017-06-28 20:01-0700, Wanpeng Li:
> > From: Wanpeng Li <wanpeng.li@hotmail.com>
> > 
> > This patch adds the L1 guest async page fault #PF vmexit handler, such
> > #PF is converted into vmexit from L2 to L1 on #PF which is then handled
> > by L1 similar to ordinary async page fault.
> > 
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Radim Krčmář <rkrcmar@redhat.com>
> > Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> > ---
> 
> This patch breaks SVM, so I've taken the series off kvm/queue for now;

The error is triggered by 'WARN_ON_ONCE(tdp_enabled);', because
pf_interception() handles both cases.  (The bizzare part is that it
doesn't warn.)

I think this hunk on top of the patch would be good.  It makes the
WARN_ON_ONCE specific to VMX and also preserves the parameters that SVM
had before.

(Passes basic tests, haven't done the nested async_pf test yet.)


diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index f37c0307dcb0..338cb4c8cbb9 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3782,17 +3782,16 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 }
 
 int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
-				u64 fault_address)
+				u64 fault_address, char *insn, int insn_len,
+				bool need_unprotect)
 {
 	int r = 1;
 
 	switch (vcpu->arch.apf.host_apf_reason) {
 	default:
-		/* TDP won't cause page fault directly */
-		WARN_ON_ONCE(tdp_enabled);
 		trace_kvm_page_fault(fault_address, error_code);
 
-		if (kvm_event_needs_reinjection(vcpu))
+		if (need_unprotect && kvm_event_needs_reinjection(vcpu))
 			kvm_mmu_unprotect_page_virt(vcpu, fault_address);
 		r = kvm_mmu_page_fault(vcpu, fault_address, error_code, NULL, 0);
 		break;
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 270d9adaa039..d7d248a000dd 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -78,7 +78,8 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
 			     bool accessed_dirty);
 bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu);
 int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
-				u64 fault_address);
+				u64 fault_address, char *insn, int insn_len,
+				bool need_unprotect);
 
 static inline unsigned int kvm_mmu_available_pages(struct kvm *kvm)
 {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 659b610c4711..fb23497cf915 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2123,7 +2123,9 @@ static int pf_interception(struct vcpu_svm *svm)
 	u64 fault_address = svm->vmcb->control.exit_info_2;
 	u64 error_code = svm->vmcb->control.exit_info_1;
 
-	return kvm_handle_page_fault(&svm->vcpu, error_code, fault_address);
+	return kvm_handle_page_fault(&svm->vcpu, error_code, fault_address,
+			svm->vmcb->control.insn_bytes,
+			svm->vmcb->control.insn_len, !npt_enabled);
 }
 
 static int db_interception(struct vcpu_svm *svm)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ab33eace4f66..2e8cfb2f1371 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5699,7 +5699,10 @@ static int handle_exception(struct kvm_vcpu *vcpu)
 
 	if (is_page_fault(intr_info)) {
 		cr2 = vmcs_readl(EXIT_QUALIFICATION);
-		return kvm_handle_page_fault(vcpu, error_code, cr2);
+		/* TDP won't cause page fault directly */
+		WARN_ON_ONCE(!vcpu->arch.apf.host_apf_reason && tdp_enabled);
+		return kvm_handle_page_fault(vcpu, error_code, cr2, NULL, 0,
+				true);
 	}
 
 	ex_no = intr_info & INTR_INFO_VECTOR_MASK;

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v7 2/4] KVM: async_pf: Add L1 guest async_pf #PF vmexit handler
  2017-07-13 15:29     ` Radim Krčmář
@ 2017-07-14  1:40       ` Wanpeng Li
  0 siblings, 0 replies; 11+ messages in thread
From: Wanpeng Li @ 2017-07-14  1:40 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: linux-kernel, kvm, Paolo Bonzini, Wanpeng Li

2017-07-13 23:29 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> 2017-07-12 23:44+0200, Radim Krčmář:
>> 2017-06-28 20:01-0700, Wanpeng Li:
>> > From: Wanpeng Li <wanpeng.li@hotmail.com>
>> >
>> > This patch adds the L1 guest async page fault #PF vmexit handler, such
>> > #PF is converted into vmexit from L2 to L1 on #PF which is then handled
>> > by L1 similar to ordinary async page fault.
>> >
>> > Cc: Paolo Bonzini <pbonzini@redhat.com>
>> > Cc: Radim Krčmář <rkrcmar@redhat.com>
>> > Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> > ---
>>
>> This patch breaks SVM, so I've taken the series off kvm/queue for now;
>
> The error is triggered by 'WARN_ON_ONCE(tdp_enabled);', because
> pf_interception() handles both cases.  (The bizzare part is that it
> doesn't warn.)
>
> I think this hunk on top of the patch would be good.  It makes the

Thanks Radim! The work is really appreciated. :)

> WARN_ON_ONCE specific to VMX and also preserves the parameters that SVM
> had before.

I replace the tdp_enabled by enable_ept in VMX since there is a
warning: "tdp_enabled" in [kvm-intel.ko] undefined! Btw, I just sent
out v8, hope both
the v8 and vm86 stuff can catch the end of the merge window. :)

Regards,
Wanpeng Li

>
> (Passes basic tests, haven't done the nested async_pf test yet.)
>
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index f37c0307dcb0..338cb4c8cbb9 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -3782,17 +3782,16 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
>  }
>
>  int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
> -                               u64 fault_address)
> +                               u64 fault_address, char *insn, int insn_len,
> +                               bool need_unprotect)
>  {
>         int r = 1;
>
>         switch (vcpu->arch.apf.host_apf_reason) {
>         default:
> -               /* TDP won't cause page fault directly */
> -               WARN_ON_ONCE(tdp_enabled);
>                 trace_kvm_page_fault(fault_address, error_code);
>
> -               if (kvm_event_needs_reinjection(vcpu))
> +               if (need_unprotect && kvm_event_needs_reinjection(vcpu))
>                         kvm_mmu_unprotect_page_virt(vcpu, fault_address);
>                 r = kvm_mmu_page_fault(vcpu, fault_address, error_code, NULL, 0);
>                 break;
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 270d9adaa039..d7d248a000dd 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -78,7 +78,8 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
>                              bool accessed_dirty);
>  bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu);
>  int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
> -                               u64 fault_address);
> +                               u64 fault_address, char *insn, int insn_len,
> +                               bool need_unprotect);
>
>  static inline unsigned int kvm_mmu_available_pages(struct kvm *kvm)
>  {
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 659b610c4711..fb23497cf915 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -2123,7 +2123,9 @@ static int pf_interception(struct vcpu_svm *svm)
>         u64 fault_address = svm->vmcb->control.exit_info_2;
>         u64 error_code = svm->vmcb->control.exit_info_1;
>
> -       return kvm_handle_page_fault(&svm->vcpu, error_code, fault_address);
> +       return kvm_handle_page_fault(&svm->vcpu, error_code, fault_address,
> +                       svm->vmcb->control.insn_bytes,
> +                       svm->vmcb->control.insn_len, !npt_enabled);
>  }
>
>  static int db_interception(struct vcpu_svm *svm)
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index ab33eace4f66..2e8cfb2f1371 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -5699,7 +5699,10 @@ static int handle_exception(struct kvm_vcpu *vcpu)
>
>         if (is_page_fault(intr_info)) {
>                 cr2 = vmcs_readl(EXIT_QUALIFICATION);
> -               return kvm_handle_page_fault(vcpu, error_code, cr2);
> +               /* TDP won't cause page fault directly */
> +               WARN_ON_ONCE(!vcpu->arch.apf.host_apf_reason && tdp_enabled);
> +               return kvm_handle_page_fault(vcpu, error_code, cr2, NULL, 0,
> +                               true);
>         }
>
>         ex_no = intr_info & INTR_INFO_VECTOR_MASK;

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-07-14  1:40 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-29  3:01 [PATCH v7 0/4] KVM: async_pf: Fix async pf exception injection Wanpeng Li
2017-06-29  3:01 ` [PATCH v7 1/4] KVM: x86: Simple kvm_x86_ops->queue_exception parameter Wanpeng Li
2017-06-29  3:01 ` [PATCH v7 2/4] KVM: async_pf: Add L1 guest async_pf #PF vmexit handler Wanpeng Li
2017-07-12 21:44   ` Radim Krčmář
2017-07-13  1:34     ` Wanpeng Li
2017-07-13 15:29     ` Radim Krčmář
2017-07-14  1:40       ` Wanpeng Li
2017-06-29  3:02 ` [PATCH v7 3/4] KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf Wanpeng Li
2017-06-29  3:02 ` [PATCH v7 4/4] KVM: async_pf: Let host know whether the guest support delivery async_pf as #PF vmexit Wanpeng Li
2017-06-29  7:13 ` [PATCH v7 0/4] KVM: async_pf: Fix async pf exception injection Wanpeng Li
2017-06-29 12:15   ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.