linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] KVM: async_pf: Fix async_pf exception injection
@ 2017-06-13  6:08 Wanpeng Li
  2017-06-13  6:08 ` [PATCH 1/4] KVM: x86: Simple kvm_x86_ops->queue_exception parameter Wanpeng Li
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Wanpeng Li @ 2017-06-13  6:08 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Paolo Bonzini, Radim Krčmář, Wanpeng Li

 INFO: task gnome-terminal-:1734 blocked for more than 120 seconds.
       Not tainted 4.12.0-rc4+ #8
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 gnome-terminal- D    0  1734   1015 0x00000000
 Call Trace:
  __schedule+0x3cd/0xb30
  schedule+0x40/0x90
  kvm_async_pf_task_wait+0x1cc/0x270
  ? __vfs_read+0x37/0x150
  ? prepare_to_swait+0x22/0x70
  do_async_page_fault+0x77/0xb0
  ? do_async_page_fault+0x77/0xb0
  async_page_fault+0x28/0x30

This is triggered by running both win7 and win2016 on L1 KVM simultaneously, 
and then gives stress to memory on L1, I can observed this hang on L1 when 
at least ~70% swap area is occupied on L0.

This is due to async pf was injected to L2 which should be injected to L1, 
L2 guest starts receiving pagefault w/ bogus %cr2(apf token from the host 
actually), and L1 guest starts accumulating tasks stuck in D state in 
kvm_async_pf_task_wait() since missing PAGE_READY async_pfs.

This patchset fixes it according to Radim's proposal "force a nested VM exit 
from nested_vmx_check_exception if the injected #PF is async_pf and handle 
the #PF VM exit in L1". https://www.spinics.net/lists/kvm/msg142498.html

Note: The patchset almost not touch SVM since I don't have AMD CPU to verify
the modification.

Wanpeng Li (4):
  KVM: x86: Simple kvm_x86_ops->queue_exception parameter
  KVM: async_pf: Add L1 guest async_pf #PF vmexit handler
  KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf
  KVM: async_pf: Let host know whether the guest support delivery async_pf as #PF vmexit

 Documentation/virtual/kvm/msr.txt    |  5 +--
 arch/x86/include/asm/kvm_emulate.h   |  1 +
 arch/x86/include/asm/kvm_host.h      |  6 ++--
 arch/x86/include/uapi/asm/kvm_para.h |  1 +
 arch/x86/kernel/kvm.c                |  1 +
 arch/x86/kvm/mmu.c                   |  2 +-
 arch/x86/kvm/svm.c                   |  8 +++--
 arch/x86/kvm/vmx.c                   | 63 +++++++++++++++++++++++++++++-------
 arch/x86/kvm/x86.c                   | 13 ++++----
 9 files changed, 73 insertions(+), 27 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/4] KVM: x86: Simple kvm_x86_ops->queue_exception parameter
  2017-06-13  6:08 [PATCH 0/4] KVM: async_pf: Fix async_pf exception injection Wanpeng Li
@ 2017-06-13  6:08 ` Wanpeng Li
  2017-06-13  6:08 ` [PATCH 2/4] KVM: async_pf: Add L1 guest async_pf #PF vmexit handler Wanpeng Li
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Wanpeng Li @ 2017-06-13  6:08 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Paolo Bonzini, Radim Krčmář, Wanpeng Li

From: Wanpeng Li <wanpeng.li@hotmail.com>

This patch removes all arguments except the first in kvm_x86_ops->queue_exception 
since they can extract the arguments from vcpu->arch.exception themselves, do the 
same in nested_{vmx,svm}_check_exception.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 arch/x86/include/asm/kvm_host.h | 4 +---
 arch/x86/kvm/svm.c              | 8 +++++---
 arch/x86/kvm/vmx.c              | 8 +++++---
 arch/x86/kvm/x86.c              | 5 +----
 4 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 695605e..1f01bfb 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -948,9 +948,7 @@ struct kvm_x86_ops {
 				unsigned char *hypercall_addr);
 	void (*set_irq)(struct kvm_vcpu *vcpu);
 	void (*set_nmi)(struct kvm_vcpu *vcpu);
-	void (*queue_exception)(struct kvm_vcpu *vcpu, unsigned nr,
-				bool has_error_code, u32 error_code,
-				bool reinject);
+	void (*queue_exception)(struct kvm_vcpu *vcpu);
 	void (*cancel_injection)(struct kvm_vcpu *vcpu);
 	int (*interrupt_allowed)(struct kvm_vcpu *vcpu);
 	int (*nmi_allowed)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ba9891a..e1f8e89 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -631,11 +631,13 @@ static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
 	svm_set_interrupt_shadow(vcpu, 0);
 }
 
-static void svm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr,
-				bool has_error_code, u32 error_code,
-				bool reinject)
+static void svm_queue_exception(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
+	unsigned int nr = vcpu->arch.exception.nr;
+	bool has_error_code = vcpu->arch.exception.has_error_code;
+	bool reinject = vcpu->arch.exception.reinject;
+	u32 error_code = vcpu->arch.exception.error_code;
 
 	/*
 	 * If we are within a nested VM we'd better #VMEXIT and let the guest
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ca5d2b9..df825bb 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2431,11 +2431,13 @@ static int nested_vmx_check_exception(struct kvm_vcpu *vcpu, unsigned nr)
 	return 1;
 }
 
-static void vmx_queue_exception(struct kvm_vcpu *vcpu, unsigned nr,
-				bool has_error_code, u32 error_code,
-				bool reinject)
+static void vmx_queue_exception(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
+	unsigned int nr = vcpu->arch.exception.nr;
+	bool has_error_code = vcpu->arch.exception.has_error_code;
+	bool reinject = vcpu->arch.exception.reinject;
+	u32 error_code = vcpu->arch.exception.error_code;
 	u32 intr_info = nr | INTR_INFO_VALID_MASK;
 
 	if (!reinject && is_guest_mode(vcpu) &&
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 87d3cb9..1b28a31 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6345,10 +6345,7 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool req_int_win)
 			kvm_update_dr7(vcpu);
 		}
 
-		kvm_x86_ops->queue_exception(vcpu, vcpu->arch.exception.nr,
-					  vcpu->arch.exception.has_error_code,
-					  vcpu->arch.exception.error_code,
-					  vcpu->arch.exception.reinject);
+		kvm_x86_ops->queue_exception(vcpu);
 		return 0;
 	}
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/4] KVM: async_pf: Add L1 guest async_pf #PF vmexit handler
  2017-06-13  6:08 [PATCH 0/4] KVM: async_pf: Fix async_pf exception injection Wanpeng Li
  2017-06-13  6:08 ` [PATCH 1/4] KVM: x86: Simple kvm_x86_ops->queue_exception parameter Wanpeng Li
@ 2017-06-13  6:08 ` Wanpeng Li
  2017-06-13  6:08 ` [PATCH 3/4] KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf Wanpeng Li
  2017-06-13  6:08 ` [PATCH 4/4] KVM: async_pf: Let host know whether the guest support delivery async_pf as #PF vmexit Wanpeng Li
  3 siblings, 0 replies; 17+ messages in thread
From: Wanpeng Li @ 2017-06-13  6:08 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Paolo Bonzini, Radim Krčmář, Wanpeng Li

From: Wanpeng Li <wanpeng.li@hotmail.com>

This patch adds the L1 guest async page fault #PF vmexit handler, such 
#PF is converted into vmexit from L2 to L1 on #PF which is then handled 
by L1 similar to ordinary async page fault.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 arch/x86/kvm/vmx.c | 34 ++++++++++++++++++++++++++++------
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index df825bb..f533cc1 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -616,6 +616,7 @@ struct vcpu_vmx {
 	bool emulation_required;
 
 	u32 exit_reason;
+	u32 apf_reason;
 
 	/* Posted interrupt descriptor */
 	struct pi_desc pi_desc;
@@ -5648,14 +5649,31 @@ static int handle_exception(struct kvm_vcpu *vcpu)
 	}
 
 	if (is_page_fault(intr_info)) {
-		/* EPT won't cause page fault directly */
-		BUG_ON(enable_ept);
 		cr2 = vmcs_readl(EXIT_QUALIFICATION);
-		trace_kvm_page_fault(cr2, error_code);
+		switch (vmx->apf_reason) {
+		default:
+			/* EPT won't cause page fault directly */
+			BUG_ON(enable_ept);
+			trace_kvm_page_fault(cr2, error_code);
 
-		if (kvm_event_needs_reinjection(vcpu))
-			kvm_mmu_unprotect_page_virt(vcpu, cr2);
-		return kvm_mmu_page_fault(vcpu, cr2, error_code, NULL, 0);
+			if (kvm_event_needs_reinjection(vcpu))
+				kvm_mmu_unprotect_page_virt(vcpu, cr2);
+			return kvm_mmu_page_fault(vcpu, cr2, error_code, NULL, 0);
+			break;
+		case KVM_PV_REASON_PAGE_NOT_PRESENT:
+			vmx->apf_reason = 0;
+			local_irq_disable();
+			kvm_async_pf_task_wait(cr2);
+			local_irq_enable();
+			break;
+		case KVM_PV_REASON_PAGE_READY:
+			vmx->apf_reason = 0;
+			local_irq_disable();
+			kvm_async_pf_task_wake(cr2);
+			local_irq_enable();
+			break;
+		}
+		return 0;
 	}
 
 	ex_no = intr_info & INTR_INFO_VECTOR_MASK;
@@ -8602,6 +8620,10 @@ static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
 	vmx->exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
 	exit_intr_info = vmx->exit_intr_info;
 
+	/* if exit due to PF check for async PF */
+	if (is_page_fault(exit_intr_info))
+		vmx->apf_reason = kvm_read_and_reset_pf_reason();
+
 	/* Handle machine checks before interrupts are enabled */
 	if (is_machine_check(exit_intr_info))
 		kvm_machine_check();
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/4] KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf
  2017-06-13  6:08 [PATCH 0/4] KVM: async_pf: Fix async_pf exception injection Wanpeng Li
  2017-06-13  6:08 ` [PATCH 1/4] KVM: x86: Simple kvm_x86_ops->queue_exception parameter Wanpeng Li
  2017-06-13  6:08 ` [PATCH 2/4] KVM: async_pf: Add L1 guest async_pf #PF vmexit handler Wanpeng Li
@ 2017-06-13  6:08 ` Wanpeng Li
  2017-06-13 18:55   ` Radim Krčmář
  2017-06-13  6:08 ` [PATCH 4/4] KVM: async_pf: Let host know whether the guest support delivery async_pf as #PF vmexit Wanpeng Li
  3 siblings, 1 reply; 17+ messages in thread
From: Wanpeng Li @ 2017-06-13  6:08 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Paolo Bonzini, Radim Krčmář, Wanpeng Li

From: Wanpeng Li <wanpeng.li@hotmail.com>

Add an async_page_fault field to vcpu->arch.exception to identify an async 
page fault, and constructs the expected vm-exit information fields. Force 
a nested VM exit from nested_vmx_check_exception() if the injected #PF 
is async page fault.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 arch/x86/include/asm/kvm_emulate.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  1 +
 arch/x86/kvm/vmx.c                 | 21 ++++++++++++++++++---
 arch/x86/kvm/x86.c                 |  3 +++
 4 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index 0559626..b5bcad9 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -23,6 +23,7 @@ struct x86_exception {
 	u16 error_code;
 	bool nested_page_fault;
 	u64 address; /* cr2 or nested page fault gpa */
+	bool async_page_fault;
 };
 
 /*
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1f01bfb..d30576a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -545,6 +545,7 @@ struct kvm_vcpu_arch {
 		bool reinject;
 		u8 nr;
 		u32 error_code;
+		bool async_page_fault;
 	} exception;
 
 	struct kvm_queued_interrupt {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f533cc1..da81e48 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2422,13 +2422,28 @@ static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
 static int nested_vmx_check_exception(struct kvm_vcpu *vcpu, unsigned nr)
 {
 	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+	u32 intr_info = 0;
+	unsigned long exit_qualification = 0;
 
-	if (!(vmcs12->exception_bitmap & (1u << nr)))
+	if (!((vmcs12->exception_bitmap & (1u << nr)) ||
+		(nr == PF_VECTOR && vcpu->arch.exception.async_page_fault)))
 		return 0;
 
+	intr_info = nr | INTR_INFO_VALID_MASK;
+	exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
+	if (vcpu->arch.exception.has_error_code) {
+		vmcs_write32(VM_EXIT_INTR_ERROR_CODE, vcpu->arch.exception.error_code);
+		intr_info |= INTR_INFO_DELIVER_CODE_MASK;
+	}
+	if (kvm_exception_is_soft(nr))
+		intr_info |= INTR_TYPE_SOFT_EXCEPTION;
+	else
+		intr_info |= INTR_TYPE_HARD_EXCEPTION;
+	if (nr == PF_VECTOR && vcpu->arch.exception.async_page_fault)
+		exit_qualification = vcpu->arch.cr2;
+
 	nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
-			  vmcs_read32(VM_EXIT_INTR_INFO),
-			  vmcs_readl(EXIT_QUALIFICATION));
+			intr_info, exit_qualification);
 	return 1;
 }
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1b28a31..d7f1a49 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -453,6 +453,7 @@ void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault)
 {
 	++vcpu->stat.pf_guest;
 	vcpu->arch.cr2 = fault->address;
+	vcpu->arch.exception.async_page_fault = fault->async_page_fault;
 	kvm_queue_exception_e(vcpu, PF_VECTOR, fault->error_code);
 }
 EXPORT_SYMBOL_GPL(kvm_inject_page_fault);
@@ -8571,6 +8572,7 @@ void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
 		fault.error_code = 0;
 		fault.nested_page_fault = false;
 		fault.address = work->arch.token;
+		fault.async_page_fault = true;
 		kvm_inject_page_fault(vcpu, &fault);
 	}
 }
@@ -8593,6 +8595,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 		fault.error_code = 0;
 		fault.nested_page_fault = false;
 		fault.address = work->arch.token;
+		fault.async_page_fault = true;
 		kvm_inject_page_fault(vcpu, &fault);
 	}
 	vcpu->arch.apf.halted = false;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/4] KVM: async_pf: Let host know whether the guest support delivery async_pf as #PF vmexit
  2017-06-13  6:08 [PATCH 0/4] KVM: async_pf: Fix async_pf exception injection Wanpeng Li
                   ` (2 preceding siblings ...)
  2017-06-13  6:08 ` [PATCH 3/4] KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf Wanpeng Li
@ 2017-06-13  6:08 ` Wanpeng Li
  2017-06-13 18:19   ` Radim Krčmář
  3 siblings, 1 reply; 17+ messages in thread
From: Wanpeng Li @ 2017-06-13  6:08 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: Paolo Bonzini, Radim Krčmář, Wanpeng Li

From: Wanpeng Li <wanpeng.li@hotmail.com>

Adds another flag bit (bit 2) to MSR_KVM_ASYNC_PF_EN. If bit 2 is 1, async 
page faults are delivered to L1 as #PF vmexits; if bit 2 is 0, kvm_can_do_async_pf 
returns 0 if in guest mode.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
---
 Documentation/virtual/kvm/msr.txt    | 5 +++--
 arch/x86/include/asm/kvm_host.h      | 1 +
 arch/x86/include/uapi/asm/kvm_para.h | 1 +
 arch/x86/kernel/kvm.c                | 1 +
 arch/x86/kvm/mmu.c                   | 2 +-
 arch/x86/kvm/x86.c                   | 5 +++--
 6 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/Documentation/virtual/kvm/msr.txt b/Documentation/virtual/kvm/msr.txt
index 0a9ea51..1ebecc1 100644
--- a/Documentation/virtual/kvm/msr.txt
+++ b/Documentation/virtual/kvm/msr.txt
@@ -166,10 +166,11 @@ MSR_KVM_SYSTEM_TIME: 0x12
 MSR_KVM_ASYNC_PF_EN: 0x4b564d02
 	data: Bits 63-6 hold 64-byte aligned physical address of a
 	64 byte memory area which must be in guest RAM and must be
-	zeroed. Bits 5-2 are reserved and should be zero. Bit 0 is 1
+	zeroed. Bits 5-3 are reserved and should be zero. Bit 0 is 1
 	when asynchronous page faults are enabled on the vcpu 0 when
 	disabled. Bit 1 is 1 if asynchronous page faults can be injected
-	when vcpu is in cpl == 0.
+	when vcpu is in cpl == 0. Bit 2 is 1 if asynchronous page faults
+	are delivered to L1 as #PF vmexits.
 
 	First 4 byte of 64 byte memory location will be written to by
 	the hypervisor at the time of asynchronous page fault (APF)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d30576a..2766b2c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -646,6 +646,7 @@ struct kvm_vcpu_arch {
 		u64 msr_val;
 		u32 id;
 		bool send_user_only;
+		bool delivery_as_pf_vmexit;
 	} apf;
 
 	/* OSVW MSRs (AMD only) */
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index cff0bb6..a965e5b 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -67,6 +67,7 @@ struct kvm_clock_pairing {
 
 #define KVM_ASYNC_PF_ENABLED			(1 << 0)
 #define KVM_ASYNC_PF_SEND_ALWAYS		(1 << 1)
+#define KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT	(1 << 2)
 
 /* Operations for KVM_HC_MMU_OP */
 #define KVM_MMU_OP_WRITE_PTE            1
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 43e10d6..1e29a77 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -330,6 +330,7 @@ static void kvm_guest_cpu_init(void)
 #ifdef CONFIG_PREEMPT
 		pa |= KVM_ASYNC_PF_SEND_ALWAYS;
 #endif
+		pa |= KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT;
 		wrmsrl(MSR_KVM_ASYNC_PF_EN, pa | KVM_ASYNC_PF_ENABLED);
 		__this_cpu_write(apf_reason.enabled, 1);
 		printk(KERN_INFO"KVM setup async PF for cpu %d\n",
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index cb82259..c49aecd 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3704,7 +3704,7 @@ bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu)
 		     kvm_event_needs_reinjection(vcpu)))
 		return false;
 
-	if (is_guest_mode(vcpu))
+	if (!vcpu->arch.apf.delivery_as_pf_vmexit && is_guest_mode(vcpu))
 		return false;
 
 	return kvm_x86_ops->interrupt_allowed(vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d7f1a49..74cb944 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2061,8 +2061,8 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
 {
 	gpa_t gpa = data & ~0x3f;
 
-	/* Bits 2:5 are reserved, Should be zero */
-	if (data & 0x3c)
+	/* Bits 3:5 are reserved, Should be zero */
+	if (data & 0x38)
 		return 1;
 
 	vcpu->arch.apf.msr_val = data;
@@ -2078,6 +2078,7 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
 		return 1;
 
 	vcpu->arch.apf.send_user_only = !(data & KVM_ASYNC_PF_SEND_ALWAYS);
+	vcpu->arch.apf.delivery_as_pf_vmexit = data & KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT;
 	kvm_async_pf_wakeup_all(vcpu);
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/4] KVM: async_pf: Let host know whether the guest support delivery async_pf as #PF vmexit
  2017-06-13  6:08 ` [PATCH 4/4] KVM: async_pf: Let host know whether the guest support delivery async_pf as #PF vmexit Wanpeng Li
@ 2017-06-13 18:19   ` Radim Krčmář
  2017-06-14  1:01     ` Wanpeng Li
  0 siblings, 1 reply; 17+ messages in thread
From: Radim Krčmář @ 2017-06-13 18:19 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: linux-kernel, kvm, Paolo Bonzini, Wanpeng Li

2017-06-12 23:08-0700, Wanpeng Li:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
> 
> Adds another flag bit (bit 2) to MSR_KVM_ASYNC_PF_EN. If bit 2 is 1, async 
> page faults are delivered to L1 as #PF vmexits; if bit 2 is 0, kvm_can_do_async_pf 
> returns 0 if in guest mode.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---

I think KVM (L1) should also do something like

  diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
  index dd274db9bf77..c15a9f178e60 100644
  --- a/arch/x86/kvm/vmx.c
  +++ b/arch/x86/kvm/vmx.c
  @@ -7991,7 +7991,7 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
   		if (is_nmi(intr_info))
   			return false;
   		else if (is_page_fault(intr_info))
  -			return enable_ept;
  +			return !vmx->apf_reason && enable_ept;
   		else if (is_no_device(intr_info) &&
   			 !(vmcs12->guest_cr0 & X86_CR0_TS))
   			return false;

so it doesn't pass the APF directed towards it (L1) into L2 if there is
L3 at the moment.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf
  2017-06-13  6:08 ` [PATCH 3/4] KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf Wanpeng Li
@ 2017-06-13 18:55   ` Radim Krčmář
  2017-06-14  1:07     ` Wanpeng Li
  0 siblings, 1 reply; 17+ messages in thread
From: Radim Krčmář @ 2017-06-13 18:55 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: linux-kernel, kvm, Paolo Bonzini, Wanpeng Li

2017-06-12 23:08-0700, Wanpeng Li:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
> 
> Add an async_page_fault field to vcpu->arch.exception to identify an async 
> page fault, and constructs the expected vm-exit information fields. Force 
> a nested VM exit from nested_vmx_check_exception() if the injected #PF 
> is async page fault.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> @@ -2422,13 +2422,28 @@ static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
>  static int nested_vmx_check_exception(struct kvm_vcpu *vcpu, unsigned nr)

This function could use the same treatment as vmx_queue_exception(), so
we are not mixing 'nr' with 'vcpu->arch.exception.*'.

>  {
>  	struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
> +	u32 intr_info = 0;
> +	unsigned long exit_qualification = 0;
>  
> -	if (!(vmcs12->exception_bitmap & (1u << nr)))
> +	if (!((vmcs12->exception_bitmap & (1u << nr)) ||
> +		(nr == PF_VECTOR && vcpu->arch.exception.async_page_fault)))
>  		return 0;
>  
> +	intr_info = nr | INTR_INFO_VALID_MASK;
> +	exit_qualification = vmcs_readl(EXIT_QUALIFICATION);

This part still uses EXIT_QUALIFICATION(), which means it is not general
and I think it would be nicer to just do simple special case on top:

	if (vcpu->arch.exception.async_page_fault) {
		vmcs_write32(VM_EXIT_INTR_ERROR_CODE, vcpu->arch.exception.error_code);
		nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
		                  PF_VECTOR | INTR_TYPE_HARD_EXCEPTION |
		                  INTR_INFO_DELIVER_CODE_MASK | INTR_INFO_VALID_MASK,
		                  vcpu->arch.cr2);
		return 1;
	}

Using vcpu->arch.cr2 is suspicious as VMX doesn't update CR2 on VM
exits;  isn't this going to change the CR2 visible in L2 guest after a
nested VM entry?

Btw. nested_vmx_check_exception() didn't support emulated exceptions at
all (it only passed through ones we got from hardware), or have I missed
something?

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/4] KVM: async_pf: Let host know whether the guest support delivery async_pf as #PF vmexit
  2017-06-13 18:19   ` Radim Krčmář
@ 2017-06-14  1:01     ` Wanpeng Li
  0 siblings, 0 replies; 17+ messages in thread
From: Wanpeng Li @ 2017-06-14  1:01 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: linux-kernel, kvm, Paolo Bonzini, Wanpeng Li

2017-06-14 2:19 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> 2017-06-12 23:08-0700, Wanpeng Li:
>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>
>> Adds another flag bit (bit 2) to MSR_KVM_ASYNC_PF_EN. If bit 2 is 1, async
>> page faults are delivered to L1 as #PF vmexits; if bit 2 is 0, kvm_can_do_async_pf
>> returns 0 if in guest mode.
>>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> ---
>
> I think KVM (L1) should also do something like
>
>   diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>   index dd274db9bf77..c15a9f178e60 100644
>   --- a/arch/x86/kvm/vmx.c
>   +++ b/arch/x86/kvm/vmx.c
>   @@ -7991,7 +7991,7 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
>                 if (is_nmi(intr_info))
>                         return false;
>                 else if (is_page_fault(intr_info))
>   -                     return enable_ept;
>   +                     return !vmx->apf_reason && enable_ept;
>                 else if (is_no_device(intr_info) &&
>                          !(vmcs12->guest_cr0 & X86_CR0_TS))
>                         return false;
>
> so it doesn't pass the APF directed towards it (L1) into L2 if there is
> L3 at the moment.

Agreed. I will do this in v2.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf
  2017-06-13 18:55   ` Radim Krčmář
@ 2017-06-14  1:07     ` Wanpeng Li
  2017-06-14 12:52       ` Radim Krčmář
  0 siblings, 1 reply; 17+ messages in thread
From: Wanpeng Li @ 2017-06-14  1:07 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: linux-kernel, kvm, Paolo Bonzini, Wanpeng Li

2017-06-14 2:55 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> 2017-06-12 23:08-0700, Wanpeng Li:
>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>
>> Add an async_page_fault field to vcpu->arch.exception to identify an async
>> page fault, and constructs the expected vm-exit information fields. Force
>> a nested VM exit from nested_vmx_check_exception() if the injected #PF
>> is async page fault.
>>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> ---
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> @@ -2422,13 +2422,28 @@ static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
>>  static int nested_vmx_check_exception(struct kvm_vcpu *vcpu, unsigned nr)
>
> This function could use the same treatment as vmx_queue_exception(), so
> we are not mixing 'nr' with 'vcpu->arch.exception.*'.
>
>>  {
>>       struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
>> +     u32 intr_info = 0;
>> +     unsigned long exit_qualification = 0;
>>
>> -     if (!(vmcs12->exception_bitmap & (1u << nr)))
>> +     if (!((vmcs12->exception_bitmap & (1u << nr)) ||
>> +             (nr == PF_VECTOR && vcpu->arch.exception.async_page_fault)))
>>               return 0;
>>
>> +     intr_info = nr | INTR_INFO_VALID_MASK;
>> +     exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
>
> This part still uses EXIT_QUALIFICATION(), which means it is not general
> and I think it would be nicer to just do simple special case on top:
>
>         if (vcpu->arch.exception.async_page_fault) {
>                 vmcs_write32(VM_EXIT_INTR_ERROR_CODE, vcpu->arch.exception.error_code);
>                 nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
>                                   PF_VECTOR | INTR_TYPE_HARD_EXCEPTION |
>                                   INTR_INFO_DELIVER_CODE_MASK | INTR_INFO_VALID_MASK,
>                                   vcpu->arch.cr2);
>                 return 1;
>         }

Good point.

>
> Using vcpu->arch.cr2 is suspicious as VMX doesn't update CR2 on VM
> exits;  isn't this going to change the CR2 visible in L2 guest after a
> nested VM entry?

Sorry, I don't fully understand the question. As you know this
vcpu->arch.cr2 which includes token is set before async pf injection,
and L1 will intercept it from EXIT_QUALIFICATION during nested vmexit,
why it can change the CR2 visible in L2 guest after a nested VM entry?

>
> Btw. nested_vmx_check_exception() didn't support emulated exceptions at
> all (it only passed through ones we got from hardware),

I think so.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf
  2017-06-14  1:07     ` Wanpeng Li
@ 2017-06-14 12:52       ` Radim Krčmář
  2017-06-14 13:02         ` Wanpeng Li
  0 siblings, 1 reply; 17+ messages in thread
From: Radim Krčmář @ 2017-06-14 12:52 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: linux-kernel, kvm, Paolo Bonzini, Wanpeng Li

2017-06-14 09:07+0800, Wanpeng Li:
> 2017-06-14 2:55 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> > Using vcpu->arch.cr2 is suspicious as VMX doesn't update CR2 on VM
> > exits;  isn't this going to change the CR2 visible in L2 guest after a
> > nested VM entry?
> 
> Sorry, I don't fully understand the question. As you know this
> vcpu->arch.cr2 which includes token is set before async pf injection,

Yes, I'm thinking that setting vcpu->arch.cr2 is a mistake in this case.

> and L1 will intercept it from EXIT_QUALIFICATION during nested vmexit,

Right, so we do not need to have the token in CR2, because L1 is not
going to look at it.

> why it can change the CR2 visible in L2 guest after a nested VM entry?

Sorry, the situation is too convoluted to be expressed in one sentence:

1) L2 is running with CR2 = L2CR2
3) VMX exits (say, unrelated EXTERNAL_INTERRUPT) and L0 stores L2CR2 in
   vcpu->arch.cr2
2) APF for L1 has completed
4) L0 KVM wants to inject APF and sets vcpu->arch.cr2 = APFT
5) L0 KVM does a nested VM exit to L1, EXIT_QUALIFICATION = APFT
6) L0 KVM enters L1 with CR2 = vcpu->arch.cr2 = APFT
7) L1 stores APFT as L2's CR2
8) L1 handles APF, maybe reschedules, but eventually comes back to this
   L2's thread
9) after some time, L1 enters L2 with CR2 = APFT
10) L2 is running with CR2 = APTF

The original L2CR2 is lost and we'd introduce a bug if L2 wanted to look
at it, e.g. it was in a process of handling its #PF.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf
  2017-06-14 12:52       ` Radim Krčmář
@ 2017-06-14 13:02         ` Wanpeng Li
  2017-06-14 13:20           ` Radim Krčmář
  0 siblings, 1 reply; 17+ messages in thread
From: Wanpeng Li @ 2017-06-14 13:02 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: linux-kernel, kvm, Paolo Bonzini, Wanpeng Li

2017-06-14 20:52 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> 2017-06-14 09:07+0800, Wanpeng Li:
>> 2017-06-14 2:55 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>> > Using vcpu->arch.cr2 is suspicious as VMX doesn't update CR2 on VM
>> > exits;  isn't this going to change the CR2 visible in L2 guest after a
>> > nested VM entry?
>>
>> Sorry, I don't fully understand the question. As you know this
>> vcpu->arch.cr2 which includes token is set before async pf injection,
>
> Yes, I'm thinking that setting vcpu->arch.cr2 is a mistake in this case.
>
>> and L1 will intercept it from EXIT_QUALIFICATION during nested vmexit,
>
> Right, so we do not need to have the token in CR2, because L1 is not
> going to look at it.
>
>> why it can change the CR2 visible in L2 guest after a nested VM entry?
>
> Sorry, the situation is too convoluted to be expressed in one sentence:
>
> 1) L2 is running with CR2 = L2CR2
> 3) VMX exits (say, unrelated EXTERNAL_INTERRUPT) and L0 stores L2CR2 in
>    vcpu->arch.cr2
> 2) APF for L1 has completed
> 4) L0 KVM wants to inject APF and sets vcpu->arch.cr2 = APFT
> 5) L0 KVM does a nested VM exit to L1, EXIT_QUALIFICATION = APFT
> 6) L0 KVM enters L1 with CR2 = vcpu->arch.cr2 = APFT
> 7) L1 stores APFT as L2's CR2
> 8) L1 handles APF, maybe reschedules, but eventually comes back to this
>    L2's thread
> 9) after some time, L1 enters L2 with CR2 = APFT
> 10) L2 is running with CR2 = APTF
>
> The original L2CR2 is lost and we'd introduce a bug if L2 wanted to look
> at it, e.g. it was in a process of handling its #PF.

Good point. What's your proposal? :)

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf
  2017-06-14 13:02         ` Wanpeng Li
@ 2017-06-14 13:20           ` Radim Krčmář
  2017-06-14 13:27             ` Wanpeng Li
                               ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Radim Krčmář @ 2017-06-14 13:20 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: linux-kernel, kvm, Paolo Bonzini, Wanpeng Li

2017-06-14 21:02+0800, Wanpeng Li:
> 2017-06-14 20:52 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> > 2017-06-14 09:07+0800, Wanpeng Li:
> >> 2017-06-14 2:55 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> >> > Using vcpu->arch.cr2 is suspicious as VMX doesn't update CR2 on VM
> >> > exits;  isn't this going to change the CR2 visible in L2 guest after a
> >> > nested VM entry?
> >>
> >> Sorry, I don't fully understand the question. As you know this
> >> vcpu->arch.cr2 which includes token is set before async pf injection,
> >
> > Yes, I'm thinking that setting vcpu->arch.cr2 is a mistake in this case.
> >
> >> and L1 will intercept it from EXIT_QUALIFICATION during nested vmexit,
> >
> > Right, so we do not need to have the token in CR2, because L1 is not
> > going to look at it.
> >
> >> why it can change the CR2 visible in L2 guest after a nested VM entry?
> >
> > Sorry, the situation is too convoluted to be expressed in one sentence:
> >
> > 1) L2 is running with CR2 = L2CR2
> > 3) VMX exits (say, unrelated EXTERNAL_INTERRUPT) and L0 stores L2CR2 in
> >    vcpu->arch.cr2
> > 2) APF for L1 has completed
> > 4) L0 KVM wants to inject APF and sets vcpu->arch.cr2 = APFT
> > 5) L0 KVM does a nested VM exit to L1, EXIT_QUALIFICATION = APFT
> > 6) L0 KVM enters L1 with CR2 = vcpu->arch.cr2 = APFT
> > 7) L1 stores APFT as L2's CR2
> > 8) L1 handles APF, maybe reschedules, but eventually comes back to this
> >    L2's thread
> > 9) after some time, L1 enters L2 with CR2 = APFT
> > 10) L2 is running with CR2 = APTF
> >
> > The original L2CR2 is lost and we'd introduce a bug if L2 wanted to look
> > at it, e.g. it was in a process of handling its #PF.
> 
> Good point. What's your proposal? :)

Get rid of async_pf. :) Optimal solutions aside, I think it would be
best to add a new injection function for APF.  One that injects a normal
#PF for non-nested guests and directly triggers a #PF VM exit otherwise,
and call it from kvm_arch_async_page_*present().

Do you think that just moving the nested VM exit from
nested_vmx_check_exception() would work?

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf
  2017-06-14 13:20           ` Radim Krčmář
@ 2017-06-14 13:27             ` Wanpeng Li
  2017-06-14 13:32             ` Wanpeng Li
  2017-06-14 14:32             ` Wanpeng Li
  2 siblings, 0 replies; 17+ messages in thread
From: Wanpeng Li @ 2017-06-14 13:27 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: linux-kernel, kvm, Paolo Bonzini, Wanpeng Li

2017-06-14 21:20 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> 2017-06-14 21:02+0800, Wanpeng Li:
>> 2017-06-14 20:52 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>> > 2017-06-14 09:07+0800, Wanpeng Li:
>> >> 2017-06-14 2:55 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>> >> > Using vcpu->arch.cr2 is suspicious as VMX doesn't update CR2 on VM
>> >> > exits;  isn't this going to change the CR2 visible in L2 guest after a
>> >> > nested VM entry?
>> >>
>> >> Sorry, I don't fully understand the question. As you know this
>> >> vcpu->arch.cr2 which includes token is set before async pf injection,
>> >
>> > Yes, I'm thinking that setting vcpu->arch.cr2 is a mistake in this case.
>> >
>> >> and L1 will intercept it from EXIT_QUALIFICATION during nested vmexit,
>> >
>> > Right, so we do not need to have the token in CR2, because L1 is not
>> > going to look at it.
>> >
>> >> why it can change the CR2 visible in L2 guest after a nested VM entry?
>> >
>> > Sorry, the situation is too convoluted to be expressed in one sentence:
>> >
>> > 1) L2 is running with CR2 = L2CR2
>> > 3) VMX exits (say, unrelated EXTERNAL_INTERRUPT) and L0 stores L2CR2 in
>> >    vcpu->arch.cr2
>> > 2) APF for L1 has completed
>> > 4) L0 KVM wants to inject APF and sets vcpu->arch.cr2 = APFT
>> > 5) L0 KVM does a nested VM exit to L1, EXIT_QUALIFICATION = APFT
>> > 6) L0 KVM enters L1 with CR2 = vcpu->arch.cr2 = APFT
>> > 7) L1 stores APFT as L2's CR2
>> > 8) L1 handles APF, maybe reschedules, but eventually comes back to this
>> >    L2's thread
>> > 9) after some time, L1 enters L2 with CR2 = APFT
>> > 10) L2 is running with CR2 = APTF
>> >
>> > The original L2CR2 is lost and we'd introduce a bug if L2 wanted to look
>> > at it, e.g. it was in a process of handling its #PF.
>>
>> Good point. What's your proposal? :)
>
> Get rid of async_pf. :) Optimal solutions aside, I think it would be
> best to add a new injection function for APF.  One that injects a normal
> #PF for non-nested guests and directly triggers a #PF VM exit otherwise,
> and call it from kvm_arch_async_page_*present().
>
> Do you think that just moving the nested VM exit from
> nested_vmx_check_exception() would work?

That's the same as what commit  9bc1f09f6fa76

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf
  2017-06-14 13:20           ` Radim Krčmář
  2017-06-14 13:27             ` Wanpeng Li
@ 2017-06-14 13:32             ` Wanpeng Li
  2017-06-14 14:32             ` Wanpeng Li
  2 siblings, 0 replies; 17+ messages in thread
From: Wanpeng Li @ 2017-06-14 13:32 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: linux-kernel, kvm, Paolo Bonzini, Wanpeng Li

2017-06-14 21:20 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> 2017-06-14 21:02+0800, Wanpeng Li:
>> 2017-06-14 20:52 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>> > 2017-06-14 09:07+0800, Wanpeng Li:
>> >> 2017-06-14 2:55 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>> >> > Using vcpu->arch.cr2 is suspicious as VMX doesn't update CR2 on VM
>> >> > exits;  isn't this going to change the CR2 visible in L2 guest after a
>> >> > nested VM entry?
>> >>
>> >> Sorry, I don't fully understand the question. As you know this
>> >> vcpu->arch.cr2 which includes token is set before async pf injection,
>> >
>> > Yes, I'm thinking that setting vcpu->arch.cr2 is a mistake in this case.
>> >
>> >> and L1 will intercept it from EXIT_QUALIFICATION during nested vmexit,
>> >
>> > Right, so we do not need to have the token in CR2, because L1 is not
>> > going to look at it.
>> >
>> >> why it can change the CR2 visible in L2 guest after a nested VM entry?
>> >
>> > Sorry, the situation is too convoluted to be expressed in one sentence:
>> >
>> > 1) L2 is running with CR2 = L2CR2
>> > 3) VMX exits (say, unrelated EXTERNAL_INTERRUPT) and L0 stores L2CR2 in
>> >    vcpu->arch.cr2
>> > 2) APF for L1 has completed
>> > 4) L0 KVM wants to inject APF and sets vcpu->arch.cr2 = APFT
>> > 5) L0 KVM does a nested VM exit to L1, EXIT_QUALIFICATION = APFT
>> > 6) L0 KVM enters L1 with CR2 = vcpu->arch.cr2 = APFT
>> > 7) L1 stores APFT as L2's CR2
>> > 8) L1 handles APF, maybe reschedules, but eventually comes back to this
>> >    L2's thread
>> > 9) after some time, L1 enters L2 with CR2 = APFT
>> > 10) L2 is running with CR2 = APTF
>> >
>> > The original L2CR2 is lost and we'd introduce a bug if L2 wanted to look
>> > at it, e.g. it was in a process of handling its #PF.
>>
>> Good point. What's your proposal? :)
>
> Get rid of async_pf. :) Optimal solutions aside, I think it would be
> best to add a new injection function for APF.  One that injects a normal
> #PF for non-nested guests and directly triggers a #PF VM exit otherwise,
> and call it from kvm_arch_async_page_*present().
>
> Do you think that just moving the nested VM exit from
> nested_vmx_check_exception() would work?

That's the same as what commit  9bc1f09f6fa76 (KVM: async_pf: avoid
async pf injection when in guest mode) do I think, how about
introducing a field like nested_apf_token to vcpu->arch to keep the
token in kvm_inject_page_fault if
(vcpu->arch.exception.async_page_fault && is_guest_mode(vcpu)) is
true.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf
  2017-06-14 13:20           ` Radim Krčmář
  2017-06-14 13:27             ` Wanpeng Li
  2017-06-14 13:32             ` Wanpeng Li
@ 2017-06-14 14:32             ` Wanpeng Li
  2017-06-14 16:18               ` Radim Krčmář
  2 siblings, 1 reply; 17+ messages in thread
From: Wanpeng Li @ 2017-06-14 14:32 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: linux-kernel, kvm, Paolo Bonzini, Wanpeng Li

2017-06-14 21:20 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> 2017-06-14 21:02+0800, Wanpeng Li:
>> 2017-06-14 20:52 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>> > 2017-06-14 09:07+0800, Wanpeng Li:
>> >> 2017-06-14 2:55 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>> >> > Using vcpu->arch.cr2 is suspicious as VMX doesn't update CR2 on VM
>> >> > exits;  isn't this going to change the CR2 visible in L2 guest after a
>> >> > nested VM entry?
>> >>
>> >> Sorry, I don't fully understand the question. As you know this
>> >> vcpu->arch.cr2 which includes token is set before async pf injection,
>> >
>> > Yes, I'm thinking that setting vcpu->arch.cr2 is a mistake in this case.
>> >
>> >> and L1 will intercept it from EXIT_QUALIFICATION during nested vmexit,
>> >
>> > Right, so we do not need to have the token in CR2, because L1 is not
>> > going to look at it.
>> >
>> >> why it can change the CR2 visible in L2 guest after a nested VM entry?
>> >
>> > Sorry, the situation is too convoluted to be expressed in one sentence:
>> >
>> > 1) L2 is running with CR2 = L2CR2
>> > 3) VMX exits (say, unrelated EXTERNAL_INTERRUPT) and L0 stores L2CR2 in
>> >    vcpu->arch.cr2
>> > 2) APF for L1 has completed
>> > 4) L0 KVM wants to inject APF and sets vcpu->arch.cr2 = APFT
>> > 5) L0 KVM does a nested VM exit to L1, EXIT_QUALIFICATION = APFT
>> > 6) L0 KVM enters L1 with CR2 = vcpu->arch.cr2 = APFT
>> > 7) L1 stores APFT as L2's CR2
>> > 8) L1 handles APF, maybe reschedules, but eventually comes back to this
>> >    L2's thread
>> > 9) after some time, L1 enters L2 with CR2 = APFT
>> > 10) L2 is running with CR2 = APTF
>> >
>> > The original L2CR2 is lost and we'd introduce a bug if L2 wanted to look
>> > at it, e.g. it was in a process of handling its #PF.
>>
>> Good point. What's your proposal? :)
>
> Get rid of async_pf. :) Optimal solutions aside, I think it would be
> best to add a new injection function for APF.  One that injects a normal
> #PF for non-nested guests and directly triggers a #PF VM exit otherwise,
> and call it from kvm_arch_async_page_*present().

In addition, nested vmexit in kvm_arch_async_page_*present() directly
instead of through inject_pending_event() before vmentry, or nested
vmexit after vmexit on L0 looks strange. So how about the proposal of
the nested_apf_token stuff? Radim, Paolo?

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf
  2017-06-14 14:32             ` Wanpeng Li
@ 2017-06-14 16:18               ` Radim Krčmář
  2017-06-15  2:43                 ` Wanpeng Li
  0 siblings, 1 reply; 17+ messages in thread
From: Radim Krčmář @ 2017-06-14 16:18 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: linux-kernel, kvm, Paolo Bonzini, Wanpeng Li

2017-06-14 22:32+0800, Wanpeng Li:
> 2017-06-14 21:20 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> > 2017-06-14 21:02+0800, Wanpeng Li:
> >> 2017-06-14 20:52 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> >> > 2017-06-14 09:07+0800, Wanpeng Li:
> >> >> 2017-06-14 2:55 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> >> >> > Using vcpu->arch.cr2 is suspicious as VMX doesn't update CR2 on VM
> >> >> > exits;  isn't this going to change the CR2 visible in L2 guest after a
> >> >> > nested VM entry?
> >> >>
> >> >> Sorry, I don't fully understand the question. As you know this
> >> >> vcpu->arch.cr2 which includes token is set before async pf injection,
> >> >
> >> > Yes, I'm thinking that setting vcpu->arch.cr2 is a mistake in this case.
> >> >
> >> >> and L1 will intercept it from EXIT_QUALIFICATION during nested vmexit,
> >> >
> >> > Right, so we do not need to have the token in CR2, because L1 is not
> >> > going to look at it.
> >> >
> >> >> why it can change the CR2 visible in L2 guest after a nested VM entry?
> >> >
> >> > Sorry, the situation is too convoluted to be expressed in one sentence:
> >> >
> >> > 1) L2 is running with CR2 = L2CR2
> >> > 3) VMX exits (say, unrelated EXTERNAL_INTERRUPT) and L0 stores L2CR2 in
> >> >    vcpu->arch.cr2
> >> > 2) APF for L1 has completed
> >> > 4) L0 KVM wants to inject APF and sets vcpu->arch.cr2 = APFT
> >> > 5) L0 KVM does a nested VM exit to L1, EXIT_QUALIFICATION = APFT
> >> > 6) L0 KVM enters L1 with CR2 = vcpu->arch.cr2 = APFT
> >> > 7) L1 stores APFT as L2's CR2
> >> > 8) L1 handles APF, maybe reschedules, but eventually comes back to this
> >> >    L2's thread
> >> > 9) after some time, L1 enters L2 with CR2 = APFT
> >> > 10) L2 is running with CR2 = APTF
> >> >
> >> > The original L2CR2 is lost and we'd introduce a bug if L2 wanted to look
> >> > at it, e.g. it was in a process of handling its #PF.
> >>
> >> Good point. What's your proposal? :)
> >
> > Get rid of async_pf. :) Optimal solutions aside, I think it would be
> > best to add a new injection function for APF.  One that injects a normal
> > #PF for non-nested guests and directly triggers a #PF VM exit otherwise,
> > and call it from kvm_arch_async_page_*present().
> 
> In addition, nested vmexit in kvm_arch_async_page_*present() directly
> instead of through inject_pending_event() before vmentry, or nested
> vmexit after vmexit on L0 looks strange.

Right, it might be tricky if another exception can get queued in
between.  (Which shouldn't happen, though, because async_pf exceptions
must not cause double faults for no good reason.)

>                                          So how about the proposal of
> the nested_apf_token stuff? Radim, Paolo?

I think it is worth exploring.  We need to make sure that interfacing
with userspace through kvm_vcpu_ioctl_x86_{set,get}_vcpu_events() is
right, but it should be possible without any extension as migration is
already covered by unconditional async_pf wakeup on the destination.

At this point, using a structure other than arch.exception would make
sense too -- async_pf uses the exception injection path mostly for
convenience, but the paravirt exception does not want to mix with real
exceptions.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/4] KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf
  2017-06-14 16:18               ` Radim Krčmář
@ 2017-06-15  2:43                 ` Wanpeng Li
  0 siblings, 0 replies; 17+ messages in thread
From: Wanpeng Li @ 2017-06-15  2:43 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: linux-kernel, kvm, Paolo Bonzini, Wanpeng Li

2017-06-15 0:18 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> 2017-06-14 22:32+0800, Wanpeng Li:
>> 2017-06-14 21:20 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>> > 2017-06-14 21:02+0800, Wanpeng Li:
>> >> 2017-06-14 20:52 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>> >> > 2017-06-14 09:07+0800, Wanpeng Li:
>> >> >> 2017-06-14 2:55 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>> >> >> > Using vcpu->arch.cr2 is suspicious as VMX doesn't update CR2 on VM
>> >> >> > exits;  isn't this going to change the CR2 visible in L2 guest after a
>> >> >> > nested VM entry?
>> >> >>
>> >> >> Sorry, I don't fully understand the question. As you know this
>> >> >> vcpu->arch.cr2 which includes token is set before async pf injection,
>> >> >
>> >> > Yes, I'm thinking that setting vcpu->arch.cr2 is a mistake in this case.
>> >> >
>> >> >> and L1 will intercept it from EXIT_QUALIFICATION during nested vmexit,
>> >> >
>> >> > Right, so we do not need to have the token in CR2, because L1 is not
>> >> > going to look at it.
>> >> >
>> >> >> why it can change the CR2 visible in L2 guest after a nested VM entry?
>> >> >
>> >> > Sorry, the situation is too convoluted to be expressed in one sentence:
>> >> >
>> >> > 1) L2 is running with CR2 = L2CR2
>> >> > 3) VMX exits (say, unrelated EXTERNAL_INTERRUPT) and L0 stores L2CR2 in
>> >> >    vcpu->arch.cr2
>> >> > 2) APF for L1 has completed
>> >> > 4) L0 KVM wants to inject APF and sets vcpu->arch.cr2 = APFT
>> >> > 5) L0 KVM does a nested VM exit to L1, EXIT_QUALIFICATION = APFT
>> >> > 6) L0 KVM enters L1 with CR2 = vcpu->arch.cr2 = APFT
>> >> > 7) L1 stores APFT as L2's CR2
>> >> > 8) L1 handles APF, maybe reschedules, but eventually comes back to this
>> >> >    L2's thread
>> >> > 9) after some time, L1 enters L2 with CR2 = APFT
>> >> > 10) L2 is running with CR2 = APTF
>> >> >
>> >> > The original L2CR2 is lost and we'd introduce a bug if L2 wanted to look
>> >> > at it, e.g. it was in a process of handling its #PF.
>> >>
>> >> Good point. What's your proposal? :)
>> >
>> > Get rid of async_pf. :) Optimal solutions aside, I think it would be
>> > best to add a new injection function for APF.  One that injects a normal
>> > #PF for non-nested guests and directly triggers a #PF VM exit otherwise,
>> > and call it from kvm_arch_async_page_*present().
>>
>> In addition, nested vmexit in kvm_arch_async_page_*present() directly
>> instead of through inject_pending_event() before vmentry, or nested
>> vmexit after vmexit on L0 looks strange.
>
> Right, it might be tricky if another exception can get queued in
> between.  (Which shouldn't happen, though, because async_pf exceptions
> must not cause double faults for no good reason.)
>
>>                                          So how about the proposal of
>> the nested_apf_token stuff? Radim, Paolo?
>
> I think it is worth exploring.  We need to make sure that interfacing
> with userspace through kvm_vcpu_ioctl_x86_{set,get}_vcpu_events() is
> right, but it should be possible without any extension as migration is
> already covered by unconditional async_pf wakeup on the destination.
>
> At this point, using a structure other than arch.exception would make
> sense too -- async_pf uses the exception injection path mostly for
> convenience, but the paravirt exception does not want to mix with real
> exceptions.

Yeah, but maybe need more reconstruct, I just send out v2 to fix it
simply and avoid too aggressive. :)

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-06-15  2:44 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-13  6:08 [PATCH 0/4] KVM: async_pf: Fix async_pf exception injection Wanpeng Li
2017-06-13  6:08 ` [PATCH 1/4] KVM: x86: Simple kvm_x86_ops->queue_exception parameter Wanpeng Li
2017-06-13  6:08 ` [PATCH 2/4] KVM: async_pf: Add L1 guest async_pf #PF vmexit handler Wanpeng Li
2017-06-13  6:08 ` [PATCH 3/4] KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf Wanpeng Li
2017-06-13 18:55   ` Radim Krčmář
2017-06-14  1:07     ` Wanpeng Li
2017-06-14 12:52       ` Radim Krčmář
2017-06-14 13:02         ` Wanpeng Li
2017-06-14 13:20           ` Radim Krčmář
2017-06-14 13:27             ` Wanpeng Li
2017-06-14 13:32             ` Wanpeng Li
2017-06-14 14:32             ` Wanpeng Li
2017-06-14 16:18               ` Radim Krčmář
2017-06-15  2:43                 ` Wanpeng Li
2017-06-13  6:08 ` [PATCH 4/4] KVM: async_pf: Let host know whether the guest support delivery async_pf as #PF vmexit Wanpeng Li
2017-06-13 18:19   ` Radim Krčmář
2017-06-14  1:01     ` Wanpeng Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).