linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/10] KVM: x86: Interrupt-based mechanism for async_pf 'page present' notifications
@ 2020-05-25 14:41 Vitaly Kuznetsov
  2020-05-25 14:41 ` [PATCH v2 01/10] Revert "KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously" Vitaly Kuznetsov
                   ` (10 more replies)
  0 siblings, 11 replies; 30+ messages in thread
From: Vitaly Kuznetsov @ 2020-05-25 14:41 UTC (permalink / raw)
  To: kvm, x86
  Cc: Paolo Bonzini, Andy Lutomirski, Thomas Gleixner, Borislav Petkov,
	H. Peter Anvin, Wanpeng Li, Sean Christopherson, Jim Mattson,
	Vivek Goyal, Gavin Shan, Peter Zijlstra, linux-kernel

Concerns were expressed around (ab)using #PF for KVM's async_pf mechanism,
it seems that re-using #PF exception for a PV mechanism wasn't a great
idea after all. The Grand Plan is to switch to using e.g. #VE for 'page
not present' events and normal APIC interrupts for 'page ready' events.
This series does the later.

Changes since v1:
- struct kvm_vcpu_pv_apf_data's fields renamed to 'flags' and 'token',
  comments added [Vivek Goyal]
- 'type1/2' names for APF events dropped from everywhere [Vivek Goyal]
- kvm_arch_can_inject_async_page_present() renamed to 
  kvm_arch_can_dequeue_async_page_present [Vivek Goyal]
- 'KVM: x86: deprecate KVM_ASYNC_PF_SEND_ALWAYS' patch added.

v1: https://lore.kernel.org/kvm/20200511164752.2158645-1-vkuznets@redhat.com/
QEMU patches for testing: https://github.com/vittyvk/qemu.git (async_pf2_v2 branch)

Vitaly Kuznetsov (10):
  Revert "KVM: async_pf: Fix #DF due to inject "Page not Present" and
    "Page Ready" exceptions simultaneously"
  KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info
  KVM: rename kvm_arch_can_inject_async_page_present() to
    kvm_arch_can_dequeue_async_page_present()
  KVM: introduce kvm_read_guest_offset_cached()
  KVM: x86: interrupt based APF 'page ready' event delivery
  KVM: x86: acknowledgment mechanism for async pf page ready
    notifications
  KVM: x86: announce KVM_FEATURE_ASYNC_PF_INT
  KVM: x86: Switch KVM guest to using interrupts for page ready APF
    delivery
  KVM: x86: drop KVM_PV_REASON_PAGE_READY case from
    kvm_handle_page_fault()
  KVM: x86: deprecate KVM_ASYNC_PF_SEND_ALWAYS

 Documentation/virt/kvm/cpuid.rst     |   6 ++
 Documentation/virt/kvm/msr.rst       | 120 +++++++++++++++------
 arch/s390/include/asm/kvm_host.h     |   4 +-
 arch/s390/kvm/kvm-s390.c             |   2 +-
 arch/x86/entry/entry_32.S            |   5 +
 arch/x86/entry/entry_64.S            |   5 +
 arch/x86/include/asm/hardirq.h       |   3 +
 arch/x86/include/asm/irq_vectors.h   |   6 +-
 arch/x86/include/asm/kvm_host.h      |  12 ++-
 arch/x86/include/asm/kvm_para.h      |  10 +-
 arch/x86/include/uapi/asm/kvm_para.h |  19 +++-
 arch/x86/kernel/irq.c                |   9 ++
 arch/x86/kernel/kvm.c                |  62 +++++++----
 arch/x86/kvm/cpuid.c                 |   3 +-
 arch/x86/kvm/mmu/mmu.c               |  19 ++--
 arch/x86/kvm/svm/nested.c            |   2 +-
 arch/x86/kvm/svm/svm.c               |   3 +-
 arch/x86/kvm/vmx/nested.c            |   2 +-
 arch/x86/kvm/vmx/vmx.c               |   5 +-
 arch/x86/kvm/x86.c                   | 149 ++++++++++++++++++---------
 include/linux/kvm_host.h             |   3 +
 include/uapi/linux/kvm.h             |   1 +
 virt/kvm/async_pf.c                  |  12 ++-
 virt/kvm/kvm_main.c                  |  19 +++-
 24 files changed, 344 insertions(+), 137 deletions(-)

-- 
2.25.4


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 01/10] Revert "KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously"
  2020-05-25 14:41 [PATCH v2 00/10] KVM: x86: Interrupt-based mechanism for async_pf 'page present' notifications Vitaly Kuznetsov
@ 2020-05-25 14:41 ` Vitaly Kuznetsov
  2020-05-25 14:41 ` [PATCH v2 02/10] KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info Vitaly Kuznetsov
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Vitaly Kuznetsov @ 2020-05-25 14:41 UTC (permalink / raw)
  To: kvm, x86
  Cc: Paolo Bonzini, Andy Lutomirski, Thomas Gleixner, Borislav Petkov,
	H. Peter Anvin, Wanpeng Li, Sean Christopherson, Jim Mattson,
	Vivek Goyal, Gavin Shan, Peter Zijlstra, linux-kernel

Commit 9a6e7c39810e (""KVM: async_pf: Fix #DF due to inject "Page not
Present" and "Page Ready" exceptions simultaneously") added a protection
against 'page ready' notification coming before 'page not present' is
delivered. This situation seems to be impossible since commit 2a266f23550b
("KVM MMU: check pending exception before injecting APF) which added
'vcpu->arch.exception.pending' check to kvm_can_do_async_pf.

On x86, kvm_arch_async_page_present() has only one call site:
kvm_check_async_pf_completion() loop and we only enter the loop when
kvm_arch_can_inject_async_page_present(vcpu) which when async pf msr
is enabled, translates into kvm_can_do_async_pf().

There is also one problem with the cancellation mechanism. We don't seem
to check that the 'page not present' notification we're canceling matches
the 'page ready' notification so in theory, we may erroneously drop two
valid events.

Revert the commit.

Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/x86.c | 23 +----------------------
 1 file changed, 1 insertion(+), 22 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c17e6eb9ad43..8e0bbe4f0d5b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10364,13 +10364,6 @@ static int apf_put_user(struct kvm_vcpu *vcpu, u32 val)
 				      sizeof(val));
 }
 
-static int apf_get_user(struct kvm_vcpu *vcpu, u32 *val)
-{
-
-	return kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.apf.data, val,
-				      sizeof(u32));
-}
-
 static bool kvm_can_deliver_async_pf(struct kvm_vcpu *vcpu)
 {
 	if (!vcpu->arch.apf.delivery_as_pf_vmexit && is_guest_mode(vcpu))
@@ -10435,7 +10428,6 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 				 struct kvm_async_pf *work)
 {
 	struct x86_exception fault;
-	u32 val;
 
 	if (work->wakeup_all)
 		work->arch.token = ~0; /* broadcast wakeup */
@@ -10444,19 +10436,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 	trace_kvm_async_pf_ready(work->arch.token, work->cr2_or_gpa);
 
 	if (vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED &&
-	    !apf_get_user(vcpu, &val)) {
-		if (val == KVM_PV_REASON_PAGE_NOT_PRESENT &&
-		    vcpu->arch.exception.pending &&
-		    vcpu->arch.exception.nr == PF_VECTOR &&
-		    !apf_put_user(vcpu, 0)) {
-			vcpu->arch.exception.injected = false;
-			vcpu->arch.exception.pending = false;
-			vcpu->arch.exception.nr = 0;
-			vcpu->arch.exception.has_error_code = false;
-			vcpu->arch.exception.error_code = 0;
-			vcpu->arch.exception.has_payload = false;
-			vcpu->arch.exception.payload = 0;
-		} else if (!apf_put_user(vcpu, KVM_PV_REASON_PAGE_READY)) {
+	    !apf_put_user(vcpu, KVM_PV_REASON_PAGE_READY)) {
 			fault.vector = PF_VECTOR;
 			fault.error_code_valid = true;
 			fault.error_code = 0;
@@ -10464,7 +10444,6 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 			fault.address = work->arch.token;
 			fault.async_page_fault = true;
 			kvm_inject_page_fault(vcpu, &fault);
-		}
 	}
 	vcpu->arch.apf.halted = false;
 	vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 02/10] KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info
  2020-05-25 14:41 [PATCH v2 00/10] KVM: x86: Interrupt-based mechanism for async_pf 'page present' notifications Vitaly Kuznetsov
  2020-05-25 14:41 ` [PATCH v2 01/10] Revert "KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously" Vitaly Kuznetsov
@ 2020-05-25 14:41 ` Vitaly Kuznetsov
  2020-05-26 18:27   ` Vivek Goyal
  2020-05-25 14:41 ` [PATCH v2 03/10] KVM: rename kvm_arch_can_inject_async_page_present() to kvm_arch_can_dequeue_async_page_present() Vitaly Kuznetsov
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Vitaly Kuznetsov @ 2020-05-25 14:41 UTC (permalink / raw)
  To: kvm, x86
  Cc: Paolo Bonzini, Andy Lutomirski, Thomas Gleixner, Borislav Petkov,
	H. Peter Anvin, Wanpeng Li, Sean Christopherson, Jim Mattson,
	Vivek Goyal, Gavin Shan, Peter Zijlstra, linux-kernel

Currently, APF mechanism relies on the #PF abuse where the token is being
passed through CR2. If we switch to using interrupts to deliver page-ready
notifications we need a different way to pass the data. Extent the existing
'struct kvm_vcpu_pv_apf_data' with token information for page-ready
notifications.

While on it, rename 'reason' to 'flags'. This doesn't change the semantics
as we only have reasons '1' and '2' and these can be treated as bit flags
but KVM_PV_REASON_PAGE_READY is going away with interrupt based delivery
making 'reason' name misleading.

The newly introduced apf_put_user_ready() temporary puts both flags and
token information, this will be changed to put token only when we switch
to interrupt based notifications.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/include/asm/kvm_host.h      |  2 +-
 arch/x86/include/asm/kvm_para.h      |  4 ++--
 arch/x86/include/uapi/asm/kvm_para.h |  5 +++--
 arch/x86/kernel/kvm.c                | 16 ++++++++--------
 arch/x86/kvm/mmu/mmu.c               |  6 +++---
 arch/x86/kvm/svm/nested.c            |  2 +-
 arch/x86/kvm/svm/svm.c               |  3 ++-
 arch/x86/kvm/vmx/nested.c            |  2 +-
 arch/x86/kvm/vmx/vmx.c               |  5 +++--
 arch/x86/kvm/x86.c                   | 17 +++++++++++++----
 10 files changed, 37 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0a6b35353fc7..c195f63c1086 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -767,7 +767,7 @@ struct kvm_vcpu_arch {
 		u64 msr_val;
 		u32 id;
 		bool send_user_only;
-		u32 host_apf_reason;
+		u32 host_apf_flags;
 		unsigned long nested_apf_token;
 		bool delivery_as_pf_vmexit;
 	} apf;
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 9b4df6eaa11a..2a3102fee189 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -90,7 +90,7 @@ unsigned int kvm_arch_para_features(void);
 unsigned int kvm_arch_para_hints(void);
 void kvm_async_pf_task_wait(u32 token, int interrupt_kernel);
 void kvm_async_pf_task_wake(u32 token);
-u32 kvm_read_and_reset_pf_reason(void);
+u32 kvm_read_and_reset_apf_flags(void);
 extern void kvm_disable_steal_time(void);
 void do_async_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned long address);
 
@@ -121,7 +121,7 @@ static inline unsigned int kvm_arch_para_hints(void)
 	return 0;
 }
 
-static inline u32 kvm_read_and_reset_pf_reason(void)
+static inline u32 kvm_read_and_reset_apf_flags(void)
 {
 	return 0;
 }
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 2a8e0b6b9805..d1cd5c0f431a 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -112,8 +112,9 @@ struct kvm_mmu_op_release_pt {
 #define KVM_PV_REASON_PAGE_READY 2
 
 struct kvm_vcpu_pv_apf_data {
-	__u32 reason;
-	__u8 pad[60];
+	__u32 flags;
+	__u32 token; /* Used for page ready notification only */
+	__u8 pad[56];
 	__u32 enabled;
 };
 
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 6efe0410fb72..340df5dab30d 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -228,24 +228,24 @@ void kvm_async_pf_task_wake(u32 token)
 }
 EXPORT_SYMBOL_GPL(kvm_async_pf_task_wake);
 
-u32 kvm_read_and_reset_pf_reason(void)
+u32 kvm_read_and_reset_apf_flags(void)
 {
-	u32 reason = 0;
+	u32 flags = 0;
 
 	if (__this_cpu_read(apf_reason.enabled)) {
-		reason = __this_cpu_read(apf_reason.reason);
-		__this_cpu_write(apf_reason.reason, 0);
+		flags = __this_cpu_read(apf_reason.flags);
+		__this_cpu_write(apf_reason.flags, 0);
 	}
 
-	return reason;
+	return flags;
 }
-EXPORT_SYMBOL_GPL(kvm_read_and_reset_pf_reason);
-NOKPROBE_SYMBOL(kvm_read_and_reset_pf_reason);
+EXPORT_SYMBOL_GPL(kvm_read_and_reset_apf_flags);
+NOKPROBE_SYMBOL(kvm_read_and_reset_apf_flags);
 
 dotraplinkage void
 do_async_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned long address)
 {
-	switch (kvm_read_and_reset_pf_reason()) {
+	switch (kvm_read_and_reset_apf_flags()) {
 	default:
 		do_page_fault(regs, error_code, address);
 		break;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8071952e9cf2..7fa5253237b2 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4186,7 +4186,7 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
 #endif
 
 	vcpu->arch.l1tf_flush_l1d = true;
-	switch (vcpu->arch.apf.host_apf_reason) {
+	switch (vcpu->arch.apf.host_apf_flags) {
 	default:
 		trace_kvm_page_fault(fault_address, error_code);
 
@@ -4196,13 +4196,13 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
 				insn_len);
 		break;
 	case KVM_PV_REASON_PAGE_NOT_PRESENT:
-		vcpu->arch.apf.host_apf_reason = 0;
+		vcpu->arch.apf.host_apf_flags = 0;
 		local_irq_disable();
 		kvm_async_pf_task_wait(fault_address, 0);
 		local_irq_enable();
 		break;
 	case KVM_PV_REASON_PAGE_READY:
-		vcpu->arch.apf.host_apf_reason = 0;
+		vcpu->arch.apf.host_apf_flags = 0;
 		local_irq_disable();
 		kvm_async_pf_task_wake(fault_address);
 		local_irq_enable();
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index 9a2a62e5afeb..c04adbbdbd20 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -835,7 +835,7 @@ int nested_svm_exit_special(struct vcpu_svm *svm)
 		break;
 	case SVM_EXIT_EXCP_BASE + PF_VECTOR:
 		/* When we're shadowing, trap PFs, but not async PF */
-		if (!npt_enabled && svm->vcpu.arch.apf.host_apf_reason == 0)
+		if (!npt_enabled && svm->vcpu.arch.apf.host_apf_flags == 0)
 			return NESTED_EXIT_HOST;
 		break;
 	default:
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index a862c768fd54..70d136e9e075 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3399,7 +3399,8 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	/* if exit due to PF check for async PF */
 	if (svm->vmcb->control.exit_code == SVM_EXIT_EXCP_BASE + PF_VECTOR)
-		svm->vcpu.arch.apf.host_apf_reason = kvm_read_and_reset_pf_reason();
+		svm->vcpu.arch.apf.host_apf_flags =
+			kvm_read_and_reset_apf_flags();
 
 	if (npt_enabled) {
 		vcpu->arch.regs_avail &= ~(1 << VCPU_EXREG_PDPTR);
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index e44f33c82332..c82d59070a9c 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5582,7 +5582,7 @@ bool nested_vmx_exit_reflected(struct kvm_vcpu *vcpu, u32 exit_reason)
 		if (is_nmi(intr_info))
 			return false;
 		else if (is_page_fault(intr_info))
-			return !vmx->vcpu.arch.apf.host_apf_reason && enable_ept;
+			return !vmx->vcpu.arch.apf.host_apf_flags && enable_ept;
 		else if (is_debug(intr_info) &&
 			 vcpu->guest_debug &
 			 (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 89c766fad889..4a82319353f2 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4662,7 +4662,7 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
 	if (is_page_fault(intr_info)) {
 		cr2 = vmcs_readl(EXIT_QUALIFICATION);
 		/* EPT won't cause page fault directly */
-		WARN_ON_ONCE(!vcpu->arch.apf.host_apf_reason && enable_ept);
+		WARN_ON_ONCE(!vcpu->arch.apf.host_apf_flags && enable_ept);
 		return kvm_handle_page_fault(vcpu, error_code, cr2, NULL, 0);
 	}
 
@@ -6234,7 +6234,8 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
 
 	/* if exit due to PF check for async PF */
 	if (is_page_fault(vmx->exit_intr_info)) {
-		vmx->vcpu.arch.apf.host_apf_reason = kvm_read_and_reset_pf_reason();
+		vmx->vcpu.arch.apf.host_apf_flags =
+			kvm_read_and_reset_apf_flags();
 	/* Handle machine checks before interrupts are enabled */
 	} else if (is_machine_check(vmx->exit_intr_info)) {
 		kvm_machine_check();
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8e0bbe4f0d5b..a3102406102c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2654,7 +2654,7 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
 	}
 
 	if (kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.apf.data, gpa,
-					sizeof(u32)))
+					sizeof(u64)))
 		return 1;
 
 	vcpu->arch.apf.send_user_only = !(data & KVM_ASYNC_PF_SEND_ALWAYS);
@@ -10357,8 +10357,17 @@ static void kvm_del_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
 	}
 }
 
-static int apf_put_user(struct kvm_vcpu *vcpu, u32 val)
+static inline int apf_put_user_notpresent(struct kvm_vcpu *vcpu)
 {
+	u32 reason = KVM_PV_REASON_PAGE_NOT_PRESENT;
+
+	return kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.apf.data, &reason,
+				      sizeof(reason));
+}
+
+static inline int apf_put_user_ready(struct kvm_vcpu *vcpu, u32 token)
+{
+	u64 val = (u64)token << 32 | KVM_PV_REASON_PAGE_READY;
 
 	return kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.apf.data, &val,
 				      sizeof(val));
@@ -10403,7 +10412,7 @@ void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
 	kvm_add_async_pf_gfn(vcpu, work->arch.gfn);
 
 	if (kvm_can_deliver_async_pf(vcpu) &&
-	    !apf_put_user(vcpu, KVM_PV_REASON_PAGE_NOT_PRESENT)) {
+	    !apf_put_user_notpresent(vcpu)) {
 		fault.vector = PF_VECTOR;
 		fault.error_code_valid = true;
 		fault.error_code = 0;
@@ -10436,7 +10445,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 	trace_kvm_async_pf_ready(work->arch.token, work->cr2_or_gpa);
 
 	if (vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED &&
-	    !apf_put_user(vcpu, KVM_PV_REASON_PAGE_READY)) {
+	    !apf_put_user_ready(vcpu, work->arch.token)) {
 			fault.vector = PF_VECTOR;
 			fault.error_code_valid = true;
 			fault.error_code = 0;
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 03/10] KVM: rename kvm_arch_can_inject_async_page_present() to kvm_arch_can_dequeue_async_page_present()
  2020-05-25 14:41 [PATCH v2 00/10] KVM: x86: Interrupt-based mechanism for async_pf 'page present' notifications Vitaly Kuznetsov
  2020-05-25 14:41 ` [PATCH v2 01/10] Revert "KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously" Vitaly Kuznetsov
  2020-05-25 14:41 ` [PATCH v2 02/10] KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info Vitaly Kuznetsov
@ 2020-05-25 14:41 ` Vitaly Kuznetsov
  2020-05-25 14:41 ` [PATCH v2 04/10] KVM: introduce kvm_read_guest_offset_cached() Vitaly Kuznetsov
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Vitaly Kuznetsov @ 2020-05-25 14:41 UTC (permalink / raw)
  To: kvm, x86
  Cc: Paolo Bonzini, Andy Lutomirski, Thomas Gleixner, Borislav Petkov,
	H. Peter Anvin, Wanpeng Li, Sean Christopherson, Jim Mattson,
	Vivek Goyal, Gavin Shan, Peter Zijlstra, linux-kernel

An innocent reader of the following x86 KVM code:

bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu)
{
        if (!(vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED))
                return true;
...

may get very confused: if APF mechanism is not enabled, why do we report
that we 'can inject async page present'? In reality, upon injection
kvm_arch_async_page_present() will check the same condition again and,
in case APF is disabled, will just drop the item. This is fine as the
guest which deliberately disabled APF doesn't expect to get any APF
notifications.

Rename kvm_arch_can_inject_async_page_present() to
kvm_arch_can_dequeue_async_page_present() to make it clear what we are
checking: if the item can be dequeued (meaning either injected or just
dropped).

On s390 kvm_arch_can_inject_async_page_present() always returns 'true' so
the rename doesn't matter much.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/s390/include/asm/kvm_host.h | 2 +-
 arch/s390/kvm/kvm-s390.c         | 2 +-
 arch/x86/include/asm/kvm_host.h  | 2 +-
 arch/x86/kvm/x86.c               | 2 +-
 virt/kvm/async_pf.c              | 2 +-
 5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index d6bcd34f3ec3..5ba9968c3436 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -971,7 +971,7 @@ struct kvm_arch_async_pf {
 	unsigned long pfault_token;
 };
 
-bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu);
+bool kvm_arch_can_dequeue_async_page_present(struct kvm_vcpu *vcpu);
 
 void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
 			       struct kvm_async_pf *work);
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index d05bb040fd42..64128a336c24 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -3944,7 +3944,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
 	/* s390 will always inject the page directly */
 }
 
-bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu)
+bool kvm_arch_can_dequeue_async_page_present(struct kvm_vcpu *vcpu)
 {
 	/*
 	 * s390 will always inject the page directly,
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c195f63c1086..459dc824b10b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1641,7 +1641,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 				 struct kvm_async_pf *work);
 void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
 			       struct kvm_async_pf *work);
-bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu);
+bool kvm_arch_can_dequeue_async_page_present(struct kvm_vcpu *vcpu);
 extern bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
 
 int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a3102406102c..bfa7e07fd6fd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10458,7 +10458,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 	vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
 }
 
-bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu)
+bool kvm_arch_can_dequeue_async_page_present(struct kvm_vcpu *vcpu)
 {
 	if (!(vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED))
 		return true;
diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index 15e5b037f92d..8a8770c7c889 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -135,7 +135,7 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu)
 	struct kvm_async_pf *work;
 
 	while (!list_empty_careful(&vcpu->async_pf.done) &&
-	      kvm_arch_can_inject_async_page_present(vcpu)) {
+	      kvm_arch_can_dequeue_async_page_present(vcpu)) {
 		spin_lock(&vcpu->async_pf.lock);
 		work = list_first_entry(&vcpu->async_pf.done, typeof(*work),
 					      link);
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 04/10] KVM: introduce kvm_read_guest_offset_cached()
  2020-05-25 14:41 [PATCH v2 00/10] KVM: x86: Interrupt-based mechanism for async_pf 'page present' notifications Vitaly Kuznetsov
                   ` (2 preceding siblings ...)
  2020-05-25 14:41 ` [PATCH v2 03/10] KVM: rename kvm_arch_can_inject_async_page_present() to kvm_arch_can_dequeue_async_page_present() Vitaly Kuznetsov
@ 2020-05-25 14:41 ` Vitaly Kuznetsov
  2020-05-25 14:41 ` [PATCH v2 05/10] KVM: x86: interrupt based APF 'page ready' event delivery Vitaly Kuznetsov
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Vitaly Kuznetsov @ 2020-05-25 14:41 UTC (permalink / raw)
  To: kvm, x86
  Cc: Paolo Bonzini, Andy Lutomirski, Thomas Gleixner, Borislav Petkov,
	H. Peter Anvin, Wanpeng Li, Sean Christopherson, Jim Mattson,
	Vivek Goyal, Gavin Shan, Peter Zijlstra, linux-kernel

We already have kvm_write_guest_offset_cached(), introduce read analogue.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 include/linux/kvm_host.h |  3 +++
 virt/kvm/kvm_main.c      | 19 ++++++++++++++-----
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 131cc1527d68..8020f7ab7409 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -733,6 +733,9 @@ int kvm_read_guest_page(struct kvm *kvm, gfn_t gfn, void *data, int offset,
 int kvm_read_guest(struct kvm *kvm, gpa_t gpa, void *data, unsigned long len);
 int kvm_read_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
 			   void *data, unsigned long len);
+int kvm_read_guest_offset_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
+				 void *data, unsigned int offset,
+				 unsigned long len);
 int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, const void *data,
 			 int offset, int len);
 int kvm_write_guest(struct kvm *kvm, gpa_t gpa, const void *data,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 731c1e517716..7686819687da 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2511,13 +2511,15 @@ int kvm_write_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
 }
 EXPORT_SYMBOL_GPL(kvm_write_guest_cached);
 
-int kvm_read_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
-			   void *data, unsigned long len)
+int kvm_read_guest_offset_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
+				 void *data, unsigned int offset,
+				 unsigned long len)
 {
 	struct kvm_memslots *slots = kvm_memslots(kvm);
 	int r;
+	gpa_t gpa = ghc->gpa + offset;
 
-	BUG_ON(len > ghc->len);
+	BUG_ON(len + offset > ghc->len);
 
 	if (slots->generation != ghc->generation) {
 		if (__kvm_gfn_to_hva_cache_init(slots, ghc, ghc->gpa, ghc->len))
@@ -2528,14 +2530,21 @@ int kvm_read_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
 		return -EFAULT;
 
 	if (unlikely(!ghc->memslot))
-		return kvm_read_guest(kvm, ghc->gpa, data, len);
+		return kvm_read_guest(kvm, gpa, data, len);
 
-	r = __copy_from_user(data, (void __user *)ghc->hva, len);
+	r = __copy_from_user(data, (void __user *)ghc->hva + offset, len);
 	if (r)
 		return -EFAULT;
 
 	return 0;
 }
+EXPORT_SYMBOL_GPL(kvm_read_guest_offset_cached);
+
+int kvm_read_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
+			  void *data, unsigned long len)
+{
+	return kvm_read_guest_offset_cached(kvm, ghc, data, 0, len);
+}
 EXPORT_SYMBOL_GPL(kvm_read_guest_cached);
 
 int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len)
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 05/10] KVM: x86: interrupt based APF 'page ready' event delivery
  2020-05-25 14:41 [PATCH v2 00/10] KVM: x86: Interrupt-based mechanism for async_pf 'page present' notifications Vitaly Kuznetsov
                   ` (3 preceding siblings ...)
  2020-05-25 14:41 ` [PATCH v2 04/10] KVM: introduce kvm_read_guest_offset_cached() Vitaly Kuznetsov
@ 2020-05-25 14:41 ` Vitaly Kuznetsov
  2020-06-09 19:10   ` Vivek Goyal
  2020-05-25 14:41 ` [PATCH v2 06/10] KVM: x86: acknowledgment mechanism for async pf page ready notifications Vitaly Kuznetsov
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Vitaly Kuznetsov @ 2020-05-25 14:41 UTC (permalink / raw)
  To: kvm, x86
  Cc: Paolo Bonzini, Andy Lutomirski, Thomas Gleixner, Borislav Petkov,
	H. Peter Anvin, Wanpeng Li, Sean Christopherson, Jim Mattson,
	Vivek Goyal, Gavin Shan, Peter Zijlstra, linux-kernel

Concerns were expressed around APF delivery via synthetic #PF exception as
in some cases such delivery may collide with real page fault. For 'page
ready' notifications we can easily switch to using an interrupt instead.
Introduce new MSR_KVM_ASYNC_PF_INT mechanism and deprecate the legacy one.

One notable difference between the two mechanisms is that interrupt may not
get handled immediately so whenever we would like to deliver next event
(regardless of its type) we must be sure the guest had read and cleared
previous event in the slot.

While on it, get rid on 'type 1/type 2' names for APF events in the
documentation as they are causing confusion. Use 'page not present'
and 'page ready' everywhere instead.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 Documentation/virt/kvm/msr.rst       | 102 +++++++++++++++++++--------
 arch/x86/include/asm/kvm_host.h      |   4 +-
 arch/x86/include/uapi/asm/kvm_para.h |  12 +++-
 arch/x86/kvm/x86.c                   |  93 +++++++++++++++++-------
 4 files changed, 152 insertions(+), 59 deletions(-)

diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
index 33892036672d..be08df12f31a 100644
--- a/Documentation/virt/kvm/msr.rst
+++ b/Documentation/virt/kvm/msr.rst
@@ -190,41 +190,68 @@ MSR_KVM_ASYNC_PF_EN:
 	0x4b564d02
 
 data:
-	Bits 63-6 hold 64-byte aligned physical address of a
-	64 byte memory area which must be in guest RAM and must be
-	zeroed. Bits 5-3 are reserved and should be zero. Bit 0 is 1
-	when asynchronous page faults are enabled on the vcpu 0 when
-	disabled. Bit 1 is 1 if asynchronous page faults can be injected
-	when vcpu is in cpl == 0. Bit 2 is 1 if asynchronous page faults
-	are delivered to L1 as #PF vmexits.  Bit 2 can be set only if
-	KVM_FEATURE_ASYNC_PF_VMEXIT is present in CPUID.
-
-	First 4 byte of 64 byte memory location will be written to by
-	the hypervisor at the time of asynchronous page fault (APF)
-	injection to indicate type of asynchronous page fault. Value
-	of 1 means that the page referred to by the page fault is not
-	present. Value 2 means that the page is now available. Disabling
-	interrupt inhibits APFs. Guest must not enable interrupt
-	before the reason is read, or it may be overwritten by another
-	APF. Since APF uses the same exception vector as regular page
-	fault guest must reset the reason to 0 before it does
-	something that can generate normal page fault.  If during page
-	fault APF reason is 0 it means that this is regular page
-	fault.
-
-	During delivery of type 1 APF cr2 contains a token that will
-	be used to notify a guest when missing page becomes
-	available. When page becomes available type 2 APF is sent with
-	cr2 set to the token associated with the page. There is special
-	kind of token 0xffffffff which tells vcpu that it should wake
-	up all processes waiting for APFs and no individual type 2 APFs
-	will be sent.
+	Asynchronous page fault (APF) control MSR.
+
+	Bits 63-6 hold 64-byte aligned physical address of a 64 byte memory area
+	which must be in guest RAM and must be zeroed. This memory is expected
+	to hold a copy of the following structure::
+
+	  struct kvm_vcpu_pv_apf_data {
+		/* Used for 'page not present' events delivered via #PF */
+		__u32 flags;
+
+		/* Used for 'page ready' events delivered via interrupt notification */
+		__u32 token;
+
+		__u8 pad[56];
+		__u32 enabled;
+	  };
+
+	Bits 5-4 of the MSR are reserved and should be zero. Bit 0 is set to 1
+	when asynchronous page faults are enabled on the vcpu, 0 when disabled.
+	Bit 1 is 1 if asynchronous page faults can be injected when vcpu is in
+	cpl == 0. Bit 2 is 1 if asynchronous page faults are delivered to L1 as
+	#PF vmexits.  Bit 2 can be set only if KVM_FEATURE_ASYNC_PF_VMEXIT is
+	present in CPUID. Bit 3 enables interrupt based delivery of 'page ready'
+	events.
+
+	'Page not present' events are currently always delivered as synthetic
+	#PF exception. During delivery of these events APF CR2 register contains
+	a token that will be used to notify the guest when missing page becomes
+	available. Also, to make it possible to distinguish between real #PF and
+	APF, first 4 bytes of 64 byte memory location ('flags') will be written
+	to by the hypervisor at the time of injection. Only first bit of 'flags'
+	is currently supported, when set, it indicates that the guest is dealing
+	with asynchronous 'page not present' event. If during a page fault APF
+	'flags' is '0' it means that this is regular page fault. Guest is
+	supposed to clear 'flags' when it is done handling #PF exception so the
+	next event can be delivered.
+
+	Note, since APF 'page not present' events use the same exception vector
+	as regular page fault, guest must reset 'flags' to '0' before it does
+	something that can generate normal page fault.
+
+	Bytes 5-7 of 64 byte memory location ('token') will be written to by the
+	hypervisor at the time of APF 'page ready' event injection. The content
+	of these bytes is a token which was previously delivered as 'page not
+	present' event. The event indicates the page in now available. Guest is
+	supposed to write '0' to 'token' when it is done handling 'page ready'
+	event so the next one can be delivered.
+
+	Note, MSR_KVM_ASYNC_PF_INT MSR specifying the interrupt vector for 'page
+	ready' APF delivery needs to be written to before enabling APF mechanism
+	in MSR_KVM_ASYNC_PF_EN or interrupt #0 can get injected.
+
+	Note, previously, 'page ready' events were delivered via the same #PF
+	exception as 'page not present' events but this is now deprecated. If
+	bit 3 (interrupt based delivery) is not set APF events are not delivered.
 
 	If APF is disabled while there are outstanding APFs, they will
 	not be delivered.
 
-	Currently type 2 APF will be always delivered on the same vcpu as
-	type 1 was, but guest should not rely on that.
+	Currently 'page ready' APF events will be always delivered on the
+	same vcpu as 'page not present' event was, but guest should not rely on
+	that.
 
 MSR_KVM_STEAL_TIME:
 	0x4b564d03
@@ -319,3 +346,16 @@ data:
 
 	KVM guests can request the host not to poll on HLT, for example if
 	they are performing polling themselves.
+
+MSR_KVM_ASYNC_PF_INT:
+	0x4b564d06
+
+data:
+	Second asynchronous page fault (APF) control MSR.
+
+	Bits 0-7: APIC vector for delivery of 'page ready' APF events.
+	Bits 8-63: Reserved
+
+	Interrupt vector for asynchnonous 'page ready' notifications delivery.
+	The vector has to be set up before asynchronous page fault mechanism
+	is enabled in MSR_KVM_ASYNC_PF_EN.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 459dc824b10b..c2a70e25a1f3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -764,7 +764,9 @@ struct kvm_vcpu_arch {
 		bool halted;
 		gfn_t gfns[roundup_pow_of_two(ASYNC_PF_PER_VCPU)];
 		struct gfn_to_hva_cache data;
-		u64 msr_val;
+		u64 msr_en_val; /* MSR_KVM_ASYNC_PF_EN */
+		u64 msr_int_val; /* MSR_KVM_ASYNC_PF_INT */
+		u16 vec;
 		u32 id;
 		bool send_user_only;
 		u32 host_apf_flags;
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index d1cd5c0f431a..1d37d616b1fc 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -50,6 +50,7 @@
 #define MSR_KVM_STEAL_TIME  0x4b564d03
 #define MSR_KVM_PV_EOI_EN      0x4b564d04
 #define MSR_KVM_POLL_CONTROL	0x4b564d05
+#define MSR_KVM_ASYNC_PF_INT	0x4b564d06
 
 struct kvm_steal_time {
 	__u64 steal;
@@ -81,6 +82,11 @@ struct kvm_clock_pairing {
 #define KVM_ASYNC_PF_ENABLED			(1 << 0)
 #define KVM_ASYNC_PF_SEND_ALWAYS		(1 << 1)
 #define KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT	(1 << 2)
+#define KVM_ASYNC_PF_DELIVERY_AS_INT		(1 << 3)
+
+/* MSR_KVM_ASYNC_PF_INT */
+#define KVM_ASYNC_PF_VEC_MASK			GENMASK(7, 0)
+
 
 /* Operations for KVM_HC_MMU_OP */
 #define KVM_MMU_OP_WRITE_PTE            1
@@ -112,8 +118,12 @@ struct kvm_mmu_op_release_pt {
 #define KVM_PV_REASON_PAGE_READY 2
 
 struct kvm_vcpu_pv_apf_data {
+	/* Used for 'page not present' events delivered via #PF */
 	__u32 flags;
-	__u32 token; /* Used for page ready notification only */
+
+	/* Used for 'page ready' events delivered via interrupt notification */
+	__u32 token;
+
 	__u8 pad[56];
 	__u32 enabled;
 };
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bfa7e07fd6fd..99fd347849b2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1235,7 +1235,7 @@ static const u32 emulated_msrs_all[] = {
 	HV_X64_MSR_TSC_EMULATION_STATUS,
 
 	MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
-	MSR_KVM_PV_EOI_EN,
+	MSR_KVM_PV_EOI_EN, MSR_KVM_ASYNC_PF_INT,
 
 	MSR_IA32_TSC_ADJUST,
 	MSR_IA32_TSCDEADLINE,
@@ -2637,17 +2637,24 @@ static int xen_hvm_config(struct kvm_vcpu *vcpu, u64 data)
 	return r;
 }
 
+static inline bool kvm_pv_async_pf_enabled(struct kvm_vcpu *vcpu)
+{
+	u64 mask = KVM_ASYNC_PF_ENABLED | KVM_ASYNC_PF_DELIVERY_AS_INT;
+
+	return (vcpu->arch.apf.msr_en_val & mask) == mask;
+}
+
 static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
 {
 	gpa_t gpa = data & ~0x3f;
 
-	/* Bits 3:5 are reserved, Should be zero */
-	if (data & 0x38)
+	/* Bits 4:5 are reserved, Should be zero */
+	if (data & 0x30)
 		return 1;
 
-	vcpu->arch.apf.msr_val = data;
+	vcpu->arch.apf.msr_en_val = data;
 
-	if (!(data & KVM_ASYNC_PF_ENABLED)) {
+	if (!kvm_pv_async_pf_enabled(vcpu)) {
 		kvm_clear_async_pf_completion_queue(vcpu);
 		kvm_async_pf_hash_reset(vcpu);
 		return 0;
@@ -2659,7 +2666,25 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
 
 	vcpu->arch.apf.send_user_only = !(data & KVM_ASYNC_PF_SEND_ALWAYS);
 	vcpu->arch.apf.delivery_as_pf_vmexit = data & KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT;
+
 	kvm_async_pf_wakeup_all(vcpu);
+
+	return 0;
+}
+
+static int kvm_pv_enable_async_pf_int(struct kvm_vcpu *vcpu, u64 data)
+{
+	/* Bits 8-63 are reserved */
+	if (data >> 8)
+		return 1;
+
+	if (!lapic_in_kernel(vcpu))
+		return 1;
+
+	vcpu->arch.apf.msr_int_val = data;
+
+	vcpu->arch.apf.vec = data & KVM_ASYNC_PF_VEC_MASK;
+
 	return 0;
 }
 
@@ -2875,6 +2900,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		if (kvm_pv_enable_async_pf(vcpu, data))
 			return 1;
 		break;
+	case MSR_KVM_ASYNC_PF_INT:
+		if (kvm_pv_enable_async_pf_int(vcpu, data))
+			return 1;
+		break;
 	case MSR_KVM_STEAL_TIME:
 
 		if (unlikely(!sched_info_on()))
@@ -3149,7 +3178,10 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		msr_info->data = vcpu->arch.time;
 		break;
 	case MSR_KVM_ASYNC_PF_EN:
-		msr_info->data = vcpu->arch.apf.msr_val;
+		msr_info->data = vcpu->arch.apf.msr_en_val;
+		break;
+	case MSR_KVM_ASYNC_PF_INT:
+		msr_info->data = vcpu->arch.apf.msr_int_val;
 		break;
 	case MSR_KVM_STEAL_TIME:
 		msr_info->data = vcpu->arch.st.msr_val;
@@ -9502,7 +9534,8 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 	vcpu->arch.cr2 = 0;
 
 	kvm_make_request(KVM_REQ_EVENT, vcpu);
-	vcpu->arch.apf.msr_val = 0;
+	vcpu->arch.apf.msr_en_val = 0;
+	vcpu->arch.apf.msr_int_val = 0;
 	vcpu->arch.st.msr_val = 0;
 
 	kvmclock_reset(vcpu);
@@ -10367,10 +10400,22 @@ static inline int apf_put_user_notpresent(struct kvm_vcpu *vcpu)
 
 static inline int apf_put_user_ready(struct kvm_vcpu *vcpu, u32 token)
 {
-	u64 val = (u64)token << 32 | KVM_PV_REASON_PAGE_READY;
+	unsigned int offset = offsetof(struct kvm_vcpu_pv_apf_data, token);
 
-	return kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.apf.data, &val,
-				      sizeof(val));
+	return kvm_write_guest_offset_cached(vcpu->kvm, &vcpu->arch.apf.data,
+					     &token, offset, sizeof(token));
+}
+
+static inline bool apf_pageready_slot_free(struct kvm_vcpu *vcpu)
+{
+	unsigned int offset = offsetof(struct kvm_vcpu_pv_apf_data, token);
+	u32 val;
+
+	if (kvm_read_guest_offset_cached(vcpu->kvm, &vcpu->arch.apf.data,
+					 &val, offset, sizeof(val)))
+		return false;
+
+	return !val;
 }
 
 static bool kvm_can_deliver_async_pf(struct kvm_vcpu *vcpu)
@@ -10378,9 +10423,8 @@ static bool kvm_can_deliver_async_pf(struct kvm_vcpu *vcpu)
 	if (!vcpu->arch.apf.delivery_as_pf_vmexit && is_guest_mode(vcpu))
 		return false;
 
-	if (!(vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED) ||
-	    (vcpu->arch.apf.send_user_only &&
-	     kvm_x86_ops.get_cpl(vcpu) == 0))
+	if (!kvm_pv_async_pf_enabled(vcpu) ||
+	    (vcpu->arch.apf.send_user_only && kvm_x86_ops.get_cpl(vcpu) == 0))
 		return false;
 
 	return true;
@@ -10436,7 +10480,10 @@ void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
 void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 				 struct kvm_async_pf *work)
 {
-	struct x86_exception fault;
+	struct kvm_lapic_irq irq = {
+		.delivery_mode = APIC_DM_FIXED,
+		.vector = vcpu->arch.apf.vec
+	};
 
 	if (work->wakeup_all)
 		work->arch.token = ~0; /* broadcast wakeup */
@@ -10444,26 +10491,20 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 		kvm_del_async_pf_gfn(vcpu, work->arch.gfn);
 	trace_kvm_async_pf_ready(work->arch.token, work->cr2_or_gpa);
 
-	if (vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED &&
-	    !apf_put_user_ready(vcpu, work->arch.token)) {
-			fault.vector = PF_VECTOR;
-			fault.error_code_valid = true;
-			fault.error_code = 0;
-			fault.nested_page_fault = false;
-			fault.address = work->arch.token;
-			fault.async_page_fault = true;
-			kvm_inject_page_fault(vcpu, &fault);
-	}
+	if (kvm_pv_async_pf_enabled(vcpu) &&
+	    !apf_put_user_ready(vcpu, work->arch.token))
+		kvm_apic_set_irq(vcpu, &irq, NULL);
+
 	vcpu->arch.apf.halted = false;
 	vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
 }
 
 bool kvm_arch_can_dequeue_async_page_present(struct kvm_vcpu *vcpu)
 {
-	if (!(vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED))
+	if (!kvm_pv_async_pf_enabled(vcpu))
 		return true;
 	else
-		return kvm_can_do_async_pf(vcpu);
+		return apf_pageready_slot_free(vcpu);
 }
 
 void kvm_arch_start_assignment(struct kvm *kvm)
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 06/10] KVM: x86: acknowledgment mechanism for async pf page ready notifications
  2020-05-25 14:41 [PATCH v2 00/10] KVM: x86: Interrupt-based mechanism for async_pf 'page present' notifications Vitaly Kuznetsov
                   ` (4 preceding siblings ...)
  2020-05-25 14:41 ` [PATCH v2 05/10] KVM: x86: interrupt based APF 'page ready' event delivery Vitaly Kuznetsov
@ 2020-05-25 14:41 ` Vitaly Kuznetsov
  2020-05-28 11:29   ` Paolo Bonzini
  2020-05-25 14:41 ` [PATCH v2 07/10] KVM: x86: announce KVM_FEATURE_ASYNC_PF_INT Vitaly Kuznetsov
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Vitaly Kuznetsov @ 2020-05-25 14:41 UTC (permalink / raw)
  To: kvm, x86
  Cc: Paolo Bonzini, Andy Lutomirski, Thomas Gleixner, Borislav Petkov,
	H. Peter Anvin, Wanpeng Li, Sean Christopherson, Jim Mattson,
	Vivek Goyal, Gavin Shan, Peter Zijlstra, linux-kernel

If two page ready notifications happen back to back the second one is not
delivered and the only mechanism we currently have is
kvm_check_async_pf_completion() check in vcpu_run() loop. The check will
only be performed with the next vmexit when it happens and in some cases
it may take a while. With interrupt based page ready notification delivery
the situation is even worse: unlike exceptions, interrupts are not handled
immediately so we must check if the slot is empty. This is slow and
unnecessary. Introduce dedicated MSR_KVM_ASYNC_PF_ACK MSR to communicate
the fact that the slot is free and host should check its notification
queue. Mandate using it for interrupt based 'page ready' APF event
delivery.

As kvm_check_async_pf_completion() is going away from vcpu_run() we need
a way to communicate the fact that vcpu->async_pf.done queue has
transitioned from empty to non-empty state. Introduce
kvm_arch_async_page_present_queued() and KVM_REQ_APF_READY to do the job.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 Documentation/virt/kvm/msr.rst       | 16 +++++++++++++++-
 arch/s390/include/asm/kvm_host.h     |  2 ++
 arch/x86/include/asm/kvm_host.h      |  3 +++
 arch/x86/include/uapi/asm/kvm_para.h |  1 +
 arch/x86/kvm/x86.c                   | 26 ++++++++++++++++++++++----
 virt/kvm/async_pf.c                  | 10 ++++++++++
 6 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
index be08df12f31a..8ea3fbcc67fd 100644
--- a/Documentation/virt/kvm/msr.rst
+++ b/Documentation/virt/kvm/msr.rst
@@ -236,7 +236,10 @@ data:
 	of these bytes is a token which was previously delivered as 'page not
 	present' event. The event indicates the page in now available. Guest is
 	supposed to write '0' to 'token' when it is done handling 'page ready'
-	event so the next one can be delivered.
+	event so the next one can be delivered.  It is also supposed to write
+	'1' to MSR_KVM_ASYNC_PF_ACK every time after clearing the location,
+	this forces KVM to re-scan its queue and deliver next pending
+	notification.
 
 	Note, MSR_KVM_ASYNC_PF_INT MSR specifying the interrupt vector for 'page
 	ready' APF delivery needs to be written to before enabling APF mechanism
@@ -359,3 +362,14 @@ data:
 	Interrupt vector for asynchnonous 'page ready' notifications delivery.
 	The vector has to be set up before asynchronous page fault mechanism
 	is enabled in MSR_KVM_ASYNC_PF_EN.
+
+MSR_KVM_ASYNC_PF_ACK:
+	0x4b564d07
+
+data:
+	Asynchronous page fault (APF) acknowledgment.
+
+	When the guest is done processing 'page ready' APF event and 'token'
+	field in 'struct kvm_vcpu_pv_apf_data' is cleared it is supposed to
+	write '1' to bit 0 of the MSR, this caused the host to re-scan its queue
+	and check if there are more notifications pending.
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 5ba9968c3436..bb1ede017b7e 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -982,6 +982,8 @@ void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
 void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 				 struct kvm_async_pf *work);
 
+static inline void kvm_arch_async_page_present_queued(struct kvm_vcpu *vcpu) {}
+
 void kvm_arch_crypto_clear_masks(struct kvm *kvm);
 void kvm_arch_crypto_set_masks(struct kvm *kvm, unsigned long *apm,
 			       unsigned long *aqm, unsigned long *adm);
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c2a70e25a1f3..356c02bfa587 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -83,6 +83,7 @@
 #define KVM_REQ_GET_VMCS12_PAGES	KVM_ARCH_REQ(24)
 #define KVM_REQ_APICV_UPDATE \
 	KVM_ARCH_REQ_FLAGS(25, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_APF_READY		KVM_ARCH_REQ(26)
 
 #define CR0_RESERVED_BITS                                               \
 	(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
@@ -772,6 +773,7 @@ struct kvm_vcpu_arch {
 		u32 host_apf_flags;
 		unsigned long nested_apf_token;
 		bool delivery_as_pf_vmexit;
+		bool pageready_pending;
 	} apf;
 
 	/* OSVW MSRs (AMD only) */
@@ -1643,6 +1645,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 				 struct kvm_async_pf *work);
 void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
 			       struct kvm_async_pf *work);
+void kvm_arch_async_page_present_queued(struct kvm_vcpu *vcpu);
 bool kvm_arch_can_dequeue_async_page_present(struct kvm_vcpu *vcpu);
 extern bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
 
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 1d37d616b1fc..7ac20df80ba8 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -51,6 +51,7 @@
 #define MSR_KVM_PV_EOI_EN      0x4b564d04
 #define MSR_KVM_POLL_CONTROL	0x4b564d05
 #define MSR_KVM_ASYNC_PF_INT	0x4b564d06
+#define MSR_KVM_ASYNC_PF_ACK	0x4b564d07
 
 struct kvm_steal_time {
 	__u64 steal;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 99fd347849b2..ffe1497b7beb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1235,7 +1235,7 @@ static const u32 emulated_msrs_all[] = {
 	HV_X64_MSR_TSC_EMULATION_STATUS,
 
 	MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
-	MSR_KVM_PV_EOI_EN, MSR_KVM_ASYNC_PF_INT,
+	MSR_KVM_PV_EOI_EN, MSR_KVM_ASYNC_PF_INT, MSR_KVM_ASYNC_PF_ACK,
 
 	MSR_IA32_TSC_ADJUST,
 	MSR_IA32_TSCDEADLINE,
@@ -2904,6 +2904,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		if (kvm_pv_enable_async_pf_int(vcpu, data))
 			return 1;
 		break;
+	case MSR_KVM_ASYNC_PF_ACK:
+		if (data & 0x1) {
+			vcpu->arch.apf.pageready_pending = false;
+			kvm_check_async_pf_completion(vcpu);
+		}
+		break;
 	case MSR_KVM_STEAL_TIME:
 
 		if (unlikely(!sched_info_on()))
@@ -3183,6 +3189,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_KVM_ASYNC_PF_INT:
 		msr_info->data = vcpu->arch.apf.msr_int_val;
 		break;
+	case MSR_KVM_ASYNC_PF_ACK:
+		msr_info->data = 0;
+		break;
 	case MSR_KVM_STEAL_TIME:
 		msr_info->data = vcpu->arch.st.msr_val;
 		break;
@@ -8340,6 +8349,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			kvm_hv_process_stimers(vcpu);
 		if (kvm_check_request(KVM_REQ_APICV_UPDATE, vcpu))
 			kvm_vcpu_update_apicv(vcpu);
+		if (kvm_check_request(KVM_REQ_APF_READY, vcpu))
+			kvm_check_async_pf_completion(vcpu);
 	}
 
 	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
@@ -8613,8 +8624,6 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
 			break;
 		}
 
-		kvm_check_async_pf_completion(vcpu);
-
 		if (signal_pending(current)) {
 			r = -EINTR;
 			vcpu->run->exit_reason = KVM_EXIT_INTR;
@@ -10492,13 +10501,22 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 	trace_kvm_async_pf_ready(work->arch.token, work->cr2_or_gpa);
 
 	if (kvm_pv_async_pf_enabled(vcpu) &&
-	    !apf_put_user_ready(vcpu, work->arch.token))
+	    !apf_put_user_ready(vcpu, work->arch.token)) {
+		vcpu->arch.apf.pageready_pending = true;
 		kvm_apic_set_irq(vcpu, &irq, NULL);
+	}
 
 	vcpu->arch.apf.halted = false;
 	vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
 }
 
+void kvm_arch_async_page_present_queued(struct kvm_vcpu *vcpu)
+{
+	kvm_make_request(KVM_REQ_APF_READY, vcpu);
+	if (!vcpu->arch.apf.pageready_pending)
+		kvm_vcpu_kick(vcpu);
+}
+
 bool kvm_arch_can_dequeue_async_page_present(struct kvm_vcpu *vcpu)
 {
 	if (!kvm_pv_async_pf_enabled(vcpu))
diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index 8a8770c7c889..5660bbc831bf 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -51,6 +51,7 @@ static void async_pf_execute(struct work_struct *work)
 	unsigned long addr = apf->addr;
 	gpa_t cr2_or_gpa = apf->cr2_or_gpa;
 	int locked = 1;
+	bool first;
 
 	might_sleep();
 
@@ -69,10 +70,14 @@ static void async_pf_execute(struct work_struct *work)
 		kvm_arch_async_page_present(vcpu, apf);
 
 	spin_lock(&vcpu->async_pf.lock);
+	first = list_empty(&vcpu->async_pf.done);
 	list_add_tail(&apf->link, &vcpu->async_pf.done);
 	apf->vcpu = NULL;
 	spin_unlock(&vcpu->async_pf.lock);
 
+	if (!IS_ENABLED(CONFIG_KVM_ASYNC_PF_SYNC) && first)
+		kvm_arch_async_page_present_queued(vcpu);
+
 	/*
 	 * apf may be freed by kvm_check_async_pf_completion() after
 	 * this point
@@ -202,6 +207,7 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu)
 {
 	struct kvm_async_pf *work;
+	bool first;
 
 	if (!list_empty_careful(&vcpu->async_pf.done))
 		return 0;
@@ -214,9 +220,13 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu)
 	INIT_LIST_HEAD(&work->queue); /* for list_del to work */
 
 	spin_lock(&vcpu->async_pf.lock);
+	first = list_empty(&vcpu->async_pf.done);
 	list_add_tail(&work->link, &vcpu->async_pf.done);
 	spin_unlock(&vcpu->async_pf.lock);
 
+	if (!IS_ENABLED(CONFIG_KVM_ASYNC_PF_SYNC) && first)
+		kvm_arch_async_page_present_queued(vcpu);
+
 	vcpu->async_pf.queued++;
 	return 0;
 }
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 07/10] KVM: x86: announce KVM_FEATURE_ASYNC_PF_INT
  2020-05-25 14:41 [PATCH v2 00/10] KVM: x86: Interrupt-based mechanism for async_pf 'page present' notifications Vitaly Kuznetsov
                   ` (5 preceding siblings ...)
  2020-05-25 14:41 ` [PATCH v2 06/10] KVM: x86: acknowledgment mechanism for async pf page ready notifications Vitaly Kuznetsov
@ 2020-05-25 14:41 ` Vitaly Kuznetsov
  2020-05-25 14:41 ` [PATCH v2 08/10] KVM: x86: Switch KVM guest to using interrupts for page ready APF delivery Vitaly Kuznetsov
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Vitaly Kuznetsov @ 2020-05-25 14:41 UTC (permalink / raw)
  To: kvm, x86
  Cc: Paolo Bonzini, Andy Lutomirski, Thomas Gleixner, Borislav Petkov,
	H. Peter Anvin, Wanpeng Li, Sean Christopherson, Jim Mattson,
	Vivek Goyal, Gavin Shan, Peter Zijlstra, linux-kernel

Introduce new capability to indicate that KVM supports interrupt based
delivery of 'page ready' APF events. This includes support for both
MSR_KVM_ASYNC_PF_INT and MSR_KVM_ASYNC_PF_ACK.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 Documentation/virt/kvm/cpuid.rst     |  6 ++++++
 Documentation/virt/kvm/msr.rst       | 12 ++++++++----
 arch/x86/include/uapi/asm/kvm_para.h |  1 +
 arch/x86/kvm/cpuid.c                 |  3 ++-
 arch/x86/kvm/x86.c                   |  1 +
 include/uapi/linux/kvm.h             |  1 +
 6 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
index 01b081f6e7ea..98cbd257ac33 100644
--- a/Documentation/virt/kvm/cpuid.rst
+++ b/Documentation/virt/kvm/cpuid.rst
@@ -86,6 +86,12 @@ KVM_FEATURE_PV_SCHED_YIELD        13          guest checks this feature bit
                                               before using paravirtualized
                                               sched yield.
 
+KVM_FEATURE_ASYNC_PF_INT          14          guest checks this feature bit
+                                              before using the second async
+                                              pf control msr 0x4b564d06 and
+                                              async pf acknowledgment msr
+                                              0x4b564d07.
+
 KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24          host will warn if no guest-side
                                               per-cpu warps are expeced in
                                               kvmclock
diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
index 8ea3fbcc67fd..d409cc26193f 100644
--- a/Documentation/virt/kvm/msr.rst
+++ b/Documentation/virt/kvm/msr.rst
@@ -213,7 +213,8 @@ data:
 	cpl == 0. Bit 2 is 1 if asynchronous page faults are delivered to L1 as
 	#PF vmexits.  Bit 2 can be set only if KVM_FEATURE_ASYNC_PF_VMEXIT is
 	present in CPUID. Bit 3 enables interrupt based delivery of 'page ready'
-	events.
+	events. Bit 3 can only be set if KVM_FEATURE_ASYNC_PF_INT is present in
+	CPUID.
 
 	'Page not present' events are currently always delivered as synthetic
 	#PF exception. During delivery of these events APF CR2 register contains
@@ -243,7 +244,8 @@ data:
 
 	Note, MSR_KVM_ASYNC_PF_INT MSR specifying the interrupt vector for 'page
 	ready' APF delivery needs to be written to before enabling APF mechanism
-	in MSR_KVM_ASYNC_PF_EN or interrupt #0 can get injected.
+	in MSR_KVM_ASYNC_PF_EN or interrupt #0 can get injected. The MSR is
+	available if KVM_FEATURE_ASYNC_PF_INT is present in CPUID.
 
 	Note, previously, 'page ready' events were delivered via the same #PF
 	exception as 'page not present' events but this is now deprecated. If
@@ -361,7 +363,8 @@ data:
 
 	Interrupt vector for asynchnonous 'page ready' notifications delivery.
 	The vector has to be set up before asynchronous page fault mechanism
-	is enabled in MSR_KVM_ASYNC_PF_EN.
+	is enabled in MSR_KVM_ASYNC_PF_EN.  The MSR is only available if
+	KVM_FEATURE_ASYNC_PF_INT is present in CPUID.
 
 MSR_KVM_ASYNC_PF_ACK:
 	0x4b564d07
@@ -372,4 +375,5 @@ data:
 	When the guest is done processing 'page ready' APF event and 'token'
 	field in 'struct kvm_vcpu_pv_apf_data' is cleared it is supposed to
 	write '1' to bit 0 of the MSR, this caused the host to re-scan its queue
-	and check if there are more notifications pending.
+	and check if there are more notifications pending. The MSR is available
+	if KVM_FEATURE_ASYNC_PF_INT is present in CPUID.
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 7ac20df80ba8..812e9b4c1114 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -31,6 +31,7 @@
 #define KVM_FEATURE_PV_SEND_IPI	11
 #define KVM_FEATURE_POLL_CONTROL	12
 #define KVM_FEATURE_PV_SCHED_YIELD	13
+#define KVM_FEATURE_ASYNC_PF_INT	14
 
 #define KVM_HINTS_REALTIME      0
 
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 901cd1fdecd9..790fe4988001 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -712,7 +712,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 			     (1 << KVM_FEATURE_ASYNC_PF_VMEXIT) |
 			     (1 << KVM_FEATURE_PV_SEND_IPI) |
 			     (1 << KVM_FEATURE_POLL_CONTROL) |
-			     (1 << KVM_FEATURE_PV_SCHED_YIELD);
+			     (1 << KVM_FEATURE_PV_SCHED_YIELD) |
+			     (1 << KVM_FEATURE_ASYNC_PF_INT);
 
 		if (sched_info_on())
 			entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ffe1497b7beb..cc1bf6cfc5e0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3404,6 +3404,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_X86_ROBUST_SINGLESTEP:
 	case KVM_CAP_XSAVE:
 	case KVM_CAP_ASYNC_PF:
+	case KVM_CAP_ASYNC_PF_INT:
 	case KVM_CAP_GET_TSC_KHZ:
 	case KVM_CAP_KVMCLOCK_CTRL:
 	case KVM_CAP_READONLY_MEM:
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 428c7dde6b4b..15012f78a691 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1017,6 +1017,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_S390_VCPU_RESETS 179
 #define KVM_CAP_S390_PROTECTED 180
 #define KVM_CAP_PPC_SECURE_GUEST 181
+#define KVM_CAP_ASYNC_PF_INT 182
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 08/10] KVM: x86: Switch KVM guest to using interrupts for page ready APF delivery
  2020-05-25 14:41 [PATCH v2 00/10] KVM: x86: Interrupt-based mechanism for async_pf 'page present' notifications Vitaly Kuznetsov
                   ` (6 preceding siblings ...)
  2020-05-25 14:41 ` [PATCH v2 07/10] KVM: x86: announce KVM_FEATURE_ASYNC_PF_INT Vitaly Kuznetsov
@ 2020-05-25 14:41 ` Vitaly Kuznetsov
  2020-06-10 20:51   ` Vivek Goyal
  2020-05-25 14:41 ` [PATCH v2 09/10] KVM: x86: drop KVM_PV_REASON_PAGE_READY case from kvm_handle_page_fault() Vitaly Kuznetsov
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Vitaly Kuznetsov @ 2020-05-25 14:41 UTC (permalink / raw)
  To: kvm, x86
  Cc: Paolo Bonzini, Andy Lutomirski, Thomas Gleixner, Borislav Petkov,
	H. Peter Anvin, Wanpeng Li, Sean Christopherson, Jim Mattson,
	Vivek Goyal, Gavin Shan, Peter Zijlstra, linux-kernel

KVM now supports using interrupt for 'page ready' APF event delivery and
legacy mechanism was deprecated. Switch KVM guests to the new one.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/entry/entry_32.S          |  5 ++++
 arch/x86/entry/entry_64.S          |  5 ++++
 arch/x86/include/asm/hardirq.h     |  3 ++
 arch/x86/include/asm/irq_vectors.h |  6 +++-
 arch/x86/include/asm/kvm_para.h    |  6 ++++
 arch/x86/kernel/irq.c              |  9 ++++++
 arch/x86/kernel/kvm.c              | 45 ++++++++++++++++++++++--------
 7 files changed, 66 insertions(+), 13 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index b67bae7091d7..d574dadcb2a1 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -1475,6 +1475,11 @@ BUILD_INTERRUPT3(hv_stimer0_callback_vector, HYPERV_STIMER0_VECTOR,
 
 #endif /* CONFIG_HYPERV */
 
+#ifdef CONFIG_KVM_GUEST
+BUILD_INTERRUPT3(kvm_async_pf_vector, KVM_ASYNC_PF_VECTOR,
+		 kvm_async_pf_intr)
+#endif
+
 SYM_CODE_START(page_fault)
 	ASM_CLAC
 	pushl	$do_page_fault
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 3063aa9090f9..138f5c5aca2e 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1190,6 +1190,11 @@ apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \
 	acrn_hv_callback_vector acrn_hv_vector_handler
 #endif
 
+#ifdef CONFIG_KVM_GUEST
+apicinterrupt3 KVM_ASYNC_PF_VECTOR \
+	kvm_async_pf_vector kvm_async_pf_intr
+#endif
+
 idtentry debug			do_debug		has_error_code=0	paranoid=1 shift_ist=IST_INDEX_DB ist_offset=DB_STACK_OFFSET
 idtentry int3			do_int3			has_error_code=0	create_gap=1
 idtentry stack_segment		do_stack_segment	has_error_code=1
diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h
index 07533795b8d2..be0fbb15ad7f 100644
--- a/arch/x86/include/asm/hardirq.h
+++ b/arch/x86/include/asm/hardirq.h
@@ -44,6 +44,9 @@ typedef struct {
 	unsigned int irq_hv_reenlightenment_count;
 	unsigned int hyperv_stimer0_count;
 #endif
+#ifdef CONFIG_KVM_GUEST
+	unsigned int kvm_async_pf_pageready_count;
+#endif
 } ____cacheline_aligned irq_cpustat_t;
 
 DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index 889f8b1b5b7f..8879a9ecd908 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -104,7 +104,11 @@
 #define HYPERV_STIMER0_VECTOR		0xed
 #endif
 
-#define LOCAL_TIMER_VECTOR		0xec
+#ifdef CONFIG_KVM_GUEST
+#define KVM_ASYNC_PF_VECTOR		0xec
+#endif
+
+#define LOCAL_TIMER_VECTOR		0xeb
 
 #define NR_VECTORS			 256
 
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 2a3102fee189..a075cd8fa5c7 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -4,6 +4,7 @@
 
 #include <asm/processor.h>
 #include <asm/alternative.h>
+#include <linux/interrupt.h>
 #include <uapi/asm/kvm_para.h>
 
 extern void kvmclock_init(void);
@@ -93,6 +94,11 @@ void kvm_async_pf_task_wake(u32 token);
 u32 kvm_read_and_reset_apf_flags(void);
 extern void kvm_disable_steal_time(void);
 void do_async_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned long address);
+extern __visible void kvm_async_pf_vector(void);
+#ifdef CONFIG_TRACING
+#define trace_kvm_async_pf_vector kvm_async_pf_vector
+#endif
+__visible void __irq_entry kvm_async_pf_intr(struct pt_regs *regs);
 
 #ifdef CONFIG_PARAVIRT_SPINLOCKS
 void __init kvm_spinlock_init(void);
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index c7965ff429c5..a4c2f25ad74d 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -159,6 +159,15 @@ int arch_show_interrupts(struct seq_file *p, int prec)
 				   irq_stats(j)->hyperv_stimer0_count);
 		seq_puts(p, "  Hyper-V stimer0 interrupts\n");
 	}
+#endif
+#ifdef CONFIG_KVM_GUEST
+	if (test_bit(KVM_ASYNC_PF_VECTOR, system_vectors)) {
+		seq_printf(p, "%*s: ", prec, "APF");
+		for_each_online_cpu(j)
+			seq_printf(p, "%10u ",
+				   irq_stats(j)->kvm_async_pf_pageready_count);
+		seq_puts(p, "  KVM async PF page ready interrupts\n");
+	}
 #endif
 	seq_printf(p, "%*s: %10u\n", prec, "ERR", atomic_read(&irq_err_count));
 #if defined(CONFIG_X86_IO_APIC)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 340df5dab30d..79730eaef1e1 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -245,23 +245,39 @@ NOKPROBE_SYMBOL(kvm_read_and_reset_apf_flags);
 dotraplinkage void
 do_async_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned long address)
 {
-	switch (kvm_read_and_reset_apf_flags()) {
-	default:
+	u32 flags = kvm_read_and_reset_apf_flags();
+
+	if (!flags) {
+		/* This is a normal page fault */
 		do_page_fault(regs, error_code, address);
-		break;
-	case KVM_PV_REASON_PAGE_NOT_PRESENT:
+		return;
+	} else if (flags & KVM_PV_REASON_PAGE_NOT_PRESENT) {
 		/* page is swapped out by the host. */
 		kvm_async_pf_task_wait((u32)address, !user_mode(regs));
-		break;
-	case KVM_PV_REASON_PAGE_READY:
-		rcu_irq_enter();
-		kvm_async_pf_task_wake((u32)address);
-		rcu_irq_exit();
-		break;
+	} else {
+		WARN_ONCE(1, "Unexpected async PF flags: %x\n", flags);
 	}
 }
 NOKPROBE_SYMBOL(do_async_page_fault);
 
+__visible void __irq_entry kvm_async_pf_intr(struct pt_regs *regs)
+{
+	u32 token;
+
+	entering_ack_irq();
+
+	inc_irq_stat(kvm_async_pf_pageready_count);
+
+	if (__this_cpu_read(apf_reason.enabled)) {
+		token = __this_cpu_read(apf_reason.token);
+		kvm_async_pf_task_wake(token);
+		__this_cpu_write(apf_reason.token, 0);
+		wrmsrl(MSR_KVM_ASYNC_PF_ACK, 1);
+	}
+
+	exiting_irq();
+}
+
 static void __init paravirt_ops_setup(void)
 {
 	pv_info.name = "KVM";
@@ -305,17 +321,19 @@ static notrace void kvm_guest_apic_eoi_write(u32 reg, u32 val)
 
 static void kvm_guest_cpu_init(void)
 {
-	if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF) && kvmapf) {
+	if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_INT) && kvmapf) {
 		u64 pa = slow_virt_to_phys(this_cpu_ptr(&apf_reason));
 
 #ifdef CONFIG_PREEMPTION
 		pa |= KVM_ASYNC_PF_SEND_ALWAYS;
 #endif
-		pa |= KVM_ASYNC_PF_ENABLED;
+		pa |= KVM_ASYNC_PF_ENABLED | KVM_ASYNC_PF_DELIVERY_AS_INT;
 
 		if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_VMEXIT))
 			pa |= KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT;
 
+		wrmsrl(MSR_KVM_ASYNC_PF_INT, KVM_ASYNC_PF_VECTOR);
+
 		wrmsrl(MSR_KVM_ASYNC_PF_EN, pa);
 		__this_cpu_write(apf_reason.enabled, 1);
 		printk(KERN_INFO"KVM setup async PF for cpu %d\n",
@@ -649,6 +667,9 @@ static void __init kvm_guest_init(void)
 	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
 		apic_set_eoi_write(kvm_guest_apic_eoi_write);
 
+	if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_INT))
+		alloc_intr_gate(KVM_ASYNC_PF_VECTOR, kvm_async_pf_vector);
+
 #ifdef CONFIG_SMP
 	smp_ops.smp_prepare_cpus = kvm_smp_prepare_cpus;
 	smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 09/10] KVM: x86: drop KVM_PV_REASON_PAGE_READY case from kvm_handle_page_fault()
  2020-05-25 14:41 [PATCH v2 00/10] KVM: x86: Interrupt-based mechanism for async_pf 'page present' notifications Vitaly Kuznetsov
                   ` (7 preceding siblings ...)
  2020-05-25 14:41 ` [PATCH v2 08/10] KVM: x86: Switch KVM guest to using interrupts for page ready APF delivery Vitaly Kuznetsov
@ 2020-05-25 14:41 ` Vitaly Kuznetsov
  2020-05-25 14:41 ` [PATCH v2 10/10] KVM: x86: deprecate KVM_ASYNC_PF_SEND_ALWAYS Vitaly Kuznetsov
  2020-05-28 11:04 ` [PATCH v2 00/10] KVM: x86: Interrupt-based mechanism for async_pf 'page present' notifications Paolo Bonzini
  10 siblings, 0 replies; 30+ messages in thread
From: Vitaly Kuznetsov @ 2020-05-25 14:41 UTC (permalink / raw)
  To: kvm, x86
  Cc: Paolo Bonzini, Andy Lutomirski, Thomas Gleixner, Borislav Petkov,
	H. Peter Anvin, Wanpeng Li, Sean Christopherson, Jim Mattson,
	Vivek Goyal, Gavin Shan, Peter Zijlstra, linux-kernel

KVM guest code in Linux enables APF only when KVM_FEATURE_ASYNC_PF_INT
is supported, this means we will never see KVM_PV_REASON_PAGE_READY
when handling page fault vmexit in KVM.

While on it, make sure we only follow genuine page fault path when
APF reason is zero. If we happen to see something else this means
that the underlying hypervisor is misbehaving. Leave WARN_ON_ONCE()
to catch that.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/mmu/mmu.c | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 7fa5253237b2..1bb7a8c6c28e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4178,6 +4178,7 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
 				u64 fault_address, char *insn, int insn_len)
 {
 	int r = 1;
+	u32 flags = vcpu->arch.apf.host_apf_flags;
 
 #ifndef CONFIG_X86_64
 	/* A 64-bit CR2 should be impossible on 32-bit KVM. */
@@ -4186,28 +4187,22 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
 #endif
 
 	vcpu->arch.l1tf_flush_l1d = true;
-	switch (vcpu->arch.apf.host_apf_flags) {
-	default:
+	if (!flags) {
 		trace_kvm_page_fault(fault_address, error_code);
 
 		if (kvm_event_needs_reinjection(vcpu))
 			kvm_mmu_unprotect_page_virt(vcpu, fault_address);
 		r = kvm_mmu_page_fault(vcpu, fault_address, error_code, insn,
 				insn_len);
-		break;
-	case KVM_PV_REASON_PAGE_NOT_PRESENT:
+	} else if (flags & KVM_PV_REASON_PAGE_NOT_PRESENT) {
 		vcpu->arch.apf.host_apf_flags = 0;
 		local_irq_disable();
 		kvm_async_pf_task_wait(fault_address, 0);
 		local_irq_enable();
-		break;
-	case KVM_PV_REASON_PAGE_READY:
-		vcpu->arch.apf.host_apf_flags = 0;
-		local_irq_disable();
-		kvm_async_pf_task_wake(fault_address);
-		local_irq_enable();
-		break;
+	} else {
+		WARN_ONCE(1, "Unexpected host async PF flags: %x\n", flags);
 	}
+
 	return r;
 }
 EXPORT_SYMBOL_GPL(kvm_handle_page_fault);
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 10/10] KVM: x86: deprecate KVM_ASYNC_PF_SEND_ALWAYS
  2020-05-25 14:41 [PATCH v2 00/10] KVM: x86: Interrupt-based mechanism for async_pf 'page present' notifications Vitaly Kuznetsov
                   ` (8 preceding siblings ...)
  2020-05-25 14:41 ` [PATCH v2 09/10] KVM: x86: drop KVM_PV_REASON_PAGE_READY case from kvm_handle_page_fault() Vitaly Kuznetsov
@ 2020-05-25 14:41 ` Vitaly Kuznetsov
  2020-05-28 11:03   ` Paolo Bonzini
  2020-05-28 11:04 ` [PATCH v2 00/10] KVM: x86: Interrupt-based mechanism for async_pf 'page present' notifications Paolo Bonzini
  10 siblings, 1 reply; 30+ messages in thread
From: Vitaly Kuznetsov @ 2020-05-25 14:41 UTC (permalink / raw)
  To: kvm, x86
  Cc: Paolo Bonzini, Andy Lutomirski, Thomas Gleixner, Borislav Petkov,
	H. Peter Anvin, Wanpeng Li, Sean Christopherson, Jim Mattson,
	Vivek Goyal, Gavin Shan, Peter Zijlstra, linux-kernel

Concerns were expressed around APF events delivery when CPU is not
in user mode (KVM_ASYNC_PF_SEND_ALWAYS), e.g.
https://lore.kernel.org/kvm/ed71d0967113a35f670a9625a058b8e6e0b2f104.1583547991.git.luto@kernel.org/

'Page ready' events are already free from '#PF abuse' but 'page not ready'
notifications still go through #PF (to be changed in the future). Make the
probability of running into issues when APF collides with regular #PF lower
by deprecating KVM_ASYNC_PF_SEND_ALWAYS. The feature doesn't seem to be
important enough for any particular workload to notice the difference.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/include/asm/kvm_host.h      |  1 -
 arch/x86/include/uapi/asm/kvm_para.h |  2 +-
 arch/x86/kernel/kvm.c                |  3 ---
 arch/x86/kvm/x86.c                   | 13 +++++++++----
 4 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 356c02bfa587..f491214b7667 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -769,7 +769,6 @@ struct kvm_vcpu_arch {
 		u64 msr_int_val; /* MSR_KVM_ASYNC_PF_INT */
 		u16 vec;
 		u32 id;
-		bool send_user_only;
 		u32 host_apf_flags;
 		unsigned long nested_apf_token;
 		bool delivery_as_pf_vmexit;
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 812e9b4c1114..3cae0faac2b8 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -82,7 +82,7 @@ struct kvm_clock_pairing {
 #define KVM_MAX_MMU_OP_BATCH           32
 
 #define KVM_ASYNC_PF_ENABLED			(1 << 0)
-#define KVM_ASYNC_PF_SEND_ALWAYS		(1 << 1)
+#define KVM_ASYNC_PF_SEND_ALWAYS		(1 << 1) /* deprecated */
 #define KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT	(1 << 2)
 #define KVM_ASYNC_PF_DELIVERY_AS_INT		(1 << 3)
 
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 79730eaef1e1..add123302122 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -324,9 +324,6 @@ static void kvm_guest_cpu_init(void)
 	if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_INT) && kvmapf) {
 		u64 pa = slow_virt_to_phys(this_cpu_ptr(&apf_reason));
 
-#ifdef CONFIG_PREEMPTION
-		pa |= KVM_ASYNC_PF_SEND_ALWAYS;
-#endif
 		pa |= KVM_ASYNC_PF_ENABLED | KVM_ASYNC_PF_DELIVERY_AS_INT;
 
 		if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_VMEXIT))
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cc1bf6cfc5e0..8222133bf5be 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2648,7 +2648,10 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
 {
 	gpa_t gpa = data & ~0x3f;
 
-	/* Bits 4:5 are reserved, Should be zero */
+	/*
+	 * Bits 4:5 are reserved and should be zero. Bit 1
+	 * (KVM_ASYNC_PF_SEND_ALWAYS) was deprecated and is being ignored.
+	 */
 	if (data & 0x30)
 		return 1;
 
@@ -2664,7 +2667,6 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
 					sizeof(u64)))
 		return 1;
 
-	vcpu->arch.apf.send_user_only = !(data & KVM_ASYNC_PF_SEND_ALWAYS);
 	vcpu->arch.apf.delivery_as_pf_vmexit = data & KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT;
 
 	kvm_async_pf_wakeup_all(vcpu);
@@ -10433,8 +10435,11 @@ static bool kvm_can_deliver_async_pf(struct kvm_vcpu *vcpu)
 	if (!vcpu->arch.apf.delivery_as_pf_vmexit && is_guest_mode(vcpu))
 		return false;
 
-	if (!kvm_pv_async_pf_enabled(vcpu) ||
-	    (vcpu->arch.apf.send_user_only && kvm_x86_ops.get_cpl(vcpu) == 0))
+	/*
+	 * 'Page not present' APF events are only delivered when CPU is in
+	 * user mode and APF mechanism is enabled.
+	 */
+	if (!kvm_pv_async_pf_enabled(vcpu) || kvm_x86_ops.get_cpl(vcpu) == 0)
 		return false;
 
 	return true;
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 02/10] KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info
  2020-05-25 14:41 ` [PATCH v2 02/10] KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info Vitaly Kuznetsov
@ 2020-05-26 18:27   ` Vivek Goyal
  2020-05-28  8:42     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 30+ messages in thread
From: Vivek Goyal @ 2020-05-26 18:27 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, x86, Paolo Bonzini, Andy Lutomirski, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, Wanpeng Li, Sean Christopherson,
	Jim Mattson, Gavin Shan, Peter Zijlstra, linux-kernel

On Mon, May 25, 2020 at 04:41:17PM +0200, Vitaly Kuznetsov wrote:
> 

[..]
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 0a6b35353fc7..c195f63c1086 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -767,7 +767,7 @@ struct kvm_vcpu_arch {
>  		u64 msr_val;
>  		u32 id;
>  		bool send_user_only;
> -		u32 host_apf_reason;
> +		u32 host_apf_flags;

Hi Vitaly,

What is host_apf_reason used for. Looks like it is somehow used in
context of nested guests. I hope by now you have been able to figure
it out.

Is it somehow the case of that L2 guest takes a page fault exit
and then L0 injects this event in L1 using exception. I have been
trying to read this code but can't wrap my head around it.

I am still concerned about the case of nested kvm. We have discussed
apf mechanism but never touched nested part of it. Given we are
touching code in nested kvm part, want to make sure it is not broken
in new design.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 02/10] KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info
  2020-05-26 18:27   ` Vivek Goyal
@ 2020-05-28  8:42     ` Vitaly Kuznetsov
  2020-05-28 10:59       ` Paolo Bonzini
  2020-06-03 19:35       ` Vivek Goyal
  0 siblings, 2 replies; 30+ messages in thread
From: Vitaly Kuznetsov @ 2020-05-28  8:42 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: kvm, x86, Paolo Bonzini, Andy Lutomirski, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, Wanpeng Li, Sean Christopherson,
	Jim Mattson, Gavin Shan, Peter Zijlstra, linux-kernel

Vivek Goyal <vgoyal@redhat.com> writes:

> On Mon, May 25, 2020 at 04:41:17PM +0200, Vitaly Kuznetsov wrote:
>> 
>
> [..]
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 0a6b35353fc7..c195f63c1086 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -767,7 +767,7 @@ struct kvm_vcpu_arch {
>>  		u64 msr_val;
>>  		u32 id;
>>  		bool send_user_only;
>> -		u32 host_apf_reason;
>> +		u32 host_apf_flags;
>
> Hi Vitaly,
>
> What is host_apf_reason used for. Looks like it is somehow used in
> context of nested guests. I hope by now you have been able to figure
> it out.
>
> Is it somehow the case of that L2 guest takes a page fault exit
> and then L0 injects this event in L1 using exception. I have been
> trying to read this code but can't wrap my head around it.
>
> I am still concerned about the case of nested kvm. We have discussed
> apf mechanism but never touched nested part of it. Given we are
> touching code in nested kvm part, want to make sure it is not broken
> in new design.
>

Sorry I missed this.

I think we've touched nested topic a bit already:
https://lore.kernel.org/kvm/87lfluwfi0.fsf@vitty.brq.redhat.com/

But let me try to explain the whole thing and maybe someone will point
out what I'm missing.

The problem being solved: L2 guest is running and it is hitting a page
which is not present *in L0* and instead of pausing *L1* vCPU completely
we want to let L1 know about the problem so it can run something else
(e.g. another guest or just another application).

What's different between this and 'normal' APF case. When L2 guest is
running, the CPU (physical) is in 'guest' mode so we can't inject #PF
there. Actually, we can but L2 may get confused and we're not even sure
it's L2's fault, that L2 supported APF and so on. We want to make L1
deal with the issue.

How does it work then. We inject #PF and L1 sees it as #PF VMEXIT. It
needs to know about APF (thus KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT) but
the handling is exactly the same as do_pagefault(): L1's
kvm_handle_page_fault() checkes APF area (shared between L0 and L1) and
either pauses a task or resumes a previously paused one. This can be a
L2 guest or something else.

What is 'host_apf_reason'. It is a copy of 'reason' field from 'struct
kvm_vcpu_pv_apf_data' which we read upon #PF VMEXIT. It indicates that
the #PF VMEXIT is synthetic.

How does it work with the patchset: 'page not present' case remains the
same. 'page ready' case now goes through interrupts so it may not get
handled immediately. External interrupts will be handled by L0 in host
mode (when L2 is not running). For the 'page ready' case L1 hypervisor
doesn't need any special handling, kvm_async_pf_intr() irq handler will
work correctly.

I've smoke tested this with VMX and nothing immediately blew up.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 02/10] KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info
  2020-05-28  8:42     ` Vitaly Kuznetsov
@ 2020-05-28 10:59       ` Paolo Bonzini
  2020-06-03 19:35       ` Vivek Goyal
  1 sibling, 0 replies; 30+ messages in thread
From: Paolo Bonzini @ 2020-05-28 10:59 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Vivek Goyal
  Cc: kvm, x86, Andy Lutomirski, Thomas Gleixner, Borislav Petkov,
	H. Peter Anvin, Wanpeng Li, Sean Christopherson, Jim Mattson,
	Gavin Shan, Peter Zijlstra, linux-kernel

On 28/05/20 10:42, Vitaly Kuznetsov wrote:
> How does it work with the patchset: 'page not present' case remains the
> same. 'page ready' case now goes through interrupts so it may not get
> handled immediately. External interrupts will be handled by L0 in host
> mode (when L2 is not running). For the 'page ready' case L1 hypervisor
> doesn't need any special handling, kvm_async_pf_intr() irq handler will
> work correctly.
> 
> I've smoke tested this with VMX and nothing immediately blew up.

Indeed.

It would be an issue in the remote (read: nonexistent) case of a
hypervisor that handles async page faults and does not VMEXIT on
interrupts.  In this case it would not be able to enable page ready as
interrupt, and it would have to get rid of async page fault support.

Paolo


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 10/10] KVM: x86: deprecate KVM_ASYNC_PF_SEND_ALWAYS
  2020-05-25 14:41 ` [PATCH v2 10/10] KVM: x86: deprecate KVM_ASYNC_PF_SEND_ALWAYS Vitaly Kuznetsov
@ 2020-05-28 11:03   ` Paolo Bonzini
  2020-05-28 11:14     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 30+ messages in thread
From: Paolo Bonzini @ 2020-05-28 11:03 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, x86
  Cc: Andy Lutomirski, Thomas Gleixner, Borislav Petkov,
	H. Peter Anvin, Wanpeng Li, Sean Christopherson, Jim Mattson,
	Vivek Goyal, Gavin Shan, Peter Zijlstra, linux-kernel

On 25/05/20 16:41, Vitaly Kuznetsov wrote:
> Concerns were expressed around APF events delivery when CPU is not
> in user mode (KVM_ASYNC_PF_SEND_ALWAYS), e.g.
> https://lore.kernel.org/kvm/ed71d0967113a35f670a9625a058b8e6e0b2f104.1583547991.git.luto@kernel.org/
> 
> 'Page ready' events are already free from '#PF abuse' but 'page not ready'
> notifications still go through #PF (to be changed in the future). Make the
> probability of running into issues when APF collides with regular #PF lower
> by deprecating KVM_ASYNC_PF_SEND_ALWAYS. The feature doesn't seem to be
> important enough for any particular workload to notice the difference.

This has been disabled already in guest code, but I don't see a
particular reason to deprecate it in the host too.  Supporting it on the
host is just one line of code; if it's a problem *for the guest*, you
just don't use KVM_ASYNC_PF_SEND_ALWAYS.

Also, note that #VE will always be delivered to the guest even at CPL0;
the decision on whether to do sync or async page fault at CPL0 will move
to the guest, but enabling #VE will probably _require_ the
KVM_ASYNC_PF_SEND_ALWAYS bit to be set (and KVM_ASYNC_PF_DELIVERY_AS_INT
as well).

Thanks,

Paolo

> Suggested-by: Andy Lutomirski <luto@kernel.org>
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h      |  1 -
>  arch/x86/include/uapi/asm/kvm_para.h |  2 +-
>  arch/x86/kernel/kvm.c                |  3 ---
>  arch/x86/kvm/x86.c                   | 13 +++++++++----
>  4 files changed, 10 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 356c02bfa587..f491214b7667 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -769,7 +769,6 @@ struct kvm_vcpu_arch {
>  		u64 msr_int_val; /* MSR_KVM_ASYNC_PF_INT */
>  		u16 vec;
>  		u32 id;
> -		bool send_user_only;
>  		u32 host_apf_flags;
>  		unsigned long nested_apf_token;
>  		bool delivery_as_pf_vmexit;
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> index 812e9b4c1114..3cae0faac2b8 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -82,7 +82,7 @@ struct kvm_clock_pairing {
>  #define KVM_MAX_MMU_OP_BATCH           32
>  
>  #define KVM_ASYNC_PF_ENABLED			(1 << 0)
> -#define KVM_ASYNC_PF_SEND_ALWAYS		(1 << 1)
> +#define KVM_ASYNC_PF_SEND_ALWAYS		(1 << 1) /* deprecated */
>  #define KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT	(1 << 2)
>  #define KVM_ASYNC_PF_DELIVERY_AS_INT		(1 << 3)
>  
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 79730eaef1e1..add123302122 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -324,9 +324,6 @@ static void kvm_guest_cpu_init(void)
>  	if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_INT) && kvmapf) {
>  		u64 pa = slow_virt_to_phys(this_cpu_ptr(&apf_reason));
>  
> -#ifdef CONFIG_PREEMPTION
> -		pa |= KVM_ASYNC_PF_SEND_ALWAYS;
> -#endif
>  		pa |= KVM_ASYNC_PF_ENABLED | KVM_ASYNC_PF_DELIVERY_AS_INT;
>  
>  		if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_VMEXIT))
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index cc1bf6cfc5e0..8222133bf5be 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2648,7 +2648,10 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
>  {
>  	gpa_t gpa = data & ~0x3f;
>  
> -	/* Bits 4:5 are reserved, Should be zero */
> +	/*
> +	 * Bits 4:5 are reserved and should be zero. Bit 1
> +	 * (KVM_ASYNC_PF_SEND_ALWAYS) was deprecated and is being ignored.
> +	 */
>  	if (data & 0x30)
>  		return 1;
>  
> @@ -2664,7 +2667,6 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
>  					sizeof(u64)))
>  		return 1;
>  
> -	vcpu->arch.apf.send_user_only = !(data & KVM_ASYNC_PF_SEND_ALWAYS);
>  	vcpu->arch.apf.delivery_as_pf_vmexit = data & KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT;
>  
>  	kvm_async_pf_wakeup_all(vcpu);
> @@ -10433,8 +10435,11 @@ static bool kvm_can_deliver_async_pf(struct kvm_vcpu *vcpu)
>  	if (!vcpu->arch.apf.delivery_as_pf_vmexit && is_guest_mode(vcpu))
>  		return false;
>  
> -	if (!kvm_pv_async_pf_enabled(vcpu) ||
> -	    (vcpu->arch.apf.send_user_only && kvm_x86_ops.get_cpl(vcpu) == 0))
> +	/*
> +	 * 'Page not present' APF events are only delivered when CPU is in
> +	 * user mode and APF mechanism is enabled.
> +	 */
> +	if (!kvm_pv_async_pf_enabled(vcpu) || kvm_x86_ops.get_cpl(vcpu) == 0)
>  		return false;
>  
>  	return true;
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 00/10] KVM: x86: Interrupt-based mechanism for async_pf 'page present' notifications
  2020-05-25 14:41 [PATCH v2 00/10] KVM: x86: Interrupt-based mechanism for async_pf 'page present' notifications Vitaly Kuznetsov
                   ` (9 preceding siblings ...)
  2020-05-25 14:41 ` [PATCH v2 10/10] KVM: x86: deprecate KVM_ASYNC_PF_SEND_ALWAYS Vitaly Kuznetsov
@ 2020-05-28 11:04 ` Paolo Bonzini
  2020-06-04 17:45   ` Vivek Goyal
  10 siblings, 1 reply; 30+ messages in thread
From: Paolo Bonzini @ 2020-05-28 11:04 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, x86
  Cc: Andy Lutomirski, Thomas Gleixner, Borislav Petkov,
	H. Peter Anvin, Wanpeng Li, Sean Christopherson, Jim Mattson,
	Vivek Goyal, Gavin Shan, Peter Zijlstra, linux-kernel

On 25/05/20 16:41, Vitaly Kuznetsov wrote:
> Concerns were expressed around (ab)using #PF for KVM's async_pf mechanism,
> it seems that re-using #PF exception for a PV mechanism wasn't a great
> idea after all. The Grand Plan is to switch to using e.g. #VE for 'page
> not present' events and normal APIC interrupts for 'page ready' events.
> This series does the later.
> 
> Changes since v1:
> - struct kvm_vcpu_pv_apf_data's fields renamed to 'flags' and 'token',
>   comments added [Vivek Goyal]
> - 'type1/2' names for APF events dropped from everywhere [Vivek Goyal]
> - kvm_arch_can_inject_async_page_present() renamed to 
>   kvm_arch_can_dequeue_async_page_present [Vivek Goyal]
> - 'KVM: x86: deprecate KVM_ASYNC_PF_SEND_ALWAYS' patch added.
> 
> v1: https://lore.kernel.org/kvm/20200511164752.2158645-1-vkuznets@redhat.com/
> QEMU patches for testing: https://github.com/vittyvk/qemu.git (async_pf2_v2 branch)

I'll do another round of review and queue patches 1-7; 8-9 will be
queued later and separately due to the conflicts with the interrupt
entry rework, but it's my job and you don't need to do anything else.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 10/10] KVM: x86: deprecate KVM_ASYNC_PF_SEND_ALWAYS
  2020-05-28 11:03   ` Paolo Bonzini
@ 2020-05-28 11:14     ` Vitaly Kuznetsov
  0 siblings, 0 replies; 30+ messages in thread
From: Vitaly Kuznetsov @ 2020-05-28 11:14 UTC (permalink / raw)
  To: Paolo Bonzini, kvm, x86
  Cc: Andy Lutomirski, Thomas Gleixner, Borislav Petkov,
	H. Peter Anvin, Wanpeng Li, Sean Christopherson, Jim Mattson,
	Vivek Goyal, Gavin Shan, Peter Zijlstra, linux-kernel

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 25/05/20 16:41, Vitaly Kuznetsov wrote:
>> Concerns were expressed around APF events delivery when CPU is not
>> in user mode (KVM_ASYNC_PF_SEND_ALWAYS), e.g.
>> https://lore.kernel.org/kvm/ed71d0967113a35f670a9625a058b8e6e0b2f104.1583547991.git.luto@kernel.org/
>> 
>> 'Page ready' events are already free from '#PF abuse' but 'page not ready'
>> notifications still go through #PF (to be changed in the future). Make the
>> probability of running into issues when APF collides with regular #PF lower
>> by deprecating KVM_ASYNC_PF_SEND_ALWAYS. The feature doesn't seem to be
>> important enough for any particular workload to notice the difference.
>
> This has been disabled already in guest code, but I don't see a
> particular reason to deprecate it in the host too.  Supporting it on the
> host is just one line of code; if it's a problem *for the guest*, you
> just don't use KVM_ASYNC_PF_SEND_ALWAYS.
>
> Also, note that #VE will always be delivered to the guest even at CPL0;
> the decision on whether to do sync or async page fault at CPL0 will move
> to the guest, but enabling #VE will probably _require_ the
> KVM_ASYNC_PF_SEND_ALWAYS bit to be set (and KVM_ASYNC_PF_DELIVERY_AS_INT
> as well).

I actually missed the fact that guest side is already disabled (I can
see it now in the queue), feel free to ignore this patch then.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 06/10] KVM: x86: acknowledgment mechanism for async pf page ready notifications
  2020-05-25 14:41 ` [PATCH v2 06/10] KVM: x86: acknowledgment mechanism for async pf page ready notifications Vitaly Kuznetsov
@ 2020-05-28 11:29   ` Paolo Bonzini
  2020-05-28 11:39     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 30+ messages in thread
From: Paolo Bonzini @ 2020-05-28 11:29 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, x86
  Cc: Andy Lutomirski, Thomas Gleixner, Borislav Petkov,
	H. Peter Anvin, Wanpeng Li, Sean Christopherson, Jim Mattson,
	Vivek Goyal, Gavin Shan, Peter Zijlstra, linux-kernel

On 25/05/20 16:41, Vitaly Kuznetsov wrote:
> +	case MSR_KVM_ASYNC_PF_ACK:
> +		if (data & 0x1) {
> +			vcpu->arch.apf.pageready_pending = false;
> +			kvm_check_async_pf_completion(vcpu);
> +		}
> +		break;
>  	case MSR_KVM_STEAL_TIME:
>  
>  		if (unlikely(!sched_info_on()))
> @@ -3183,6 +3189,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  	case MSR_KVM_ASYNC_PF_INT:
>  		msr_info->data = vcpu->arch.apf.msr_int_val;
>  		break;
> +	case MSR_KVM_ASYNC_PF_ACK:
> +		msr_info->data = 0;
> +		break;

How is the pageready_pending flag migrated?  Should we revert the
direction of the MSR (i.e. read the flag, and write 0 to clear it)?

Paolo


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 06/10] KVM: x86: acknowledgment mechanism for async pf page ready notifications
  2020-05-28 11:29   ` Paolo Bonzini
@ 2020-05-28 11:39     ` Vitaly Kuznetsov
  2020-05-28 11:44       ` Paolo Bonzini
  0 siblings, 1 reply; 30+ messages in thread
From: Vitaly Kuznetsov @ 2020-05-28 11:39 UTC (permalink / raw)
  To: Paolo Bonzini, kvm, x86
  Cc: Andy Lutomirski, Thomas Gleixner, Borislav Petkov,
	H. Peter Anvin, Wanpeng Li, Sean Christopherson, Jim Mattson,
	Vivek Goyal, Gavin Shan, Peter Zijlstra, linux-kernel

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 25/05/20 16:41, Vitaly Kuznetsov wrote:
>> +	case MSR_KVM_ASYNC_PF_ACK:
>> +		if (data & 0x1) {
>> +			vcpu->arch.apf.pageready_pending = false;
>> +			kvm_check_async_pf_completion(vcpu);
>> +		}
>> +		break;
>>  	case MSR_KVM_STEAL_TIME:
>>  
>>  		if (unlikely(!sched_info_on()))
>> @@ -3183,6 +3189,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>  	case MSR_KVM_ASYNC_PF_INT:
>>  		msr_info->data = vcpu->arch.apf.msr_int_val;
>>  		break;
>> +	case MSR_KVM_ASYNC_PF_ACK:
>> +		msr_info->data = 0;
>> +		break;
>
> How is the pageready_pending flag migrated?  Should we revert the
> direction of the MSR (i.e. read the flag, and write 0 to clear it)?

The flag is not migrated so it will be 'false'. This can just cause an
extra kick in kvm_arch_async_page_present_queued() but this shouldn't be
a big deal. Also, after migration we will just send 'wakeup all' event,
async pf queue will be empty. MSR_KVM_ASYNC_PF_ACK by itself is not
migrated, we don't even store it, not sure how invering it would change
things.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 06/10] KVM: x86: acknowledgment mechanism for async pf page ready notifications
  2020-05-28 11:39     ` Vitaly Kuznetsov
@ 2020-05-28 11:44       ` Paolo Bonzini
  0 siblings, 0 replies; 30+ messages in thread
From: Paolo Bonzini @ 2020-05-28 11:44 UTC (permalink / raw)
  To: Vitaly Kuznetsov, kvm, x86
  Cc: Andy Lutomirski, Thomas Gleixner, Borislav Petkov,
	H. Peter Anvin, Wanpeng Li, Sean Christopherson, Jim Mattson,
	Vivek Goyal, Gavin Shan, Peter Zijlstra, linux-kernel

On 28/05/20 13:39, Vitaly Kuznetsov wrote:
>> How is the pageready_pending flag migrated?  Should we revert the
>> direction of the MSR (i.e. read the flag, and write 0 to clear it)?
> The flag is not migrated so it will be 'false'. This can just cause an
> extra kick in kvm_arch_async_page_present_queued() but this shouldn't be
> a big deal. Also, after migration we will just send 'wakeup all' event,
> async pf queue will be empty.

Ah, that's kvm_pv_enable_async_pf, where the destination writes to
MSR_KVM_ASYNC_PF.  Cool.

> MSR_KVM_ASYNC_PF_ACK by itself is not
> migrated, we don't even store it, not sure how invering it would change
> things.

Yes, it would only be useful to invert it if it needs to be stored and
migrated.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 02/10] KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info
  2020-05-28  8:42     ` Vitaly Kuznetsov
  2020-05-28 10:59       ` Paolo Bonzini
@ 2020-06-03 19:35       ` Vivek Goyal
  1 sibling, 0 replies; 30+ messages in thread
From: Vivek Goyal @ 2020-06-03 19:35 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, x86, Paolo Bonzini, Andy Lutomirski, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, Wanpeng Li, Sean Christopherson,
	Jim Mattson, Gavin Shan, Peter Zijlstra, linux-kernel

On Thu, May 28, 2020 at 10:42:38AM +0200, Vitaly Kuznetsov wrote:
> Vivek Goyal <vgoyal@redhat.com> writes:
> 
> > On Mon, May 25, 2020 at 04:41:17PM +0200, Vitaly Kuznetsov wrote:
> >> 
> >
> > [..]
> >> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> >> index 0a6b35353fc7..c195f63c1086 100644
> >> --- a/arch/x86/include/asm/kvm_host.h
> >> +++ b/arch/x86/include/asm/kvm_host.h
> >> @@ -767,7 +767,7 @@ struct kvm_vcpu_arch {
> >>  		u64 msr_val;
> >>  		u32 id;
> >>  		bool send_user_only;
> >> -		u32 host_apf_reason;
> >> +		u32 host_apf_flags;
> >
> > Hi Vitaly,
> >
> > What is host_apf_reason used for. Looks like it is somehow used in
> > context of nested guests. I hope by now you have been able to figure
> > it out.
> >
> > Is it somehow the case of that L2 guest takes a page fault exit
> > and then L0 injects this event in L1 using exception. I have been
> > trying to read this code but can't wrap my head around it.
> >
> > I am still concerned about the case of nested kvm. We have discussed
> > apf mechanism but never touched nested part of it. Given we are
> > touching code in nested kvm part, want to make sure it is not broken
> > in new design.
> >
> 
> Sorry I missed this.
> 
> I think we've touched nested topic a bit already:
> https://lore.kernel.org/kvm/87lfluwfi0.fsf@vitty.brq.redhat.com/
> 
> But let me try to explain the whole thing and maybe someone will point
> out what I'm missing.

Hi Vitaly,

Sorry, I got busy in some other things. Got back to it now. Thanks for
the explanation. I think I understand it up to some extent now.

Vivek

> 
> The problem being solved: L2 guest is running and it is hitting a page
> which is not present *in L0* and instead of pausing *L1* vCPU completely
> we want to let L1 know about the problem so it can run something else
> (e.g. another guest or just another application).
> 
> What's different between this and 'normal' APF case. When L2 guest is
> running, the CPU (physical) is in 'guest' mode so we can't inject #PF
> there. Actually, we can but L2 may get confused and we're not even sure
> it's L2's fault, that L2 supported APF and so on. We want to make L1
> deal with the issue.
> 
> How does it work then. We inject #PF and L1 sees it as #PF VMEXIT. It
> needs to know about APF (thus KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT) but
> the handling is exactly the same as do_pagefault(): L1's
> kvm_handle_page_fault() checkes APF area (shared between L0 and L1) and
> either pauses a task or resumes a previously paused one. This can be a
> L2 guest or something else.
> 
> What is 'host_apf_reason'. It is a copy of 'reason' field from 'struct
> kvm_vcpu_pv_apf_data' which we read upon #PF VMEXIT. It indicates that
> the #PF VMEXIT is synthetic.
> 
> How does it work with the patchset: 'page not present' case remains the
> same. 'page ready' case now goes through interrupts so it may not get
> handled immediately. External interrupts will be handled by L0 in host
> mode (when L2 is not running). For the 'page ready' case L1 hypervisor
> doesn't need any special handling, kvm_async_pf_intr() irq handler will
> work correctly.
> 
> I've smoke tested this with VMX and nothing immediately blew up.
> 
> -- 
> Vitaly
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 00/10] KVM: x86: Interrupt-based mechanism for async_pf 'page present' notifications
  2020-05-28 11:04 ` [PATCH v2 00/10] KVM: x86: Interrupt-based mechanism for async_pf 'page present' notifications Paolo Bonzini
@ 2020-06-04 17:45   ` Vivek Goyal
  2020-06-04 17:56     ` Paolo Bonzini
  0 siblings, 1 reply; 30+ messages in thread
From: Vivek Goyal @ 2020-06-04 17:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Vitaly Kuznetsov, kvm, x86, Andy Lutomirski, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, Wanpeng Li, Sean Christopherson,
	Jim Mattson, Gavin Shan, Peter Zijlstra, linux-kernel

On Thu, May 28, 2020 at 01:04:55PM +0200, Paolo Bonzini wrote:
> On 25/05/20 16:41, Vitaly Kuznetsov wrote:
> > Concerns were expressed around (ab)using #PF for KVM's async_pf mechanism,
> > it seems that re-using #PF exception for a PV mechanism wasn't a great
> > idea after all. The Grand Plan is to switch to using e.g. #VE for 'page
> > not present' events and normal APIC interrupts for 'page ready' events.
> > This series does the later.
> > 
> > Changes since v1:
> > - struct kvm_vcpu_pv_apf_data's fields renamed to 'flags' and 'token',
> >   comments added [Vivek Goyal]
> > - 'type1/2' names for APF events dropped from everywhere [Vivek Goyal]
> > - kvm_arch_can_inject_async_page_present() renamed to 
> >   kvm_arch_can_dequeue_async_page_present [Vivek Goyal]
> > - 'KVM: x86: deprecate KVM_ASYNC_PF_SEND_ALWAYS' patch added.
> > 
> > v1: https://lore.kernel.org/kvm/20200511164752.2158645-1-vkuznets@redhat.com/
> > QEMU patches for testing: https://github.com/vittyvk/qemu.git (async_pf2_v2 branch)
> 
> I'll do another round of review and queue patches 1-7; 8-9 will be
> queued later and separately due to the conflicts with the interrupt
> entry rework, but it's my job and you don't need to do anything else.

Hi Paolo,

I seee 1-7 got merged for 5.8. When you say patch 8-9 will be queue later,
you mean later in 5.8 or it will held till 5.9 merge window opens.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 00/10] KVM: x86: Interrupt-based mechanism for async_pf 'page present' notifications
  2020-06-04 17:45   ` Vivek Goyal
@ 2020-06-04 17:56     ` Paolo Bonzini
  0 siblings, 0 replies; 30+ messages in thread
From: Paolo Bonzini @ 2020-06-04 17:56 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Vitaly Kuznetsov, kvm, x86, Andy Lutomirski, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, Wanpeng Li, Sean Christopherson,
	Jim Mattson, Gavin Shan, Peter Zijlstra, linux-kernel

On 04/06/20 19:45, Vivek Goyal wrote:
>> I'll do another round of review and queue patches 1-7; 8-9 will be
>> queued later and separately due to the conflicts with the interrupt
>> entry rework, but it's my job and you don't need to do anything else.
> Hi Paolo,
> 
> I seee 1-7 got merged for 5.8. When you say patch 8-9 will be queue later,
> you mean later in 5.8 or it will held till 5.9 merge window opens.

I hope to get them in 5.8.  They have some pretty nasty conflicts that
are too much for Linus to resolve.  So my plan is to put 8-9 in a topic
branch and do the merge myself.  Whether this works out depends on the
timing of the tip pull request.

Paolo


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 05/10] KVM: x86: interrupt based APF 'page ready' event delivery
  2020-05-25 14:41 ` [PATCH v2 05/10] KVM: x86: interrupt based APF 'page ready' event delivery Vitaly Kuznetsov
@ 2020-06-09 19:10   ` Vivek Goyal
  2020-06-09 20:31     ` Paolo Bonzini
  0 siblings, 1 reply; 30+ messages in thread
From: Vivek Goyal @ 2020-06-09 19:10 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, x86, Paolo Bonzini, Andy Lutomirski, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, Wanpeng Li, Sean Christopherson,
	Jim Mattson, Gavin Shan, Peter Zijlstra, linux-kernel

On Mon, May 25, 2020 at 04:41:20PM +0200, Vitaly Kuznetsov wrote:
[..]
>  void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
>  				 struct kvm_async_pf *work)
>  {
> -	struct x86_exception fault;
> +	struct kvm_lapic_irq irq = {
> +		.delivery_mode = APIC_DM_FIXED,
> +		.vector = vcpu->arch.apf.vec
> +	};
>  
>  	if (work->wakeup_all)
>  		work->arch.token = ~0; /* broadcast wakeup */
> @@ -10444,26 +10491,20 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
>  		kvm_del_async_pf_gfn(vcpu, work->arch.gfn);
>  	trace_kvm_async_pf_ready(work->arch.token, work->cr2_or_gpa);
>  
> -	if (vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED &&
> -	    !apf_put_user_ready(vcpu, work->arch.token)) {
> -			fault.vector = PF_VECTOR;
> -			fault.error_code_valid = true;
> -			fault.error_code = 0;
> -			fault.nested_page_fault = false;
> -			fault.address = work->arch.token;
> -			fault.async_page_fault = true;
> -			kvm_inject_page_fault(vcpu, &fault);
> -	}
> +	if (kvm_pv_async_pf_enabled(vcpu) &&
> +	    !apf_put_user_ready(vcpu, work->arch.token))
> +		kvm_apic_set_irq(vcpu, &irq, NULL);
> +

Hi Vitaly,

Have a question about page ready events. 

Now we deliver PAGE_NOT_PRESENT page faults only if guest is not in
kernel mode. So say kernel tried to access a page and we halted cpu.
When page is available, we will inject page_ready interrupt. At
that time we don't seem to check whether page_not_present was injected
or not. 

IOW, we seem to deliver page_ready irrespective of the fact whether
PAGE_NOT_PRESENT was delivered or not. And that means we will be
sending page present tokens to guest. Guest will not have a state
associated with that token and think that page_not_present has
not been delivered yet and allocate an element in hash table for
future page_not_present event. And that will lead to memory leak
and token conflict etc.

While setting up async pf, should we keep track whether associated
page_not_present was delivered to guest or not and deliver page_ready
accordingly.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 05/10] KVM: x86: interrupt based APF 'page ready' event delivery
  2020-06-09 19:10   ` Vivek Goyal
@ 2020-06-09 20:31     ` Paolo Bonzini
  2020-06-10  9:01       ` Vitaly Kuznetsov
  0 siblings, 1 reply; 30+ messages in thread
From: Paolo Bonzini @ 2020-06-09 20:31 UTC (permalink / raw)
  To: Vivek Goyal, Vitaly Kuznetsov
  Cc: kvm, x86, Andy Lutomirski, Thomas Gleixner, Borislav Petkov,
	H. Peter Anvin, Wanpeng Li, Sean Christopherson, Jim Mattson,
	Gavin Shan, Peter Zijlstra, linux-kernel

On 09/06/20 21:10, Vivek Goyal wrote:
> Hi Vitaly,
> 
> Have a question about page ready events. 
> 
> Now we deliver PAGE_NOT_PRESENT page faults only if guest is not in
> kernel mode. So say kernel tried to access a page and we halted cpu.
> When page is available, we will inject page_ready interrupt. At
> that time we don't seem to check whether page_not_present was injected
> or not. 
> 
> IOW, we seem to deliver page_ready irrespective of the fact whether
> PAGE_NOT_PRESENT was delivered or not. And that means we will be
> sending page present tokens to guest. Guest will not have a state
> associated with that token and think that page_not_present has
> not been delivered yet and allocate an element in hash table for
> future page_not_present event. And that will lead to memory leak
> and token conflict etc.

Yes, and this is https://bugzilla.kernel.org/show_bug.cgi?id=208081
which I was looking at right today.

> While setting up async pf, should we keep track whether associated
> page_not_present was delivered to guest or not and deliver page_ready
> accordingly.

Yes, I think so.

Paolo


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 05/10] KVM: x86: interrupt based APF 'page ready' event delivery
  2020-06-09 20:31     ` Paolo Bonzini
@ 2020-06-10  9:01       ` Vitaly Kuznetsov
  2020-06-10 11:34         ` Paolo Bonzini
  2020-06-10 12:51         ` Vivek Goyal
  0 siblings, 2 replies; 30+ messages in thread
From: Vitaly Kuznetsov @ 2020-06-10  9:01 UTC (permalink / raw)
  To: Paolo Bonzini, Vivek Goyal
  Cc: kvm, x86, Andy Lutomirski, Thomas Gleixner, Borislav Petkov,
	H. Peter Anvin, Wanpeng Li, Sean Christopherson, Jim Mattson,
	Gavin Shan, Peter Zijlstra, linux-kernel

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 09/06/20 21:10, Vivek Goyal wrote:
>> Hi Vitaly,
>> 
>> Have a question about page ready events. 
>> 
>> Now we deliver PAGE_NOT_PRESENT page faults only if guest is not in
>> kernel mode. So say kernel tried to access a page and we halted cpu.
>> When page is available, we will inject page_ready interrupt. At
>> that time we don't seem to check whether page_not_present was injected
>> or not. 
>> 
>> IOW, we seem to deliver page_ready irrespective of the fact whether
>> PAGE_NOT_PRESENT was delivered or not. And that means we will be
>> sending page present tokens to guest. Guest will not have a state
>> associated with that token and think that page_not_present has
>> not been delivered yet and allocate an element in hash table for
>> future page_not_present event. And that will lead to memory leak
>> and token conflict etc.
>
> Yes, and this is https://bugzilla.kernel.org/show_bug.cgi?id=208081
> which I was looking at right today.
>

The issue isn't related to the interrupt based APF mechanism, right?
'Page ready' events are always injected (sooner or later). I'll take a
look.

>> While setting up async pf, should we keep track whether associated
>> page_not_present was delivered to guest or not and deliver page_ready
>> accordingly.
>
> Yes, I think so.
>

Something like this? (not even compile tested yet):

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8e8fea13b6c7..68178d29d35c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1661,7 +1661,7 @@ void kvm_make_scan_ioapic_request(struct kvm *kvm);
 void kvm_make_scan_ioapic_request_mask(struct kvm *kvm,
 				       unsigned long *vcpu_bitmap);
 
-void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
+bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
 				     struct kvm_async_pf *work);
 void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 				 struct kvm_async_pf *work);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c26dd1363151..e1e840df6b69 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10515,7 +10515,7 @@ bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu)
 	return kvm_arch_interrupt_allowed(vcpu);
 }
 
-void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
+bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
 				     struct kvm_async_pf *work)
 {
 	struct x86_exception fault;
@@ -10532,17 +10532,19 @@ void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
 		fault.address = work->arch.token;
 		fault.async_page_fault = true;
 		kvm_inject_page_fault(vcpu, &fault);
-	} else {
-		/*
-		 * It is not possible to deliver a paravirtualized asynchronous
-		 * page fault, but putting the guest in an artificial halt state
-		 * can be beneficial nevertheless: if an interrupt arrives, we
-		 * can deliver it timely and perhaps the guest will schedule
-		 * another process.  When the instruction that triggered a page
-		 * fault is retried, hopefully the page will be ready in the host.
-		 */
-		kvm_make_request(KVM_REQ_APF_HALT, vcpu);
+		return true;
 	}
+
+	/*
+	 * It is not possible to deliver a paravirtualized asynchronous
+	 * page fault, but putting the guest in an artificial halt state
+	 * can be beneficial nevertheless: if an interrupt arrives, we
+	 * can deliver it timely and perhaps the guest will schedule
+	 * another process.  When the instruction that triggered a page
+	 * fault is retried, hopefully the page will be ready in the host.
+	 */
+	kvm_make_request(KVM_REQ_APF_HALT, vcpu);
+	return false;
 }
 
 void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
@@ -10559,7 +10561,8 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 		kvm_del_async_pf_gfn(vcpu, work->arch.gfn);
 	trace_kvm_async_pf_ready(work->arch.token, work->cr2_or_gpa);
 
-	if (kvm_pv_async_pf_enabled(vcpu) &&
+	if (work->notpresent_injected &&
+	    kvm_pv_async_pf_enabled(vcpu) &&
 	    !apf_put_user_ready(vcpu, work->arch.token)) {
 		vcpu->arch.apf.pageready_pending = true;
 		kvm_apic_set_irq(vcpu, &irq, NULL);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 802b9e2306f0..2456dc5338f8 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -206,6 +206,7 @@ struct kvm_async_pf {
 	unsigned long addr;
 	struct kvm_arch_async_pf arch;
 	bool   wakeup_all;
+	bool notpresent_injected;
 };
 
 void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index f1e07fae84e9..de28413abefd 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -189,12 +189,14 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 		goto retry_sync;
 
 	INIT_WORK(&work->work, async_pf_execute);
-	if (!schedule_work(&work->work))
-		goto retry_sync;
 
 	list_add_tail(&work->queue, &vcpu->async_pf.queue);
 	vcpu->async_pf.queued++;
-	kvm_arch_async_page_not_present(vcpu, work);
+	work->notpresent_injected = kvm_arch_async_page_not_present(vcpu, work);
+
+	/* schedule_work() only fails for already queued works */
+	schedule_work(&work->work);
+
 	return 1;
 retry_sync:
 	kvm_put_kvm(work->vcpu->kvm);
@@ -216,6 +218,7 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu)
 		return -ENOMEM;
 
 	work->wakeup_all = true;
+	work->notpresent_injected = true;
 	INIT_LIST_HEAD(&work->queue); /* for list_del to work */
 
 	spin_lock(&vcpu->async_pf.lock);

-- 
Vitaly


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 05/10] KVM: x86: interrupt based APF 'page ready' event delivery
  2020-06-10  9:01       ` Vitaly Kuznetsov
@ 2020-06-10 11:34         ` Paolo Bonzini
  2020-06-10 12:51         ` Vivek Goyal
  1 sibling, 0 replies; 30+ messages in thread
From: Paolo Bonzini @ 2020-06-10 11:34 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Vivek Goyal
  Cc: kvm, x86, Andy Lutomirski, Thomas Gleixner, Borislav Petkov,
	H. Peter Anvin, Wanpeng Li, Sean Christopherson, Jim Mattson,
	Gavin Shan, Peter Zijlstra, linux-kernel

On 10/06/20 11:01, Vitaly Kuznetsov wrote:
> The issue isn't related to the interrupt based APF mechanism, right?
> 'Page ready' events are always injected (sooner or later). I'll take a
> look.

No, it isn't.

>>> While setting up async pf, should we keep track whether associated
>>> page_not_present was delivered to guest or not and deliver page_ready
>>> accordingly.
>>
>> Yes, I think so.
> 
> Something like this? (not even compile tested yet):

Pretty much, though I would avoid the reindentation if possible.

Paolo


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 05/10] KVM: x86: interrupt based APF 'page ready' event delivery
  2020-06-10  9:01       ` Vitaly Kuznetsov
  2020-06-10 11:34         ` Paolo Bonzini
@ 2020-06-10 12:51         ` Vivek Goyal
  1 sibling, 0 replies; 30+ messages in thread
From: Vivek Goyal @ 2020-06-10 12:51 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Paolo Bonzini, kvm, x86, Andy Lutomirski, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, Wanpeng Li, Sean Christopherson,
	Jim Mattson, Gavin Shan, Peter Zijlstra, linux-kernel

On Wed, Jun 10, 2020 at 11:01:39AM +0200, Vitaly Kuznetsov wrote:
> Paolo Bonzini <pbonzini@redhat.com> writes:
> 
> > On 09/06/20 21:10, Vivek Goyal wrote:
> >> Hi Vitaly,
> >> 
> >> Have a question about page ready events. 
> >> 
> >> Now we deliver PAGE_NOT_PRESENT page faults only if guest is not in
> >> kernel mode. So say kernel tried to access a page and we halted cpu.
> >> When page is available, we will inject page_ready interrupt. At
> >> that time we don't seem to check whether page_not_present was injected
> >> or not. 
> >> 
> >> IOW, we seem to deliver page_ready irrespective of the fact whether
> >> PAGE_NOT_PRESENT was delivered or not. And that means we will be
> >> sending page present tokens to guest. Guest will not have a state
> >> associated with that token and think that page_not_present has
> >> not been delivered yet and allocate an element in hash table for
> >> future page_not_present event. And that will lead to memory leak
> >> and token conflict etc.
> >
> > Yes, and this is https://bugzilla.kernel.org/show_bug.cgi?id=208081
> > which I was looking at right today.
> >
> 
> The issue isn't related to the interrupt based APF mechanism, right?
> 'Page ready' events are always injected (sooner or later). I'll take a
> look.
> 
> >> While setting up async pf, should we keep track whether associated
> >> page_not_present was delivered to guest or not and deliver page_ready
> >> accordingly.
> >
> > Yes, I think so.
> >
> 
> Something like this? (not even compile tested yet):
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 8e8fea13b6c7..68178d29d35c 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1661,7 +1661,7 @@ void kvm_make_scan_ioapic_request(struct kvm *kvm);
>  void kvm_make_scan_ioapic_request_mask(struct kvm *kvm,
>  				       unsigned long *vcpu_bitmap);
>  
> -void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
> +bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
>  				     struct kvm_async_pf *work);
>  void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
>  				 struct kvm_async_pf *work);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c26dd1363151..e1e840df6b69 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10515,7 +10515,7 @@ bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu)
>  	return kvm_arch_interrupt_allowed(vcpu);
>  }
>  
> -void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
> +bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
>  				     struct kvm_async_pf *work)
>  {
>  	struct x86_exception fault;
> @@ -10532,17 +10532,19 @@ void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
>  		fault.address = work->arch.token;
>  		fault.async_page_fault = true;
>  		kvm_inject_page_fault(vcpu, &fault);
> -	} else {
> -		/*
> -		 * It is not possible to deliver a paravirtualized asynchronous
> -		 * page fault, but putting the guest in an artificial halt state
> -		 * can be beneficial nevertheless: if an interrupt arrives, we
> -		 * can deliver it timely and perhaps the guest will schedule
> -		 * another process.  When the instruction that triggered a page
> -		 * fault is retried, hopefully the page will be ready in the host.
> -		 */
> -		kvm_make_request(KVM_REQ_APF_HALT, vcpu);
> +		return true;
>  	}
> +
> +	/*
> +	 * It is not possible to deliver a paravirtualized asynchronous
> +	 * page fault, but putting the guest in an artificial halt state
> +	 * can be beneficial nevertheless: if an interrupt arrives, we
> +	 * can deliver it timely and perhaps the guest will schedule
> +	 * another process.  When the instruction that triggered a page
> +	 * fault is retried, hopefully the page will be ready in the host.
> +	 */
> +	kvm_make_request(KVM_REQ_APF_HALT, vcpu);
> +	return false;
>  }
>  
>  void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
> @@ -10559,7 +10561,8 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
>  		kvm_del_async_pf_gfn(vcpu, work->arch.gfn);
>  	trace_kvm_async_pf_ready(work->arch.token, work->cr2_or_gpa);
>  
> -	if (kvm_pv_async_pf_enabled(vcpu) &&
> +	if (work->notpresent_injected &&
> +	    kvm_pv_async_pf_enabled(vcpu) &&
>  	    !apf_put_user_ready(vcpu, work->arch.token)) {
>  		vcpu->arch.apf.pageready_pending = true;
>  		kvm_apic_set_irq(vcpu, &irq, NULL);
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 802b9e2306f0..2456dc5338f8 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -206,6 +206,7 @@ struct kvm_async_pf {
>  	unsigned long addr;
>  	struct kvm_arch_async_pf arch;
>  	bool   wakeup_all;
> +	bool notpresent_injected;
>  };
>  
>  void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu);
> diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
> index f1e07fae84e9..de28413abefd 100644
> --- a/virt/kvm/async_pf.c
> +++ b/virt/kvm/async_pf.c
> @@ -189,12 +189,14 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>  		goto retry_sync;
>  
>  	INIT_WORK(&work->work, async_pf_execute);
> -	if (!schedule_work(&work->work))
> -		goto retry_sync;
>  
>  	list_add_tail(&work->queue, &vcpu->async_pf.queue);
>  	vcpu->async_pf.queued++;
> -	kvm_arch_async_page_not_present(vcpu, work);
> +	work->notpresent_injected = kvm_arch_async_page_not_present(vcpu, work);
> +
> +	/* schedule_work() only fails for already queued works */
> +	schedule_work(&work->work);
> +
>  	return 1;
>  retry_sync:

This label and associated logic can go away as we are not expecting
error from schedule_work().

>  	kvm_put_kvm(work->vcpu->kvm);
> @@ -216,6 +218,7 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu)
>  		return -ENOMEM;
>  
>  	work->wakeup_all = true;
> +	work->notpresent_injected = true;

This probably is not needed. We already have work->wakeup_all set and
this has to be always injected. There is no associated page_not_present
event. IMHO, it probably is cleaner to not set it for  wake up all
events and check work->wakeup_all instead and always inject it.

Once you have final patch, I will test it. I have now configured nvdimm
device with a file backing the device. I drop the page cache on host
and that means any page accessed in guest triggers async pf on host. So
I can easily test it.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 08/10] KVM: x86: Switch KVM guest to using interrupts for page ready APF delivery
  2020-05-25 14:41 ` [PATCH v2 08/10] KVM: x86: Switch KVM guest to using interrupts for page ready APF delivery Vitaly Kuznetsov
@ 2020-06-10 20:51   ` Vivek Goyal
  2020-06-10 21:06     ` Paolo Bonzini
  0 siblings, 1 reply; 30+ messages in thread
From: Vivek Goyal @ 2020-06-10 20:51 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm, x86, Paolo Bonzini, Andy Lutomirski, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, Wanpeng Li, Sean Christopherson,
	Jim Mattson, Gavin Shan, Peter Zijlstra, linux-kernel

On Mon, May 25, 2020 at 04:41:23PM +0200, Vitaly Kuznetsov wrote:
> KVM now supports using interrupt for 'page ready' APF event delivery and
> legacy mechanism was deprecated. Switch KVM guests to the new one.

Hi Vitaly,

I see we have all this code in guest which tries to take care of
cases where PAGE_READY can be delivered before PAGE_NOT_PRESENT. In
this new schedume of things, can it still happen. We are using
an exception to deliver PAGE_NOT_PRESENT while using interrupt to
deliver PAGE_READY.

If re-ordeing is not possible, then it will be good to get rid of
that extra complexity in guest.

Thanks
Vivek

> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/entry/entry_32.S          |  5 ++++
>  arch/x86/entry/entry_64.S          |  5 ++++
>  arch/x86/include/asm/hardirq.h     |  3 ++
>  arch/x86/include/asm/irq_vectors.h |  6 +++-
>  arch/x86/include/asm/kvm_para.h    |  6 ++++
>  arch/x86/kernel/irq.c              |  9 ++++++
>  arch/x86/kernel/kvm.c              | 45 ++++++++++++++++++++++--------
>  7 files changed, 66 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> index b67bae7091d7..d574dadcb2a1 100644
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -1475,6 +1475,11 @@ BUILD_INTERRUPT3(hv_stimer0_callback_vector, HYPERV_STIMER0_VECTOR,
>  
>  #endif /* CONFIG_HYPERV */
>  
> +#ifdef CONFIG_KVM_GUEST
> +BUILD_INTERRUPT3(kvm_async_pf_vector, KVM_ASYNC_PF_VECTOR,
> +		 kvm_async_pf_intr)
> +#endif
> +
>  SYM_CODE_START(page_fault)
>  	ASM_CLAC
>  	pushl	$do_page_fault
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 3063aa9090f9..138f5c5aca2e 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -1190,6 +1190,11 @@ apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \
>  	acrn_hv_callback_vector acrn_hv_vector_handler
>  #endif
>  
> +#ifdef CONFIG_KVM_GUEST
> +apicinterrupt3 KVM_ASYNC_PF_VECTOR \
> +	kvm_async_pf_vector kvm_async_pf_intr
> +#endif
> +
>  idtentry debug			do_debug		has_error_code=0	paranoid=1 shift_ist=IST_INDEX_DB ist_offset=DB_STACK_OFFSET
>  idtentry int3			do_int3			has_error_code=0	create_gap=1
>  idtentry stack_segment		do_stack_segment	has_error_code=1
> diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h
> index 07533795b8d2..be0fbb15ad7f 100644
> --- a/arch/x86/include/asm/hardirq.h
> +++ b/arch/x86/include/asm/hardirq.h
> @@ -44,6 +44,9 @@ typedef struct {
>  	unsigned int irq_hv_reenlightenment_count;
>  	unsigned int hyperv_stimer0_count;
>  #endif
> +#ifdef CONFIG_KVM_GUEST
> +	unsigned int kvm_async_pf_pageready_count;
> +#endif
>  } ____cacheline_aligned irq_cpustat_t;
>  
>  DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
> diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
> index 889f8b1b5b7f..8879a9ecd908 100644
> --- a/arch/x86/include/asm/irq_vectors.h
> +++ b/arch/x86/include/asm/irq_vectors.h
> @@ -104,7 +104,11 @@
>  #define HYPERV_STIMER0_VECTOR		0xed
>  #endif
>  
> -#define LOCAL_TIMER_VECTOR		0xec
> +#ifdef CONFIG_KVM_GUEST
> +#define KVM_ASYNC_PF_VECTOR		0xec
> +#endif
> +
> +#define LOCAL_TIMER_VECTOR		0xeb
>  
>  #define NR_VECTORS			 256
>  
> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> index 2a3102fee189..a075cd8fa5c7 100644
> --- a/arch/x86/include/asm/kvm_para.h
> +++ b/arch/x86/include/asm/kvm_para.h
> @@ -4,6 +4,7 @@
>  
>  #include <asm/processor.h>
>  #include <asm/alternative.h>
> +#include <linux/interrupt.h>
>  #include <uapi/asm/kvm_para.h>
>  
>  extern void kvmclock_init(void);
> @@ -93,6 +94,11 @@ void kvm_async_pf_task_wake(u32 token);
>  u32 kvm_read_and_reset_apf_flags(void);
>  extern void kvm_disable_steal_time(void);
>  void do_async_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned long address);
> +extern __visible void kvm_async_pf_vector(void);
> +#ifdef CONFIG_TRACING
> +#define trace_kvm_async_pf_vector kvm_async_pf_vector
> +#endif
> +__visible void __irq_entry kvm_async_pf_intr(struct pt_regs *regs);
>  
>  #ifdef CONFIG_PARAVIRT_SPINLOCKS
>  void __init kvm_spinlock_init(void);
> diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
> index c7965ff429c5..a4c2f25ad74d 100644
> --- a/arch/x86/kernel/irq.c
> +++ b/arch/x86/kernel/irq.c
> @@ -159,6 +159,15 @@ int arch_show_interrupts(struct seq_file *p, int prec)
>  				   irq_stats(j)->hyperv_stimer0_count);
>  		seq_puts(p, "  Hyper-V stimer0 interrupts\n");
>  	}
> +#endif
> +#ifdef CONFIG_KVM_GUEST
> +	if (test_bit(KVM_ASYNC_PF_VECTOR, system_vectors)) {
> +		seq_printf(p, "%*s: ", prec, "APF");
> +		for_each_online_cpu(j)
> +			seq_printf(p, "%10u ",
> +				   irq_stats(j)->kvm_async_pf_pageready_count);
> +		seq_puts(p, "  KVM async PF page ready interrupts\n");
> +	}
>  #endif
>  	seq_printf(p, "%*s: %10u\n", prec, "ERR", atomic_read(&irq_err_count));
>  #if defined(CONFIG_X86_IO_APIC)
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 340df5dab30d..79730eaef1e1 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -245,23 +245,39 @@ NOKPROBE_SYMBOL(kvm_read_and_reset_apf_flags);
>  dotraplinkage void
>  do_async_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned long address)
>  {
> -	switch (kvm_read_and_reset_apf_flags()) {
> -	default:
> +	u32 flags = kvm_read_and_reset_apf_flags();
> +
> +	if (!flags) {
> +		/* This is a normal page fault */
>  		do_page_fault(regs, error_code, address);
> -		break;
> -	case KVM_PV_REASON_PAGE_NOT_PRESENT:
> +		return;
> +	} else if (flags & KVM_PV_REASON_PAGE_NOT_PRESENT) {
>  		/* page is swapped out by the host. */
>  		kvm_async_pf_task_wait((u32)address, !user_mode(regs));
> -		break;
> -	case KVM_PV_REASON_PAGE_READY:
> -		rcu_irq_enter();
> -		kvm_async_pf_task_wake((u32)address);
> -		rcu_irq_exit();
> -		break;
> +	} else {
> +		WARN_ONCE(1, "Unexpected async PF flags: %x\n", flags);
>  	}
>  }
>  NOKPROBE_SYMBOL(do_async_page_fault);
>  
> +__visible void __irq_entry kvm_async_pf_intr(struct pt_regs *regs)
> +{
> +	u32 token;
> +
> +	entering_ack_irq();
> +
> +	inc_irq_stat(kvm_async_pf_pageready_count);
> +
> +	if (__this_cpu_read(apf_reason.enabled)) {
> +		token = __this_cpu_read(apf_reason.token);
> +		kvm_async_pf_task_wake(token);
> +		__this_cpu_write(apf_reason.token, 0);
> +		wrmsrl(MSR_KVM_ASYNC_PF_ACK, 1);
> +	}
> +
> +	exiting_irq();
> +}
> +
>  static void __init paravirt_ops_setup(void)
>  {
>  	pv_info.name = "KVM";
> @@ -305,17 +321,19 @@ static notrace void kvm_guest_apic_eoi_write(u32 reg, u32 val)
>  
>  static void kvm_guest_cpu_init(void)
>  {
> -	if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF) && kvmapf) {
> +	if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_INT) && kvmapf) {
>  		u64 pa = slow_virt_to_phys(this_cpu_ptr(&apf_reason));
>  
>  #ifdef CONFIG_PREEMPTION
>  		pa |= KVM_ASYNC_PF_SEND_ALWAYS;
>  #endif
> -		pa |= KVM_ASYNC_PF_ENABLED;
> +		pa |= KVM_ASYNC_PF_ENABLED | KVM_ASYNC_PF_DELIVERY_AS_INT;
>  
>  		if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_VMEXIT))
>  			pa |= KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT;
>  
> +		wrmsrl(MSR_KVM_ASYNC_PF_INT, KVM_ASYNC_PF_VECTOR);
> +
>  		wrmsrl(MSR_KVM_ASYNC_PF_EN, pa);
>  		__this_cpu_write(apf_reason.enabled, 1);
>  		printk(KERN_INFO"KVM setup async PF for cpu %d\n",
> @@ -649,6 +667,9 @@ static void __init kvm_guest_init(void)
>  	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
>  		apic_set_eoi_write(kvm_guest_apic_eoi_write);
>  
> +	if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_INT))
> +		alloc_intr_gate(KVM_ASYNC_PF_VECTOR, kvm_async_pf_vector);
> +
>  #ifdef CONFIG_SMP
>  	smp_ops.smp_prepare_cpus = kvm_smp_prepare_cpus;
>  	smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
> -- 
> 2.25.4
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 08/10] KVM: x86: Switch KVM guest to using interrupts for page ready APF delivery
  2020-06-10 20:51   ` Vivek Goyal
@ 2020-06-10 21:06     ` Paolo Bonzini
  0 siblings, 0 replies; 30+ messages in thread
From: Paolo Bonzini @ 2020-06-10 21:06 UTC (permalink / raw)
  To: Vivek Goyal, Vitaly Kuznetsov
  Cc: kvm, x86, Andy Lutomirski, Thomas Gleixner, Borislav Petkov,
	H. Peter Anvin, Wanpeng Li, Sean Christopherson, Jim Mattson,
	Gavin Shan, Peter Zijlstra, linux-kernel

On 10/06/20 22:51, Vivek Goyal wrote:
>> KVM now supports using interrupt for 'page ready' APF event delivery and
>> legacy mechanism was deprecated. Switch KVM guests to the new one.
> Hi Vitaly,
> 
> I see we have all this code in guest which tries to take care of
> cases where PAGE_READY can be delivered before PAGE_NOT_PRESENT. In
> this new schedume of things, can it still happen. We are using
> an exception to deliver PAGE_NOT_PRESENT while using interrupt to
> deliver PAGE_READY.
> 
> If re-ordeing is not possible, then it will be good to get rid of
> that extra complexity in guest.

It is perhaps less likely but still possible, because the interrupt
might be delivered to another CPU and race against the exception.

Paolo


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2020-06-10 21:07 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-25 14:41 [PATCH v2 00/10] KVM: x86: Interrupt-based mechanism for async_pf 'page present' notifications Vitaly Kuznetsov
2020-05-25 14:41 ` [PATCH v2 01/10] Revert "KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously" Vitaly Kuznetsov
2020-05-25 14:41 ` [PATCH v2 02/10] KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info Vitaly Kuznetsov
2020-05-26 18:27   ` Vivek Goyal
2020-05-28  8:42     ` Vitaly Kuznetsov
2020-05-28 10:59       ` Paolo Bonzini
2020-06-03 19:35       ` Vivek Goyal
2020-05-25 14:41 ` [PATCH v2 03/10] KVM: rename kvm_arch_can_inject_async_page_present() to kvm_arch_can_dequeue_async_page_present() Vitaly Kuznetsov
2020-05-25 14:41 ` [PATCH v2 04/10] KVM: introduce kvm_read_guest_offset_cached() Vitaly Kuznetsov
2020-05-25 14:41 ` [PATCH v2 05/10] KVM: x86: interrupt based APF 'page ready' event delivery Vitaly Kuznetsov
2020-06-09 19:10   ` Vivek Goyal
2020-06-09 20:31     ` Paolo Bonzini
2020-06-10  9:01       ` Vitaly Kuznetsov
2020-06-10 11:34         ` Paolo Bonzini
2020-06-10 12:51         ` Vivek Goyal
2020-05-25 14:41 ` [PATCH v2 06/10] KVM: x86: acknowledgment mechanism for async pf page ready notifications Vitaly Kuznetsov
2020-05-28 11:29   ` Paolo Bonzini
2020-05-28 11:39     ` Vitaly Kuznetsov
2020-05-28 11:44       ` Paolo Bonzini
2020-05-25 14:41 ` [PATCH v2 07/10] KVM: x86: announce KVM_FEATURE_ASYNC_PF_INT Vitaly Kuznetsov
2020-05-25 14:41 ` [PATCH v2 08/10] KVM: x86: Switch KVM guest to using interrupts for page ready APF delivery Vitaly Kuznetsov
2020-06-10 20:51   ` Vivek Goyal
2020-06-10 21:06     ` Paolo Bonzini
2020-05-25 14:41 ` [PATCH v2 09/10] KVM: x86: drop KVM_PV_REASON_PAGE_READY case from kvm_handle_page_fault() Vitaly Kuznetsov
2020-05-25 14:41 ` [PATCH v2 10/10] KVM: x86: deprecate KVM_ASYNC_PF_SEND_ALWAYS Vitaly Kuznetsov
2020-05-28 11:03   ` Paolo Bonzini
2020-05-28 11:14     ` Vitaly Kuznetsov
2020-05-28 11:04 ` [PATCH v2 00/10] KVM: x86: Interrupt-based mechanism for async_pf 'page present' notifications Paolo Bonzini
2020-06-04 17:45   ` Vivek Goyal
2020-06-04 17:56     ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).