[PATCH RFC 0/6] KVM: x86: Interrupt-based mechanism for async

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH RFC 0/6] KVM: x86: Interrupt-based mechanism for async_pf 'page present' notifications
@ 2020-04-29  9:36 Vitaly Kuznetsov
  2020-04-29  9:36 ` [PATCH RFC 1/6] Revert "KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously" Vitaly Kuznetsov
                   ` (5 more replies)
  0 siblings, 6 replies; 41+ messages in thread
From: Vitaly Kuznetsov @ 2020-04-29  9:36 UTC (permalink / raw)
  To: x86, kvm
  Cc: linux-kernel, Paolo Bonzini, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Sean Christopherson, Wanpeng Li, Jim Mattson

Concerns were expressed around (ab)using #PF for KVM's async_pf mechanism,
it seems that re-using #PF exception for a PV mechanism wasn't a great
idea after all. The Grand Plan is to switch to using e.g. #VE for 'page
not present' events and normal APIC interrupts for 'page present' events.
This RFC does the later.

Please let me know what you think about the idea in general and the
selected approach in particular.

Vitaly Kuznetsov (6):
  Revert "KVM: async_pf: Fix #DF due to inject "Page not Present" and
    "Page Ready" exceptions simultaneously"
  KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info
  KVM: x86: interrupt based APF page-ready event delivery
  KVM: x86: acknowledgment mechanism for async pf page ready
    notifications
  KVM: x86: announce KVM_FEATURE_ASYNC_PF_INT
  KVM: x86: Switch KVM guest to using interrupts for page ready APF
    delivery

 Documentation/virt/kvm/cpuid.rst     |   6 ++
 Documentation/virt/kvm/msr.rst       |  52 +++++++++++--
 arch/x86/entry/entry_32.S            |   5 ++
 arch/x86/entry/entry_64.S            |   5 ++
 arch/x86/include/asm/hardirq.h       |   3 +
 arch/x86/include/asm/irq_vectors.h   |   6 +-
 arch/x86/include/asm/kvm_host.h      |   5 +-
 arch/x86/include/asm/kvm_para.h      |   6 ++
 arch/x86/include/uapi/asm/kvm_para.h |  11 ++-
 arch/x86/kernel/irq.c                |   9 +++
 arch/x86/kernel/kvm.c                |  42 +++++++++++
 arch/x86/kvm/cpuid.c                 |   3 +-
 arch/x86/kvm/x86.c                   | 107 +++++++++++++++++++++------
 include/uapi/linux/kvm.h             |   1 +
 14 files changed, 228 insertions(+), 33 deletions(-)

-- 
2.25.3


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH RFC 1/6] Revert "KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously"
  2020-04-29  9:36 [PATCH RFC 0/6] KVM: x86: Interrupt-based mechanism for async_pf 'page present' notifications Vitaly Kuznetsov
@ 2020-04-29  9:36 ` Vitaly Kuznetsov
  2020-05-05  1:22   ` Gavin Shan
  2020-05-05 14:16   ` Vivek Goyal
  2020-04-29  9:36 ` [PATCH RFC 2/6] KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info Vitaly Kuznetsov
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 41+ messages in thread
From: Vitaly Kuznetsov @ 2020-04-29  9:36 UTC (permalink / raw)
  To: x86, kvm
  Cc: linux-kernel, Paolo Bonzini, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Sean Christopherson, Wanpeng Li, Jim Mattson

Commit 9a6e7c39810e (""KVM: async_pf: Fix #DF due to inject "Page not
Present" and "Page Ready" exceptions simultaneously") added a protection
against 'page ready' notification coming before 'page not ready' is
delivered. This situation seems to be impossible since commit 2a266f23550b
("KVM MMU: check pending exception before injecting APF) which added
'vcpu->arch.exception.pending' check to kvm_can_do_async_pf.

On x86, kvm_arch_async_page_present() has only one call site:
kvm_check_async_pf_completion() loop and we only enter the loop when
kvm_arch_can_inject_async_page_present(vcpu) which when async pf msr
is enabled, translates into kvm_can_do_async_pf().

There is also one problem with the cancellation mechanism. We don't seem
to check that the 'page not ready' notification we're cancelling matches
the 'page ready' notification so in theory, we may erroneously drop two
valid events.

Revert the commit. apf_get_user() stays as we will need it for the new
'page ready notifications via interrupt' mechanism.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/kvm/x86.c | 16 +---------------
 1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c5835f9cb9ad..b93133ee07ba 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10430,7 +10430,6 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 				 struct kvm_async_pf *work)
 {
 	struct x86_exception fault;
-	u32 val;
 
 	if (work->wakeup_all)
 		work->arch.token = ~0; /* broadcast wakeup */
@@ -10439,19 +10438,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 	trace_kvm_async_pf_ready(work->arch.token, work->cr2_or_gpa);
 
 	if (vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED &&
-	    !apf_get_user(vcpu, &val)) {
-		if (val == KVM_PV_REASON_PAGE_NOT_PRESENT &&
-		    vcpu->arch.exception.pending &&
-		    vcpu->arch.exception.nr == PF_VECTOR &&
-		    !apf_put_user(vcpu, 0)) {
-			vcpu->arch.exception.injected = false;
-			vcpu->arch.exception.pending = false;
-			vcpu->arch.exception.nr = 0;
-			vcpu->arch.exception.has_error_code = false;
-			vcpu->arch.exception.error_code = 0;
-			vcpu->arch.exception.has_payload = false;
-			vcpu->arch.exception.payload = 0;
-		} else if (!apf_put_user(vcpu, KVM_PV_REASON_PAGE_READY)) {
+	    !apf_put_user(vcpu, KVM_PV_REASON_PAGE_READY)) {
 			fault.vector = PF_VECTOR;
 			fault.error_code_valid = true;
 			fault.error_code = 0;
@@ -10459,7 +10446,6 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 			fault.address = work->arch.token;
 			fault.async_page_fault = true;
 			kvm_inject_page_fault(vcpu, &fault);
-		}
 	}
 	vcpu->arch.apf.halted = false;
 	vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
-- 
2.25.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH RFC 2/6] KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info
  2020-04-29  9:36 [PATCH RFC 0/6] KVM: x86: Interrupt-based mechanism for async_pf 'page present' notifications Vitaly Kuznetsov
  2020-04-29  9:36 ` [PATCH RFC 1/6] Revert "KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously" Vitaly Kuznetsov
@ 2020-04-29  9:36 ` Vitaly Kuznetsov
  2020-05-04 23:52   ` Gavin Shan
  2020-04-29  9:36 ` [PATCH RFC 3/6] KVM: x86: interrupt based APF page-ready event delivery Vitaly Kuznetsov
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 41+ messages in thread
From: Vitaly Kuznetsov @ 2020-04-29  9:36 UTC (permalink / raw)
  To: x86, kvm
  Cc: linux-kernel, Paolo Bonzini, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Sean Christopherson, Wanpeng Li, Jim Mattson

Currently, APF mechanism relies on the #PF abuse where the token is being
passed through CR2. If we switch to using interrupts to deliver page-ready
notifications we need a different way to pass the data. Extent the existing
'struct kvm_vcpu_pv_apf_data' with token information.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/include/uapi/asm/kvm_para.h |  3 ++-
 arch/x86/kvm/x86.c                   | 10 ++++++----
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 2a8e0b6b9805..df2ba34037a2 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -113,7 +113,8 @@ struct kvm_mmu_op_release_pt {
 
 struct kvm_vcpu_pv_apf_data {
 	__u32 reason;
-	__u8 pad[60];
+	__u32 token;
+	__u8 pad[56];
 	__u32 enabled;
 };
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b93133ee07ba..7c21c0cf0a33 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2662,7 +2662,7 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
 	}
 
 	if (kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.apf.data, gpa,
-					sizeof(u32)))
+					sizeof(u64)))
 		return 1;
 
 	vcpu->arch.apf.send_user_only = !(data & KVM_ASYNC_PF_SEND_ALWAYS);
@@ -10352,8 +10352,9 @@ static void kvm_del_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
 	}
 }
 
-static int apf_put_user(struct kvm_vcpu *vcpu, u32 val)
+static int apf_put_user(struct kvm_vcpu *vcpu, u32 reason, u32 token)
 {
+	u64 val = (u64)token << 32 | reason;
 
 	return kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.apf.data, &val,
 				      sizeof(val));
@@ -10405,7 +10406,8 @@ void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
 	kvm_add_async_pf_gfn(vcpu, work->arch.gfn);
 
 	if (kvm_can_deliver_async_pf(vcpu) &&
-	    !apf_put_user(vcpu, KVM_PV_REASON_PAGE_NOT_PRESENT)) {
+	    !apf_put_user(vcpu, KVM_PV_REASON_PAGE_NOT_PRESENT,
+			  work->arch.token)) {
 		fault.vector = PF_VECTOR;
 		fault.error_code_valid = true;
 		fault.error_code = 0;
@@ -10438,7 +10440,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 	trace_kvm_async_pf_ready(work->arch.token, work->cr2_or_gpa);
 
 	if (vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED &&
-	    !apf_put_user(vcpu, KVM_PV_REASON_PAGE_READY)) {
+	    !apf_put_user(vcpu, KVM_PV_REASON_PAGE_READY, work->arch.token)) {
 			fault.vector = PF_VECTOR;
 			fault.error_code_valid = true;
 			fault.error_code = 0;
-- 
2.25.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH RFC 3/6] KVM: x86: interrupt based APF page-ready event delivery
  2020-04-29  9:36 [PATCH RFC 0/6] KVM: x86: Interrupt-based mechanism for async_pf 'page present' notifications Vitaly Kuznetsov
  2020-04-29  9:36 ` [PATCH RFC 1/6] Revert "KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously" Vitaly Kuznetsov
  2020-04-29  9:36 ` [PATCH RFC 2/6] KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info Vitaly Kuznetsov
@ 2020-04-29  9:36 ` Vitaly Kuznetsov
  2020-04-29 10:54   ` Paolo Bonzini
                     ` (2 more replies)
  2020-04-29  9:36 ` [PATCH RFC 4/6] KVM: x86: acknowledgment mechanism for async pf page ready notifications Vitaly Kuznetsov
                   ` (2 subsequent siblings)
  5 siblings, 3 replies; 41+ messages in thread
From: Vitaly Kuznetsov @ 2020-04-29  9:36 UTC (permalink / raw)
  To: x86, kvm
  Cc: linux-kernel, Paolo Bonzini, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Sean Christopherson, Wanpeng Li, Jim Mattson

Concerns were expressed around APF delivery via synthetic #PF exception as
in some cases such delivery may collide with real page fault. For type 2
(page ready) notifications we can easily switch to using an interrupt
instead. Introduce new MSR_KVM_ASYNC_PF2 mechanism.

One notable difference between the two mechanisms is that interrupt may not
get handled immediately so whenever we would like to deliver next event
(regardless of its type) we must be sure the guest had read and cleared
previous event in the slot.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 Documentation/virt/kvm/msr.rst       | 38 +++++++++++---
 arch/x86/include/asm/kvm_host.h      |  5 +-
 arch/x86/include/uapi/asm/kvm_para.h |  6 +++
 arch/x86/kvm/x86.c                   | 77 ++++++++++++++++++++++++++--
 4 files changed, 113 insertions(+), 13 deletions(-)

diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
index 33892036672d..7433e55f7184 100644
--- a/Documentation/virt/kvm/msr.rst
+++ b/Documentation/virt/kvm/msr.rst
@@ -203,14 +203,21 @@ data:
 	the hypervisor at the time of asynchronous page fault (APF)
 	injection to indicate type of asynchronous page fault. Value
 	of 1 means that the page referred to by the page fault is not
-	present. Value 2 means that the page is now available. Disabling
-	interrupt inhibits APFs. Guest must not enable interrupt
-	before the reason is read, or it may be overwritten by another
-	APF. Since APF uses the same exception vector as regular page
-	fault guest must reset the reason to 0 before it does
-	something that can generate normal page fault.  If during page
-	fault APF reason is 0 it means that this is regular page
-	fault.
+	present. Value 2 means that the page is now available.
+
+	Type 1 page (page missing) events are currently always delivered as
+	synthetic #PF exception. Type 2 (page ready) are either delivered
+	by #PF exception (when bit 3 of MSR_KVM_ASYNC_PF_EN is clear) or
+	via an APIC interrupt (when bit 3 set). APIC interrupt delivery is
+	controlled by MSR_KVM_ASYNC_PF2.
+
+	For #PF delivery, disabling interrupt inhibits APFs. Guest must
+	not enable interrupt before the reason is read, or it may be
+	overwritten by another APF. Since APF uses the same exception
+	vector as regular page fault guest must reset the reason to 0
+	before it does something that can generate normal page fault.
+	If during pagefault APF reason is 0 it means that this is regular
+	page fault.
 
 	During delivery of type 1 APF cr2 contains a token that will
 	be used to notify a guest when missing page becomes
@@ -319,3 +326,18 @@ data:
 
 	KVM guests can request the host not to poll on HLT, for example if
 	they are performing polling themselves.
+
+MSR_KVM_ASYNC_PF2:
+	0x4b564d06
+
+data:
+	Second asynchronous page fault control MSR.
+
+	Bits 0-7: APIC vector for interrupt based delivery of type 2 APF
+	events (page ready notification).
+        Bit 8: Interrupt based delivery of type 2 APF events is enabled
+        Bits 9-63: Reserved
+
+	To switch to interrupt based delivery of type 2 APF events guests
+	are supposed to enable asynchronous page faults and set bit 3 in
+	MSR_KVM_ASYNC_PF_EN first.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 42a2d0d3984a..6215f61450cb 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -763,12 +763,15 @@ struct kvm_vcpu_arch {
 		bool halted;
 		gfn_t gfns[roundup_pow_of_two(ASYNC_PF_PER_VCPU)];
 		struct gfn_to_hva_cache data;
-		u64 msr_val;
+		u64 msr_val; /* MSR_KVM_ASYNC_PF_EN */
+		u64 msr2_val; /* MSR_KVM_ASYNC_PF2 */
+		u16 vec;
 		u32 id;
 		bool send_user_only;
 		u32 host_apf_reason;
 		unsigned long nested_apf_token;
 		bool delivery_as_pf_vmexit;
+		bool delivery_as_int;
 	} apf;
 
 	/* OSVW MSRs (AMD only) */
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index df2ba34037a2..1bbb0b7e062f 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -50,6 +50,7 @@
 #define MSR_KVM_STEAL_TIME  0x4b564d03
 #define MSR_KVM_PV_EOI_EN      0x4b564d04
 #define MSR_KVM_POLL_CONTROL	0x4b564d05
+#define MSR_KVM_ASYNC_PF2	0x4b564d06
 
 struct kvm_steal_time {
 	__u64 steal;
@@ -81,6 +82,11 @@ struct kvm_clock_pairing {
 #define KVM_ASYNC_PF_ENABLED			(1 << 0)
 #define KVM_ASYNC_PF_SEND_ALWAYS		(1 << 1)
 #define KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT	(1 << 2)
+#define KVM_ASYNC_PF_DELIVERY_AS_INT		(1 << 3)
+
+#define KVM_ASYNC_PF2_VEC_MASK			GENMASK(7, 0)
+#define KVM_ASYNC_PF2_ENABLED			BIT(8)
+
 
 /* Operations for KVM_HC_MMU_OP */
 #define KVM_MMU_OP_WRITE_PTE            1
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7c21c0cf0a33..861dce1e7cf5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1243,7 +1243,7 @@ static const u32 emulated_msrs_all[] = {
 	HV_X64_MSR_TSC_EMULATION_STATUS,
 
 	MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
-	MSR_KVM_PV_EOI_EN,
+	MSR_KVM_PV_EOI_EN, MSR_KVM_ASYNC_PF2,
 
 	MSR_IA32_TSC_ADJUST,
 	MSR_IA32_TSCDEADLINE,
@@ -2649,8 +2649,8 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
 {
 	gpa_t gpa = data & ~0x3f;
 
-	/* Bits 3:5 are reserved, Should be zero */
-	if (data & 0x38)
+	/* Bits 4:5 are reserved, Should be zero */
+	if (data & 0x30)
 		return 1;
 
 	vcpu->arch.apf.msr_val = data;
@@ -2667,7 +2667,35 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
 
 	vcpu->arch.apf.send_user_only = !(data & KVM_ASYNC_PF_SEND_ALWAYS);
 	vcpu->arch.apf.delivery_as_pf_vmexit = data & KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT;
-	kvm_async_pf_wakeup_all(vcpu);
+	vcpu->arch.apf.delivery_as_int = data & KVM_ASYNC_PF_DELIVERY_AS_INT;
+
+	/*
+	 * If delivery via interrupt is configured make sure MSR_KVM_ASYNC_PF2
+	 * was written to before sending 'wakeup all'.
+	 */
+	if (!vcpu->arch.apf.delivery_as_int ||
+	    vcpu->arch.apf.msr2_val & KVM_ASYNC_PF2_ENABLED)
+		kvm_async_pf_wakeup_all(vcpu);
+
+	return 0;
+}
+
+static int kvm_pv_enable_async_pf2(struct kvm_vcpu *vcpu, u64 data)
+{
+	/* Bits 9-63 are reserved */
+	if (data & ~0x1ff)
+		return 1;
+
+	if (!lapic_in_kernel(vcpu))
+		return 1;
+
+	vcpu->arch.apf.msr2_val = data;
+
+	vcpu->arch.apf.vec = data & KVM_ASYNC_PF2_VEC_MASK;
+
+	if (data & KVM_ASYNC_PF2_ENABLED)
+		kvm_async_pf_wakeup_all(vcpu);
+
 	return 0;
 }
 
@@ -2883,6 +2911,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		if (kvm_pv_enable_async_pf(vcpu, data))
 			return 1;
 		break;
+	case MSR_KVM_ASYNC_PF2:
+		if (kvm_pv_enable_async_pf2(vcpu, data))
+			return 1;
+		break;
 	case MSR_KVM_STEAL_TIME:
 
 		if (unlikely(!sched_info_on()))
@@ -3159,6 +3191,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_KVM_ASYNC_PF_EN:
 		msr_info->data = vcpu->arch.apf.msr_val;
 		break;
+	case MSR_KVM_ASYNC_PF2:
+		msr_info->data = vcpu->arch.apf.msr2_val;
+		break;
 	case MSR_KVM_STEAL_TIME:
 		msr_info->data = vcpu->arch.st.msr_val;
 		break;
@@ -10367,6 +10402,16 @@ static int apf_get_user(struct kvm_vcpu *vcpu, u32 *val)
 				      sizeof(u32));
 }
 
+static bool apf_slot_free(struct kvm_vcpu *vcpu)
+{
+	u32 val;
+
+	if (apf_get_user(vcpu, &val))
+		return false;
+
+	return !val;
+}
+
 static bool kvm_can_deliver_async_pf(struct kvm_vcpu *vcpu)
 {
 	if (!vcpu->arch.apf.delivery_as_pf_vmexit && is_guest_mode(vcpu))
@@ -10382,11 +10427,23 @@ static bool kvm_can_deliver_async_pf(struct kvm_vcpu *vcpu)
 
 bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu)
 {
+	/*
+	 * TODO: when we are injecting a 'page present' event with an interrupt
+	 * we may ignore pending exceptions.
+	 */
 	if (unlikely(!lapic_in_kernel(vcpu) ||
 		     kvm_event_needs_reinjection(vcpu) ||
 		     vcpu->arch.exception.pending))
 		return false;
 
+	/*'
+	 * Regardless of the type of event we're trying to deliver, we need to
+	 * check that the previous even was already consumed, this may not be
+	 * the case with interrupt based delivery.
+	 */
+	if (vcpu->arch.apf.delivery_as_int && !apf_slot_free(vcpu))
+		return false;
+
 	if (kvm_hlt_in_guest(vcpu->kvm) && !kvm_can_deliver_async_pf(vcpu))
 		return false;
 
@@ -10441,6 +10498,8 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 
 	if (vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED &&
 	    !apf_put_user(vcpu, KVM_PV_REASON_PAGE_READY, work->arch.token)) {
+		if (!vcpu->arch.apf.delivery_as_int) {
+			/* Page ready delivery via #PF */
 			fault.vector = PF_VECTOR;
 			fault.error_code_valid = true;
 			fault.error_code = 0;
@@ -10448,6 +10507,16 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 			fault.address = work->arch.token;
 			fault.async_page_fault = true;
 			kvm_inject_page_fault(vcpu, &fault);
+		} else if (vcpu->arch.apf.msr2_val & KVM_ASYNC_PF2_ENABLED) {
+			/* Page ready delivery via interrupt */
+			struct kvm_lapic_irq irq = {
+				.delivery_mode = APIC_DM_FIXED,
+				.vector = vcpu->arch.apf.vec
+			};
+
+			/* Assuming LAPIC is enabled */
+			kvm_apic_set_irq(vcpu, &irq, NULL);
+		}
 	}
 	vcpu->arch.apf.halted = false;
 	vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
-- 
2.25.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH RFC 4/6] KVM: x86: acknowledgment mechanism for async pf page ready notifications
  2020-04-29  9:36 [PATCH RFC 0/6] KVM: x86: Interrupt-based mechanism for async_pf 'page present' notifications Vitaly Kuznetsov
                   ` (2 preceding siblings ...)
  2020-04-29  9:36 ` [PATCH RFC 3/6] KVM: x86: interrupt based APF page-ready event delivery Vitaly Kuznetsov
@ 2020-04-29  9:36 ` Vitaly Kuznetsov
  2020-04-29 17:28   ` Andy Lutomirski
                     ` (2 more replies)
  2020-04-29  9:36 ` [PATCH RFC 5/6] KVM: x86: announce KVM_FEATURE_ASYNC_PF_INT Vitaly Kuznetsov
  2020-04-29  9:36 ` [PATCH RFC 6/6] KVM: x86: Switch KVM guest to using interrupts for page ready APF delivery Vitaly Kuznetsov
  5 siblings, 3 replies; 41+ messages in thread
From: Vitaly Kuznetsov @ 2020-04-29  9:36 UTC (permalink / raw)
  To: x86, kvm
  Cc: linux-kernel, Paolo Bonzini, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Sean Christopherson, Wanpeng Li, Jim Mattson

If two page ready notifications happen back to back the second one is not
delivered and the only mechanism we currently have is
kvm_check_async_pf_completion() check in vcpu_run() loop. The check will
only be performed with the next vmexit when it happens and in some cases
it may take a while. With interrupt based page ready notification delivery
the situation is even worse: unlike exceptions, interrupts are not handled
immediately so we must check if the slot is empty. This is slow and
unnecessary. Introduce dedicated MSR_KVM_ASYNC_PF_ACK MSR to communicate
the fact that the slot is free and host should check its notification
queue. Mandate using it for interrupt based type 2 APF event delivery.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 Documentation/virt/kvm/msr.rst       | 16 +++++++++++++++-
 arch/x86/include/uapi/asm/kvm_para.h |  1 +
 arch/x86/kvm/x86.c                   |  9 ++++++++-
 3 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
index 7433e55f7184..18db3448db06 100644
--- a/Documentation/virt/kvm/msr.rst
+++ b/Documentation/virt/kvm/msr.rst
@@ -219,6 +219,11 @@ data:
 	If during pagefault APF reason is 0 it means that this is regular
 	page fault.
 
+	For interrupt based delivery, guest has to write '1' to
+	MSR_KVM_ASYNC_PF_ACK every time it clears reason in the shared
+	'struct kvm_vcpu_pv_apf_data', this forces KVM to re-scan its
+	queue and deliver next pending notification.
+
 	During delivery of type 1 APF cr2 contains a token that will
 	be used to notify a guest when missing page becomes
 	available. When page becomes available type 2 APF is sent with
@@ -340,4 +345,13 @@ data:
 
 	To switch to interrupt based delivery of type 2 APF events guests
 	are supposed to enable asynchronous page faults and set bit 3 in
-	MSR_KVM_ASYNC_PF_EN first.
+
+MSR_KVM_ASYNC_PF_ACK:
+	0x4b564d07
+
+data:
+	Asynchronous page fault acknowledgment. When the guest is done
+	processing type 2 APF event and 'reason' field in 'struct
+	kvm_vcpu_pv_apf_data' is cleared it is supposed to write '1' to
+	Bit 0 of the MSR, this caused the host to re-scan its queue and
+	check if there are more notifications pending.
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 1bbb0b7e062f..5c7449980619 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -51,6 +51,7 @@
 #define MSR_KVM_PV_EOI_EN      0x4b564d04
 #define MSR_KVM_POLL_CONTROL	0x4b564d05
 #define MSR_KVM_ASYNC_PF2	0x4b564d06
+#define MSR_KVM_ASYNC_PF_ACK	0x4b564d07
 
 struct kvm_steal_time {
 	__u64 steal;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 861dce1e7cf5..e3b91ac33bfd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1243,7 +1243,7 @@ static const u32 emulated_msrs_all[] = {
 	HV_X64_MSR_TSC_EMULATION_STATUS,
 
 	MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
-	MSR_KVM_PV_EOI_EN, MSR_KVM_ASYNC_PF2,
+	MSR_KVM_PV_EOI_EN, MSR_KVM_ASYNC_PF2, MSR_KVM_ASYNC_PF_ACK,
 
 	MSR_IA32_TSC_ADJUST,
 	MSR_IA32_TSCDEADLINE,
@@ -2915,6 +2915,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		if (kvm_pv_enable_async_pf2(vcpu, data))
 			return 1;
 		break;
+	case MSR_KVM_ASYNC_PF_ACK:
+		if (data & 0x1)
+			kvm_check_async_pf_completion(vcpu);
+		break;
 	case MSR_KVM_STEAL_TIME:
 
 		if (unlikely(!sched_info_on()))
@@ -3194,6 +3198,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_KVM_ASYNC_PF2:
 		msr_info->data = vcpu->arch.apf.msr2_val;
 		break;
+	case MSR_KVM_ASYNC_PF_ACK:
+		msr_info->data = 0;
+		break;
 	case MSR_KVM_STEAL_TIME:
 		msr_info->data = vcpu->arch.st.msr_val;
 		break;
-- 
2.25.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH RFC 5/6] KVM: x86: announce KVM_FEATURE_ASYNC_PF_INT
  2020-04-29  9:36 [PATCH RFC 0/6] KVM: x86: Interrupt-based mechanism for async_pf 'page present' notifications Vitaly Kuznetsov
                   ` (3 preceding siblings ...)
  2020-04-29  9:36 ` [PATCH RFC 4/6] KVM: x86: acknowledgment mechanism for async pf page ready notifications Vitaly Kuznetsov
@ 2020-04-29  9:36 ` Vitaly Kuznetsov
  2020-04-29 10:40   ` Peter Zijlstra
  2020-04-29  9:36 ` [PATCH RFC 6/6] KVM: x86: Switch KVM guest to using interrupts for page ready APF delivery Vitaly Kuznetsov
  5 siblings, 1 reply; 41+ messages in thread
From: Vitaly Kuznetsov @ 2020-04-29  9:36 UTC (permalink / raw)
  To: x86, kvm
  Cc: linux-kernel, Paolo Bonzini, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Sean Christopherson, Wanpeng Li, Jim Mattson

Introduce new capability to indicate that KVM supports interrupt based
delivery of type 2 APF events (page ready notifications). This includes
support for both MSR_KVM_ASYNC_PF2 and MSR_KVM_ASYNC_PF_ACK.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 Documentation/virt/kvm/cpuid.rst     | 6 ++++++
 arch/x86/include/uapi/asm/kvm_para.h | 1 +
 arch/x86/kvm/cpuid.c                 | 3 ++-
 arch/x86/kvm/x86.c                   | 1 +
 include/uapi/linux/kvm.h             | 1 +
 5 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
index 01b081f6e7ea..5383d68e3217 100644
--- a/Documentation/virt/kvm/cpuid.rst
+++ b/Documentation/virt/kvm/cpuid.rst
@@ -86,6 +86,12 @@ KVM_FEATURE_PV_SCHED_YIELD        13          guest checks this feature bit
                                               before using paravirtualized
                                               sched yield.
 
+KVM_FEATURE_PV_SCHED_YIELD        14          guest checks this feature bit
+                                              before using the second async
+                                              pf control msr 0x4b564d06 and
+                                              async pf acknowledgment msr
+                                              0x4b564d07.
+
 KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24          host will warn if no guest-side
                                               per-cpu warps are expeced in
                                               kvmclock
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 5c7449980619..b4560f60fb05 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -31,6 +31,7 @@
 #define KVM_FEATURE_PV_SEND_IPI	11
 #define KVM_FEATURE_POLL_CONTROL	12
 #define KVM_FEATURE_PV_SCHED_YIELD	13
+#define KVM_FEATURE_ASYNC_PF_INT	14
 
 #define KVM_HINTS_REALTIME      0
 
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 901cd1fdecd9..790fe4988001 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -712,7 +712,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 			     (1 << KVM_FEATURE_ASYNC_PF_VMEXIT) |
 			     (1 << KVM_FEATURE_PV_SEND_IPI) |
 			     (1 << KVM_FEATURE_POLL_CONTROL) |
-			     (1 << KVM_FEATURE_PV_SCHED_YIELD);
+			     (1 << KVM_FEATURE_PV_SCHED_YIELD) |
+			     (1 << KVM_FEATURE_ASYNC_PF_INT);
 
 		if (sched_info_on())
 			entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e3b91ac33bfd..b1ee01fdf671 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3413,6 +3413,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_X86_ROBUST_SINGLESTEP:
 	case KVM_CAP_XSAVE:
 	case KVM_CAP_ASYNC_PF:
+	case KVM_CAP_ASYNC_PF_INT:
 	case KVM_CAP_GET_TSC_KHZ:
 	case KVM_CAP_KVMCLOCK_CTRL:
 	case KVM_CAP_READONLY_MEM:
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 428c7dde6b4b..15012f78a691 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1017,6 +1017,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_S390_VCPU_RESETS 179
 #define KVM_CAP_S390_PROTECTED 180
 #define KVM_CAP_PPC_SECURE_GUEST 181
+#define KVM_CAP_ASYNC_PF_INT 182
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.25.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH RFC 6/6] KVM: x86: Switch KVM guest to using interrupts for page ready APF delivery
  2020-04-29  9:36 [PATCH RFC 0/6] KVM: x86: Interrupt-based mechanism for async_pf 'page present' notifications Vitaly Kuznetsov
                   ` (4 preceding siblings ...)
  2020-04-29  9:36 ` [PATCH RFC 5/6] KVM: x86: announce KVM_FEATURE_ASYNC_PF_INT Vitaly Kuznetsov
@ 2020-04-29  9:36 ` Vitaly Kuznetsov
  2020-04-29 10:53   ` Paolo Bonzini
  2020-05-05  0:42   ` Gavin Shan
  5 siblings, 2 replies; 41+ messages in thread
From: Vitaly Kuznetsov @ 2020-04-29  9:36 UTC (permalink / raw)
  To: x86, kvm
  Cc: linux-kernel, Paolo Bonzini, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Sean Christopherson, Wanpeng Li, Jim Mattson

KVM now supports using interrupt for type 2 APF event delivery (page ready
notifications). Switch KVM guests to using it when the feature is present.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/entry/entry_32.S          |  5 ++++
 arch/x86/entry/entry_64.S          |  5 ++++
 arch/x86/include/asm/hardirq.h     |  3 +++
 arch/x86/include/asm/irq_vectors.h |  6 ++++-
 arch/x86/include/asm/kvm_para.h    |  6 +++++
 arch/x86/kernel/irq.c              |  9 +++++++
 arch/x86/kernel/kvm.c              | 42 ++++++++++++++++++++++++++++++
 7 files changed, 75 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index b67bae7091d7..d574dadcb2a1 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -1475,6 +1475,11 @@ BUILD_INTERRUPT3(hv_stimer0_callback_vector, HYPERV_STIMER0_VECTOR,
 
 #endif /* CONFIG_HYPERV */
 
+#ifdef CONFIG_KVM_GUEST
+BUILD_INTERRUPT3(kvm_async_pf_vector, KVM_ASYNC_PF_VECTOR,
+		 kvm_async_pf_intr)
+#endif
+
 SYM_CODE_START(page_fault)
 	ASM_CLAC
 	pushl	$do_page_fault
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 0e9504fabe52..6f127c1a6547 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1190,6 +1190,11 @@ apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \
 	acrn_hv_callback_vector acrn_hv_vector_handler
 #endif
 
+#ifdef CONFIG_KVM_GUEST
+apicinterrupt3 KVM_ASYNC_PF_VECTOR \
+	kvm_async_pf_vector kvm_async_pf_intr
+#endif
+
 idtentry debug			do_debug		has_error_code=0	paranoid=1 shift_ist=IST_INDEX_DB ist_offset=DB_STACK_OFFSET
 idtentry int3			do_int3			has_error_code=0	create_gap=1
 idtentry stack_segment		do_stack_segment	has_error_code=1
diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h
index 07533795b8d2..be0fbb15ad7f 100644
--- a/arch/x86/include/asm/hardirq.h
+++ b/arch/x86/include/asm/hardirq.h
@@ -44,6 +44,9 @@ typedef struct {
 	unsigned int irq_hv_reenlightenment_count;
 	unsigned int hyperv_stimer0_count;
 #endif
+#ifdef CONFIG_KVM_GUEST
+	unsigned int kvm_async_pf_pageready_count;
+#endif
 } ____cacheline_aligned irq_cpustat_t;
 
 DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index 889f8b1b5b7f..8879a9ecd908 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -104,7 +104,11 @@
 #define HYPERV_STIMER0_VECTOR		0xed
 #endif
 
-#define LOCAL_TIMER_VECTOR		0xec
+#ifdef CONFIG_KVM_GUEST
+#define KVM_ASYNC_PF_VECTOR		0xec
+#endif
+
+#define LOCAL_TIMER_VECTOR		0xeb
 
 #define NR_VECTORS			 256
 
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 9b4df6eaa11a..fde4f21607f9 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -4,6 +4,7 @@
 
 #include <asm/processor.h>
 #include <asm/alternative.h>
+#include <linux/interrupt.h>
 #include <uapi/asm/kvm_para.h>
 
 extern void kvmclock_init(void);
@@ -93,6 +94,11 @@ void kvm_async_pf_task_wake(u32 token);
 u32 kvm_read_and_reset_pf_reason(void);
 extern void kvm_disable_steal_time(void);
 void do_async_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned long address);
+extern void kvm_async_pf_vector(void);
+#ifdef CONFIG_TRACING
+#define trace_kvm_async_pf_vector kvm_async_pf_vector
+#endif
+__visible void __irq_entry kvm_async_pf_intr(struct pt_regs *regs);
 
 #ifdef CONFIG_PARAVIRT_SPINLOCKS
 void __init kvm_spinlock_init(void);
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index c7965ff429c5..a4c2f25ad74d 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -159,6 +159,15 @@ int arch_show_interrupts(struct seq_file *p, int prec)
 				   irq_stats(j)->hyperv_stimer0_count);
 		seq_puts(p, "  Hyper-V stimer0 interrupts\n");
 	}
+#endif
+#ifdef CONFIG_KVM_GUEST
+	if (test_bit(KVM_ASYNC_PF_VECTOR, system_vectors)) {
+		seq_printf(p, "%*s: ", prec, "APF");
+		for_each_online_cpu(j)
+			seq_printf(p, "%10u ",
+				   irq_stats(j)->kvm_async_pf_pageready_count);
+		seq_puts(p, "  KVM async PF page ready interrupts\n");
+	}
 #endif
 	seq_printf(p, "%*s: %10u\n", prec, "ERR", atomic_read(&irq_err_count));
 #if defined(CONFIG_X86_IO_APIC)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 6efe0410fb72..1c00c7ba01ff 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -259,9 +259,39 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned lon
 		rcu_irq_exit();
 		break;
 	}
+
+	if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_INT))
+		wrmsrl(MSR_KVM_ASYNC_PF_ACK, 1);
 }
 NOKPROBE_SYMBOL(do_async_page_fault);
 
+__visible void __irq_entry kvm_async_pf_intr(struct pt_regs *regs)
+{
+	u32 token, reason;
+
+	entering_ack_irq();
+
+	inc_irq_stat(kvm_async_pf_pageready_count);
+
+	if (__this_cpu_read(apf_reason.enabled)) {
+		reason = __this_cpu_read(apf_reason.reason);
+		if (reason == KVM_PV_REASON_PAGE_READY) {
+			token = __this_cpu_read(apf_reason.token);
+			/*
+			 * Make sure we read 'token' before we reset
+			 * 'reason' or it can get lost.
+			 */
+			mb();
+			__this_cpu_write(apf_reason.reason, 0);
+			kvm_async_pf_task_wake(token);
+		}
+	}
+
+	wrmsrl(MSR_KVM_ASYNC_PF_ACK, 1);
+
+	exiting_irq();
+}
+
 static void __init paravirt_ops_setup(void)
 {
 	pv_info.name = "KVM";
@@ -316,10 +346,17 @@ static void kvm_guest_cpu_init(void)
 		if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_VMEXIT))
 			pa |= KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT;
 
+		if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_INT))
+			pa |= KVM_ASYNC_PF_DELIVERY_AS_INT;
+
 		wrmsrl(MSR_KVM_ASYNC_PF_EN, pa);
 		__this_cpu_write(apf_reason.enabled, 1);
 		printk(KERN_INFO"KVM setup async PF for cpu %d\n",
 		       smp_processor_id());
+
+		if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_INT))
+			wrmsrl(MSR_KVM_ASYNC_PF2, KVM_ASYNC_PF2_ENABLED |
+			       KVM_ASYNC_PF_VECTOR);
 	}
 
 	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) {
@@ -649,6 +686,11 @@ static void __init kvm_guest_init(void)
 	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
 		apic_set_eoi_write(kvm_guest_apic_eoi_write);
 
+	if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_INT)) {
+		pr_info("KVM using interrupt for async PF page-ready\n");
+		alloc_intr_gate(KVM_ASYNC_PF_VECTOR, kvm_async_pf_vector);
+	}
+
 #ifdef CONFIG_SMP
 	smp_ops.smp_prepare_cpus = kvm_smp_prepare_cpus;
 	smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
-- 
2.25.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH RFC 5/6] KVM: x86: announce KVM_FEATURE_ASYNC_PF_INT
  2020-04-29  9:36 ` [PATCH RFC 5/6] KVM: x86: announce KVM_FEATURE_ASYNC_PF_INT Vitaly Kuznetsov
@ 2020-04-29 10:40   ` Peter Zijlstra
  0 siblings, 0 replies; 41+ messages in thread
From: Peter Zijlstra @ 2020-04-29 10:40 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: x86, kvm, linux-kernel, Paolo Bonzini, Andy Lutomirski,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Sean Christopherson, Wanpeng Li, Jim Mattson

On Wed, Apr 29, 2020 at 11:36:33AM +0200, Vitaly Kuznetsov wrote:
> Introduce new capability to indicate that KVM supports interrupt based
> delivery of type 2 APF events (page ready notifications). This includes
> support for both MSR_KVM_ASYNC_PF2 and MSR_KVM_ASYNC_PF_ACK.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  Documentation/virt/kvm/cpuid.rst     | 6 ++++++
>  arch/x86/include/uapi/asm/kvm_para.h | 1 +
>  arch/x86/kvm/cpuid.c                 | 3 ++-
>  arch/x86/kvm/x86.c                   | 1 +
>  include/uapi/linux/kvm.h             | 1 +
>  5 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
> index 01b081f6e7ea..5383d68e3217 100644
> --- a/Documentation/virt/kvm/cpuid.rst
> +++ b/Documentation/virt/kvm/cpuid.rst
> @@ -86,6 +86,12 @@ KVM_FEATURE_PV_SCHED_YIELD        13          guest checks this feature bit
>                                                before using paravirtualized
>                                                sched yield.
>  
> +KVM_FEATURE_PV_SCHED_YIELD        14          guest checks this feature bit

Copy/paste fail

> +                                              before using the second async
> +                                              pf control msr 0x4b564d06 and
> +                                              async pf acknowledgment msr
> +                                              0x4b564d07.
> +

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH RFC 6/6] KVM: x86: Switch KVM guest to using interrupts for page ready APF delivery
  2020-04-29  9:36 ` [PATCH RFC 6/6] KVM: x86: Switch KVM guest to using interrupts for page ready APF delivery Vitaly Kuznetsov
@ 2020-04-29 10:53   ` Paolo Bonzini
  2020-04-29 12:44     ` Vitaly Kuznetsov
  2020-05-05 18:59     ` Vivek Goyal
  2020-05-05  0:42   ` Gavin Shan
  1 sibling, 2 replies; 41+ messages in thread
From: Paolo Bonzini @ 2020-04-29 10:53 UTC (permalink / raw)
  To: Vitaly Kuznetsov, x86, kvm
  Cc: linux-kernel, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Sean Christopherson, Wanpeng Li,
	Jim Mattson

On 29/04/20 11:36, Vitaly Kuznetsov wrote:
> +
> +	if (__this_cpu_read(apf_reason.enabled)) {
> +		reason = __this_cpu_read(apf_reason.reason);
> +		if (reason == KVM_PV_REASON_PAGE_READY) {
> +			token = __this_cpu_read(apf_reason.token);
> +			/*
> +			 * Make sure we read 'token' before we reset
> +			 * 'reason' or it can get lost.
> +			 */
> +			mb();
> +			__this_cpu_write(apf_reason.reason, 0);
> +			kvm_async_pf_task_wake(token);
> +		}

If tokens cannot be zero, could we avoid using reason for the page ready
interrupt (and ultimately retire "reason" completely)?

Paolo


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH RFC 3/6] KVM: x86: interrupt based APF page-ready event delivery
  2020-04-29  9:36 ` [PATCH RFC 3/6] KVM: x86: interrupt based APF page-ready event delivery Vitaly Kuznetsov
@ 2020-04-29 10:54   ` Paolo Bonzini
  2020-04-29 12:40     ` Vitaly Kuznetsov
  2020-04-29 21:27   ` Peter Xu
  2020-05-05 15:22   ` Vivek Goyal
  2 siblings, 1 reply; 41+ messages in thread
From: Paolo Bonzini @ 2020-04-29 10:54 UTC (permalink / raw)
  To: Vitaly Kuznetsov, x86, kvm
  Cc: linux-kernel, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Sean Christopherson, Wanpeng Li,
	Jim Mattson

On 29/04/20 11:36, Vitaly Kuznetsov wrote:
> +
> +	Type 1 page (page missing) events are currently always delivered as
> +	synthetic #PF exception. Type 2 (page ready) are either delivered
> +	by #PF exception (when bit 3 of MSR_KVM_ASYNC_PF_EN is clear) or
> +	via an APIC interrupt (when bit 3 set). APIC interrupt delivery is
> +	controlled by MSR_KVM_ASYNC_PF2.

I think we should (in the non-RFC version) block async page faults
completely and only keep APF_HALT unless the guest is using page ready
interrupt delivery.

Paolo


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH RFC 3/6] KVM: x86: interrupt based APF page-ready event delivery
  2020-04-29 10:54   ` Paolo Bonzini
@ 2020-04-29 12:40     ` Vitaly Kuznetsov
  2020-04-29 13:21       ` Paolo Bonzini
  0 siblings, 1 reply; 41+ messages in thread
From: Vitaly Kuznetsov @ 2020-04-29 12:40 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Sean Christopherson, Wanpeng Li,
	Jim Mattson, x86, kvm

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 29/04/20 11:36, Vitaly Kuznetsov wrote:
>> +
>> +	Type 1 page (page missing) events are currently always delivered as
>> +	synthetic #PF exception. Type 2 (page ready) are either delivered
>> +	by #PF exception (when bit 3 of MSR_KVM_ASYNC_PF_EN is clear) or
>> +	via an APIC interrupt (when bit 3 set). APIC interrupt delivery is
>> +	controlled by MSR_KVM_ASYNC_PF2.
>
> I think we should (in the non-RFC version) block async page faults
> completely and only keep APF_HALT unless the guest is using page ready
> interrupt delivery.

Sure, we can do that. This is, however, a significant behavioral change:
APF_HALT frees the host, not the guest, so even if the combined
performance of all guests on the same pCPU remain the same guests with
e.g. a lot of simultaneously running processes may suffer more.

In theory, we can keep two mechanisms side by side for as long as we
want but if the end goal is to have '#PF abuse eliminated' than we'll
have to get rid of the legacy one some day. The day when the new
mechanism lands is also a good choice :-)

-- 
Vitaly

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH RFC 6/6] KVM: x86: Switch KVM guest to using interrupts for page ready APF delivery
  2020-04-29 10:53   ` Paolo Bonzini
@ 2020-04-29 12:44     ` Vitaly Kuznetsov
  2020-04-29 13:19       ` Paolo Bonzini
  2020-05-05 18:59     ` Vivek Goyal
  1 sibling, 1 reply; 41+ messages in thread
From: Vitaly Kuznetsov @ 2020-04-29 12:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Sean Christopherson, Wanpeng Li,
	Jim Mattson, x86, kvm

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 29/04/20 11:36, Vitaly Kuznetsov wrote:
>> +
>> +	if (__this_cpu_read(apf_reason.enabled)) {
>> +		reason = __this_cpu_read(apf_reason.reason);
>> +		if (reason == KVM_PV_REASON_PAGE_READY) {
>> +			token = __this_cpu_read(apf_reason.token);
>> +			/*
>> +			 * Make sure we read 'token' before we reset
>> +			 * 'reason' or it can get lost.
>> +			 */
>> +			mb();
>> +			__this_cpu_write(apf_reason.reason, 0);
>> +			kvm_async_pf_task_wake(token);
>> +		}
>
> If tokens cannot be zero, could we avoid using reason for the page ready
> interrupt (and ultimately retire "reason" completely)?

Yes, we can switch to using 'token' exclusively but personally I'm not
sure it is worth it. We'll still have to have a hole and reason + token
is only u64. Keeping 'reason' in place allows us to easily come up with
any other type of notification through this mecanism (if the reson is
... then 'token' means ...).

-- 
Vitaly


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH RFC 6/6] KVM: x86: Switch KVM guest to using interrupts for page ready APF delivery
  2020-04-29 12:44     ` Vitaly Kuznetsov
@ 2020-04-29 13:19       ` Paolo Bonzini
  2020-04-29 14:34         ` Vitaly Kuznetsov
  0 siblings, 1 reply; 41+ messages in thread
From: Paolo Bonzini @ 2020-04-29 13:19 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: linux-kernel, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Sean Christopherson, Wanpeng Li,
	Jim Mattson, x86, kvm

On 29/04/20 14:44, Vitaly Kuznetsov wrote:
>>> +			token = __this_cpu_read(apf_reason.token);
>>> +			/*
>>> +			 * Make sure we read 'token' before we reset
>>> +			 * 'reason' or it can get lost.
>>> +			 */
>>> +			mb();
>>> +			__this_cpu_write(apf_reason.reason, 0);
>>> +			kvm_async_pf_task_wake(token);
>>> +		}
>> If tokens cannot be zero, could we avoid using reason for the page ready
>> interrupt (and ultimately retire "reason" completely)?
> Yes, we can switch to using 'token' exclusively but personally I'm not
> sure it is worth it. We'll still have to have a hole and reason + token
> is only u64. Keeping 'reason' in place allows us to easily come up with
> any other type of notification through this mecanism (if the reson is
> ... then 'token' means ...).

If we need a "reason" field I'd rather make it separate from the page
not ready reason, because as we differentiate the delivery mechanism it
is cleaner to keep them separate.

For example, if the reason is present but separate, the memory barrier
is not necessary anymore, because apf_reason.token cannot be written
before the ack MSR is written.  And with #VE there will be already a
hardware-provided mechanism to avoid reentrancy.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH RFC 3/6] KVM: x86: interrupt based APF page-ready event delivery
  2020-04-29 12:40     ` Vitaly Kuznetsov
@ 2020-04-29 13:21       ` Paolo Bonzini
  0 siblings, 0 replies; 41+ messages in thread
From: Paolo Bonzini @ 2020-04-29 13:21 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: linux-kernel, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Sean Christopherson, Wanpeng Li,
	Jim Mattson, x86, kvm

On 29/04/20 14:40, Vitaly Kuznetsov wrote:
> Paolo Bonzini <pbonzini@redhat.com> writes:
> 
>> On 29/04/20 11:36, Vitaly Kuznetsov wrote:
>>> +
>>> +	Type 1 page (page missing) events are currently always delivered as
>>> +	synthetic #PF exception. Type 2 (page ready) are either delivered
>>> +	by #PF exception (when bit 3 of MSR_KVM_ASYNC_PF_EN is clear) or
>>> +	via an APIC interrupt (when bit 3 set). APIC interrupt delivery is
>>> +	controlled by MSR_KVM_ASYNC_PF2.
>>
>> I think we should (in the non-RFC version) block async page faults
>> completely and only keep APF_HALT unless the guest is using page ready
>> interrupt delivery.
> 
> Sure, we can do that. This is, however, a significant behavioral change:
> APF_HALT frees the host, not the guest, so even if the combined
> performance of all guests on the same pCPU remain the same guests with
> e.g. a lot of simultaneously running processes may suffer more.

Yes, it is a significant change.  However the resulting clean up in the
spec is significant, because we don't have type 2 notifications at all
anymore.

(APF_HALT does free the guest a little bit by allowing interrupt
delivery during a host page fault; in particular it lets the scheduler
tick run, which does improve responsiveness somewhat significantly).

Likewise, I think we should clean up the guest side without prejudice.
Patch 6 should disable async page fault unless page-ready interrupts are
available, and drop the page ready case from the #PF handler.

Thanks,

Paolo

> In theory, we can keep two mechanisms side by side for as long as we
> want but if the end goal is to have '#PF abuse eliminated' than we'll
> have to get rid of the legacy one some day. The day when the new
> mechanism lands is also a good choice :-)



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH RFC 6/6] KVM: x86: Switch KVM guest to using interrupts for page ready APF delivery
  2020-04-29 13:19       ` Paolo Bonzini
@ 2020-04-29 14:34         ` Vitaly Kuznetsov
  0 siblings, 0 replies; 41+ messages in thread
From: Vitaly Kuznetsov @ 2020-04-29 14:34 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: linux-kernel, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Sean Christopherson, Wanpeng Li,
	Jim Mattson, x86, kvm

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 29/04/20 14:44, Vitaly Kuznetsov wrote:
>>>> +			token = __this_cpu_read(apf_reason.token);
>>>> +			/*
>>>> +			 * Make sure we read 'token' before we reset
>>>> +			 * 'reason' or it can get lost.
>>>> +			 */
>>>> +			mb();
>>>> +			__this_cpu_write(apf_reason.reason, 0);
>>>> +			kvm_async_pf_task_wake(token);
>>>> +		}
>>> If tokens cannot be zero, could we avoid using reason for the page ready
>>> interrupt (and ultimately retire "reason" completely)?
>> Yes, we can switch to using 'token' exclusively but personally I'm not
>> sure it is worth it. We'll still have to have a hole and reason + token
>> is only u64. Keeping 'reason' in place allows us to easily come up with
>> any other type of notification through this mecanism (if the reson is
>> ... then 'token' means ...).
>
> If we need a "reason" field I'd rather make it separate from the page
> not ready reason, because as we differentiate the delivery mechanism it
> is cleaner to keep them separate.
>
> For example, if the reason is present but separate, the memory barrier
> is not necessary anymore, because apf_reason.token cannot be written
> before the ack MSR is written.  And with #VE there will be already a
> hardware-provided mechanism to avoid reentrancy.

Ok, makes sense. I'll probably use your original idea and use 'token'
for 'page ready' notification exclusively for now. In case of need we
can always extend 'struct kvm_vcpu_pv_apf_data' with the information we
need so we can avoid adding 'reason2' for now.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH RFC 4/6] KVM: x86: acknowledgment mechanism for async pf page ready notifications
  2020-04-29  9:36 ` [PATCH RFC 4/6] KVM: x86: acknowledgment mechanism for async pf page ready notifications Vitaly Kuznetsov
@ 2020-04-29 17:28   ` Andy Lutomirski
  2020-04-29 17:40     ` Paolo Bonzini
  2020-04-30  6:49   ` Paolo Bonzini
  2020-05-05  0:36   ` Gavin Shan
  2 siblings, 1 reply; 41+ messages in thread
From: Andy Lutomirski @ 2020-04-29 17:28 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: X86 ML, kvm list, LKML, Paolo Bonzini, Andy Lutomirski,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Sean Christopherson, Wanpeng Li, Jim Mattson

On Wed, Apr 29, 2020 at 2:36 AM Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>
> If two page ready notifications happen back to back the second one is not
> delivered and the only mechanism we currently have is
> kvm_check_async_pf_completion() check in vcpu_run() loop. The check will
> only be performed with the next vmexit when it happens and in some cases
> it may take a while. With interrupt based page ready notification delivery
> the situation is even worse: unlike exceptions, interrupts are not handled
> immediately so we must check if the slot is empty. This is slow and
> unnecessary. Introduce dedicated MSR_KVM_ASYNC_PF_ACK MSR to communicate
> the fact that the slot is free and host should check its notification
> queue. Mandate using it for interrupt based type 2 APF event delivery.

This seems functional, but I'm wondering if it could a bit simpler and
more efficient if the data structure was a normal descriptor ring with
the same number slots as whatever the maximum number of waiting pages
is.  Then there would never need to be any notification from the guest
back to the host, since there would always be room for a notification.

It might be even better if a single unified data structure was used
for both notifications.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH RFC 4/6] KVM: x86: acknowledgment mechanism for async pf page ready notifications
  2020-04-29 17:28   ` Andy Lutomirski
@ 2020-04-29 17:40     ` Paolo Bonzini
  2020-04-30  0:45       ` Andy Lutomirski
  0 siblings, 1 reply; 41+ messages in thread
From: Paolo Bonzini @ 2020-04-29 17:40 UTC (permalink / raw)
  To: Andy Lutomirski, Vitaly Kuznetsov
  Cc: X86 ML, kvm list, LKML, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Sean Christopherson, Wanpeng Li,
	Jim Mattson

On 29/04/20 19:28, Andy Lutomirski wrote:
> This seems functional, but I'm wondering if it could a bit simpler and
> more efficient if the data structure was a normal descriptor ring with
> the same number slots as whatever the maximum number of waiting pages
> is.  Then there would never need to be any notification from the guest
> back to the host, since there would always be room for a notification.

No, it would be much more complicated code for a slow path which is
already order of magnitudes slower than a vmexit.  It would also use
much more memory.

> It might be even better if a single unified data structure was used
> for both notifications.

That's a very bad idea since one is synchronous and one is asynchronous.
 Part of the proposal we agreed upon was to keep "page not ready"
synchronous while making "page ready" an interrupt.  The data structure
for "page not ready" will be #VE.

Paolo


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH RFC 3/6] KVM: x86: interrupt based APF page-ready event delivery
  2020-04-29  9:36 ` [PATCH RFC 3/6] KVM: x86: interrupt based APF page-ready event delivery Vitaly Kuznetsov
  2020-04-29 10:54   ` Paolo Bonzini
@ 2020-04-29 21:27   ` Peter Xu
  2020-04-30  8:31     ` Vitaly Kuznetsov
  2020-05-05 15:22   ` Vivek Goyal
  2 siblings, 1 reply; 41+ messages in thread
From: Peter Xu @ 2020-04-29 21:27 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: x86, kvm, linux-kernel, Paolo Bonzini, Andy Lutomirski,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Sean Christopherson, Wanpeng Li, Jim Mattson

Hi, Vitaly,

On Wed, Apr 29, 2020 at 11:36:31AM +0200, Vitaly Kuznetsov wrote:
> +	Type 1 page (page missing) events are currently always delivered as
> +	synthetic #PF exception. Type 2 (page ready) are either delivered
> +	by #PF exception (when bit 3 of MSR_KVM_ASYNC_PF_EN is clear) or
> +	via an APIC interrupt (when bit 3 set). APIC interrupt delivery is
> +	controlled by MSR_KVM_ASYNC_PF2.

How about MSR_KVM_ASYNC_PF_INT instead of MSR_KVM_ASYNC_PF2 (to match
MSR_KVM_ASYNC_EN and MSR_KVM_ASYNC_ACK where they're all MSR_KVM_ASYNC_* with a
meaningful ending word)?

> +
> +	For #PF delivery, disabling interrupt inhibits APFs. Guest must
> +	not enable interrupt before the reason is read, or it may be
> +	overwritten by another APF. Since APF uses the same exception
> +	vector as regular page fault guest must reset the reason to 0
> +	before it does something that can generate normal page fault.
> +	If during pagefault APF reason is 0 it means that this is regular
> +	page fault.
>  
>  	During delivery of type 1 APF cr2 contains a token that will
>  	be used to notify a guest when missing page becomes
> @@ -319,3 +326,18 @@ data:
>  
>  	KVM guests can request the host not to poll on HLT, for example if
>  	they are performing polling themselves.
> +
> +MSR_KVM_ASYNC_PF2:
> +	0x4b564d06
> +
> +data:
> +	Second asynchronous page fault control MSR.
> +
> +	Bits 0-7: APIC vector for interrupt based delivery of type 2 APF
> +	events (page ready notification).
> +        Bit 8: Interrupt based delivery of type 2 APF events is enabled
> +        Bits 9-63: Reserved

(may need to fix up the indents)

> +
> +	To switch to interrupt based delivery of type 2 APF events guests
> +	are supposed to enable asynchronous page faults and set bit 3 in
> +	MSR_KVM_ASYNC_PF_EN first.
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 42a2d0d3984a..6215f61450cb 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -763,12 +763,15 @@ struct kvm_vcpu_arch {
>  		bool halted;
>  		gfn_t gfns[roundup_pow_of_two(ASYNC_PF_PER_VCPU)];
>  		struct gfn_to_hva_cache data;
> -		u64 msr_val;
> +		u64 msr_val; /* MSR_KVM_ASYNC_PF_EN */
> +		u64 msr2_val; /* MSR_KVM_ASYNC_PF2 */
> +		u16 vec;
>  		u32 id;
>  		bool send_user_only;
>  		u32 host_apf_reason;
>  		unsigned long nested_apf_token;
>  		bool delivery_as_pf_vmexit;
> +		bool delivery_as_int;
>  	} apf;
>  
>  	/* OSVW MSRs (AMD only) */
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> index df2ba34037a2..1bbb0b7e062f 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -50,6 +50,7 @@
>  #define MSR_KVM_STEAL_TIME  0x4b564d03
>  #define MSR_KVM_PV_EOI_EN      0x4b564d04
>  #define MSR_KVM_POLL_CONTROL	0x4b564d05
> +#define MSR_KVM_ASYNC_PF2	0x4b564d06
>  
>  struct kvm_steal_time {
>  	__u64 steal;
> @@ -81,6 +82,11 @@ struct kvm_clock_pairing {
>  #define KVM_ASYNC_PF_ENABLED			(1 << 0)
>  #define KVM_ASYNC_PF_SEND_ALWAYS		(1 << 1)
>  #define KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT	(1 << 2)
> +#define KVM_ASYNC_PF_DELIVERY_AS_INT		(1 << 3)
> +
> +#define KVM_ASYNC_PF2_VEC_MASK			GENMASK(7, 0)
> +#define KVM_ASYNC_PF2_ENABLED			BIT(8)

There are two enable bits, this one in ASYNC_PF_EN and the other old one in
ASYNC_PF2.  Could it work with only one knob (e.g., set bit 0 of ASYNC_PF_EN
always to enable apf)?  After all we have had bit 3 of ASYNC_PF_EN to show
whether interrupt is enabled, which seems to be the real switch for whether to
enable interrupt for apf.

If we can keep the only knob in ASYNC_PF_EN (bit 0), iiuc we can also keep the
below kvm_async_pf_wakeup_all() untouched (so we only set bit 0 of ASYNC_PF_EN
after configure everything).

Thanks,

> +
>  
>  /* Operations for KVM_HC_MMU_OP */
>  #define KVM_MMU_OP_WRITE_PTE            1
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 7c21c0cf0a33..861dce1e7cf5 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1243,7 +1243,7 @@ static const u32 emulated_msrs_all[] = {
>  	HV_X64_MSR_TSC_EMULATION_STATUS,
>  
>  	MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
> -	MSR_KVM_PV_EOI_EN,
> +	MSR_KVM_PV_EOI_EN, MSR_KVM_ASYNC_PF2,
>  
>  	MSR_IA32_TSC_ADJUST,
>  	MSR_IA32_TSCDEADLINE,
> @@ -2649,8 +2649,8 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
>  {
>  	gpa_t gpa = data & ~0x3f;
>  
> -	/* Bits 3:5 are reserved, Should be zero */
> -	if (data & 0x38)
> +	/* Bits 4:5 are reserved, Should be zero */
> +	if (data & 0x30)
>  		return 1;
>  
>  	vcpu->arch.apf.msr_val = data;
> @@ -2667,7 +2667,35 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
>  
>  	vcpu->arch.apf.send_user_only = !(data & KVM_ASYNC_PF_SEND_ALWAYS);
>  	vcpu->arch.apf.delivery_as_pf_vmexit = data & KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT;
> -	kvm_async_pf_wakeup_all(vcpu);
> +	vcpu->arch.apf.delivery_as_int = data & KVM_ASYNC_PF_DELIVERY_AS_INT;
> +
> +	/*
> +	 * If delivery via interrupt is configured make sure MSR_KVM_ASYNC_PF2
> +	 * was written to before sending 'wakeup all'.
> +	 */
> +	if (!vcpu->arch.apf.delivery_as_int ||
> +	    vcpu->arch.apf.msr2_val & KVM_ASYNC_PF2_ENABLED)
> +		kvm_async_pf_wakeup_all(vcpu);
> +
> +	return 0;
> +}
> +
> +static int kvm_pv_enable_async_pf2(struct kvm_vcpu *vcpu, u64 data)
> +{
> +	/* Bits 9-63 are reserved */
> +	if (data & ~0x1ff)
> +		return 1;
> +
> +	if (!lapic_in_kernel(vcpu))
> +		return 1;
> +
> +	vcpu->arch.apf.msr2_val = data;
> +
> +	vcpu->arch.apf.vec = data & KVM_ASYNC_PF2_VEC_MASK;
> +
> +	if (data & KVM_ASYNC_PF2_ENABLED)
> +		kvm_async_pf_wakeup_all(vcpu);
> +
>  	return 0;
>  }
>  
> @@ -2883,6 +2911,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		if (kvm_pv_enable_async_pf(vcpu, data))
>  			return 1;
>  		break;
> +	case MSR_KVM_ASYNC_PF2:
> +		if (kvm_pv_enable_async_pf2(vcpu, data))
> +			return 1;
> +		break;
>  	case MSR_KVM_STEAL_TIME:
>  
>  		if (unlikely(!sched_info_on()))
> @@ -3159,6 +3191,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  	case MSR_KVM_ASYNC_PF_EN:
>  		msr_info->data = vcpu->arch.apf.msr_val;
>  		break;
> +	case MSR_KVM_ASYNC_PF2:
> +		msr_info->data = vcpu->arch.apf.msr2_val;
> +		break;
>  	case MSR_KVM_STEAL_TIME:
>  		msr_info->data = vcpu->arch.st.msr_val;
>  		break;
> @@ -10367,6 +10402,16 @@ static int apf_get_user(struct kvm_vcpu *vcpu, u32 *val)
>  				      sizeof(u32));
>  }
>  
> +static bool apf_slot_free(struct kvm_vcpu *vcpu)
> +{
> +	u32 val;
> +
> +	if (apf_get_user(vcpu, &val))
> +		return false;
> +
> +	return !val;
> +}
> +
>  static bool kvm_can_deliver_async_pf(struct kvm_vcpu *vcpu)
>  {
>  	if (!vcpu->arch.apf.delivery_as_pf_vmexit && is_guest_mode(vcpu))
> @@ -10382,11 +10427,23 @@ static bool kvm_can_deliver_async_pf(struct kvm_vcpu *vcpu)
>  
>  bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu)
>  {
> +	/*
> +	 * TODO: when we are injecting a 'page present' event with an interrupt
> +	 * we may ignore pending exceptions.
> +	 */
>  	if (unlikely(!lapic_in_kernel(vcpu) ||
>  		     kvm_event_needs_reinjection(vcpu) ||
>  		     vcpu->arch.exception.pending))
>  		return false;
>  
> +	/*'
> +	 * Regardless of the type of event we're trying to deliver, we need to
> +	 * check that the previous even was already consumed, this may not be
> +	 * the case with interrupt based delivery.
> +	 */
> +	if (vcpu->arch.apf.delivery_as_int && !apf_slot_free(vcpu))
> +		return false;
> +
>  	if (kvm_hlt_in_guest(vcpu->kvm) && !kvm_can_deliver_async_pf(vcpu))
>  		return false;

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH RFC 4/6] KVM: x86: acknowledgment mechanism for async pf page ready notifications
  2020-04-29 17:40     ` Paolo Bonzini
@ 2020-04-30  0:45       ` Andy Lutomirski
  2020-04-30  6:46         ` Paolo Bonzini
  0 siblings, 1 reply; 41+ messages in thread
From: Andy Lutomirski @ 2020-04-30  0:45 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Andy Lutomirski, Vitaly Kuznetsov, X86 ML, kvm list, LKML,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Sean Christopherson, Wanpeng Li, Jim Mattson

On Wed, Apr 29, 2020 at 10:40 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 29/04/20 19:28, Andy Lutomirski wrote:
> > This seems functional, but I'm wondering if it could a bit simpler and
> > more efficient if the data structure was a normal descriptor ring with
> > the same number slots as whatever the maximum number of waiting pages
> > is.  Then there would never need to be any notification from the guest
> > back to the host, since there would always be room for a notification.
>
> No, it would be much more complicated code for a slow path which is
> already order of magnitudes slower than a vmexit.  It would also use
> much more memory.

Fair enough.

>
> > It might be even better if a single unified data structure was used
> > for both notifications.
>
> That's a very bad idea since one is synchronous and one is asynchronous.
>  Part of the proposal we agreed upon was to keep "page not ready"
> synchronous while making "page ready" an interrupt.  The data structure
> for "page not ready" will be #VE.
>

#VE on SVM will be interesting, to say the least, and I think that a
solution that is VMX specific doesn't make much sense.  #VE also has
unpleasant issues involving the contexts in which it can occur.  You
will have quite a hard time convincing me to ack the addition of a #VE
entry handler for this.  I think a brand new vector is the right
solution.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH RFC 4/6] KVM: x86: acknowledgment mechanism for async pf page ready notifications
  2020-04-30  0:45       ` Andy Lutomirski
@ 2020-04-30  6:46         ` Paolo Bonzini
  0 siblings, 0 replies; 41+ messages in thread
From: Paolo Bonzini @ 2020-04-30  6:46 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Vitaly Kuznetsov, X86 ML, kvm list, LKML, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Sean Christopherson, Wanpeng Li, Jim Mattson

On 30/04/20 02:45, Andy Lutomirski wrote:
>> That's a very bad idea since one is synchronous and one is asynchronous.
>>  Part of the proposal we agreed upon was to keep "page not ready"
>> synchronous while making "page ready" an interrupt.  The data structure
>> for "page not ready" will be #VE.
>
> #VE on SVM will be interesting, to say the least, and I think that a
> solution that is VMX specific doesn't make much sense.

You can always inject it manually.  The same is true of Haswell and
earlier processors.

> #VE also has
> unpleasant issues involving the contexts in which it can occur.  You
> will have quite a hard time convincing me to ack the addition of a #VE
> entry handler for this.  I think a brand new vector is the right
> solution.

I need --verbose. :)  For #VE I liked the idea of re-enabling it from an
IPI, at least in the case where we cannot move out of the IST stack.
And any other vector that behaves like an exception would have the same
issue, wouldn't it (especially re-entrancy)?

Paolo


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH RFC 4/6] KVM: x86: acknowledgment mechanism for async pf page ready notifications
  2020-04-29  9:36 ` [PATCH RFC 4/6] KVM: x86: acknowledgment mechanism for async pf page ready notifications Vitaly Kuznetsov
  2020-04-29 17:28   ` Andy Lutomirski
@ 2020-04-30  6:49   ` Paolo Bonzini
  2020-04-30  8:40     ` Vitaly Kuznetsov
  2020-05-05  0:36   ` Gavin Shan
  2 siblings, 1 reply; 41+ messages in thread
From: Paolo Bonzini @ 2020-04-30  6:49 UTC (permalink / raw)
  To: Vitaly Kuznetsov, x86, kvm
  Cc: linux-kernel, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Sean Christopherson, Wanpeng Li,
	Jim Mattson

On 29/04/20 11:36, Vitaly Kuznetsov wrote:
> +	case MSR_KVM_ASYNC_PF_ACK:
> +		if (data & 0x1)
> +			kvm_check_async_pf_completion(vcpu);
> +		break;

Does this work if interrupts are turned off?  I think in that case
kvm_check_async_pf_completion will refuse to make progress.  You need to
make this bit stateful (e.g. 1 = async PF in progress, 0 = not in
progress), and check that for page ready notifications instead of
EFLAGS.IF.  This probably means that;

- it might be simpler to move it to the vector MSR

- it's definitely much simpler to remove the #PF-based mechanism for
injecting page ready notifications.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH RFC 3/6] KVM: x86: interrupt based APF page-ready event delivery
  2020-04-29 21:27   ` Peter Xu
@ 2020-04-30  8:31     ` Vitaly Kuznetsov
  2020-04-30 13:28       ` Peter Xu
  0 siblings, 1 reply; 41+ messages in thread
From: Vitaly Kuznetsov @ 2020-04-30  8:31 UTC (permalink / raw)
  To: Peter Xu
  Cc: x86, kvm, linux-kernel, Paolo Bonzini, Andy Lutomirski,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Sean Christopherson, Wanpeng Li, Jim Mattson

Peter Xu <peterx@redhat.com> writes:

> Hi, Vitaly,
>
> On Wed, Apr 29, 2020 at 11:36:31AM +0200, Vitaly Kuznetsov wrote:
>> +	Type 1 page (page missing) events are currently always delivered as
>> +	synthetic #PF exception. Type 2 (page ready) are either delivered
>> +	by #PF exception (when bit 3 of MSR_KVM_ASYNC_PF_EN is clear) or
>> +	via an APIC interrupt (when bit 3 set). APIC interrupt delivery is
>> +	controlled by MSR_KVM_ASYNC_PF2.
>
> How about MSR_KVM_ASYNC_PF_INT instead of MSR_KVM_ASYNC_PF2 (to match
> MSR_KVM_ASYNC_EN and MSR_KVM_ASYNC_ACK where they're all MSR_KVM_ASYNC_* with a
> meaningful ending word)?
>

Sure.

>> +
>> +	For #PF delivery, disabling interrupt inhibits APFs. Guest must
>> +	not enable interrupt before the reason is read, or it may be
>> +	overwritten by another APF. Since APF uses the same exception
>> +	vector as regular page fault guest must reset the reason to 0
>> +	before it does something that can generate normal page fault.
>> +	If during pagefault APF reason is 0 it means that this is regular
>> +	page fault.
>>  
>>  	During delivery of type 1 APF cr2 contains a token that will
>>  	be used to notify a guest when missing page becomes
>> @@ -319,3 +326,18 @@ data:
>>  
>>  	KVM guests can request the host not to poll on HLT, for example if
>>  	they are performing polling themselves.
>> +
>> +MSR_KVM_ASYNC_PF2:
>> +	0x4b564d06
>> +
>> +data:
>> +	Second asynchronous page fault control MSR.
>> +
>> +	Bits 0-7: APIC vector for interrupt based delivery of type 2 APF
>> +	events (page ready notification).
>> +        Bit 8: Interrupt based delivery of type 2 APF events is enabled
>> +        Bits 9-63: Reserved
>
> (may need to fix up the indents)
>
>> +
>> +	To switch to interrupt based delivery of type 2 APF events guests
>> +	are supposed to enable asynchronous page faults and set bit 3 in
>> +	MSR_KVM_ASYNC_PF_EN first.
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 42a2d0d3984a..6215f61450cb 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -763,12 +763,15 @@ struct kvm_vcpu_arch {
>>  		bool halted;
>>  		gfn_t gfns[roundup_pow_of_two(ASYNC_PF_PER_VCPU)];
>>  		struct gfn_to_hva_cache data;
>> -		u64 msr_val;
>> +		u64 msr_val; /* MSR_KVM_ASYNC_PF_EN */
>> +		u64 msr2_val; /* MSR_KVM_ASYNC_PF2 */
>> +		u16 vec;
>>  		u32 id;
>>  		bool send_user_only;
>>  		u32 host_apf_reason;
>>  		unsigned long nested_apf_token;
>>  		bool delivery_as_pf_vmexit;
>> +		bool delivery_as_int;
>>  	} apf;
>>  
>>  	/* OSVW MSRs (AMD only) */
>> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
>> index df2ba34037a2..1bbb0b7e062f 100644
>> --- a/arch/x86/include/uapi/asm/kvm_para.h
>> +++ b/arch/x86/include/uapi/asm/kvm_para.h
>> @@ -50,6 +50,7 @@
>>  #define MSR_KVM_STEAL_TIME  0x4b564d03
>>  #define MSR_KVM_PV_EOI_EN      0x4b564d04
>>  #define MSR_KVM_POLL_CONTROL	0x4b564d05
>> +#define MSR_KVM_ASYNC_PF2	0x4b564d06
>>  
>>  struct kvm_steal_time {
>>  	__u64 steal;
>> @@ -81,6 +82,11 @@ struct kvm_clock_pairing {
>>  #define KVM_ASYNC_PF_ENABLED			(1 << 0)
>>  #define KVM_ASYNC_PF_SEND_ALWAYS		(1 << 1)
>>  #define KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT	(1 << 2)
>> +#define KVM_ASYNC_PF_DELIVERY_AS_INT		(1 << 3)
>> +
>> +#define KVM_ASYNC_PF2_VEC_MASK			GENMASK(7, 0)
>> +#define KVM_ASYNC_PF2_ENABLED			BIT(8)
>
> There are two enable bits, this one in ASYNC_PF_EN and the other old one in
> ASYNC_PF2.  Could it work with only one knob (e.g., set bit 0 of ASYNC_PF_EN
> always to enable apf)?  After all we have had bit 3 of ASYNC_PF_EN to show
> whether interrupt is enabled, which seems to be the real switch for whether to
> enable interrupt for apf.
>
> If we can keep the only knob in ASYNC_PF_EN (bit 0), iiuc we can also keep the
> below kvm_async_pf_wakeup_all() untouched (so we only set bit 0 of ASYNC_PF_EN
> after configure everything).

Yes,

as we need to write to two MSRs to configure the new mechanism ordering
becomes important. If the guest writes to ASYNC_PF_EN first to establish
the shared memory stucture the interrupt in ASYNC_PF2 is not yet set
(and AFAIR '0' is a valid interrupt!) so if an async pf happens
immediately after that we'll be forced to inject INT0 in the guest and
it'll get confused and linkely miss the event.

We can probably mandate the reverse sequence: guest has to set up
interrupt in ASYNC_PF2 first and then write to ASYNC_PF_EN (with both
bit 0 and bit 3). In that case the additional 'enable' bit in ASYNC_PF2
seems redundant. This protocol doesn't look too complex for guests to
follow.

>
> Thanks,
>
>> +
>>  
>>  /* Operations for KVM_HC_MMU_OP */
>>  #define KVM_MMU_OP_WRITE_PTE            1
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 7c21c0cf0a33..861dce1e7cf5 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -1243,7 +1243,7 @@ static const u32 emulated_msrs_all[] = {
>>  	HV_X64_MSR_TSC_EMULATION_STATUS,
>>  
>>  	MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
>> -	MSR_KVM_PV_EOI_EN,
>> +	MSR_KVM_PV_EOI_EN, MSR_KVM_ASYNC_PF2,
>>  
>>  	MSR_IA32_TSC_ADJUST,
>>  	MSR_IA32_TSCDEADLINE,
>> @@ -2649,8 +2649,8 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
>>  {
>>  	gpa_t gpa = data & ~0x3f;
>>  
>> -	/* Bits 3:5 are reserved, Should be zero */
>> -	if (data & 0x38)
>> +	/* Bits 4:5 are reserved, Should be zero */
>> +	if (data & 0x30)
>>  		return 1;
>>  
>>  	vcpu->arch.apf.msr_val = data;
>> @@ -2667,7 +2667,35 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
>>  
>>  	vcpu->arch.apf.send_user_only = !(data & KVM_ASYNC_PF_SEND_ALWAYS);
>>  	vcpu->arch.apf.delivery_as_pf_vmexit = data & KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT;
>> -	kvm_async_pf_wakeup_all(vcpu);
>> +	vcpu->arch.apf.delivery_as_int = data & KVM_ASYNC_PF_DELIVERY_AS_INT;
>> +
>> +	/*
>> +	 * If delivery via interrupt is configured make sure MSR_KVM_ASYNC_PF2
>> +	 * was written to before sending 'wakeup all'.
>> +	 */
>> +	if (!vcpu->arch.apf.delivery_as_int ||
>> +	    vcpu->arch.apf.msr2_val & KVM_ASYNC_PF2_ENABLED)
>> +		kvm_async_pf_wakeup_all(vcpu);
>> +
>> +	return 0;
>> +}
>> +
>> +static int kvm_pv_enable_async_pf2(struct kvm_vcpu *vcpu, u64 data)
>> +{
>> +	/* Bits 9-63 are reserved */
>> +	if (data & ~0x1ff)
>> +		return 1;
>> +
>> +	if (!lapic_in_kernel(vcpu))
>> +		return 1;
>> +
>> +	vcpu->arch.apf.msr2_val = data;
>> +
>> +	vcpu->arch.apf.vec = data & KVM_ASYNC_PF2_VEC_MASK;
>> +
>> +	if (data & KVM_ASYNC_PF2_ENABLED)
>> +		kvm_async_pf_wakeup_all(vcpu);
>> +
>>  	return 0;
>>  }
>>  
>> @@ -2883,6 +2911,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>  		if (kvm_pv_enable_async_pf(vcpu, data))
>>  			return 1;
>>  		break;
>> +	case MSR_KVM_ASYNC_PF2:
>> +		if (kvm_pv_enable_async_pf2(vcpu, data))
>> +			return 1;
>> +		break;
>>  	case MSR_KVM_STEAL_TIME:
>>  
>>  		if (unlikely(!sched_info_on()))
>> @@ -3159,6 +3191,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>  	case MSR_KVM_ASYNC_PF_EN:
>>  		msr_info->data = vcpu->arch.apf.msr_val;
>>  		break;
>> +	case MSR_KVM_ASYNC_PF2:
>> +		msr_info->data = vcpu->arch.apf.msr2_val;
>> +		break;
>>  	case MSR_KVM_STEAL_TIME:
>>  		msr_info->data = vcpu->arch.st.msr_val;
>>  		break;
>> @@ -10367,6 +10402,16 @@ static int apf_get_user(struct kvm_vcpu *vcpu, u32 *val)
>>  				      sizeof(u32));
>>  }
>>  
>> +static bool apf_slot_free(struct kvm_vcpu *vcpu)
>> +{
>> +	u32 val;
>> +
>> +	if (apf_get_user(vcpu, &val))
>> +		return false;
>> +
>> +	return !val;
>> +}
>> +
>>  static bool kvm_can_deliver_async_pf(struct kvm_vcpu *vcpu)
>>  {
>>  	if (!vcpu->arch.apf.delivery_as_pf_vmexit && is_guest_mode(vcpu))
>> @@ -10382,11 +10427,23 @@ static bool kvm_can_deliver_async_pf(struct kvm_vcpu *vcpu)
>>  
>>  bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu)
>>  {
>> +	/*
>> +	 * TODO: when we are injecting a 'page present' event with an interrupt
>> +	 * we may ignore pending exceptions.
>> +	 */
>>  	if (unlikely(!lapic_in_kernel(vcpu) ||
>>  		     kvm_event_needs_reinjection(vcpu) ||
>>  		     vcpu->arch.exception.pending))
>>  		return false;
>>  
>> +	/*'
>> +	 * Regardless of the type of event we're trying to deliver, we need to
>> +	 * check that the previous even was already consumed, this may not be
>> +	 * the case with interrupt based delivery.
>> +	 */
>> +	if (vcpu->arch.apf.delivery_as_int && !apf_slot_free(vcpu))
>> +		return false;
>> +
>>  	if (kvm_hlt_in_guest(vcpu->kvm) && !kvm_can_deliver_async_pf(vcpu))
>>  		return false;

-- 
Vitaly


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH RFC 4/6] KVM: x86: acknowledgment mechanism for async pf page ready notifications
  2020-04-30  6:49   ` Paolo Bonzini
@ 2020-04-30  8:40     ` Vitaly Kuznetsov
  2020-04-30  9:27       ` Paolo Bonzini
  0 siblings, 1 reply; 41+ messages in thread
From: Vitaly Kuznetsov @ 2020-04-30  8:40 UTC (permalink / raw)
  To: Paolo Bonzini, x86, kvm
  Cc: linux-kernel, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Sean Christopherson, Wanpeng Li,
	Jim Mattson

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 29/04/20 11:36, Vitaly Kuznetsov wrote:
>> +	case MSR_KVM_ASYNC_PF_ACK:
>> +		if (data & 0x1)
>> +			kvm_check_async_pf_completion(vcpu);
>> +		break;
>
> Does this work if interrupts are turned off?
>  I think in that case
> kvm_check_async_pf_completion will refuse to make progress.
> You need to make this bit stateful (e.g. 1 = async PF in progress, 0 =
> not in progress), and check that for page ready notifications instead of
> EFLAGS.IF.  
> This probably means that;
>
> - it might be simpler to move it to the vector MSR

I didn't want to merge 'ACK' with the vector MSR as it forces the guest
to remember the setting. It doesn't matter at all for Linux as we
hardcode the interrupt number but I can imaging an OS assigning IRQ
numbers dynamically, it'll need to keep record to avoid doing rdmsr.

> - it's definitely much simpler to remove the #PF-based mechanism for
> injecting page ready notifications.

Yea, the logic in kvm_can_do_async_pf()/kvm_can_deliver_async_pf()
becomes cumbersome. If we are to drop #PF-based mechanism I'd split it
completely from the remaining synchronious #PF for page-not-present:
basically, we only need to check that the slot (which we agreed becomes
completely separate) is empty, interrupt/pending expception/... state
becomes irrelevant.

-- 
Vitaly

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH RFC 4/6] KVM: x86: acknowledgment mechanism for async pf page ready notifications
  2020-04-30  8:40     ` Vitaly Kuznetsov
@ 2020-04-30  9:27       ` Paolo Bonzini
  2020-04-30 11:33         ` Vitaly Kuznetsov
  0 siblings, 1 reply; 41+ messages in thread
From: Paolo Bonzini @ 2020-04-30  9:27 UTC (permalink / raw)
  To: Vitaly Kuznetsov, x86, kvm
  Cc: linux-kernel, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Sean Christopherson, Wanpeng Li,
	Jim Mattson

On 30/04/20 10:40, Vitaly Kuznetsov wrote:
>>  I think in that case
>> kvm_check_async_pf_completion will refuse to make progress.
>> You need to make this bit stateful (e.g. 1 = async PF in progress, 0 =
>> not in progress), and check that for page ready notifications instead of
>> EFLAGS.IF.  
>> This probably means that;
>>
>> - it might be simpler to move it to the vector MSR
> I didn't want to merge 'ACK' with the vector MSR as it forces the guest
> to remember the setting. It doesn't matter at all for Linux as we
> hardcode the interrupt number but I can imaging an OS assigning IRQ
> numbers dynamically, it'll need to keep record to avoid doing rdmsr.

I would expect that it needs to keep it in a global variable anyway, but
yes this is a good point.  You can also keep the ACK MSR and store the
pending bit in the other MSR, kind of like you have separate ISR and EOI
registers in the LAPIC.

>> - it's definitely much simpler to remove the #PF-based mechanism for
>> injecting page ready notifications.
> Yea, the logic in kvm_can_do_async_pf()/kvm_can_deliver_async_pf()
> becomes cumbersome. If we are to drop #PF-based mechanism I'd split it
> completely from the remaining synchronious #PF for page-not-present:
> basically, we only need to check that the slot (which we agreed becomes
> completely separate) is empty, interrupt/pending expception/... state
> becomes irrelevant.

Yes, that's a good point.

Paolo


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH RFC 4/6] KVM: x86: acknowledgment mechanism for async pf page ready notifications
  2020-04-30  9:27       ` Paolo Bonzini
@ 2020-04-30 11:33         ` Vitaly Kuznetsov
  2020-04-30 11:43           ` Paolo Bonzini
  0 siblings, 1 reply; 41+ messages in thread
From: Vitaly Kuznetsov @ 2020-04-30 11:33 UTC (permalink / raw)
  To: Paolo Bonzini, x86, kvm
  Cc: linux-kernel, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Sean Christopherson, Wanpeng Li,
	Jim Mattson

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 30/04/20 10:40, Vitaly Kuznetsov wrote:
>>>  I think in that case
>>> kvm_check_async_pf_completion will refuse to make progress.
>>> You need to make this bit stateful (e.g. 1 = async PF in progress, 0 =
>>> not in progress), and check that for page ready notifications instead of
>>> EFLAGS.IF.  
>>> This probably means that;
>>>
>>> - it might be simpler to move it to the vector MSR
>> I didn't want to merge 'ACK' with the vector MSR as it forces the guest
>> to remember the setting. It doesn't matter at all for Linux as we
>> hardcode the interrupt number but I can imaging an OS assigning IRQ
>> numbers dynamically, it'll need to keep record to avoid doing rdmsr.
>
> I would expect that it needs to keep it in a global variable anyway, but
> yes this is a good point.  You can also keep the ACK MSR and store the
> pending bit in the other MSR, kind of like you have separate ISR and EOI
> registers in the LAPIC.
>

Honestly I was inspired by Hyper-V's HV_X64_MSR_EOM MSR as the protocol
we're trying to come up with here is very similar to HV messaging)

I'm not exactly sure why we need the pending bit after we drop #PF. When
we call kvm_check_async_pf_completion() from MSR_KVM_ASYNC_PF_ACK write
it will (in case there are page ready events in the queue) check if the
slot is empty, put one there and raise IRQ regardless of guest's current
state. It may or may not get injected immediately but we don't care.
The second invocation of kvm_check_async_pf_completion() from vcpu_run()
will just go away.

I'm probably just missing something, will think of it again while
working on v1, it seems nobody is against the idea in general. Thanks!

-- 
Vitaly


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH RFC 4/6] KVM: x86: acknowledgment mechanism for async pf page ready notifications
  2020-04-30 11:33         ` Vitaly Kuznetsov
@ 2020-04-30 11:43           ` Paolo Bonzini
  0 siblings, 0 replies; 41+ messages in thread
From: Paolo Bonzini @ 2020-04-30 11:43 UTC (permalink / raw)
  To: Vitaly Kuznetsov, x86, kvm
  Cc: linux-kernel, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Sean Christopherson, Wanpeng Li,
	Jim Mattson

On 30/04/20 13:33, Vitaly Kuznetsov wrote:
>> I would expect that it needs to keep it in a global variable anyway, but
>> yes this is a good point.  You can also keep the ACK MSR and store the
>> pending bit in the other MSR, kind of like you have separate ISR and EOI
>> registers in the LAPIC.
>>
> Honestly I was inspired by Hyper-V's HV_X64_MSR_EOM MSR as the protocol
> we're trying to come up with here is very similar to HV messaging)

Oh, that's true actually.

> I'm not exactly sure why we need the pending bit after we drop #PF. When
> we call kvm_check_async_pf_completion() from MSR_KVM_ASYNC_PF_ACK write
> it will (in case there are page ready events in the queue) check if the
> slot is empty, put one there and raise IRQ regardless of guest's current
> state. It may or may not get injected immediately but we don't care.> The second invocation of kvm_check_async_pf_completion() from vcpu_run()
> will just go away.

You're right, you can just use the value in the guest to see if the
guest is ready.  This is also similar to how #VE handles re-entrancy,
however because this is an interrupt we have IF to delay the IRQ until
after the interrupt handler has finished.

By dropping the #PF page ready case, we can also drop the ugly case
where WRMSR injects a page ready page fault even if IF=0.  That one is
safe on Linux, but Andy didn't like it.

Paolo


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH RFC 3/6] KVM: x86: interrupt based APF page-ready event delivery
  2020-04-30  8:31     ` Vitaly Kuznetsov
@ 2020-04-30 13:28       ` Peter Xu
  2020-04-30 13:49         ` Vitaly Kuznetsov
  0 siblings, 1 reply; 41+ messages in thread
From: Peter Xu @ 2020-04-30 13:28 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: x86, kvm, linux-kernel, Paolo Bonzini, Andy Lutomirski,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Sean Christopherson, Wanpeng Li, Jim Mattson

On Thu, Apr 30, 2020 at 10:31:32AM +0200, Vitaly Kuznetsov wrote:
> as we need to write to two MSRs to configure the new mechanism ordering
> becomes important. If the guest writes to ASYNC_PF_EN first to establish
> the shared memory stucture the interrupt in ASYNC_PF2 is not yet set
> (and AFAIR '0' is a valid interrupt!) so if an async pf happens
> immediately after that we'll be forced to inject INT0 in the guest and
> it'll get confused and linkely miss the event.
> 
> We can probably mandate the reverse sequence: guest has to set up
> interrupt in ASYNC_PF2 first and then write to ASYNC_PF_EN (with both
> bit 0 and bit 3). In that case the additional 'enable' bit in ASYNC_PF2
> seems redundant. This protocol doesn't look too complex for guests to
> follow.

Yep looks good.  We should also update the document too about the fact.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH RFC 3/6] KVM: x86: interrupt based APF page-ready event delivery
  2020-04-30 13:28       ` Peter Xu
@ 2020-04-30 13:49         ` Vitaly Kuznetsov
  0 siblings, 0 replies; 41+ messages in thread
From: Vitaly Kuznetsov @ 2020-04-30 13:49 UTC (permalink / raw)
  To: Peter Xu
  Cc: x86, kvm, linux-kernel, Paolo Bonzini, Andy Lutomirski,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Sean Christopherson, Wanpeng Li, Jim Mattson

Peter Xu <peterx@redhat.com> writes:

> On Thu, Apr 30, 2020 at 10:31:32AM +0200, Vitaly Kuznetsov wrote:
>> as we need to write to two MSRs to configure the new mechanism ordering
>> becomes important. If the guest writes to ASYNC_PF_EN first to establish
>> the shared memory stucture the interrupt in ASYNC_PF2 is not yet set
>> (and AFAIR '0' is a valid interrupt!) so if an async pf happens
>> immediately after that we'll be forced to inject INT0 in the guest and
>> it'll get confused and linkely miss the event.
>> 
>> We can probably mandate the reverse sequence: guest has to set up
>> interrupt in ASYNC_PF2 first and then write to ASYNC_PF_EN (with both
>> bit 0 and bit 3). In that case the additional 'enable' bit in ASYNC_PF2
>> seems redundant. This protocol doesn't look too complex for guests to
>> follow.
>
> Yep looks good.  We should also update the document too about the fact.
>

Of course. It will also mention the fact that #PF-based mechanism is
now depreceted and unless guest opts into interrupt based delivery
async_pf is not functional (except for APF_HALT).

-- 
Vitaly


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH RFC 2/6] KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info
  2020-04-29  9:36 ` [PATCH RFC 2/6] KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info Vitaly Kuznetsov
@ 2020-05-04 23:52   ` Gavin Shan
  2020-05-05  8:08     ` Vitaly Kuznetsov
  0 siblings, 1 reply; 41+ messages in thread
From: Gavin Shan @ 2020-05-04 23:52 UTC (permalink / raw)
  To: Vitaly Kuznetsov, x86, kvm
  Cc: linux-kernel, Paolo Bonzini, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Sean Christopherson, Wanpeng Li, Jim Mattson

Hi Vitaly,

On 4/29/20 7:36 PM, Vitaly Kuznetsov wrote:
> Currently, APF mechanism relies on the #PF abuse where the token is being
> passed through CR2. If we switch to using interrupts to deliver page-ready
> notifications we need a different way to pass the data. Extent the existing
> 'struct kvm_vcpu_pv_apf_data' with token information.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>   arch/x86/include/uapi/asm/kvm_para.h |  3 ++-
>   arch/x86/kvm/x86.c                   | 10 ++++++----
>   2 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> index 2a8e0b6b9805..df2ba34037a2 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -113,7 +113,8 @@ struct kvm_mmu_op_release_pt {
>   
>   struct kvm_vcpu_pv_apf_data {
>   	__u32 reason;
> -	__u8 pad[60];
> +	__u32 token;
> +	__u8 pad[56];
>   	__u32 enabled;
>   };
>   
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index b93133ee07ba..7c21c0cf0a33 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2662,7 +2662,7 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
>   	}
>   
>   	if (kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.apf.data, gpa,
> -					sizeof(u32)))
> +					sizeof(u64)))
>   		return 1;
>   
>   	vcpu->arch.apf.send_user_only = !(data & KVM_ASYNC_PF_SEND_ALWAYS);
> @@ -10352,8 +10352,9 @@ static void kvm_del_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
>   	}
>   }
>   
> -static int apf_put_user(struct kvm_vcpu *vcpu, u32 val)
> +static int apf_put_user(struct kvm_vcpu *vcpu, u32 reason, u32 token)
>   {
> +	u64 val = (u64)token << 32 | reason;
>   
>   	return kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.apf.data, &val,
>   				      sizeof(val));
> @@ -10405,7 +10406,8 @@ void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
>   	kvm_add_async_pf_gfn(vcpu, work->arch.gfn);
>   
>   	if (kvm_can_deliver_async_pf(vcpu) &&
> -	    !apf_put_user(vcpu, KVM_PV_REASON_PAGE_NOT_PRESENT)) {
> +	    !apf_put_user(vcpu, KVM_PV_REASON_PAGE_NOT_PRESENT,
> +			  work->arch.token)) {
>   		fault.vector = PF_VECTOR;
>   		fault.error_code_valid = true;
>   		fault.error_code = 0;
> @@ -10438,7 +10440,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
>   	trace_kvm_async_pf_ready(work->arch.token, work->cr2_or_gpa);
>   
>   	if (vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED &&
> -	    !apf_put_user(vcpu, KVM_PV_REASON_PAGE_READY)) {
> +	    !apf_put_user(vcpu, KVM_PV_REASON_PAGE_READY, work->arch.token)) {
>   			fault.vector = PF_VECTOR;
>   			fault.error_code_valid = true;
>   			fault.error_code = 0;
> 

It would be as below based on two facts: (1) token is more important than reason;
(2) token will be put into high word of @val. I think apf_{get,put}_user() might
be worthy to be inline. However, it's not a big deal.

    static inline int apf_put_user(struct kvm_vcpu *vcpu, u32 token, u32 reason)

Thanks,
Gavin
    


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH RFC 4/6] KVM: x86: acknowledgment mechanism for async pf page ready notifications
  2020-04-29  9:36 ` [PATCH RFC 4/6] KVM: x86: acknowledgment mechanism for async pf page ready notifications Vitaly Kuznetsov
  2020-04-29 17:28   ` Andy Lutomirski
  2020-04-30  6:49   ` Paolo Bonzini
@ 2020-05-05  0:36   ` Gavin Shan
  2020-05-05  8:16     ` Vitaly Kuznetsov
  2 siblings, 1 reply; 41+ messages in thread
From: Gavin Shan @ 2020-05-05  0:36 UTC (permalink / raw)
  To: Vitaly Kuznetsov, x86, kvm
  Cc: linux-kernel, Paolo Bonzini, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Sean Christopherson, Wanpeng Li, Jim Mattson

Hi Vitaly,

On 4/29/20 7:36 PM, Vitaly Kuznetsov wrote:
> If two page ready notifications happen back to back the second one is not
> delivered and the only mechanism we currently have is
> kvm_check_async_pf_completion() check in vcpu_run() loop. The check will
> only be performed with the next vmexit when it happens and in some cases
> it may take a while. With interrupt based page ready notification delivery
> the situation is even worse: unlike exceptions, interrupts are not handled
> immediately so we must check if the slot is empty. This is slow and
> unnecessary. Introduce dedicated MSR_KVM_ASYNC_PF_ACK MSR to communicate
> the fact that the slot is free and host should check its notification
> queue. Mandate using it for interrupt based type 2 APF event delivery.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>   Documentation/virt/kvm/msr.rst       | 16 +++++++++++++++-
>   arch/x86/include/uapi/asm/kvm_para.h |  1 +
>   arch/x86/kvm/x86.c                   |  9 ++++++++-
>   3 files changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
> index 7433e55f7184..18db3448db06 100644
> --- a/Documentation/virt/kvm/msr.rst
> +++ b/Documentation/virt/kvm/msr.rst
> @@ -219,6 +219,11 @@ data:
>   	If during pagefault APF reason is 0 it means that this is regular
>   	page fault.
>   
> +	For interrupt based delivery, guest has to write '1' to
> +	MSR_KVM_ASYNC_PF_ACK every time it clears reason in the shared
> +	'struct kvm_vcpu_pv_apf_data', this forces KVM to re-scan its
> +	queue and deliver next pending notification.
> +
>   	During delivery of type 1 APF cr2 contains a token that will
>   	be used to notify a guest when missing page becomes
>   	available. When page becomes available type 2 APF is sent with
> @@ -340,4 +345,13 @@ data:
>   
>   	To switch to interrupt based delivery of type 2 APF events guests
>   	are supposed to enable asynchronous page faults and set bit 3 in
> -	MSR_KVM_ASYNC_PF_EN first.
> +
> +MSR_KVM_ASYNC_PF_ACK:
> +	0x4b564d07
> +
> +data:
> +	Asynchronous page fault acknowledgment. When the guest is done
> +	processing type 2 APF event and 'reason' field in 'struct
> +	kvm_vcpu_pv_apf_data' is cleared it is supposed to write '1' to
> +	Bit 0 of the MSR, this caused the host to re-scan its queue and
> +	check if there are more notifications pending.

I'm not sure if I understood the usage of MSR_KVM_ASYNC_PF_ACK
completely. It seems it's used to trapped to host, to have chance
to check/deliver pending page ready events. If there is no pending
events, no work will be finished in the trap. If it's true, it might
be good idea to trap conditionally, meaning writing to ASYNC_PF_ACK
if there are really pending events?

Thanks,
Gavin

> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
> index 1bbb0b7e062f..5c7449980619 100644
> --- a/arch/x86/include/uapi/asm/kvm_para.h
> +++ b/arch/x86/include/uapi/asm/kvm_para.h
> @@ -51,6 +51,7 @@
>   #define MSR_KVM_PV_EOI_EN      0x4b564d04
>   #define MSR_KVM_POLL_CONTROL	0x4b564d05
>   #define MSR_KVM_ASYNC_PF2	0x4b564d06
> +#define MSR_KVM_ASYNC_PF_ACK	0x4b564d07
>   
>   struct kvm_steal_time {
>   	__u64 steal;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 861dce1e7cf5..e3b91ac33bfd 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1243,7 +1243,7 @@ static const u32 emulated_msrs_all[] = {
>   	HV_X64_MSR_TSC_EMULATION_STATUS,
>   
>   	MSR_KVM_ASYNC_PF_EN, MSR_KVM_STEAL_TIME,
> -	MSR_KVM_PV_EOI_EN, MSR_KVM_ASYNC_PF2,
> +	MSR_KVM_PV_EOI_EN, MSR_KVM_ASYNC_PF2, MSR_KVM_ASYNC_PF_ACK,
>   
>   	MSR_IA32_TSC_ADJUST,
>   	MSR_IA32_TSCDEADLINE,
> @@ -2915,6 +2915,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   		if (kvm_pv_enable_async_pf2(vcpu, data))
>   			return 1;
>   		break;
> +	case MSR_KVM_ASYNC_PF_ACK:
> +		if (data & 0x1)
> +			kvm_check_async_pf_completion(vcpu);
> +		break;
>   	case MSR_KVM_STEAL_TIME:
>   
>   		if (unlikely(!sched_info_on()))
> @@ -3194,6 +3198,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   	case MSR_KVM_ASYNC_PF2:
>   		msr_info->data = vcpu->arch.apf.msr2_val;
>   		break;
> +	case MSR_KVM_ASYNC_PF_ACK:
> +		msr_info->data = 0;
> +		break;
>   	case MSR_KVM_STEAL_TIME:
>   		msr_info->data = vcpu->arch.st.msr_val;
>   		break;
> 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH RFC 6/6] KVM: x86: Switch KVM guest to using interrupts for page ready APF delivery
  2020-04-29  9:36 ` [PATCH RFC 6/6] KVM: x86: Switch KVM guest to using interrupts for page ready APF delivery Vitaly Kuznetsov
  2020-04-29 10:53   ` Paolo Bonzini
@ 2020-05-05  0:42   ` Gavin Shan
  1 sibling, 0 replies; 41+ messages in thread
From: Gavin Shan @ 2020-05-05  0:42 UTC (permalink / raw)
  To: Vitaly Kuznetsov, x86, kvm
  Cc: linux-kernel, Paolo Bonzini, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Sean Christopherson, Wanpeng Li, Jim Mattson

Hi Vitaly,

On 4/29/20 7:36 PM, Vitaly Kuznetsov wrote:
> KVM now supports using interrupt for type 2 APF event delivery (page ready
> notifications). Switch KVM guests to using it when the feature is present.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>   arch/x86/entry/entry_32.S          |  5 ++++
>   arch/x86/entry/entry_64.S          |  5 ++++
>   arch/x86/include/asm/hardirq.h     |  3 +++
>   arch/x86/include/asm/irq_vectors.h |  6 ++++-
>   arch/x86/include/asm/kvm_para.h    |  6 +++++
>   arch/x86/kernel/irq.c              |  9 +++++++
>   arch/x86/kernel/kvm.c              | 42 ++++++++++++++++++++++++++++++
>   7 files changed, 75 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> index b67bae7091d7..d574dadcb2a1 100644
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -1475,6 +1475,11 @@ BUILD_INTERRUPT3(hv_stimer0_callback_vector, HYPERV_STIMER0_VECTOR,
>   
>   #endif /* CONFIG_HYPERV */
>   
> +#ifdef CONFIG_KVM_GUEST
> +BUILD_INTERRUPT3(kvm_async_pf_vector, KVM_ASYNC_PF_VECTOR,
> +		 kvm_async_pf_intr)
> +#endif
> +
>   SYM_CODE_START(page_fault)
>   	ASM_CLAC
>   	pushl	$do_page_fault
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 0e9504fabe52..6f127c1a6547 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -1190,6 +1190,11 @@ apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \
>   	acrn_hv_callback_vector acrn_hv_vector_handler
>   #endif
>   
> +#ifdef CONFIG_KVM_GUEST
> +apicinterrupt3 KVM_ASYNC_PF_VECTOR \
> +	kvm_async_pf_vector kvm_async_pf_intr
> +#endif
> +
>   idtentry debug			do_debug		has_error_code=0	paranoid=1 shift_ist=IST_INDEX_DB ist_offset=DB_STACK_OFFSET
>   idtentry int3			do_int3			has_error_code=0	create_gap=1
>   idtentry stack_segment		do_stack_segment	has_error_code=1
> diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h
> index 07533795b8d2..be0fbb15ad7f 100644
> --- a/arch/x86/include/asm/hardirq.h
> +++ b/arch/x86/include/asm/hardirq.h
> @@ -44,6 +44,9 @@ typedef struct {
>   	unsigned int irq_hv_reenlightenment_count;
>   	unsigned int hyperv_stimer0_count;
>   #endif
> +#ifdef CONFIG_KVM_GUEST
> +	unsigned int kvm_async_pf_pageready_count;
> +#endif
>   } ____cacheline_aligned irq_cpustat_t;
>   
>   DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
> diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
> index 889f8b1b5b7f..8879a9ecd908 100644
> --- a/arch/x86/include/asm/irq_vectors.h
> +++ b/arch/x86/include/asm/irq_vectors.h
> @@ -104,7 +104,11 @@
>   #define HYPERV_STIMER0_VECTOR		0xed
>   #endif
>   
> -#define LOCAL_TIMER_VECTOR		0xec
> +#ifdef CONFIG_KVM_GUEST
> +#define KVM_ASYNC_PF_VECTOR		0xec
> +#endif
> +
> +#define LOCAL_TIMER_VECTOR		0xeb
>   
>   #define NR_VECTORS			 256
>   
> diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
> index 9b4df6eaa11a..fde4f21607f9 100644
> --- a/arch/x86/include/asm/kvm_para.h
> +++ b/arch/x86/include/asm/kvm_para.h
> @@ -4,6 +4,7 @@
>   
>   #include <asm/processor.h>
>   #include <asm/alternative.h>
> +#include <linux/interrupt.h>
>   #include <uapi/asm/kvm_para.h>
>   
>   extern void kvmclock_init(void);
> @@ -93,6 +94,11 @@ void kvm_async_pf_task_wake(u32 token);
>   u32 kvm_read_and_reset_pf_reason(void);
>   extern void kvm_disable_steal_time(void);
>   void do_async_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned long address);
> +extern void kvm_async_pf_vector(void);
> +#ifdef CONFIG_TRACING
> +#define trace_kvm_async_pf_vector kvm_async_pf_vector
> +#endif
> +__visible void __irq_entry kvm_async_pf_intr(struct pt_regs *regs);
>   
>   #ifdef CONFIG_PARAVIRT_SPINLOCKS
>   void __init kvm_spinlock_init(void);
> diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
> index c7965ff429c5..a4c2f25ad74d 100644
> --- a/arch/x86/kernel/irq.c
> +++ b/arch/x86/kernel/irq.c
> @@ -159,6 +159,15 @@ int arch_show_interrupts(struct seq_file *p, int prec)
>   				   irq_stats(j)->hyperv_stimer0_count);
>   		seq_puts(p, "  Hyper-V stimer0 interrupts\n");
>   	}
> +#endif
> +#ifdef CONFIG_KVM_GUEST
> +	if (test_bit(KVM_ASYNC_PF_VECTOR, system_vectors)) {
> +		seq_printf(p, "%*s: ", prec, "APF");
> +		for_each_online_cpu(j)
> +			seq_printf(p, "%10u ",
> +				   irq_stats(j)->kvm_async_pf_pageready_count);
> +		seq_puts(p, "  KVM async PF page ready interrupts\n");
> +	}
>   #endif
>   	seq_printf(p, "%*s: %10u\n", prec, "ERR", atomic_read(&irq_err_count));
>   #if defined(CONFIG_X86_IO_APIC)
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 6efe0410fb72..1c00c7ba01ff 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -259,9 +259,39 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned lon
>   		rcu_irq_exit();
>   		break;
>   	}
> +
> +	if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_INT))
> +		wrmsrl(MSR_KVM_ASYNC_PF_ACK, 1);
>   }
>   NOKPROBE_SYMBOL(do_async_page_fault);
>   
> +__visible void __irq_entry kvm_async_pf_intr(struct pt_regs *regs)
> +{
> +	u32 token, reason;
> +
> +	entering_ack_irq();
> +
> +	inc_irq_stat(kvm_async_pf_pageready_count);
> +
> +	if (__this_cpu_read(apf_reason.enabled)) {
> +		reason = __this_cpu_read(apf_reason.reason);
> +		if (reason == KVM_PV_REASON_PAGE_READY) {
> +			token = __this_cpu_read(apf_reason.token);
> +			/*
> +			 * Make sure we read 'token' before we reset
> +			 * 'reason' or it can get lost.
> +			 */
> +			mb();
> +			__this_cpu_write(apf_reason.reason, 0);
> +			kvm_async_pf_task_wake(token);
> +		}
> +	}
> +
> +	wrmsrl(MSR_KVM_ASYNC_PF_ACK, 1);
> +
> +	exiting_irq();
> +}
In theory, it's possible the interrupt happens in the context of the
suspended process. With the call to kvm_async_pf_task_wake(), the
suspended process tries to wake up itself, but it seems it's not
working because of aacedf26fb760 ("sched/core: Optimize try_to_wake_up()
for local wakeups").

It's one of issue I observed when enabling async page fault for arm64,
but not sure it's valid to x86.

Thanks,
Gavin

>   static void __init paravirt_ops_setup(void)
>   {
>   	pv_info.name = "KVM";
> @@ -316,10 +346,17 @@ static void kvm_guest_cpu_init(void)
>   		if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_VMEXIT))
>   			pa |= KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT;
>   
> +		if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_INT))
> +			pa |= KVM_ASYNC_PF_DELIVERY_AS_INT;
> +
>   		wrmsrl(MSR_KVM_ASYNC_PF_EN, pa);
>   		__this_cpu_write(apf_reason.enabled, 1);
>   		printk(KERN_INFO"KVM setup async PF for cpu %d\n",
>   		       smp_processor_id());
> +
> +		if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_INT))
> +			wrmsrl(MSR_KVM_ASYNC_PF2, KVM_ASYNC_PF2_ENABLED |
> +			       KVM_ASYNC_PF_VECTOR);
>   	}
>   
>   	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) {
> @@ -649,6 +686,11 @@ static void __init kvm_guest_init(void)
>   	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
>   		apic_set_eoi_write(kvm_guest_apic_eoi_write);
>   
> +	if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_INT)) {
> +		pr_info("KVM using interrupt for async PF page-ready\n");
> +		alloc_intr_gate(KVM_ASYNC_PF_VECTOR, kvm_async_pf_vector);
> +	}
> +
>   #ifdef CONFIG_SMP
>   	smp_ops.smp_prepare_cpus = kvm_smp_prepare_cpus;
>   	smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
> 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH RFC 1/6] Revert "KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously"
  2020-04-29  9:36 ` [PATCH RFC 1/6] Revert "KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously" Vitaly Kuznetsov
@ 2020-05-05  1:22   ` Gavin Shan
  2020-05-05 14:16   ` Vivek Goyal
  1 sibling, 0 replies; 41+ messages in thread
From: Gavin Shan @ 2020-05-05  1:22 UTC (permalink / raw)
  To: Vitaly Kuznetsov, x86, kvm
  Cc: linux-kernel, Paolo Bonzini, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Sean Christopherson, Wanpeng Li, Jim Mattson

On 4/29/20 7:36 PM, Vitaly Kuznetsov wrote:
> Commit 9a6e7c39810e (""KVM: async_pf: Fix #DF due to inject "Page not
> Present" and "Page Ready" exceptions simultaneously") added a protection
> against 'page ready' notification coming before 'page not ready' is
> delivered. This situation seems to be impossible since commit 2a266f23550b
> ("KVM MMU: check pending exception before injecting APF) which added
> 'vcpu->arch.exception.pending' check to kvm_can_do_async_pf.
> 
> On x86, kvm_arch_async_page_present() has only one call site:
> kvm_check_async_pf_completion() loop and we only enter the loop when
> kvm_arch_can_inject_async_page_present(vcpu) which when async pf msr
> is enabled, translates into kvm_can_do_async_pf().
> 
> There is also one problem with the cancellation mechanism. We don't seem
> to check that the 'page not ready' notification we're cancelling matches
> the 'page ready' notification so in theory, we may erroneously drop two
> valid events.
> 
> Revert the commit. apf_get_user() stays as we will need it for the new
> 'page ready notifications via interrupt' mechanism.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>   arch/x86/kvm/x86.c | 16 +---------------
>   1 file changed, 1 insertion(+), 15 deletions(-)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH RFC 2/6] KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info
  2020-05-04 23:52   ` Gavin Shan
@ 2020-05-05  8:08     ` Vitaly Kuznetsov
  0 siblings, 0 replies; 41+ messages in thread
From: Vitaly Kuznetsov @ 2020-05-05  8:08 UTC (permalink / raw)
  To: Gavin Shan, x86, kvm
  Cc: linux-kernel, Paolo Bonzini, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Sean Christopherson, Wanpeng Li, Jim Mattson

Gavin Shan <gshan@redhat.com> writes:

> Hi Vitaly,
>
> On 4/29/20 7:36 PM, Vitaly Kuznetsov wrote:
>> Currently, APF mechanism relies on the #PF abuse where the token is being
>> passed through CR2. If we switch to using interrupts to deliver page-ready
>> notifications we need a different way to pass the data. Extent the existing
>> 'struct kvm_vcpu_pv_apf_data' with token information.
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>   arch/x86/include/uapi/asm/kvm_para.h |  3 ++-
>>   arch/x86/kvm/x86.c                   | 10 ++++++----
>>   2 files changed, 8 insertions(+), 5 deletions(-)
>> 
>> diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
>> index 2a8e0b6b9805..df2ba34037a2 100644
>> --- a/arch/x86/include/uapi/asm/kvm_para.h
>> +++ b/arch/x86/include/uapi/asm/kvm_para.h
>> @@ -113,7 +113,8 @@ struct kvm_mmu_op_release_pt {
>>   
>>   struct kvm_vcpu_pv_apf_data {
>>   	__u32 reason;
>> -	__u8 pad[60];
>> +	__u32 token;
>> +	__u8 pad[56];
>>   	__u32 enabled;
>>   };
>>   
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index b93133ee07ba..7c21c0cf0a33 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -2662,7 +2662,7 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
>>   	}
>>   
>>   	if (kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.apf.data, gpa,
>> -					sizeof(u32)))
>> +					sizeof(u64)))
>>   		return 1;
>>   
>>   	vcpu->arch.apf.send_user_only = !(data & KVM_ASYNC_PF_SEND_ALWAYS);
>> @@ -10352,8 +10352,9 @@ static void kvm_del_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
>>   	}
>>   }
>>   
>> -static int apf_put_user(struct kvm_vcpu *vcpu, u32 val)
>> +static int apf_put_user(struct kvm_vcpu *vcpu, u32 reason, u32 token)
>>   {
>> +	u64 val = (u64)token << 32 | reason;
>>   
>>   	return kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.apf.data, &val,
>>   				      sizeof(val));
>> @@ -10405,7 +10406,8 @@ void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
>>   	kvm_add_async_pf_gfn(vcpu, work->arch.gfn);
>>   
>>   	if (kvm_can_deliver_async_pf(vcpu) &&
>> -	    !apf_put_user(vcpu, KVM_PV_REASON_PAGE_NOT_PRESENT)) {
>> +	    !apf_put_user(vcpu, KVM_PV_REASON_PAGE_NOT_PRESENT,
>> +			  work->arch.token)) {
>>   		fault.vector = PF_VECTOR;
>>   		fault.error_code_valid = true;
>>   		fault.error_code = 0;
>> @@ -10438,7 +10440,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
>>   	trace_kvm_async_pf_ready(work->arch.token, work->cr2_or_gpa);
>>   
>>   	if (vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED &&
>> -	    !apf_put_user(vcpu, KVM_PV_REASON_PAGE_READY)) {
>> +	    !apf_put_user(vcpu, KVM_PV_REASON_PAGE_READY, work->arch.token)) {
>>   			fault.vector = PF_VECTOR;
>>   			fault.error_code_valid = true;
>>   			fault.error_code = 0;
>> 
>
> It would be as below based on two facts: (1) token is more important than reason;
> (2) token will be put into high word of @val. I think apf_{get,put}_user() might
> be worthy to be inline. However, it's not a big deal.

This is to be changed in v1 as we agreed to drop page-ready delivery via
#PF completely.

>     static inline int apf_put_user(struct kvm_vcpu *vcpu, u32 token, u32 reason)
>

Yes, it makes sense to inline these. Thanks!

-- 
Vitaly


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH RFC 4/6] KVM: x86: acknowledgment mechanism for async pf page ready notifications
  2020-05-05  0:36   ` Gavin Shan
@ 2020-05-05  8:16     ` Vitaly Kuznetsov
  2020-05-05  8:51       ` Gavin Shan
  0 siblings, 1 reply; 41+ messages in thread
From: Vitaly Kuznetsov @ 2020-05-05  8:16 UTC (permalink / raw)
  To: Gavin Shan, x86, kvm
  Cc: linux-kernel, Paolo Bonzini, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Sean Christopherson, Wanpeng Li, Jim Mattson

Gavin Shan <gshan@redhat.com> writes:

> Hi Vitaly,
>
> On 4/29/20 7:36 PM, Vitaly Kuznetsov wrote:
>> If two page ready notifications happen back to back the second one is not
>> delivered and the only mechanism we currently have is
>> kvm_check_async_pf_completion() check in vcpu_run() loop. The check will
>> only be performed with the next vmexit when it happens and in some cases
>> it may take a while. With interrupt based page ready notification delivery
>> the situation is even worse: unlike exceptions, interrupts are not handled
>> immediately so we must check if the slot is empty. This is slow and
>> unnecessary. Introduce dedicated MSR_KVM_ASYNC_PF_ACK MSR to communicate
>> the fact that the slot is free and host should check its notification
>> queue. Mandate using it for interrupt based type 2 APF event delivery.
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>   Documentation/virt/kvm/msr.rst       | 16 +++++++++++++++-
>>   arch/x86/include/uapi/asm/kvm_para.h |  1 +
>>   arch/x86/kvm/x86.c                   |  9 ++++++++-
>>   3 files changed, 24 insertions(+), 2 deletions(-)
>> 
>> diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
>> index 7433e55f7184..18db3448db06 100644
>> --- a/Documentation/virt/kvm/msr.rst
>> +++ b/Documentation/virt/kvm/msr.rst
>> @@ -219,6 +219,11 @@ data:
>>   	If during pagefault APF reason is 0 it means that this is regular
>>   	page fault.
>>   
>> +	For interrupt based delivery, guest has to write '1' to
>> +	MSR_KVM_ASYNC_PF_ACK every time it clears reason in the shared
>> +	'struct kvm_vcpu_pv_apf_data', this forces KVM to re-scan its
>> +	queue and deliver next pending notification.
>> +
>>   	During delivery of type 1 APF cr2 contains a token that will
>>   	be used to notify a guest when missing page becomes
>>   	available. When page becomes available type 2 APF is sent with
>> @@ -340,4 +345,13 @@ data:
>>   
>>   	To switch to interrupt based delivery of type 2 APF events guests
>>   	are supposed to enable asynchronous page faults and set bit 3 in
>> -	MSR_KVM_ASYNC_PF_EN first.
>> +
>> +MSR_KVM_ASYNC_PF_ACK:
>> +	0x4b564d07
>> +
>> +data:
>> +	Asynchronous page fault acknowledgment. When the guest is done
>> +	processing type 2 APF event and 'reason' field in 'struct
>> +	kvm_vcpu_pv_apf_data' is cleared it is supposed to write '1' to
>> +	Bit 0 of the MSR, this caused the host to re-scan its queue and
>> +	check if there are more notifications pending.
>
> I'm not sure if I understood the usage of MSR_KVM_ASYNC_PF_ACK
> completely. It seems it's used to trapped to host, to have chance
> to check/deliver pending page ready events. If there is no pending
> events, no work will be finished in the trap. If it's true, it might
> be good idea to trap conditionally, meaning writing to ASYNC_PF_ACK
> if there are really pending events?

How does the guest know if host has any pending events or not?

The problem we're trying to address with ACK msr is the following:
imagine host has two 'page ready' notifications back to back. It puts
token for the first on in the slot and raises an IRQ but how do we know
when the slot becomes free so we can inject the second one? Currently,
we have kvm_check_async_pf_completion() check in vcpu_run() loop but
this doesn't guarantee timely delivery of the event, we just hope that
there's going to be a vmexit 'some time soon' and we'll piggy back onto
that. Normally this works but in some special cases it may take really
long before a vmexit happens. Also, we're penalizing each vmexit with an
unneeded check. ACK msr is intended to solve these issues.

-- 
Vitaly


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH RFC 4/6] KVM: x86: acknowledgment mechanism for async pf page ready notifications
  2020-05-05  8:16     ` Vitaly Kuznetsov
@ 2020-05-05  8:51       ` Gavin Shan
  2020-05-05  9:54         ` Paolo Bonzini
  0 siblings, 1 reply; 41+ messages in thread
From: Gavin Shan @ 2020-05-05  8:51 UTC (permalink / raw)
  To: Vitaly Kuznetsov, x86, kvm
  Cc: linux-kernel, Paolo Bonzini, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Sean Christopherson, Wanpeng Li, Jim Mattson

Hi Vitaly,

On 5/5/20 6:16 PM, Vitaly Kuznetsov wrote:
>> On 4/29/20 7:36 PM, Vitaly Kuznetsov wrote:
>>> If two page ready notifications happen back to back the second one is not
>>> delivered and the only mechanism we currently have is
>>> kvm_check_async_pf_completion() check in vcpu_run() loop. The check will
>>> only be performed with the next vmexit when it happens and in some cases
>>> it may take a while. With interrupt based page ready notification delivery
>>> the situation is even worse: unlike exceptions, interrupts are not handled
>>> immediately so we must check if the slot is empty. This is slow and
>>> unnecessary. Introduce dedicated MSR_KVM_ASYNC_PF_ACK MSR to communicate
>>> the fact that the slot is free and host should check its notification
>>> queue. Mandate using it for interrupt based type 2 APF event delivery.
>>>
>>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>>> ---
>>>    Documentation/virt/kvm/msr.rst       | 16 +++++++++++++++-
>>>    arch/x86/include/uapi/asm/kvm_para.h |  1 +
>>>    arch/x86/kvm/x86.c                   |  9 ++++++++-
>>>    3 files changed, 24 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
>>> index 7433e55f7184..18db3448db06 100644
>>> --- a/Documentation/virt/kvm/msr.rst
>>> +++ b/Documentation/virt/kvm/msr.rst
>>> @@ -219,6 +219,11 @@ data:
>>>    	If during pagefault APF reason is 0 it means that this is regular
>>>    	page fault.
>>>    
>>> +	For interrupt based delivery, guest has to write '1' to
>>> +	MSR_KVM_ASYNC_PF_ACK every time it clears reason in the shared
>>> +	'struct kvm_vcpu_pv_apf_data', this forces KVM to re-scan its
>>> +	queue and deliver next pending notification.
>>> +
>>>    	During delivery of type 1 APF cr2 contains a token that will
>>>    	be used to notify a guest when missing page becomes
>>>    	available. When page becomes available type 2 APF is sent with
>>> @@ -340,4 +345,13 @@ data:
>>>    
>>>    	To switch to interrupt based delivery of type 2 APF events guests
>>>    	are supposed to enable asynchronous page faults and set bit 3 in
>>> -	MSR_KVM_ASYNC_PF_EN first.
>>> +
>>> +MSR_KVM_ASYNC_PF_ACK:
>>> +	0x4b564d07
>>> +
>>> +data:
>>> +	Asynchronous page fault acknowledgment. When the guest is done
>>> +	processing type 2 APF event and 'reason' field in 'struct
>>> +	kvm_vcpu_pv_apf_data' is cleared it is supposed to write '1' to
>>> +	Bit 0 of the MSR, this caused the host to re-scan its queue and
>>> +	check if there are more notifications pending.
>>
>> I'm not sure if I understood the usage of MSR_KVM_ASYNC_PF_ACK
>> completely. It seems it's used to trapped to host, to have chance
>> to check/deliver pending page ready events. If there is no pending
>> events, no work will be finished in the trap. If it's true, it might
>> be good idea to trap conditionally, meaning writing to ASYNC_PF_ACK
>> if there are really pending events?
> 
> How does the guest know if host has any pending events or not?
> 

One way would be through struct kvm_vcpu_pv_apf_data, which is visible
to host and guest. In the host, the workers that have completed their
work (GUP) are queued into vcpu->async_pf.done. So we need a bit in
struct kvm_vcpu_pv_apf_data, written by host while read by guest to
see if there pending work. I even don't think the writer/reader need
to be synchronized.

> The problem we're trying to address with ACK msr is the following:
> imagine host has two 'page ready' notifications back to back. It puts
> token for the first on in the slot and raises an IRQ but how do we know
> when the slot becomes free so we can inject the second one? Currently,
> we have kvm_check_async_pf_completion() check in vcpu_run() loop but
> this doesn't guarantee timely delivery of the event, we just hope that
> there's going to be a vmexit 'some time soon' and we'll piggy back onto
> that. Normally this works but in some special cases it may take really
> long before a vmexit happens. Also, we're penalizing each vmexit with an
> unneeded check. ACK msr is intended to solve these issues.
> 

Thanks for the explanation. I think vmexit frequency is somewhat proportional
to the workload, meaning more vmexit would be observed if the VM has more
workload. The async page fault is part of the workload. I agree it makes sense
to proactively poke the pending events ('page ready'), but it would be reasonable
to do so conditionally, to avoid overhead. However, I'm not sure how much overhead
it would have on x86 :)

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH RFC 4/6] KVM: x86: acknowledgment mechanism for async pf page ready notifications
  2020-05-05  8:51       ` Gavin Shan
@ 2020-05-05  9:54         ` Paolo Bonzini
  0 siblings, 0 replies; 41+ messages in thread
From: Paolo Bonzini @ 2020-05-05  9:54 UTC (permalink / raw)
  To: Gavin Shan, Vitaly Kuznetsov, x86, kvm
  Cc: linux-kernel, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Sean Christopherson, Wanpeng Li,
	Jim Mattson

On 05/05/20 10:51, Gavin Shan wrote:
>> How does the guest know if host has any pending events or not?
> 
> One way would be through struct kvm_vcpu_pv_apf_data, which is visible
> to host and guest. In the host, the workers that have completed their
> work (GUP) are queued into vcpu->async_pf.done. So we need a bit in
> struct kvm_vcpu_pv_apf_data, written by host while read by guest to
> see if there pending work. I even don't think the writer/reader need
> to be synchronized.

No, this cannot work.  The problem is that you need a way to tell the
host "you can give me another ready page", and a memory location is not
enough for that.

Paolo


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH RFC 1/6] Revert "KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously"
  2020-04-29  9:36 ` [PATCH RFC 1/6] Revert "KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously" Vitaly Kuznetsov
  2020-05-05  1:22   ` Gavin Shan
@ 2020-05-05 14:16   ` Vivek Goyal
  2020-05-06 15:17     ` Vitaly Kuznetsov
  1 sibling, 1 reply; 41+ messages in thread
From: Vivek Goyal @ 2020-05-05 14:16 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: x86, kvm, linux-kernel, Paolo Bonzini, Andy Lutomirski,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Sean Christopherson, Wanpeng Li, Jim Mattson

On Wed, Apr 29, 2020 at 11:36:29AM +0200, Vitaly Kuznetsov wrote:
> Commit 9a6e7c39810e (""KVM: async_pf: Fix #DF due to inject "Page not
> Present" and "Page Ready" exceptions simultaneously") added a protection
> against 'page ready' notification coming before 'page not ready' is
> delivered.

Hi Vitaly,

Description of the commit seems to suggest that it is solving double
fault issue. That is both "page not present" and "page ready" exceptions
got queued and on next vcpu entry, it will result in double fault.

It does not seem to solve the issue of "page not ready" being delivered
before "page ready". That guest can handle already and its not an issue.

> This situation seems to be impossible since commit 2a266f23550b
> ("KVM MMU: check pending exception before injecting APF) which added
> 'vcpu->arch.exception.pending' check to kvm_can_do_async_pf.

This original commit description is confusing too. It says.

"For example, when two APF's for page ready happen after one exit and
 the first one becomes pending, the second one will result in #DF.
 Instead, just handle the second page fault synchronously."

So it seems to be trying to protect against that two "page ready"
exceptions don't get queued simultaneously. But you can't fall back
to synchronous mechanism once you have started the async pf prototocol.
Once you have started async page fault protocol by sending "page not
reay", you have to send "page ready". So I am not sure how above commit
solved the issue of two "page ready" not being queued at the same time.

I am wondering what problem did this commit solve. It looks like it
can avoid queueing two "page not ready" events. But can that even happen?

> 
> On x86, kvm_arch_async_page_present() has only one call site:
> kvm_check_async_pf_completion() loop and we only enter the loop when
> kvm_arch_can_inject_async_page_present(vcpu) which when async pf msr
> is enabled, translates into kvm_can_do_async_pf().

kvm_check_async_pf_completion() skips injecting "page ready" if fault
can't be injected now. Does that mean we leave it queued and it will
be injected after next exit.

If yes, then previous commit kind of makes sense. When it will not
queue up two exceptions at the same time but will wait for queuing
up the exception after next exit. But commit description still seems
to be wrong in the sense it is not falling back to synchronous page
fault for "page ready" events.

try_async_pf() also calls kvm_can_do_async_pf(). And IIUC, it will
fall back to synchrounous fault if injecting async_pf is not possible
at this point of time. So that means despite the fact that async pf
is enabled, all the page faults might not take that route and some
will fall back to synchrounous faults. I am concerned that how will
this work for reporting errors back to guest (for virtiofs use case).
If we are relying on async pf mechanism to also be able to report
errors back, then we can't afford to do synchrounous page faults becase
we don't have a way to report errors back to guest and it will hang (if
page can't be faulted in).

So either we need a way to report errors back while doing synchrounous
page faults or we can't fall back to synchorounous page faults while
async page faults are enabled. 

While we are reworking async page mechanism, want to make sure that
error reporting part has been taken care of as part of design. Don't
want to be dealing with it after the fact.

Thanks
Vivek

> 
> There is also one problem with the cancellation mechanism. We don't seem
> to check that the 'page not ready' notification we're cancelling matches
> the 'page ready' notification so in theory, we may erroneously drop two
> valid events.
> 
> Revert the commit. apf_get_user() stays as we will need it for the new
> 'page ready notifications via interrupt' mechanism.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/kvm/x86.c | 16 +---------------
>  1 file changed, 1 insertion(+), 15 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c5835f9cb9ad..b93133ee07ba 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10430,7 +10430,6 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
>  				 struct kvm_async_pf *work)
>  {
>  	struct x86_exception fault;
> -	u32 val;
>  
>  	if (work->wakeup_all)
>  		work->arch.token = ~0; /* broadcast wakeup */
> @@ -10439,19 +10438,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
>  	trace_kvm_async_pf_ready(work->arch.token, work->cr2_or_gpa);
>  
>  	if (vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED &&
> -	    !apf_get_user(vcpu, &val)) {
> -		if (val == KVM_PV_REASON_PAGE_NOT_PRESENT &&
> -		    vcpu->arch.exception.pending &&
> -		    vcpu->arch.exception.nr == PF_VECTOR &&
> -		    !apf_put_user(vcpu, 0)) {
> -			vcpu->arch.exception.injected = false;
> -			vcpu->arch.exception.pending = false;
> -			vcpu->arch.exception.nr = 0;
> -			vcpu->arch.exception.has_error_code = false;
> -			vcpu->arch.exception.error_code = 0;
> -			vcpu->arch.exception.has_payload = false;
> -			vcpu->arch.exception.payload = 0;
> -		} else if (!apf_put_user(vcpu, KVM_PV_REASON_PAGE_READY)) {
> +	    !apf_put_user(vcpu, KVM_PV_REASON_PAGE_READY)) {
>  			fault.vector = PF_VECTOR;
>  			fault.error_code_valid = true;
>  			fault.error_code = 0;
> @@ -10459,7 +10446,6 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
>  			fault.address = work->arch.token;
>  			fault.async_page_fault = true;
>  			kvm_inject_page_fault(vcpu, &fault);
> -		}
>  	}
>  	vcpu->arch.apf.halted = false;
>  	vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> -- 
> 2.25.3
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH RFC 3/6] KVM: x86: interrupt based APF page-ready event delivery
  2020-04-29  9:36 ` [PATCH RFC 3/6] KVM: x86: interrupt based APF page-ready event delivery Vitaly Kuznetsov
  2020-04-29 10:54   ` Paolo Bonzini
  2020-04-29 21:27   ` Peter Xu
@ 2020-05-05 15:22   ` Vivek Goyal
  2 siblings, 0 replies; 41+ messages in thread
From: Vivek Goyal @ 2020-05-05 15:22 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: x86, kvm, linux-kernel, Paolo Bonzini, Andy Lutomirski,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Sean Christopherson, Wanpeng Li, Jim Mattson

On Wed, Apr 29, 2020 at 11:36:31AM +0200, Vitaly Kuznetsov wrote:
> Concerns were expressed around APF delivery via synthetic #PF exception as
> in some cases such delivery may collide with real page fault. For type 2
> (page ready) notifications we can easily switch to using an interrupt
> instead. Introduce new MSR_KVM_ASYNC_PF2 mechanism.
> 
> One notable difference between the two mechanisms is that interrupt may not
> get handled immediately so whenever we would like to deliver next event
> (regardless of its type) we must be sure the guest had read and cleared
> previous event in the slot.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  Documentation/virt/kvm/msr.rst       | 38 +++++++++++---
>  arch/x86/include/asm/kvm_host.h      |  5 +-
>  arch/x86/include/uapi/asm/kvm_para.h |  6 +++
>  arch/x86/kvm/x86.c                   | 77 ++++++++++++++++++++++++++--
>  4 files changed, 113 insertions(+), 13 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
> index 33892036672d..7433e55f7184 100644
> --- a/Documentation/virt/kvm/msr.rst
> +++ b/Documentation/virt/kvm/msr.rst
> @@ -203,14 +203,21 @@ data:
>  	the hypervisor at the time of asynchronous page fault (APF)
>  	injection to indicate type of asynchronous page fault. Value
>  	of 1 means that the page referred to by the page fault is not
> -	present. Value 2 means that the page is now available. Disabling
> -	interrupt inhibits APFs. Guest must not enable interrupt
> -	before the reason is read, or it may be overwritten by another
> -	APF. Since APF uses the same exception vector as regular page
> -	fault guest must reset the reason to 0 before it does
> -	something that can generate normal page fault.  If during page
> -	fault APF reason is 0 it means that this is regular page
> -	fault.
> +	present. Value 2 means that the page is now available.
> +
> +	Type 1 page (page missing) events are currently always delivered as
> +	synthetic #PF exception. Type 2 (page ready) are either delivered
> +	by #PF exception (when bit 3 of MSR_KVM_ASYNC_PF_EN is clear) or
> +	via an APIC interrupt (when bit 3 set). APIC interrupt delivery is
> +	controlled by MSR_KVM_ASYNC_PF2.
> +
> +	For #PF delivery, disabling interrupt inhibits APFs. Guest must
> +	not enable interrupt before the reason is read, or it may be
> +	overwritten by another APF. Since APF uses the same exception
> +	vector as regular page fault guest must reset the reason to 0
> +	before it does something that can generate normal page fault.
> +	If during pagefault APF reason is 0 it means that this is regular
> +	page fault.

Hi Vitaly,

Again, thinking about how errors will be delivered. Will these be using
same interrupt path? 

As you mentioned that if interrupts are disabled, APFs are blocked. That
means host will fall back to synchronous fault? If yes, that means we
will need a mechanism to report errors in synchronous path too.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH RFC 6/6] KVM: x86: Switch KVM guest to using interrupts for page ready APF delivery
  2020-04-29 10:53   ` Paolo Bonzini
  2020-04-29 12:44     ` Vitaly Kuznetsov
@ 2020-05-05 18:59     ` Vivek Goyal
  1 sibling, 0 replies; 41+ messages in thread
From: Vivek Goyal @ 2020-05-05 18:59 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Vitaly Kuznetsov, x86, kvm, linux-kernel, Andy Lutomirski,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Sean Christopherson, Wanpeng Li, Jim Mattson

On Wed, Apr 29, 2020 at 12:53:33PM +0200, Paolo Bonzini wrote:
> On 29/04/20 11:36, Vitaly Kuznetsov wrote:
> > +
> > +	if (__this_cpu_read(apf_reason.enabled)) {
> > +		reason = __this_cpu_read(apf_reason.reason);
> > +		if (reason == KVM_PV_REASON_PAGE_READY) {
> > +			token = __this_cpu_read(apf_reason.token);
> > +			/*
> > +			 * Make sure we read 'token' before we reset
> > +			 * 'reason' or it can get lost.
> > +			 */
> > +			mb();
> > +			__this_cpu_write(apf_reason.reason, 0);
> > +			kvm_async_pf_task_wake(token);
> > +		}
> 
> If tokens cannot be zero, could we avoid using reason for the page ready
> interrupt (and ultimately retire "reason" completely)?

If we are planning to report errors using this interface, then retaining
KVM_PV_REASON_PAGE_READY makes sense because we can then introduce another
state say KVM_PV_REASON_PAGE_ERROR.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH RFC 1/6] Revert "KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously"
  2020-05-05 14:16   ` Vivek Goyal
@ 2020-05-06 15:17     ` Vitaly Kuznetsov
  2020-05-11 19:17       ` Vivek Goyal
  0 siblings, 1 reply; 41+ messages in thread
From: Vitaly Kuznetsov @ 2020-05-06 15:17 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: x86, kvm, linux-kernel, Paolo Bonzini, Andy Lutomirski,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Sean Christopherson, Wanpeng Li, Jim Mattson

Vivek Goyal <vgoyal@redhat.com> writes:

> On Wed, Apr 29, 2020 at 11:36:29AM +0200, Vitaly Kuznetsov wrote:
>> Commit 9a6e7c39810e (""KVM: async_pf: Fix #DF due to inject "Page not
>> Present" and "Page Ready" exceptions simultaneously") added a protection
>> against 'page ready' notification coming before 'page not ready' is
>> delivered.
>
> Hi Vitaly,
>
> Description of the commit seems to suggest that it is solving double
> fault issue. That is both "page not present" and "page ready" exceptions
> got queued and on next vcpu entry, it will result in double fault.
>
> It does not seem to solve the issue of "page not ready" being delivered
> before "page ready". That guest can handle already and its not an issue.
>
>> This situation seems to be impossible since commit 2a266f23550b
>> ("KVM MMU: check pending exception before injecting APF) which added
>> 'vcpu->arch.exception.pending' check to kvm_can_do_async_pf.
>
> This original commit description is confusing too. It says.
>
> "For example, when two APF's for page ready happen after one exit and
>  the first one becomes pending, the second one will result in #DF.
>  Instead, just handle the second page fault synchronously."
>
> So it seems to be trying to protect against that two "page ready"
> exceptions don't get queued simultaneously. But you can't fall back
> to synchronous mechanism once you have started the async pf prototocol.
> Once you have started async page fault protocol by sending "page not
> reay", you have to send "page ready". So I am not sure how above commit
> solved the issue of two "page ready" not being queued at the same time.
>
> I am wondering what problem did this commit solve. It looks like it
> can avoid queueing two "page not ready" events. But can that even happen?
>
>> 
>> On x86, kvm_arch_async_page_present() has only one call site:
>> kvm_check_async_pf_completion() loop and we only enter the loop when
>> kvm_arch_can_inject_async_page_present(vcpu) which when async pf msr
>> is enabled, translates into kvm_can_do_async_pf().
>
> kvm_check_async_pf_completion() skips injecting "page ready" if fault
> can't be injected now. Does that mean we leave it queued and it will
> be injected after next exit.
>
> If yes, then previous commit kind of makes sense. When it will not
> queue up two exceptions at the same time but will wait for queuing
> up the exception after next exit. But commit description still seems
> to be wrong in the sense it is not falling back to synchronous page
> fault for "page ready" events.

As I'll be dropping 'page ready' event delivery via #PF this doesn't
really matter much but after staring at the code for a while I managed
to convince myself that this particular code patch is just unreachable.
I don't think we can get #DF now with two 'page not ready' exceptions,
kvm_can_do_async_pf() should now allow that.

>
> try_async_pf() also calls kvm_can_do_async_pf(). And IIUC, it will
> fall back to synchrounous fault if injecting async_pf is not possible
> at this point of time. So that means despite the fact that async pf
> is enabled, all the page faults might not take that route and some
> will fall back to synchrounous faults. 

This sounds correct.

> I am concerned that how will
> this work for reporting errors back to guest (for virtiofs use case).
> If we are relying on async pf mechanism to also be able to report
> errors back, then we can't afford to do synchrounous page faults becase
> we don't have a way to report errors back to guest and it will hang (if
> page can't be faulted in).

Currently, if 'page not ready' event was delivered then we can't skip
'page ready' event, we'll have to deliver it. 'Page not ready' is still
going to be delivered as exception and unless we really need to solve
complex problems like delivering nested exception this should work.

>
> So either we need a way to report errors back while doing synchrounous
> page faults or we can't fall back to synchorounous page faults while
> async page faults are enabled.
>
> While we are reworking async page mechanism, want to make sure that
> error reporting part has been taken care of as part of design. Don't
> want to be dealing with it after the fact.

The main issue I'm seeing here is that we'll need to deliver these
errors 'right now' and not some time later. Generally, exceptions
(e.g. #VE) should work but there are some corner cases, I remember Paolo
and Andy discussing these (just hoping they'll jump in with their
conclusions :-). If we somehow manage to exclude interrupts-disabled
context from our scope we should be good, I don't see reasons to skip
delivering #VE there.

For the part this series touches, "page ready" notifications, we don't
skip them but at the same time there is no timely delivery guarantee, we
just queue an interrupt. I'm not sure you'll need these for virtio-fs
though.

Thanks for the feedback!

-- 
Vitaly


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH RFC 1/6] Revert "KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously"
  2020-05-06 15:17     ` Vitaly Kuznetsov
@ 2020-05-11 19:17       ` Vivek Goyal
  0 siblings, 0 replies; 41+ messages in thread
From: Vivek Goyal @ 2020-05-11 19:17 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: x86, kvm, linux-kernel, Paolo Bonzini, Andy Lutomirski,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Sean Christopherson, Wanpeng Li, Jim Mattson

On Wed, May 06, 2020 at 05:17:57PM +0200, Vitaly Kuznetsov wrote:

[..]
> >
> > So either we need a way to report errors back while doing synchrounous
> > page faults or we can't fall back to synchorounous page faults while
> > async page faults are enabled.
> >
> > While we are reworking async page mechanism, want to make sure that
> > error reporting part has been taken care of as part of design. Don't
> > want to be dealing with it after the fact.
> 
> The main issue I'm seeing here is that we'll need to deliver these
> errors 'right now' and not some time later. Generally, exceptions
> (e.g. #VE) should work but there are some corner cases, I remember Paolo
> and Andy discussing these (just hoping they'll jump in with their
> conclusions :-). If we somehow manage to exclude interrupts-disabled
> context from our scope we should be good, I don't see reasons to skip
> delivering #VE there.

Hi Vitaly,

If we can't find a good solution for interrupt disabled regions, then
I guess I will live with error reporting with interrupts enabled only
for now. It should solve a class of problems. Once users show up which
need error handling with interrupts disabled, then we will need to
solve it.

> 
> For the part this series touches, "page ready" notifications, we don't
> skip them but at the same time there is no timely delivery guarantee, we
> just queue an interrupt. I'm not sure you'll need these for virtio-fs
> though.


I think virtiofs will need both (synchronous as well as asynchrous
error reporting).

- If we can deliver async pf to guest, then we will send "page not present"
  to guest and try to fault in the page. If we figure out that page can't be
  faulted in, then we can send "page fault error" notification using interrupt
  (as you are doing for "page ready").

- If async page fault can't be injected in guest and we fall back to
  synchronous fault, and we figure out that fault can't be completed,
  we need to inject error using #VE (or some exception).

Thanks
Vivek

> 
> Thanks for the feedback!
> 
> -- 
> Vitaly
> 


^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2020-05-11 19:17 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-29  9:36 [PATCH RFC 0/6] KVM: x86: Interrupt-based mechanism for async_pf 'page present' notifications Vitaly Kuznetsov
2020-04-29  9:36 ` [PATCH RFC 1/6] Revert "KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously" Vitaly Kuznetsov
2020-05-05  1:22   ` Gavin Shan
2020-05-05 14:16   ` Vivek Goyal
2020-05-06 15:17     ` Vitaly Kuznetsov
2020-05-11 19:17       ` Vivek Goyal
2020-04-29  9:36 ` [PATCH RFC 2/6] KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info Vitaly Kuznetsov
2020-05-04 23:52   ` Gavin Shan
2020-05-05  8:08     ` Vitaly Kuznetsov
2020-04-29  9:36 ` [PATCH RFC 3/6] KVM: x86: interrupt based APF page-ready event delivery Vitaly Kuznetsov
2020-04-29 10:54   ` Paolo Bonzini
2020-04-29 12:40     ` Vitaly Kuznetsov
2020-04-29 13:21       ` Paolo Bonzini
2020-04-29 21:27   ` Peter Xu
2020-04-30  8:31     ` Vitaly Kuznetsov
2020-04-30 13:28       ` Peter Xu
2020-04-30 13:49         ` Vitaly Kuznetsov
2020-05-05 15:22   ` Vivek Goyal
2020-04-29  9:36 ` [PATCH RFC 4/6] KVM: x86: acknowledgment mechanism for async pf page ready notifications Vitaly Kuznetsov
2020-04-29 17:28   ` Andy Lutomirski
2020-04-29 17:40     ` Paolo Bonzini
2020-04-30  0:45       ` Andy Lutomirski
2020-04-30  6:46         ` Paolo Bonzini
2020-04-30  6:49   ` Paolo Bonzini
2020-04-30  8:40     ` Vitaly Kuznetsov
2020-04-30  9:27       ` Paolo Bonzini
2020-04-30 11:33         ` Vitaly Kuznetsov
2020-04-30 11:43           ` Paolo Bonzini
2020-05-05  0:36   ` Gavin Shan
2020-05-05  8:16     ` Vitaly Kuznetsov
2020-05-05  8:51       ` Gavin Shan
2020-05-05  9:54         ` Paolo Bonzini
2020-04-29  9:36 ` [PATCH RFC 5/6] KVM: x86: announce KVM_FEATURE_ASYNC_PF_INT Vitaly Kuznetsov
2020-04-29 10:40   ` Peter Zijlstra
2020-04-29  9:36 ` [PATCH RFC 6/6] KVM: x86: Switch KVM guest to using interrupts for page ready APF delivery Vitaly Kuznetsov
2020-04-29 10:53   ` Paolo Bonzini
2020-04-29 12:44     ` Vitaly Kuznetsov
2020-04-29 13:19       ` Paolo Bonzini
2020-04-29 14:34         ` Vitaly Kuznetsov
2020-05-05 18:59     ` Vivek Goyal
2020-05-05  0:42   ` Gavin Shan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.