All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] kvm,x86,async_pf: Add capability to return page fault error
@ 2020-03-31 19:40 ` Vivek Goyal
  0 siblings, 0 replies; 12+ messages in thread
From: Vivek Goyal @ 2020-03-31 19:40 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: virtio-fs, miklos, stefanha, dgilbert, vgoyal, aarcange, dhildenb

Current page fault logic in kvm seems to assume that host will always
be able to successfully resolve page fault soon or later. There does not
seem to be any mechanism for hypervisor to return an error say -EFAULT
to guest.

We are writing DAX support for virtiofs filesystem. This will allow
directly mapping host page cache page into guest user space process.

This mechanism now needs additional support from kvm where a page
fault error needs to be propagated back into guest. For example, say
guest process mmaped a file (and this did an mmap of portion of file
on host into qemu address space). Now file gets truncated and guest
process tries to access mapped region. It will generate page fault
in host and it will try to map the file page. But page is not there
any more so it will get back -EFAULT. But there is no mechanism to
send this information back to guest and currently host sends PAGE_READY
to guest, guest retries and fault happens again and host tries to
resolve page fault again and this becomes an infinite loop.

This is an RFC patch series which tries to extend async page fault
mechanism to also be able to communicate back that an error occurred
while resolving the page fault. Then guest can send SIGBUS to guest
process accessing the truncated portion of file. Or if access happened
in guest kernel, then it can try to fixup the exception and jump
to error handling portion if there is one.  

This patch series tries to solve it only for x86 architecture on intel
vmx only. Also it does not solve the problem for nested virtualization.

Is extending async page fault mechanism to report error back to
guest is right thing to do? Or there needs to be another way.

Any feedback or comments are welcome. 

Thanks
Vivek

Vivek Goyal (4):
  kvm: Add capability to be able to report async pf error to guest
  kvm: async_pf: Send faulting gva address in case of error
  kvm: Always get async page notifications
  kvm,x86,async_pf: Search exception tables in case of error

 Documentation/virt/kvm/cpuid.rst     |  4 ++
 Documentation/virt/kvm/msr.rst       | 11 +++--
 arch/x86/include/asm/kvm_host.h      | 17 ++++++-
 arch/x86/include/asm/kvm_para.h      | 13 +++---
 arch/x86/include/asm/vmx.h           |  2 +
 arch/x86/include/uapi/asm/kvm_para.h | 12 ++++-
 arch/x86/kernel/kvm.c                | 69 ++++++++++++++++++++++------
 arch/x86/kvm/cpuid.c                 |  3 +-
 arch/x86/kvm/mmu/mmu.c               | 12 +++--
 arch/x86/kvm/vmx/nested.c            |  2 +-
 arch/x86/kvm/vmx/vmx.c               | 11 ++++-
 arch/x86/kvm/x86.c                   | 37 +++++++++++----
 include/linux/kvm_host.h             |  1 +
 virt/kvm/async_pf.c                  |  6 ++-
 14 files changed, 156 insertions(+), 44 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Virtio-fs] [RFC PATCH 0/4] kvm, x86, async_pf: Add capability to return page fault error
@ 2020-03-31 19:40 ` Vivek Goyal
  0 siblings, 0 replies; 12+ messages in thread
From: Vivek Goyal @ 2020-03-31 19:40 UTC (permalink / raw)
  To: kvm, linux-kernel; +Cc: aarcange, miklos, virtio-fs, dhildenb, vgoyal

Current page fault logic in kvm seems to assume that host will always
be able to successfully resolve page fault soon or later. There does not
seem to be any mechanism for hypervisor to return an error say -EFAULT
to guest.

We are writing DAX support for virtiofs filesystem. This will allow
directly mapping host page cache page into guest user space process.

This mechanism now needs additional support from kvm where a page
fault error needs to be propagated back into guest. For example, say
guest process mmaped a file (and this did an mmap of portion of file
on host into qemu address space). Now file gets truncated and guest
process tries to access mapped region. It will generate page fault
in host and it will try to map the file page. But page is not there
any more so it will get back -EFAULT. But there is no mechanism to
send this information back to guest and currently host sends PAGE_READY
to guest, guest retries and fault happens again and host tries to
resolve page fault again and this becomes an infinite loop.

This is an RFC patch series which tries to extend async page fault
mechanism to also be able to communicate back that an error occurred
while resolving the page fault. Then guest can send SIGBUS to guest
process accessing the truncated portion of file. Or if access happened
in guest kernel, then it can try to fixup the exception and jump
to error handling portion if there is one.  

This patch series tries to solve it only for x86 architecture on intel
vmx only. Also it does not solve the problem for nested virtualization.

Is extending async page fault mechanism to report error back to
guest is right thing to do? Or there needs to be another way.

Any feedback or comments are welcome. 

Thanks
Vivek

Vivek Goyal (4):
  kvm: Add capability to be able to report async pf error to guest
  kvm: async_pf: Send faulting gva address in case of error
  kvm: Always get async page notifications
  kvm,x86,async_pf: Search exception tables in case of error

 Documentation/virt/kvm/cpuid.rst     |  4 ++
 Documentation/virt/kvm/msr.rst       | 11 +++--
 arch/x86/include/asm/kvm_host.h      | 17 ++++++-
 arch/x86/include/asm/kvm_para.h      | 13 +++---
 arch/x86/include/asm/vmx.h           |  2 +
 arch/x86/include/uapi/asm/kvm_para.h | 12 ++++-
 arch/x86/kernel/kvm.c                | 69 ++++++++++++++++++++++------
 arch/x86/kvm/cpuid.c                 |  3 +-
 arch/x86/kvm/mmu/mmu.c               | 12 +++--
 arch/x86/kvm/vmx/nested.c            |  2 +-
 arch/x86/kvm/vmx/vmx.c               | 11 ++++-
 arch/x86/kvm/x86.c                   | 37 +++++++++++----
 include/linux/kvm_host.h             |  1 +
 virt/kvm/async_pf.c                  |  6 ++-
 14 files changed, 156 insertions(+), 44 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/4] kvm: Add capability to be able to report async pf error to guest
  2020-03-31 19:40 ` [Virtio-fs] [RFC PATCH 0/4] kvm, x86, async_pf: " Vivek Goyal
@ 2020-03-31 19:40   ` Vivek Goyal
  -1 siblings, 0 replies; 12+ messages in thread
From: Vivek Goyal @ 2020-03-31 19:40 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: virtio-fs, miklos, stefanha, dgilbert, vgoyal, aarcange, dhildenb

As of now asynchronous page fault mecahanism assumes host will always be
successful in resolving page fault. So there are only two states, that
is page is not present and page is ready.

If a page is backed by a file and that file has been truncated (as
can be the case with virtio-fs), then page fault handler on host returns
-EFAULT.

As of now async page fault logic does not look at error code (-EFAULT)
returned by get_user_pages_remote() and returns PAGE_READY to guest.
Guest tries to access page and page fault happnes again. And this
gets kvm into an infinite loop. (Killing host process gets kvm out of
this loop though).

This patch adds another state to async page fault logic which allows
host to return error to guest. Once guest knows that async page fault
can't be resolved, it can send SIGBUS to host process (if user space
was accessing the page in question).

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 Documentation/virt/kvm/cpuid.rst     |  4 ++++
 Documentation/virt/kvm/msr.rst       | 11 ++++++++---
 arch/x86/include/asm/kvm_host.h      |  3 +++
 arch/x86/include/asm/kvm_para.h      |  4 ++--
 arch/x86/include/uapi/asm/kvm_para.h |  3 +++
 arch/x86/kernel/kvm.c                | 29 +++++++++++++++++++++++++---
 arch/x86/kvm/cpuid.c                 |  3 ++-
 arch/x86/kvm/mmu/mmu.c               |  2 +-
 arch/x86/kvm/x86.c                   | 13 +++++++++----
 include/linux/kvm_host.h             |  1 +
 virt/kvm/async_pf.c                  |  6 ++++--
 11 files changed, 63 insertions(+), 16 deletions(-)

diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
index 01b081f6e7ea..a00bc5e964e0 100644
--- a/Documentation/virt/kvm/cpuid.rst
+++ b/Documentation/virt/kvm/cpuid.rst
@@ -86,6 +86,10 @@ KVM_FEATURE_PV_SCHED_YIELD        13          guest checks this feature bit
                                               before using paravirtualized
                                               sched yield.
 
+KVM_FEATURE_ASYNC_PF_ERROR        14          paravirtualized async PF error
+                                              can be enabled by setting bit 3
+                                              when writing to msr 0x4b564d02
+
 KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24          host will warn if no guest-side
                                               per-cpu warps are expeced in
                                               kvmclock
diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
index 33892036672d..93f5e555dcdf 100644
--- a/Documentation/virt/kvm/msr.rst
+++ b/Documentation/virt/kvm/msr.rst
@@ -192,18 +192,23 @@ MSR_KVM_ASYNC_PF_EN:
 data:
 	Bits 63-6 hold 64-byte aligned physical address of a
 	64 byte memory area which must be in guest RAM and must be
-	zeroed. Bits 5-3 are reserved and should be zero. Bit 0 is 1
+	zeroed. Bits 5-4 are reserved and should be zero. Bit 0 is 1
 	when asynchronous page faults are enabled on the vcpu 0 when
 	disabled. Bit 1 is 1 if asynchronous page faults can be injected
 	when vcpu is in cpl == 0. Bit 2 is 1 if asynchronous page faults
 	are delivered to L1 as #PF vmexits.  Bit 2 can be set only if
-	KVM_FEATURE_ASYNC_PF_VMEXIT is present in CPUID.
+	KVM_FEATURE_ASYNC_PF_VMEXIT is present in CPUID. Bit 3 is 1 if
+        asynchronous page fault can return error if hypervisor encounters
+        errors trying to fault in the page. Bit 3 can be set only if
+        KVM_FEATURE_ASYNC_PF_ERROR is present in CPUID.
 
 	First 4 byte of 64 byte memory location will be written to by
 	the hypervisor at the time of asynchronous page fault (APF)
 	injection to indicate type of asynchronous page fault. Value
 	of 1 means that the page referred to by the page fault is not
-	present. Value 2 means that the page is now available. Disabling
+	present. Value 2 means that the page is now available. Value 3
+        means that hypervisor met with error while trying to fault in
+        page and task should probably be sent SIGBUS. Disabling
 	interrupt inhibits APFs. Guest must not enable interrupt
 	before the reason is read, or it may be overwritten by another
 	APF. Since APF uses the same exception vector as regular page
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 98959e8cd448..011a5aab9df6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -765,6 +765,7 @@ struct kvm_vcpu_arch {
 		u32 host_apf_reason;
 		unsigned long nested_apf_token;
 		bool delivery_as_pf_vmexit;
+		bool send_pf_error;
 	} apf;
 
 	/* OSVW MSRs (AMD only) */
@@ -1642,6 +1643,8 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 				 struct kvm_async_pf *work);
 void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
 			       struct kvm_async_pf *work);
+void kvm_arch_async_page_fault_error(struct kvm_vcpu *vcpu,
+				 struct kvm_async_pf *work);
 bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu);
 extern bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
 
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 9b4df6eaa11a..3d6339c6cd47 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -89,7 +89,7 @@ bool kvm_para_available(void);
 unsigned int kvm_arch_para_features(void);
 unsigned int kvm_arch_para_hints(void);
 void kvm_async_pf_task_wait(u32 token, int interrupt_kernel);
-void kvm_async_pf_task_wake(u32 token);
+void kvm_async_pf_task_wake(u32 token, bool is_err);
 u32 kvm_read_and_reset_pf_reason(void);
 extern void kvm_disable_steal_time(void);
 void do_async_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned long address);
@@ -104,7 +104,7 @@ static inline void kvm_spinlock_init(void)
 
 #else /* CONFIG_KVM_GUEST */
 #define kvm_async_pf_task_wait(T, I) do {} while(0)
-#define kvm_async_pf_task_wake(T) do {} while(0)
+#define kvm_async_pf_task_wake(T, I) do {} while(0)
 
 static inline bool kvm_para_available(void)
 {
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 2a8e0b6b9805..09743b45af79 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -31,6 +31,7 @@
 #define KVM_FEATURE_PV_SEND_IPI	11
 #define KVM_FEATURE_POLL_CONTROL	12
 #define KVM_FEATURE_PV_SCHED_YIELD	13
+#define KVM_FEATURE_ASYNC_PF_ERROR	14
 
 #define KVM_HINTS_REALTIME      0
 
@@ -81,6 +82,7 @@ struct kvm_clock_pairing {
 #define KVM_ASYNC_PF_ENABLED			(1 << 0)
 #define KVM_ASYNC_PF_SEND_ALWAYS		(1 << 1)
 #define KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT	(1 << 2)
+#define KVM_ASYNC_PF_SEND_ERROR			(1 << 3)
 
 /* Operations for KVM_HC_MMU_OP */
 #define KVM_MMU_OP_WRITE_PTE            1
@@ -110,6 +112,7 @@ struct kvm_mmu_op_release_pt {
 
 #define KVM_PV_REASON_PAGE_NOT_PRESENT 1
 #define KVM_PV_REASON_PAGE_READY 2
+#define KVM_PV_REASON_PAGE_FAULT_ERROR 3
 
 struct kvm_vcpu_pv_apf_data {
 	__u32 reason;
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 6efe0410fb72..b5e9e3fa82df 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -74,6 +74,7 @@ struct kvm_task_sleep_node {
 	u32 token;
 	int cpu;
 	bool halted;
+	bool is_err;
 };
 
 static struct kvm_task_sleep_head {
@@ -96,6 +97,12 @@ static struct kvm_task_sleep_node *_find_apf_task(struct kvm_task_sleep_head *b,
 	return NULL;
 }
 
+static void handle_async_pf_error(int user_mode)
+{
+	if (user_mode)
+		send_sig_info(SIGBUS, SEND_SIG_PRIV, current);
+}
+
 /*
  * @interrupt_kernel: Is this called from a routine which interrupts the kernel
  * 		      (other than user space)?
@@ -113,6 +120,8 @@ void kvm_async_pf_task_wait(u32 token, int interrupt_kernel)
 	e = _find_apf_task(b, token);
 	if (e) {
 		/* dummy entry exist -> wake up was delivered ahead of PF */
+		if (e->is_err)
+			handle_async_pf_error(!interrupt_kernel);
 		hlist_del(&e->link);
 		kfree(e);
 		raw_spin_unlock(&b->lock);
@@ -156,6 +165,9 @@ void kvm_async_pf_task_wait(u32 token, int interrupt_kernel)
 	if (!n.halted)
 		finish_swait(&n.wq, &wait);
 
+	if (n.is_err)
+		handle_async_pf_error(!interrupt_kernel);
+
 	rcu_irq_exit();
 	return;
 }
@@ -188,7 +200,7 @@ static void apf_task_wake_all(void)
 	}
 }
 
-void kvm_async_pf_task_wake(u32 token)
+void kvm_async_pf_task_wake(u32 token, bool is_err)
 {
 	u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS);
 	struct kvm_task_sleep_head *b = &async_pf_sleepers[key];
@@ -219,10 +231,13 @@ void kvm_async_pf_task_wake(u32 token)
 		}
 		n->token = token;
 		n->cpu = smp_processor_id();
+		n->is_err = is_err;
 		init_swait_queue_head(&n->wq);
 		hlist_add_head(&n->link, &b->list);
-	} else
+	} else {
+		n->is_err = is_err;
 		apf_task_wake_one(n);
+	}
 	raw_spin_unlock(&b->lock);
 	return;
 }
@@ -255,7 +270,12 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned lon
 		break;
 	case KVM_PV_REASON_PAGE_READY:
 		rcu_irq_enter();
-		kvm_async_pf_task_wake((u32)address);
+		kvm_async_pf_task_wake((u32)address, false);
+		rcu_irq_exit();
+		break;
+	case KVM_PV_REASON_PAGE_FAULT_ERROR:
+		rcu_irq_enter();
+		kvm_async_pf_task_wake((u32)address, true);
 		rcu_irq_exit();
 		break;
 	}
@@ -316,6 +336,9 @@ static void kvm_guest_cpu_init(void)
 		if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_VMEXIT))
 			pa |= KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT;
 
+		if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_ERROR))
+			pa |= KVM_ASYNC_PF_SEND_ERROR;
+
 		wrmsrl(MSR_KVM_ASYNC_PF_EN, pa);
 		__this_cpu_write(apf_reason.enabled, 1);
 		printk(KERN_INFO"KVM setup async PF for cpu %d\n",
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index b1c469446b07..1ce1d998cbc2 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -716,7 +716,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
 			     (1 << KVM_FEATURE_ASYNC_PF_VMEXIT) |
 			     (1 << KVM_FEATURE_PV_SEND_IPI) |
 			     (1 << KVM_FEATURE_POLL_CONTROL) |
-			     (1 << KVM_FEATURE_PV_SCHED_YIELD);
+			     (1 << KVM_FEATURE_PV_SCHED_YIELD) |
+			     (1 << KVM_FEATURE_ASYNC_PF_ERROR);
 
 		if (sched_info_on())
 			entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 87e9ba27ada1..7c6e081bade1 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4211,7 +4211,7 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
 	case KVM_PV_REASON_PAGE_READY:
 		vcpu->arch.apf.host_apf_reason = 0;
 		local_irq_disable();
-		kvm_async_pf_task_wake(fault_address);
+		kvm_async_pf_task_wake(fault_address, 0);
 		local_irq_enable();
 		break;
 	}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3156e25b0774..9cd388f1891a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2614,8 +2614,8 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
 {
 	gpa_t gpa = data & ~0x3f;
 
-	/* Bits 3:5 are reserved, Should be zero */
-	if (data & 0x38)
+	/* Bits 4:5 are reserved, Should be zero */
+	if (data & 0x30)
 		return 1;
 
 	vcpu->arch.apf.msr_val = data;
@@ -2632,6 +2632,7 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
 
 	vcpu->arch.apf.send_user_only = !(data & KVM_ASYNC_PF_SEND_ALWAYS);
 	vcpu->arch.apf.delivery_as_pf_vmexit = data & KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT;
+	vcpu->arch.apf.send_pf_error = data & KVM_ASYNC_PF_SEND_ERROR;
 	kvm_async_pf_wakeup_all(vcpu);
 	return 0;
 }
@@ -10338,12 +10339,16 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 				 struct kvm_async_pf *work)
 {
 	struct x86_exception fault;
-	u32 val;
+	u32 val, async_pf_event = KVM_PV_REASON_PAGE_READY;
 
 	if (work->wakeup_all)
 		work->arch.token = ~0; /* broadcast wakeup */
 	else
 		kvm_del_async_pf_gfn(vcpu, work->arch.gfn);
+
+	if (work->error_code && vcpu->arch.apf.send_pf_error)
+		async_pf_event = KVM_PV_REASON_PAGE_FAULT_ERROR;
+
 	trace_kvm_async_pf_ready(work->arch.token, work->cr2_or_gpa);
 
 	if (vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED &&
@@ -10359,7 +10364,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 			vcpu->arch.exception.error_code = 0;
 			vcpu->arch.exception.has_payload = false;
 			vcpu->arch.exception.payload = 0;
-		} else if (!apf_put_user(vcpu, KVM_PV_REASON_PAGE_READY)) {
+		} else if (!apf_put_user(vcpu, async_pf_event)) {
 			fault.vector = PF_VECTOR;
 			fault.error_code_valid = true;
 			fault.error_code = 0;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index bcb9b2ac0791..363fda33f803 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -206,6 +206,7 @@ struct kvm_async_pf {
 	unsigned long addr;
 	struct kvm_arch_async_pf arch;
 	bool   wakeup_all;
+	unsigned int error_code;
 };
 
 void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index 15e5b037f92d..d5268d34fc8e 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -51,6 +51,7 @@ static void async_pf_execute(struct work_struct *work)
 	unsigned long addr = apf->addr;
 	gpa_t cr2_or_gpa = apf->cr2_or_gpa;
 	int locked = 1;
+	long ret;
 
 	might_sleep();
 
@@ -60,11 +61,12 @@ static void async_pf_execute(struct work_struct *work)
 	 * access remotely.
 	 */
 	down_read(&mm->mmap_sem);
-	get_user_pages_remote(NULL, mm, addr, 1, FOLL_WRITE, NULL, NULL,
-			&locked);
+	ret = get_user_pages_remote(NULL, mm, addr, 1, FOLL_WRITE, NULL, NULL,
+				    &locked);
 	if (locked)
 		up_read(&mm->mmap_sem);
 
+	apf->error_code = ret;
 	if (IS_ENABLED(CONFIG_KVM_ASYNC_PF_SYNC))
 		kvm_arch_async_page_present(vcpu, apf);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Virtio-fs] [PATCH 1/4] kvm: Add capability to be able to report async pf error to guest
@ 2020-03-31 19:40   ` Vivek Goyal
  0 siblings, 0 replies; 12+ messages in thread
From: Vivek Goyal @ 2020-03-31 19:40 UTC (permalink / raw)
  To: kvm, linux-kernel; +Cc: aarcange, miklos, virtio-fs, dhildenb, vgoyal

As of now asynchronous page fault mecahanism assumes host will always be
successful in resolving page fault. So there are only two states, that
is page is not present and page is ready.

If a page is backed by a file and that file has been truncated (as
can be the case with virtio-fs), then page fault handler on host returns
-EFAULT.

As of now async page fault logic does not look at error code (-EFAULT)
returned by get_user_pages_remote() and returns PAGE_READY to guest.
Guest tries to access page and page fault happnes again. And this
gets kvm into an infinite loop. (Killing host process gets kvm out of
this loop though).

This patch adds another state to async page fault logic which allows
host to return error to guest. Once guest knows that async page fault
can't be resolved, it can send SIGBUS to host process (if user space
was accessing the page in question).

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 Documentation/virt/kvm/cpuid.rst     |  4 ++++
 Documentation/virt/kvm/msr.rst       | 11 ++++++++---
 arch/x86/include/asm/kvm_host.h      |  3 +++
 arch/x86/include/asm/kvm_para.h      |  4 ++--
 arch/x86/include/uapi/asm/kvm_para.h |  3 +++
 arch/x86/kernel/kvm.c                | 29 +++++++++++++++++++++++++---
 arch/x86/kvm/cpuid.c                 |  3 ++-
 arch/x86/kvm/mmu/mmu.c               |  2 +-
 arch/x86/kvm/x86.c                   | 13 +++++++++----
 include/linux/kvm_host.h             |  1 +
 virt/kvm/async_pf.c                  |  6 ++++--
 11 files changed, 63 insertions(+), 16 deletions(-)

diff --git a/Documentation/virt/kvm/cpuid.rst b/Documentation/virt/kvm/cpuid.rst
index 01b081f6e7ea..a00bc5e964e0 100644
--- a/Documentation/virt/kvm/cpuid.rst
+++ b/Documentation/virt/kvm/cpuid.rst
@@ -86,6 +86,10 @@ KVM_FEATURE_PV_SCHED_YIELD        13          guest checks this feature bit
                                               before using paravirtualized
                                               sched yield.
 
+KVM_FEATURE_ASYNC_PF_ERROR        14          paravirtualized async PF error
+                                              can be enabled by setting bit 3
+                                              when writing to msr 0x4b564d02
+
 KVM_FEATURE_CLOCSOURCE_STABLE_BIT 24          host will warn if no guest-side
                                               per-cpu warps are expeced in
                                               kvmclock
diff --git a/Documentation/virt/kvm/msr.rst b/Documentation/virt/kvm/msr.rst
index 33892036672d..93f5e555dcdf 100644
--- a/Documentation/virt/kvm/msr.rst
+++ b/Documentation/virt/kvm/msr.rst
@@ -192,18 +192,23 @@ MSR_KVM_ASYNC_PF_EN:
 data:
 	Bits 63-6 hold 64-byte aligned physical address of a
 	64 byte memory area which must be in guest RAM and must be
-	zeroed. Bits 5-3 are reserved and should be zero. Bit 0 is 1
+	zeroed. Bits 5-4 are reserved and should be zero. Bit 0 is 1
 	when asynchronous page faults are enabled on the vcpu 0 when
 	disabled. Bit 1 is 1 if asynchronous page faults can be injected
 	when vcpu is in cpl == 0. Bit 2 is 1 if asynchronous page faults
 	are delivered to L1 as #PF vmexits.  Bit 2 can be set only if
-	KVM_FEATURE_ASYNC_PF_VMEXIT is present in CPUID.
+	KVM_FEATURE_ASYNC_PF_VMEXIT is present in CPUID. Bit 3 is 1 if
+        asynchronous page fault can return error if hypervisor encounters
+        errors trying to fault in the page. Bit 3 can be set only if
+        KVM_FEATURE_ASYNC_PF_ERROR is present in CPUID.
 
 	First 4 byte of 64 byte memory location will be written to by
 	the hypervisor at the time of asynchronous page fault (APF)
 	injection to indicate type of asynchronous page fault. Value
 	of 1 means that the page referred to by the page fault is not
-	present. Value 2 means that the page is now available. Disabling
+	present. Value 2 means that the page is now available. Value 3
+        means that hypervisor met with error while trying to fault in
+        page and task should probably be sent SIGBUS. Disabling
 	interrupt inhibits APFs. Guest must not enable interrupt
 	before the reason is read, or it may be overwritten by another
 	APF. Since APF uses the same exception vector as regular page
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 98959e8cd448..011a5aab9df6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -765,6 +765,7 @@ struct kvm_vcpu_arch {
 		u32 host_apf_reason;
 		unsigned long nested_apf_token;
 		bool delivery_as_pf_vmexit;
+		bool send_pf_error;
 	} apf;
 
 	/* OSVW MSRs (AMD only) */
@@ -1642,6 +1643,8 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 				 struct kvm_async_pf *work);
 void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
 			       struct kvm_async_pf *work);
+void kvm_arch_async_page_fault_error(struct kvm_vcpu *vcpu,
+				 struct kvm_async_pf *work);
 bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu);
 extern bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
 
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 9b4df6eaa11a..3d6339c6cd47 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -89,7 +89,7 @@ bool kvm_para_available(void);
 unsigned int kvm_arch_para_features(void);
 unsigned int kvm_arch_para_hints(void);
 void kvm_async_pf_task_wait(u32 token, int interrupt_kernel);
-void kvm_async_pf_task_wake(u32 token);
+void kvm_async_pf_task_wake(u32 token, bool is_err);
 u32 kvm_read_and_reset_pf_reason(void);
 extern void kvm_disable_steal_time(void);
 void do_async_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned long address);
@@ -104,7 +104,7 @@ static inline void kvm_spinlock_init(void)
 
 #else /* CONFIG_KVM_GUEST */
 #define kvm_async_pf_task_wait(T, I) do {} while(0)
-#define kvm_async_pf_task_wake(T) do {} while(0)
+#define kvm_async_pf_task_wake(T, I) do {} while(0)
 
 static inline bool kvm_para_available(void)
 {
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 2a8e0b6b9805..09743b45af79 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -31,6 +31,7 @@
 #define KVM_FEATURE_PV_SEND_IPI	11
 #define KVM_FEATURE_POLL_CONTROL	12
 #define KVM_FEATURE_PV_SCHED_YIELD	13
+#define KVM_FEATURE_ASYNC_PF_ERROR	14
 
 #define KVM_HINTS_REALTIME      0
 
@@ -81,6 +82,7 @@ struct kvm_clock_pairing {
 #define KVM_ASYNC_PF_ENABLED			(1 << 0)
 #define KVM_ASYNC_PF_SEND_ALWAYS		(1 << 1)
 #define KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT	(1 << 2)
+#define KVM_ASYNC_PF_SEND_ERROR			(1 << 3)
 
 /* Operations for KVM_HC_MMU_OP */
 #define KVM_MMU_OP_WRITE_PTE            1
@@ -110,6 +112,7 @@ struct kvm_mmu_op_release_pt {
 
 #define KVM_PV_REASON_PAGE_NOT_PRESENT 1
 #define KVM_PV_REASON_PAGE_READY 2
+#define KVM_PV_REASON_PAGE_FAULT_ERROR 3
 
 struct kvm_vcpu_pv_apf_data {
 	__u32 reason;
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 6efe0410fb72..b5e9e3fa82df 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -74,6 +74,7 @@ struct kvm_task_sleep_node {
 	u32 token;
 	int cpu;
 	bool halted;
+	bool is_err;
 };
 
 static struct kvm_task_sleep_head {
@@ -96,6 +97,12 @@ static struct kvm_task_sleep_node *_find_apf_task(struct kvm_task_sleep_head *b,
 	return NULL;
 }
 
+static void handle_async_pf_error(int user_mode)
+{
+	if (user_mode)
+		send_sig_info(SIGBUS, SEND_SIG_PRIV, current);
+}
+
 /*
  * @interrupt_kernel: Is this called from a routine which interrupts the kernel
  * 		      (other than user space)?
@@ -113,6 +120,8 @@ void kvm_async_pf_task_wait(u32 token, int interrupt_kernel)
 	e = _find_apf_task(b, token);
 	if (e) {
 		/* dummy entry exist -> wake up was delivered ahead of PF */
+		if (e->is_err)
+			handle_async_pf_error(!interrupt_kernel);
 		hlist_del(&e->link);
 		kfree(e);
 		raw_spin_unlock(&b->lock);
@@ -156,6 +165,9 @@ void kvm_async_pf_task_wait(u32 token, int interrupt_kernel)
 	if (!n.halted)
 		finish_swait(&n.wq, &wait);
 
+	if (n.is_err)
+		handle_async_pf_error(!interrupt_kernel);
+
 	rcu_irq_exit();
 	return;
 }
@@ -188,7 +200,7 @@ static void apf_task_wake_all(void)
 	}
 }
 
-void kvm_async_pf_task_wake(u32 token)
+void kvm_async_pf_task_wake(u32 token, bool is_err)
 {
 	u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS);
 	struct kvm_task_sleep_head *b = &async_pf_sleepers[key];
@@ -219,10 +231,13 @@ void kvm_async_pf_task_wake(u32 token)
 		}
 		n->token = token;
 		n->cpu = smp_processor_id();
+		n->is_err = is_err;
 		init_swait_queue_head(&n->wq);
 		hlist_add_head(&n->link, &b->list);
-	} else
+	} else {
+		n->is_err = is_err;
 		apf_task_wake_one(n);
+	}
 	raw_spin_unlock(&b->lock);
 	return;
 }
@@ -255,7 +270,12 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned lon
 		break;
 	case KVM_PV_REASON_PAGE_READY:
 		rcu_irq_enter();
-		kvm_async_pf_task_wake((u32)address);
+		kvm_async_pf_task_wake((u32)address, false);
+		rcu_irq_exit();
+		break;
+	case KVM_PV_REASON_PAGE_FAULT_ERROR:
+		rcu_irq_enter();
+		kvm_async_pf_task_wake((u32)address, true);
 		rcu_irq_exit();
 		break;
 	}
@@ -316,6 +336,9 @@ static void kvm_guest_cpu_init(void)
 		if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_VMEXIT))
 			pa |= KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT;
 
+		if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_ERROR))
+			pa |= KVM_ASYNC_PF_SEND_ERROR;
+
 		wrmsrl(MSR_KVM_ASYNC_PF_EN, pa);
 		__this_cpu_write(apf_reason.enabled, 1);
 		printk(KERN_INFO"KVM setup async PF for cpu %d\n",
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index b1c469446b07..1ce1d998cbc2 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -716,7 +716,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_entry2 *entry, u32 function,
 			     (1 << KVM_FEATURE_ASYNC_PF_VMEXIT) |
 			     (1 << KVM_FEATURE_PV_SEND_IPI) |
 			     (1 << KVM_FEATURE_POLL_CONTROL) |
-			     (1 << KVM_FEATURE_PV_SCHED_YIELD);
+			     (1 << KVM_FEATURE_PV_SCHED_YIELD) |
+			     (1 << KVM_FEATURE_ASYNC_PF_ERROR);
 
 		if (sched_info_on())
 			entry->eax |= (1 << KVM_FEATURE_STEAL_TIME);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 87e9ba27ada1..7c6e081bade1 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4211,7 +4211,7 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
 	case KVM_PV_REASON_PAGE_READY:
 		vcpu->arch.apf.host_apf_reason = 0;
 		local_irq_disable();
-		kvm_async_pf_task_wake(fault_address);
+		kvm_async_pf_task_wake(fault_address, 0);
 		local_irq_enable();
 		break;
 	}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3156e25b0774..9cd388f1891a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2614,8 +2614,8 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
 {
 	gpa_t gpa = data & ~0x3f;
 
-	/* Bits 3:5 are reserved, Should be zero */
-	if (data & 0x38)
+	/* Bits 4:5 are reserved, Should be zero */
+	if (data & 0x30)
 		return 1;
 
 	vcpu->arch.apf.msr_val = data;
@@ -2632,6 +2632,7 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
 
 	vcpu->arch.apf.send_user_only = !(data & KVM_ASYNC_PF_SEND_ALWAYS);
 	vcpu->arch.apf.delivery_as_pf_vmexit = data & KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT;
+	vcpu->arch.apf.send_pf_error = data & KVM_ASYNC_PF_SEND_ERROR;
 	kvm_async_pf_wakeup_all(vcpu);
 	return 0;
 }
@@ -10338,12 +10339,16 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 				 struct kvm_async_pf *work)
 {
 	struct x86_exception fault;
-	u32 val;
+	u32 val, async_pf_event = KVM_PV_REASON_PAGE_READY;
 
 	if (work->wakeup_all)
 		work->arch.token = ~0; /* broadcast wakeup */
 	else
 		kvm_del_async_pf_gfn(vcpu, work->arch.gfn);
+
+	if (work->error_code && vcpu->arch.apf.send_pf_error)
+		async_pf_event = KVM_PV_REASON_PAGE_FAULT_ERROR;
+
 	trace_kvm_async_pf_ready(work->arch.token, work->cr2_or_gpa);
 
 	if (vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED &&
@@ -10359,7 +10364,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 			vcpu->arch.exception.error_code = 0;
 			vcpu->arch.exception.has_payload = false;
 			vcpu->arch.exception.payload = 0;
-		} else if (!apf_put_user(vcpu, KVM_PV_REASON_PAGE_READY)) {
+		} else if (!apf_put_user(vcpu, async_pf_event)) {
 			fault.vector = PF_VECTOR;
 			fault.error_code_valid = true;
 			fault.error_code = 0;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index bcb9b2ac0791..363fda33f803 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -206,6 +206,7 @@ struct kvm_async_pf {
 	unsigned long addr;
 	struct kvm_arch_async_pf arch;
 	bool   wakeup_all;
+	unsigned int error_code;
 };
 
 void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index 15e5b037f92d..d5268d34fc8e 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -51,6 +51,7 @@ static void async_pf_execute(struct work_struct *work)
 	unsigned long addr = apf->addr;
 	gpa_t cr2_or_gpa = apf->cr2_or_gpa;
 	int locked = 1;
+	long ret;
 
 	might_sleep();
 
@@ -60,11 +61,12 @@ static void async_pf_execute(struct work_struct *work)
 	 * access remotely.
 	 */
 	down_read(&mm->mmap_sem);
-	get_user_pages_remote(NULL, mm, addr, 1, FOLL_WRITE, NULL, NULL,
-			&locked);
+	ret = get_user_pages_remote(NULL, mm, addr, 1, FOLL_WRITE, NULL, NULL,
+				    &locked);
 	if (locked)
 		up_read(&mm->mmap_sem);
 
+	apf->error_code = ret;
 	if (IS_ENABLED(CONFIG_KVM_ASYNC_PF_SYNC))
 		kvm_arch_async_page_present(vcpu, apf);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/4] kvm: async_pf: Send faulting gva address in case of error
  2020-03-31 19:40 ` [Virtio-fs] [RFC PATCH 0/4] kvm, x86, async_pf: " Vivek Goyal
@ 2020-03-31 19:40   ` Vivek Goyal
  -1 siblings, 0 replies; 12+ messages in thread
From: Vivek Goyal @ 2020-03-31 19:40 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: virtio-fs, miklos, stefanha, dgilbert, vgoyal, aarcange, dhildenb

If async page fault returns/injects error in guest, also send guest virtual
address at the time of page fault. This will be needed if guest decides
to send SIGBUS to task in guest. Guest process will need this info if it
were to take some action.

TODO: Nested kvm needs to be modified to use this. Also this patch only
      takes care of intel vmx.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 arch/x86/include/asm/kvm_host.h      | 14 ++++++++++-
 arch/x86/include/asm/kvm_para.h      |  8 +++----
 arch/x86/include/asm/vmx.h           |  2 ++
 arch/x86/include/uapi/asm/kvm_para.h |  9 ++++++-
 arch/x86/kernel/kvm.c                | 36 +++++++++++++++++-----------
 arch/x86/kvm/mmu/mmu.c               | 10 ++++----
 arch/x86/kvm/vmx/nested.c            |  2 +-
 arch/x86/kvm/vmx/vmx.c               | 11 +++++++--
 arch/x86/kvm/x86.c                   | 34 +++++++++++++++++++-------
 9 files changed, 90 insertions(+), 36 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 011a5aab9df6..0f83faeb5863 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -762,7 +762,7 @@ struct kvm_vcpu_arch {
 		u64 msr_val;
 		u32 id;
 		bool send_user_only;
-		u32 host_apf_reason;
+		struct kvm_apf_reason host_apf_reason;
 		unsigned long nested_apf_token;
 		bool delivery_as_pf_vmexit;
 		bool send_pf_error;
@@ -813,6 +813,10 @@ struct kvm_vcpu_arch {
 	bool gpa_available;
 	gpa_t gpa_val;
 
+	/* GVA, if available at the time of VM exit */
+	bool gva_available;
+	gva_t gva_val;
+
 	/* be preempted when it's in kernel-mode(cpl=0) */
 	bool preempted_in_kernel;
 
@@ -1275,8 +1279,16 @@ struct kvm_arch_async_pf {
 	gfn_t gfn;
 	unsigned long cr3;
 	bool direct_map;
+	bool gva_available;
+	gva_t gva_val;
 };
 
+struct kvm_arch_async_pf_shared {
+	u32 reason;
+	u32 pad1;
+	u64 faulting_gva;
+} __packed;
+
 extern struct kvm_x86_ops *kvm_x86_ops;
 extern struct kmem_cache *x86_fpu_cache;
 
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 3d6339c6cd47..2d464e470325 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -89,8 +89,8 @@ bool kvm_para_available(void);
 unsigned int kvm_arch_para_features(void);
 unsigned int kvm_arch_para_hints(void);
 void kvm_async_pf_task_wait(u32 token, int interrupt_kernel);
-void kvm_async_pf_task_wake(u32 token, bool is_err);
-u32 kvm_read_and_reset_pf_reason(void);
+void kvm_async_pf_task_wake(u32 token, bool is_err, unsigned long addr);
+void kvm_read_and_reset_pf_reason(struct kvm_apf_reason *reason);
 extern void kvm_disable_steal_time(void);
 void do_async_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned long address);
 
@@ -104,7 +104,7 @@ static inline void kvm_spinlock_init(void)
 
 #else /* CONFIG_KVM_GUEST */
 #define kvm_async_pf_task_wait(T, I) do {} while(0)
-#define kvm_async_pf_task_wake(T, I) do {} while(0)
+#define kvm_async_pf_task_wake(T, I, A) do {} while(0)
 
 static inline bool kvm_para_available(void)
 {
@@ -121,7 +121,7 @@ static inline unsigned int kvm_arch_para_hints(void)
 	return 0;
 }
 
-static inline u32 kvm_read_and_reset_pf_reason(void)
+static inline void kvm_read_and_reset_pf_reason(struct kvm_apf_reason *reason)
 {
 	return 0;
 }
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 8521af3fef27..014cccb2d25d 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -529,6 +529,7 @@ struct vmx_msr_entry {
 #define EPT_VIOLATION_READABLE_BIT	3
 #define EPT_VIOLATION_WRITABLE_BIT	4
 #define EPT_VIOLATION_EXECUTABLE_BIT	5
+#define EPT_VIOLATION_GLA_VALID_BIT	7
 #define EPT_VIOLATION_GVA_TRANSLATED_BIT 8
 #define EPT_VIOLATION_ACC_READ		(1 << EPT_VIOLATION_ACC_READ_BIT)
 #define EPT_VIOLATION_ACC_WRITE		(1 << EPT_VIOLATION_ACC_WRITE_BIT)
@@ -536,6 +537,7 @@ struct vmx_msr_entry {
 #define EPT_VIOLATION_READABLE		(1 << EPT_VIOLATION_READABLE_BIT)
 #define EPT_VIOLATION_WRITABLE		(1 << EPT_VIOLATION_WRITABLE_BIT)
 #define EPT_VIOLATION_EXECUTABLE	(1 << EPT_VIOLATION_EXECUTABLE_BIT)
+#define EPT_VIOLATION_GLA_VALID		(1 << EPT_VIOLATION_GLA_VALID_BIT)
 #define EPT_VIOLATION_GVA_TRANSLATED	(1 << EPT_VIOLATION_GVA_TRANSLATED_BIT)
 
 /*
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 09743b45af79..95dcb6dd3c8a 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -116,10 +116,17 @@ struct kvm_mmu_op_release_pt {
 
 struct kvm_vcpu_pv_apf_data {
 	__u32 reason;
-	__u8 pad[60];
+	__u8 pad1[4];
+	__u64 faulting_gva;
+	__u8 pad2[48];
 	__u32 enabled;
 };
 
+struct kvm_apf_reason {
+	u32 reason;
+	u64 faulting_gva;
+};
+
 #define KVM_PV_EOI_BIT 0
 #define KVM_PV_EOI_MASK (0x1 << KVM_PV_EOI_BIT)
 #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index b5e9e3fa82df..42d17e8c0135 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -75,6 +75,7 @@ struct kvm_task_sleep_node {
 	int cpu;
 	bool halted;
 	bool is_err;
+	unsigned long fault_addr;
 };
 
 static struct kvm_task_sleep_head {
@@ -97,10 +98,10 @@ static struct kvm_task_sleep_node *_find_apf_task(struct kvm_task_sleep_head *b,
 	return NULL;
 }
 
-static void handle_async_pf_error(int user_mode)
+static void handle_async_pf_error(int user_mode, unsigned long fault_addr)
 {
 	if (user_mode)
-		send_sig_info(SIGBUS, SEND_SIG_PRIV, current);
+		force_sig_fault(SIGBUS, BUS_ADRERR, (void __user *)fault_addr);
 }
 
 /*
@@ -121,7 +122,7 @@ void kvm_async_pf_task_wait(u32 token, int interrupt_kernel)
 	if (e) {
 		/* dummy entry exist -> wake up was delivered ahead of PF */
 		if (e->is_err)
-			handle_async_pf_error(!interrupt_kernel);
+			handle_async_pf_error(!interrupt_kernel, e->fault_addr);
 		hlist_del(&e->link);
 		kfree(e);
 		raw_spin_unlock(&b->lock);
@@ -166,7 +167,7 @@ void kvm_async_pf_task_wait(u32 token, int interrupt_kernel)
 		finish_swait(&n.wq, &wait);
 
 	if (n.is_err)
-		handle_async_pf_error(!interrupt_kernel);
+		handle_async_pf_error(!interrupt_kernel, n.fault_addr);
 
 	rcu_irq_exit();
 	return;
@@ -200,7 +201,7 @@ static void apf_task_wake_all(void)
 	}
 }
 
-void kvm_async_pf_task_wake(u32 token, bool is_err)
+void kvm_async_pf_task_wake(u32 token, bool is_err, unsigned long fault_addr)
 {
 	u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS);
 	struct kvm_task_sleep_head *b = &async_pf_sleepers[key];
@@ -232,10 +233,12 @@ void kvm_async_pf_task_wake(u32 token, bool is_err)
 		n->token = token;
 		n->cpu = smp_processor_id();
 		n->is_err = is_err;
+		n->fault_addr = fault_addr;
 		init_swait_queue_head(&n->wq);
 		hlist_add_head(&n->link, &b->list);
 	} else {
 		n->is_err = is_err;
+		n->fault_addr = fault_addr;
 		apf_task_wake_one(n);
 	}
 	raw_spin_unlock(&b->lock);
@@ -243,16 +246,16 @@ void kvm_async_pf_task_wake(u32 token, bool is_err)
 }
 EXPORT_SYMBOL_GPL(kvm_async_pf_task_wake);
 
-u32 kvm_read_and_reset_pf_reason(void)
+void kvm_read_and_reset_pf_reason(struct kvm_apf_reason *apf)
 {
-	u32 reason = 0;
-
 	if (__this_cpu_read(apf_reason.enabled)) {
-		reason = __this_cpu_read(apf_reason.reason);
+		apf->reason = __this_cpu_read(apf_reason.reason);
+		apf->faulting_gva = __this_cpu_read(apf_reason.faulting_gva);
 		__this_cpu_write(apf_reason.reason, 0);
+		__this_cpu_write(apf_reason.faulting_gva, 0);
+	} else {
+		apf->reason = 0;
 	}
-
-	return reason;
 }
 EXPORT_SYMBOL_GPL(kvm_read_and_reset_pf_reason);
 NOKPROBE_SYMBOL(kvm_read_and_reset_pf_reason);
@@ -260,7 +263,11 @@ NOKPROBE_SYMBOL(kvm_read_and_reset_pf_reason);
 dotraplinkage void
 do_async_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned long address)
 {
-	switch (kvm_read_and_reset_pf_reason()) {
+	struct kvm_apf_reason apf_data;
+
+	kvm_read_and_reset_pf_reason(&apf_data);
+
+	switch (apf_data.reason) {
 	default:
 		do_page_fault(regs, error_code, address);
 		break;
@@ -270,12 +277,13 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned lon
 		break;
 	case KVM_PV_REASON_PAGE_READY:
 		rcu_irq_enter();
-		kvm_async_pf_task_wake((u32)address, false);
+		kvm_async_pf_task_wake((u32)address, false, 0);
 		rcu_irq_exit();
 		break;
 	case KVM_PV_REASON_PAGE_FAULT_ERROR:
 		rcu_irq_enter();
-		kvm_async_pf_task_wake((u32)address, true);
+		kvm_async_pf_task_wake((u32)address, true,
+				       apf_data.faulting_gva);
 		rcu_irq_exit();
 		break;
 	}
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 7c6e081bade1..e3337c5f73e0 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4082,6 +4082,8 @@ static int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	arch.direct_map = vcpu->arch.mmu->direct_map;
 	arch.cr3 = vcpu->arch.mmu->get_cr3(vcpu);
 
+	arch.gva_available = vcpu->arch.gva_available;
+	arch.gva_val = vcpu->arch.gva_val;
 	return kvm_setup_async_pf(vcpu, cr2_or_gpa,
 				  kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch);
 }
@@ -4193,7 +4195,7 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
 #endif
 
 	vcpu->arch.l1tf_flush_l1d = true;
-	switch (vcpu->arch.apf.host_apf_reason) {
+	switch (vcpu->arch.apf.host_apf_reason.reason) {
 	default:
 		trace_kvm_page_fault(fault_address, error_code);
 
@@ -4203,15 +4205,15 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
 				insn_len);
 		break;
 	case KVM_PV_REASON_PAGE_NOT_PRESENT:
-		vcpu->arch.apf.host_apf_reason = 0;
+		vcpu->arch.apf.host_apf_reason.reason = 0;
 		local_irq_disable();
 		kvm_async_pf_task_wait(fault_address, 0);
 		local_irq_enable();
 		break;
 	case KVM_PV_REASON_PAGE_READY:
-		vcpu->arch.apf.host_apf_reason = 0;
+		vcpu->arch.apf.host_apf_reason.reason = 0;
 		local_irq_disable();
-		kvm_async_pf_task_wake(fault_address, 0);
+		kvm_async_pf_task_wake(fault_address, 0, 0);
 		local_irq_enable();
 		break;
 	}
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 9750e590c89d..e8b026ec4acc 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5560,7 +5560,7 @@ bool nested_vmx_exit_reflected(struct kvm_vcpu *vcpu, u32 exit_reason)
 		if (is_nmi(intr_info))
 			return false;
 		else if (is_page_fault(intr_info))
-			return !vmx->vcpu.arch.apf.host_apf_reason && enable_ept;
+			return !vmx->vcpu.arch.apf.host_apf_reason.reason && enable_ept;
 		else if (is_debug(intr_info) &&
 			 vcpu->guest_debug &
 			 (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 26f8f31563e9..80dffc7375b6 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4676,7 +4676,8 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
 	if (is_page_fault(intr_info)) {
 		cr2 = vmcs_readl(EXIT_QUALIFICATION);
 		/* EPT won't cause page fault directly */
-		WARN_ON_ONCE(!vcpu->arch.apf.host_apf_reason && enable_ept);
+		WARN_ON_ONCE(!vcpu->arch.apf.host_apf_reason.reason &&
+			     enable_ept);
 		return kvm_handle_page_fault(vcpu, error_code, cr2, NULL, 0);
 	}
 
@@ -5159,6 +5160,7 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
 	unsigned long exit_qualification;
 	gpa_t gpa;
 	u64 error_code;
+	gva_t gva;
 
 	exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
 
@@ -5195,6 +5197,11 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
 	       PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK;
 
 	vcpu->arch.exit_qualification = exit_qualification;
+	if (exit_qualification | EPT_VIOLATION_GLA_VALID) {
+		gva = vmcs_readl(GUEST_LINEAR_ADDRESS);
+		vcpu->arch.gva_available = true;
+		vcpu->arch.gva_val = gva;
+	}
 	return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
 }
 
@@ -6236,7 +6243,7 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
 
 	/* if exit due to PF check for async PF */
 	if (is_page_fault(vmx->exit_intr_info))
-		vmx->vcpu.arch.apf.host_apf_reason = kvm_read_and_reset_pf_reason();
+		kvm_read_and_reset_pf_reason(&vmx->vcpu.arch.apf.host_apf_reason);
 
 	/* Handle machine checks before interrupts are enabled */
 	if (is_machine_check(vmx->exit_intr_info))
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9cd388f1891a..f3c79baf4998 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2627,7 +2627,7 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
 	}
 
 	if (kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.apf.data, gpa,
-					sizeof(u32)))
+				      sizeof(struct kvm_arch_async_pf_shared)))
 		return 1;
 
 	vcpu->arch.apf.send_user_only = !(data & KVM_ASYNC_PF_SEND_ALWAYS);
@@ -10261,12 +10261,18 @@ static void kvm_del_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
 	}
 }
 
-static int apf_put_user(struct kvm_vcpu *vcpu, u32 val)
+static int apf_put_user_u32(struct kvm_vcpu *vcpu, u32 val)
 {
-
 	return kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.apf.data, &val,
 				      sizeof(val));
 }
+static int apf_put_user(struct kvm_vcpu *vcpu,
+			struct kvm_arch_async_pf_shared *val)
+{
+
+	return kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.apf.data, val,
+				      sizeof(*val));
+}
 
 static int apf_get_user(struct kvm_vcpu *vcpu, u32 *val)
 {
@@ -10309,12 +10315,16 @@ void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
 				     struct kvm_async_pf *work)
 {
 	struct x86_exception fault;
+	struct kvm_arch_async_pf_shared apf_shared;
 
 	trace_kvm_async_pf_not_present(work->arch.token, work->cr2_or_gpa);
 	kvm_add_async_pf_gfn(vcpu, work->arch.gfn);
 
+	memset(&apf_shared, 0, sizeof(apf_shared));
+	apf_shared.reason = KVM_PV_REASON_PAGE_NOT_PRESENT;
+
 	if (kvm_can_deliver_async_pf(vcpu) &&
-	    !apf_put_user(vcpu, KVM_PV_REASON_PAGE_NOT_PRESENT)) {
+	    !apf_put_user(vcpu, &apf_shared)) {
 		fault.vector = PF_VECTOR;
 		fault.error_code_valid = true;
 		fault.error_code = 0;
@@ -10339,15 +10349,21 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 				 struct kvm_async_pf *work)
 {
 	struct x86_exception fault;
-	u32 val, async_pf_event = KVM_PV_REASON_PAGE_READY;
+	u32 val;
+	struct kvm_arch_async_pf_shared asyncpf_shared;
 
 	if (work->wakeup_all)
 		work->arch.token = ~0; /* broadcast wakeup */
 	else
 		kvm_del_async_pf_gfn(vcpu, work->arch.gfn);
 
-	if (work->error_code && vcpu->arch.apf.send_pf_error)
-		async_pf_event = KVM_PV_REASON_PAGE_FAULT_ERROR;
+	memset(&asyncpf_shared, 0, sizeof(asyncpf_shared));
+	asyncpf_shared.reason = KVM_PV_REASON_PAGE_READY;
+	if (work->error_code && vcpu->arch.apf.send_pf_error) {
+		asyncpf_shared.reason = KVM_PV_REASON_PAGE_FAULT_ERROR;
+		if (work->arch.gva_available)
+			asyncpf_shared.faulting_gva = work->arch.gva_val;
+	}
 
 	trace_kvm_async_pf_ready(work->arch.token, work->cr2_or_gpa);
 
@@ -10356,7 +10372,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 		if (val == KVM_PV_REASON_PAGE_NOT_PRESENT &&
 		    vcpu->arch.exception.pending &&
 		    vcpu->arch.exception.nr == PF_VECTOR &&
-		    !apf_put_user(vcpu, 0)) {
+		    !apf_put_user_u32(vcpu, 0)) {
 			vcpu->arch.exception.injected = false;
 			vcpu->arch.exception.pending = false;
 			vcpu->arch.exception.nr = 0;
@@ -10364,7 +10380,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 			vcpu->arch.exception.error_code = 0;
 			vcpu->arch.exception.has_payload = false;
 			vcpu->arch.exception.payload = 0;
-		} else if (!apf_put_user(vcpu, async_pf_event)) {
+		} else if (!apf_put_user(vcpu, &asyncpf_shared)) {
 			fault.vector = PF_VECTOR;
 			fault.error_code_valid = true;
 			fault.error_code = 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Virtio-fs] [PATCH 2/4] kvm: async_pf: Send faulting gva address in case of error
@ 2020-03-31 19:40   ` Vivek Goyal
  0 siblings, 0 replies; 12+ messages in thread
From: Vivek Goyal @ 2020-03-31 19:40 UTC (permalink / raw)
  To: kvm, linux-kernel; +Cc: aarcange, miklos, virtio-fs, dhildenb, vgoyal

If async page fault returns/injects error in guest, also send guest virtual
address at the time of page fault. This will be needed if guest decides
to send SIGBUS to task in guest. Guest process will need this info if it
were to take some action.

TODO: Nested kvm needs to be modified to use this. Also this patch only
      takes care of intel vmx.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 arch/x86/include/asm/kvm_host.h      | 14 ++++++++++-
 arch/x86/include/asm/kvm_para.h      |  8 +++----
 arch/x86/include/asm/vmx.h           |  2 ++
 arch/x86/include/uapi/asm/kvm_para.h |  9 ++++++-
 arch/x86/kernel/kvm.c                | 36 +++++++++++++++++-----------
 arch/x86/kvm/mmu/mmu.c               | 10 ++++----
 arch/x86/kvm/vmx/nested.c            |  2 +-
 arch/x86/kvm/vmx/vmx.c               | 11 +++++++--
 arch/x86/kvm/x86.c                   | 34 +++++++++++++++++++-------
 9 files changed, 90 insertions(+), 36 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 011a5aab9df6..0f83faeb5863 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -762,7 +762,7 @@ struct kvm_vcpu_arch {
 		u64 msr_val;
 		u32 id;
 		bool send_user_only;
-		u32 host_apf_reason;
+		struct kvm_apf_reason host_apf_reason;
 		unsigned long nested_apf_token;
 		bool delivery_as_pf_vmexit;
 		bool send_pf_error;
@@ -813,6 +813,10 @@ struct kvm_vcpu_arch {
 	bool gpa_available;
 	gpa_t gpa_val;
 
+	/* GVA, if available at the time of VM exit */
+	bool gva_available;
+	gva_t gva_val;
+
 	/* be preempted when it's in kernel-mode(cpl=0) */
 	bool preempted_in_kernel;
 
@@ -1275,8 +1279,16 @@ struct kvm_arch_async_pf {
 	gfn_t gfn;
 	unsigned long cr3;
 	bool direct_map;
+	bool gva_available;
+	gva_t gva_val;
 };
 
+struct kvm_arch_async_pf_shared {
+	u32 reason;
+	u32 pad1;
+	u64 faulting_gva;
+} __packed;
+
 extern struct kvm_x86_ops *kvm_x86_ops;
 extern struct kmem_cache *x86_fpu_cache;
 
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 3d6339c6cd47..2d464e470325 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -89,8 +89,8 @@ bool kvm_para_available(void);
 unsigned int kvm_arch_para_features(void);
 unsigned int kvm_arch_para_hints(void);
 void kvm_async_pf_task_wait(u32 token, int interrupt_kernel);
-void kvm_async_pf_task_wake(u32 token, bool is_err);
-u32 kvm_read_and_reset_pf_reason(void);
+void kvm_async_pf_task_wake(u32 token, bool is_err, unsigned long addr);
+void kvm_read_and_reset_pf_reason(struct kvm_apf_reason *reason);
 extern void kvm_disable_steal_time(void);
 void do_async_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned long address);
 
@@ -104,7 +104,7 @@ static inline void kvm_spinlock_init(void)
 
 #else /* CONFIG_KVM_GUEST */
 #define kvm_async_pf_task_wait(T, I) do {} while(0)
-#define kvm_async_pf_task_wake(T, I) do {} while(0)
+#define kvm_async_pf_task_wake(T, I, A) do {} while(0)
 
 static inline bool kvm_para_available(void)
 {
@@ -121,7 +121,7 @@ static inline unsigned int kvm_arch_para_hints(void)
 	return 0;
 }
 
-static inline u32 kvm_read_and_reset_pf_reason(void)
+static inline void kvm_read_and_reset_pf_reason(struct kvm_apf_reason *reason)
 {
 	return 0;
 }
diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 8521af3fef27..014cccb2d25d 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -529,6 +529,7 @@ struct vmx_msr_entry {
 #define EPT_VIOLATION_READABLE_BIT	3
 #define EPT_VIOLATION_WRITABLE_BIT	4
 #define EPT_VIOLATION_EXECUTABLE_BIT	5
+#define EPT_VIOLATION_GLA_VALID_BIT	7
 #define EPT_VIOLATION_GVA_TRANSLATED_BIT 8
 #define EPT_VIOLATION_ACC_READ		(1 << EPT_VIOLATION_ACC_READ_BIT)
 #define EPT_VIOLATION_ACC_WRITE		(1 << EPT_VIOLATION_ACC_WRITE_BIT)
@@ -536,6 +537,7 @@ struct vmx_msr_entry {
 #define EPT_VIOLATION_READABLE		(1 << EPT_VIOLATION_READABLE_BIT)
 #define EPT_VIOLATION_WRITABLE		(1 << EPT_VIOLATION_WRITABLE_BIT)
 #define EPT_VIOLATION_EXECUTABLE	(1 << EPT_VIOLATION_EXECUTABLE_BIT)
+#define EPT_VIOLATION_GLA_VALID		(1 << EPT_VIOLATION_GLA_VALID_BIT)
 #define EPT_VIOLATION_GVA_TRANSLATED	(1 << EPT_VIOLATION_GVA_TRANSLATED_BIT)
 
 /*
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index 09743b45af79..95dcb6dd3c8a 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -116,10 +116,17 @@ struct kvm_mmu_op_release_pt {
 
 struct kvm_vcpu_pv_apf_data {
 	__u32 reason;
-	__u8 pad[60];
+	__u8 pad1[4];
+	__u64 faulting_gva;
+	__u8 pad2[48];
 	__u32 enabled;
 };
 
+struct kvm_apf_reason {
+	u32 reason;
+	u64 faulting_gva;
+};
+
 #define KVM_PV_EOI_BIT 0
 #define KVM_PV_EOI_MASK (0x1 << KVM_PV_EOI_BIT)
 #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index b5e9e3fa82df..42d17e8c0135 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -75,6 +75,7 @@ struct kvm_task_sleep_node {
 	int cpu;
 	bool halted;
 	bool is_err;
+	unsigned long fault_addr;
 };
 
 static struct kvm_task_sleep_head {
@@ -97,10 +98,10 @@ static struct kvm_task_sleep_node *_find_apf_task(struct kvm_task_sleep_head *b,
 	return NULL;
 }
 
-static void handle_async_pf_error(int user_mode)
+static void handle_async_pf_error(int user_mode, unsigned long fault_addr)
 {
 	if (user_mode)
-		send_sig_info(SIGBUS, SEND_SIG_PRIV, current);
+		force_sig_fault(SIGBUS, BUS_ADRERR, (void __user *)fault_addr);
 }
 
 /*
@@ -121,7 +122,7 @@ void kvm_async_pf_task_wait(u32 token, int interrupt_kernel)
 	if (e) {
 		/* dummy entry exist -> wake up was delivered ahead of PF */
 		if (e->is_err)
-			handle_async_pf_error(!interrupt_kernel);
+			handle_async_pf_error(!interrupt_kernel, e->fault_addr);
 		hlist_del(&e->link);
 		kfree(e);
 		raw_spin_unlock(&b->lock);
@@ -166,7 +167,7 @@ void kvm_async_pf_task_wait(u32 token, int interrupt_kernel)
 		finish_swait(&n.wq, &wait);
 
 	if (n.is_err)
-		handle_async_pf_error(!interrupt_kernel);
+		handle_async_pf_error(!interrupt_kernel, n.fault_addr);
 
 	rcu_irq_exit();
 	return;
@@ -200,7 +201,7 @@ static void apf_task_wake_all(void)
 	}
 }
 
-void kvm_async_pf_task_wake(u32 token, bool is_err)
+void kvm_async_pf_task_wake(u32 token, bool is_err, unsigned long fault_addr)
 {
 	u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS);
 	struct kvm_task_sleep_head *b = &async_pf_sleepers[key];
@@ -232,10 +233,12 @@ void kvm_async_pf_task_wake(u32 token, bool is_err)
 		n->token = token;
 		n->cpu = smp_processor_id();
 		n->is_err = is_err;
+		n->fault_addr = fault_addr;
 		init_swait_queue_head(&n->wq);
 		hlist_add_head(&n->link, &b->list);
 	} else {
 		n->is_err = is_err;
+		n->fault_addr = fault_addr;
 		apf_task_wake_one(n);
 	}
 	raw_spin_unlock(&b->lock);
@@ -243,16 +246,16 @@ void kvm_async_pf_task_wake(u32 token, bool is_err)
 }
 EXPORT_SYMBOL_GPL(kvm_async_pf_task_wake);
 
-u32 kvm_read_and_reset_pf_reason(void)
+void kvm_read_and_reset_pf_reason(struct kvm_apf_reason *apf)
 {
-	u32 reason = 0;
-
 	if (__this_cpu_read(apf_reason.enabled)) {
-		reason = __this_cpu_read(apf_reason.reason);
+		apf->reason = __this_cpu_read(apf_reason.reason);
+		apf->faulting_gva = __this_cpu_read(apf_reason.faulting_gva);
 		__this_cpu_write(apf_reason.reason, 0);
+		__this_cpu_write(apf_reason.faulting_gva, 0);
+	} else {
+		apf->reason = 0;
 	}
-
-	return reason;
 }
 EXPORT_SYMBOL_GPL(kvm_read_and_reset_pf_reason);
 NOKPROBE_SYMBOL(kvm_read_and_reset_pf_reason);
@@ -260,7 +263,11 @@ NOKPROBE_SYMBOL(kvm_read_and_reset_pf_reason);
 dotraplinkage void
 do_async_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned long address)
 {
-	switch (kvm_read_and_reset_pf_reason()) {
+	struct kvm_apf_reason apf_data;
+
+	kvm_read_and_reset_pf_reason(&apf_data);
+
+	switch (apf_data.reason) {
 	default:
 		do_page_fault(regs, error_code, address);
 		break;
@@ -270,12 +277,13 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned lon
 		break;
 	case KVM_PV_REASON_PAGE_READY:
 		rcu_irq_enter();
-		kvm_async_pf_task_wake((u32)address, false);
+		kvm_async_pf_task_wake((u32)address, false, 0);
 		rcu_irq_exit();
 		break;
 	case KVM_PV_REASON_PAGE_FAULT_ERROR:
 		rcu_irq_enter();
-		kvm_async_pf_task_wake((u32)address, true);
+		kvm_async_pf_task_wake((u32)address, true,
+				       apf_data.faulting_gva);
 		rcu_irq_exit();
 		break;
 	}
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 7c6e081bade1..e3337c5f73e0 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4082,6 +4082,8 @@ static int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 	arch.direct_map = vcpu->arch.mmu->direct_map;
 	arch.cr3 = vcpu->arch.mmu->get_cr3(vcpu);
 
+	arch.gva_available = vcpu->arch.gva_available;
+	arch.gva_val = vcpu->arch.gva_val;
 	return kvm_setup_async_pf(vcpu, cr2_or_gpa,
 				  kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch);
 }
@@ -4193,7 +4195,7 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
 #endif
 
 	vcpu->arch.l1tf_flush_l1d = true;
-	switch (vcpu->arch.apf.host_apf_reason) {
+	switch (vcpu->arch.apf.host_apf_reason.reason) {
 	default:
 		trace_kvm_page_fault(fault_address, error_code);
 
@@ -4203,15 +4205,15 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
 				insn_len);
 		break;
 	case KVM_PV_REASON_PAGE_NOT_PRESENT:
-		vcpu->arch.apf.host_apf_reason = 0;
+		vcpu->arch.apf.host_apf_reason.reason = 0;
 		local_irq_disable();
 		kvm_async_pf_task_wait(fault_address, 0);
 		local_irq_enable();
 		break;
 	case KVM_PV_REASON_PAGE_READY:
-		vcpu->arch.apf.host_apf_reason = 0;
+		vcpu->arch.apf.host_apf_reason.reason = 0;
 		local_irq_disable();
-		kvm_async_pf_task_wake(fault_address, 0);
+		kvm_async_pf_task_wake(fault_address, 0, 0);
 		local_irq_enable();
 		break;
 	}
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 9750e590c89d..e8b026ec4acc 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5560,7 +5560,7 @@ bool nested_vmx_exit_reflected(struct kvm_vcpu *vcpu, u32 exit_reason)
 		if (is_nmi(intr_info))
 			return false;
 		else if (is_page_fault(intr_info))
-			return !vmx->vcpu.arch.apf.host_apf_reason && enable_ept;
+			return !vmx->vcpu.arch.apf.host_apf_reason.reason && enable_ept;
 		else if (is_debug(intr_info) &&
 			 vcpu->guest_debug &
 			 (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 26f8f31563e9..80dffc7375b6 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4676,7 +4676,8 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
 	if (is_page_fault(intr_info)) {
 		cr2 = vmcs_readl(EXIT_QUALIFICATION);
 		/* EPT won't cause page fault directly */
-		WARN_ON_ONCE(!vcpu->arch.apf.host_apf_reason && enable_ept);
+		WARN_ON_ONCE(!vcpu->arch.apf.host_apf_reason.reason &&
+			     enable_ept);
 		return kvm_handle_page_fault(vcpu, error_code, cr2, NULL, 0);
 	}
 
@@ -5159,6 +5160,7 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
 	unsigned long exit_qualification;
 	gpa_t gpa;
 	u64 error_code;
+	gva_t gva;
 
 	exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
 
@@ -5195,6 +5197,11 @@ static int handle_ept_violation(struct kvm_vcpu *vcpu)
 	       PFERR_GUEST_FINAL_MASK : PFERR_GUEST_PAGE_MASK;
 
 	vcpu->arch.exit_qualification = exit_qualification;
+	if (exit_qualification | EPT_VIOLATION_GLA_VALID) {
+		gva = vmcs_readl(GUEST_LINEAR_ADDRESS);
+		vcpu->arch.gva_available = true;
+		vcpu->arch.gva_val = gva;
+	}
 	return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
 }
 
@@ -6236,7 +6243,7 @@ static void handle_exception_nmi_irqoff(struct vcpu_vmx *vmx)
 
 	/* if exit due to PF check for async PF */
 	if (is_page_fault(vmx->exit_intr_info))
-		vmx->vcpu.arch.apf.host_apf_reason = kvm_read_and_reset_pf_reason();
+		kvm_read_and_reset_pf_reason(&vmx->vcpu.arch.apf.host_apf_reason);
 
 	/* Handle machine checks before interrupts are enabled */
 	if (is_machine_check(vmx->exit_intr_info))
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9cd388f1891a..f3c79baf4998 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2627,7 +2627,7 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
 	}
 
 	if (kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.apf.data, gpa,
-					sizeof(u32)))
+				      sizeof(struct kvm_arch_async_pf_shared)))
 		return 1;
 
 	vcpu->arch.apf.send_user_only = !(data & KVM_ASYNC_PF_SEND_ALWAYS);
@@ -10261,12 +10261,18 @@ static void kvm_del_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
 	}
 }
 
-static int apf_put_user(struct kvm_vcpu *vcpu, u32 val)
+static int apf_put_user_u32(struct kvm_vcpu *vcpu, u32 val)
 {
-
 	return kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.apf.data, &val,
 				      sizeof(val));
 }
+static int apf_put_user(struct kvm_vcpu *vcpu,
+			struct kvm_arch_async_pf_shared *val)
+{
+
+	return kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.apf.data, val,
+				      sizeof(*val));
+}
 
 static int apf_get_user(struct kvm_vcpu *vcpu, u32 *val)
 {
@@ -10309,12 +10315,16 @@ void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
 				     struct kvm_async_pf *work)
 {
 	struct x86_exception fault;
+	struct kvm_arch_async_pf_shared apf_shared;
 
 	trace_kvm_async_pf_not_present(work->arch.token, work->cr2_or_gpa);
 	kvm_add_async_pf_gfn(vcpu, work->arch.gfn);
 
+	memset(&apf_shared, 0, sizeof(apf_shared));
+	apf_shared.reason = KVM_PV_REASON_PAGE_NOT_PRESENT;
+
 	if (kvm_can_deliver_async_pf(vcpu) &&
-	    !apf_put_user(vcpu, KVM_PV_REASON_PAGE_NOT_PRESENT)) {
+	    !apf_put_user(vcpu, &apf_shared)) {
 		fault.vector = PF_VECTOR;
 		fault.error_code_valid = true;
 		fault.error_code = 0;
@@ -10339,15 +10349,21 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 				 struct kvm_async_pf *work)
 {
 	struct x86_exception fault;
-	u32 val, async_pf_event = KVM_PV_REASON_PAGE_READY;
+	u32 val;
+	struct kvm_arch_async_pf_shared asyncpf_shared;
 
 	if (work->wakeup_all)
 		work->arch.token = ~0; /* broadcast wakeup */
 	else
 		kvm_del_async_pf_gfn(vcpu, work->arch.gfn);
 
-	if (work->error_code && vcpu->arch.apf.send_pf_error)
-		async_pf_event = KVM_PV_REASON_PAGE_FAULT_ERROR;
+	memset(&asyncpf_shared, 0, sizeof(asyncpf_shared));
+	asyncpf_shared.reason = KVM_PV_REASON_PAGE_READY;
+	if (work->error_code && vcpu->arch.apf.send_pf_error) {
+		asyncpf_shared.reason = KVM_PV_REASON_PAGE_FAULT_ERROR;
+		if (work->arch.gva_available)
+			asyncpf_shared.faulting_gva = work->arch.gva_val;
+	}
 
 	trace_kvm_async_pf_ready(work->arch.token, work->cr2_or_gpa);
 
@@ -10356,7 +10372,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 		if (val == KVM_PV_REASON_PAGE_NOT_PRESENT &&
 		    vcpu->arch.exception.pending &&
 		    vcpu->arch.exception.nr == PF_VECTOR &&
-		    !apf_put_user(vcpu, 0)) {
+		    !apf_put_user_u32(vcpu, 0)) {
 			vcpu->arch.exception.injected = false;
 			vcpu->arch.exception.pending = false;
 			vcpu->arch.exception.nr = 0;
@@ -10364,7 +10380,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 			vcpu->arch.exception.error_code = 0;
 			vcpu->arch.exception.has_payload = false;
 			vcpu->arch.exception.payload = 0;
-		} else if (!apf_put_user(vcpu, async_pf_event)) {
+		} else if (!apf_put_user(vcpu, &asyncpf_shared)) {
 			fault.vector = PF_VECTOR;
 			fault.error_code_valid = true;
 			fault.error_code = 0;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/4] kvm: Always get async page notifications
  2020-03-31 19:40 ` [Virtio-fs] [RFC PATCH 0/4] kvm, x86, async_pf: " Vivek Goyal
@ 2020-03-31 19:40   ` Vivek Goyal
  -1 siblings, 0 replies; 12+ messages in thread
From: Vivek Goyal @ 2020-03-31 19:40 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: virtio-fs, miklos, stefanha, dgilbert, vgoyal, aarcange, dhildenb

Right now, we seem to get async pf related notifications only if guest
is in user mode, or if it is in kernel mode and CONFIG_PREEMPTION is
enabled. I think idea is that if CONFIG_PREEMPTION is enabled then it
gives us opportunity to schedule something else if page is not ready.

If KVM_ASYNC_PF_SEND_ALWAYS is not set, then host will not send
notifications of PAGE_NOT_PRESENT/PAGE_READY. Instead once page
has been installed guest will run.

Now we are adding capability to report errors as part of async pf
protocol. That means we need async pf related notifications so that
we can make a task wait and when error is reported, we can either
send SIGBUS to user process or search through exception tables for
possible error handler.

Hence enable async pf notifications always. Not sure if this will
have noticieable performance implication though.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 arch/x86/kernel/kvm.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 42d17e8c0135..97753a648133 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -336,9 +336,7 @@ static void kvm_guest_cpu_init(void)
 	if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF) && kvmapf) {
 		u64 pa = slow_virt_to_phys(this_cpu_ptr(&apf_reason));
 
-#ifdef CONFIG_PREEMPTION
 		pa |= KVM_ASYNC_PF_SEND_ALWAYS;
-#endif
 		pa |= KVM_ASYNC_PF_ENABLED;
 
 		if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_VMEXIT))
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Virtio-fs] [PATCH 3/4] kvm: Always get async page notifications
@ 2020-03-31 19:40   ` Vivek Goyal
  0 siblings, 0 replies; 12+ messages in thread
From: Vivek Goyal @ 2020-03-31 19:40 UTC (permalink / raw)
  To: kvm, linux-kernel; +Cc: aarcange, miklos, virtio-fs, dhildenb, vgoyal

Right now, we seem to get async pf related notifications only if guest
is in user mode, or if it is in kernel mode and CONFIG_PREEMPTION is
enabled. I think idea is that if CONFIG_PREEMPTION is enabled then it
gives us opportunity to schedule something else if page is not ready.

If KVM_ASYNC_PF_SEND_ALWAYS is not set, then host will not send
notifications of PAGE_NOT_PRESENT/PAGE_READY. Instead once page
has been installed guest will run.

Now we are adding capability to report errors as part of async pf
protocol. That means we need async pf related notifications so that
we can make a task wait and when error is reported, we can either
send SIGBUS to user process or search through exception tables for
possible error handler.

Hence enable async pf notifications always. Not sure if this will
have noticieable performance implication though.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 arch/x86/kernel/kvm.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 42d17e8c0135..97753a648133 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -336,9 +336,7 @@ static void kvm_guest_cpu_init(void)
 	if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF) && kvmapf) {
 		u64 pa = slow_virt_to_phys(this_cpu_ptr(&apf_reason));
 
-#ifdef CONFIG_PREEMPTION
 		pa |= KVM_ASYNC_PF_SEND_ALWAYS;
-#endif
 		pa |= KVM_ASYNC_PF_ENABLED;
 
 		if (kvm_para_has_feature(KVM_FEATURE_ASYNC_PF_VMEXIT))
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/4] kvm,x86,async_pf: Search exception tables in case of error
  2020-03-31 19:40 ` [Virtio-fs] [RFC PATCH 0/4] kvm, x86, async_pf: " Vivek Goyal
@ 2020-03-31 19:40   ` Vivek Goyal
  -1 siblings, 0 replies; 12+ messages in thread
From: Vivek Goyal @ 2020-03-31 19:40 UTC (permalink / raw)
  To: kvm, linux-kernel
  Cc: virtio-fs, miklos, stefanha, dgilbert, vgoyal, aarcange, dhildenb

If an error happens during page fault and it was kernel code executing
at the time of fault, search exception tables and jump to corresponding
handler, if there is one.

This is useful when virtiofs DAX code is doing memcpy and page fault
returns an error because corresponding page has been truncated on
host. In that case, we want to return that error to guest user space,
instead of retrying infinitely.

This does not take care of nested KVM. Exit into L1 does not have notion
of passing "struct pt_regs" to handler. That needs to be fixed first.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 arch/x86/include/asm/kvm_para.h |  5 +++--
 arch/x86/kernel/kvm.c           | 24 ++++++++++++++++++------
 arch/x86/kvm/mmu/mmu.c          |  2 +-
 3 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 2d464e470325..2c9e7c852b40 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -88,7 +88,8 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned long p1,
 bool kvm_para_available(void);
 unsigned int kvm_arch_para_features(void);
 unsigned int kvm_arch_para_hints(void);
-void kvm_async_pf_task_wait(u32 token, int interrupt_kernel);
+void kvm_async_pf_task_wait(u32 token, int interrupt_kernel,
+			    struct pt_regs *regs, unsigned long error_code);
 void kvm_async_pf_task_wake(u32 token, bool is_err, unsigned long addr);
 void kvm_read_and_reset_pf_reason(struct kvm_apf_reason *reason);
 extern void kvm_disable_steal_time(void);
@@ -103,7 +104,7 @@ static inline void kvm_spinlock_init(void)
 #endif /* CONFIG_PARAVIRT_SPINLOCKS */
 
 #else /* CONFIG_KVM_GUEST */
-#define kvm_async_pf_task_wait(T, I) do {} while(0)
+#define kvm_async_pf_task_wait(T, I, R, E) do {} while(0)
 #define kvm_async_pf_task_wake(T, I, A) do {} while(0)
 
 static inline bool kvm_para_available(void)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 97753a648133..387ef0aa323b 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -98,17 +98,23 @@ static struct kvm_task_sleep_node *_find_apf_task(struct kvm_task_sleep_head *b,
 	return NULL;
 }
 
-static void handle_async_pf_error(int user_mode, unsigned long fault_addr)
+static inline void handle_async_pf_error(int user_mode,
+					 unsigned long fault_addr,
+					 struct pt_regs *regs,
+					 unsigned long error_code)
 {
 	if (user_mode)
 		force_sig_fault(SIGBUS, BUS_ADRERR, (void __user *)fault_addr);
+	else
+		fixup_exception(regs, X86_TRAP_PF, error_code, fault_addr);
 }
 
 /*
  * @interrupt_kernel: Is this called from a routine which interrupts the kernel
  * 		      (other than user space)?
  */
-void kvm_async_pf_task_wait(u32 token, int interrupt_kernel)
+void kvm_async_pf_task_wait(u32 token, int interrupt_kernel,
+			    struct pt_regs *regs, unsigned long error_code)
 {
 	u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS);
 	struct kvm_task_sleep_head *b = &async_pf_sleepers[key];
@@ -120,13 +126,17 @@ void kvm_async_pf_task_wait(u32 token, int interrupt_kernel)
 	raw_spin_lock(&b->lock);
 	e = _find_apf_task(b, token);
 	if (e) {
+		bool is_err = e->is_err;
+		unsigned long fault_addr = e->fault_addr;
+
 		/* dummy entry exist -> wake up was delivered ahead of PF */
-		if (e->is_err)
-			handle_async_pf_error(!interrupt_kernel, e->fault_addr);
 		hlist_del(&e->link);
 		kfree(e);
 		raw_spin_unlock(&b->lock);
 
+		if (is_err)
+			handle_async_pf_error(!interrupt_kernel, fault_addr,
+					      regs, error_code);
 		rcu_irq_exit();
 		return;
 	}
@@ -167,7 +177,8 @@ void kvm_async_pf_task_wait(u32 token, int interrupt_kernel)
 		finish_swait(&n.wq, &wait);
 
 	if (n.is_err)
-		handle_async_pf_error(!interrupt_kernel, n.fault_addr);
+		handle_async_pf_error(!interrupt_kernel, n.fault_addr, regs,
+				      error_code);
 
 	rcu_irq_exit();
 	return;
@@ -273,7 +284,8 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned lon
 		break;
 	case KVM_PV_REASON_PAGE_NOT_PRESENT:
 		/* page is swapped out by the host. */
-		kvm_async_pf_task_wait((u32)address, !user_mode(regs));
+		kvm_async_pf_task_wait((u32)address, !user_mode(regs), regs,
+				       error_code);
 		break;
 	case KVM_PV_REASON_PAGE_READY:
 		rcu_irq_enter();
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e3337c5f73e0..a9b707fb5861 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4207,7 +4207,7 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
 	case KVM_PV_REASON_PAGE_NOT_PRESENT:
 		vcpu->arch.apf.host_apf_reason.reason = 0;
 		local_irq_disable();
-		kvm_async_pf_task_wait(fault_address, 0);
+		kvm_async_pf_task_wait(fault_address, 0, NULL, 0);
 		local_irq_enable();
 		break;
 	case KVM_PV_REASON_PAGE_READY:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Virtio-fs] [PATCH 4/4] kvm, x86, async_pf: Search exception tables in case of error
@ 2020-03-31 19:40   ` Vivek Goyal
  0 siblings, 0 replies; 12+ messages in thread
From: Vivek Goyal @ 2020-03-31 19:40 UTC (permalink / raw)
  To: kvm, linux-kernel; +Cc: aarcange, miklos, virtio-fs, dhildenb, vgoyal

If an error happens during page fault and it was kernel code executing
at the time of fault, search exception tables and jump to corresponding
handler, if there is one.

This is useful when virtiofs DAX code is doing memcpy and page fault
returns an error because corresponding page has been truncated on
host. In that case, we want to return that error to guest user space,
instead of retrying infinitely.

This does not take care of nested KVM. Exit into L1 does not have notion
of passing "struct pt_regs" to handler. That needs to be fixed first.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 arch/x86/include/asm/kvm_para.h |  5 +++--
 arch/x86/kernel/kvm.c           | 24 ++++++++++++++++++------
 arch/x86/kvm/mmu/mmu.c          |  2 +-
 3 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 2d464e470325..2c9e7c852b40 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -88,7 +88,8 @@ static inline long kvm_hypercall4(unsigned int nr, unsigned long p1,
 bool kvm_para_available(void);
 unsigned int kvm_arch_para_features(void);
 unsigned int kvm_arch_para_hints(void);
-void kvm_async_pf_task_wait(u32 token, int interrupt_kernel);
+void kvm_async_pf_task_wait(u32 token, int interrupt_kernel,
+			    struct pt_regs *regs, unsigned long error_code);
 void kvm_async_pf_task_wake(u32 token, bool is_err, unsigned long addr);
 void kvm_read_and_reset_pf_reason(struct kvm_apf_reason *reason);
 extern void kvm_disable_steal_time(void);
@@ -103,7 +104,7 @@ static inline void kvm_spinlock_init(void)
 #endif /* CONFIG_PARAVIRT_SPINLOCKS */
 
 #else /* CONFIG_KVM_GUEST */
-#define kvm_async_pf_task_wait(T, I) do {} while(0)
+#define kvm_async_pf_task_wait(T, I, R, E) do {} while(0)
 #define kvm_async_pf_task_wake(T, I, A) do {} while(0)
 
 static inline bool kvm_para_available(void)
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 97753a648133..387ef0aa323b 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -98,17 +98,23 @@ static struct kvm_task_sleep_node *_find_apf_task(struct kvm_task_sleep_head *b,
 	return NULL;
 }
 
-static void handle_async_pf_error(int user_mode, unsigned long fault_addr)
+static inline void handle_async_pf_error(int user_mode,
+					 unsigned long fault_addr,
+					 struct pt_regs *regs,
+					 unsigned long error_code)
 {
 	if (user_mode)
 		force_sig_fault(SIGBUS, BUS_ADRERR, (void __user *)fault_addr);
+	else
+		fixup_exception(regs, X86_TRAP_PF, error_code, fault_addr);
 }
 
 /*
  * @interrupt_kernel: Is this called from a routine which interrupts the kernel
  * 		      (other than user space)?
  */
-void kvm_async_pf_task_wait(u32 token, int interrupt_kernel)
+void kvm_async_pf_task_wait(u32 token, int interrupt_kernel,
+			    struct pt_regs *regs, unsigned long error_code)
 {
 	u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS);
 	struct kvm_task_sleep_head *b = &async_pf_sleepers[key];
@@ -120,13 +126,17 @@ void kvm_async_pf_task_wait(u32 token, int interrupt_kernel)
 	raw_spin_lock(&b->lock);
 	e = _find_apf_task(b, token);
 	if (e) {
+		bool is_err = e->is_err;
+		unsigned long fault_addr = e->fault_addr;
+
 		/* dummy entry exist -> wake up was delivered ahead of PF */
-		if (e->is_err)
-			handle_async_pf_error(!interrupt_kernel, e->fault_addr);
 		hlist_del(&e->link);
 		kfree(e);
 		raw_spin_unlock(&b->lock);
 
+		if (is_err)
+			handle_async_pf_error(!interrupt_kernel, fault_addr,
+					      regs, error_code);
 		rcu_irq_exit();
 		return;
 	}
@@ -167,7 +177,8 @@ void kvm_async_pf_task_wait(u32 token, int interrupt_kernel)
 		finish_swait(&n.wq, &wait);
 
 	if (n.is_err)
-		handle_async_pf_error(!interrupt_kernel, n.fault_addr);
+		handle_async_pf_error(!interrupt_kernel, n.fault_addr, regs,
+				      error_code);
 
 	rcu_irq_exit();
 	return;
@@ -273,7 +284,8 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code, unsigned lon
 		break;
 	case KVM_PV_REASON_PAGE_NOT_PRESENT:
 		/* page is swapped out by the host. */
-		kvm_async_pf_task_wait((u32)address, !user_mode(regs));
+		kvm_async_pf_task_wait((u32)address, !user_mode(regs), regs,
+				       error_code);
 		break;
 	case KVM_PV_REASON_PAGE_READY:
 		rcu_irq_enter();
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e3337c5f73e0..a9b707fb5861 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4207,7 +4207,7 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
 	case KVM_PV_REASON_PAGE_NOT_PRESENT:
 		vcpu->arch.apf.host_apf_reason.reason = 0;
 		local_irq_disable();
-		kvm_async_pf_task_wait(fault_address, 0);
+		kvm_async_pf_task_wait(fault_address, 0, NULL, 0);
 		local_irq_enable();
 		break;
 	case KVM_PV_REASON_PAGE_READY:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/4] kvm: async_pf: Send faulting gva address in case of error
  2020-03-31 19:40   ` [Virtio-fs] " Vivek Goyal
  (?)
@ 2020-03-31 23:54   ` kbuild test robot
  -1 siblings, 0 replies; 12+ messages in thread
From: kbuild test robot @ 2020-03-31 23:54 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2385 bytes --]

Hi Vivek,

I love your patch! Perhaps something to improve:

[auto build test WARNING on vhost/linux-next]
[also build test WARNING on linus/master v5.6]
[cannot apply to kvm/linux-next linux/master next-20200331]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Vivek-Goyal/kvm-x86-async_pf-Add-capability-to-return-page-fault-error/20200401-042437
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
config: x86_64-defconfig (attached as .config)
compiler: gcc-7 (Debian 7.4.0-6) 7.4.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from include/uapi/linux/kvm_para.h:36:0,
                    from include/linux/kvm_para.h:5,
                    from arch/x86/kernel/cpu/mtrr/mtrr.c:39:
   arch/x86/include/asm/kvm_para.h: In function 'kvm_read_and_reset_pf_reason':
>> arch/x86/include/asm/kvm_para.h:126:9: warning: 'return' with a value, in function returning void
     return 0;
            ^
   arch/x86/include/asm/kvm_para.h:124:20: note: declared here
    static inline void kvm_read_and_reset_pf_reason(struct kvm_apf_reason *reason)
                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~

vim +/return +126 arch/x86/include/asm/kvm_para.h

a4429e53c9b308 Wanpeng Li    2018-02-13  123  
ace296186ca93c Vivek Goyal   2020-03-31  124  static inline void kvm_read_and_reset_pf_reason(struct kvm_apf_reason *reason)
631bc487822093 Gleb Natapov  2010-10-14  125  {
631bc487822093 Gleb Natapov  2010-10-14 @126  	return 0;
631bc487822093 Gleb Natapov  2010-10-14  127  }
d910f5c1064d7f Glauber Costa 2011-07-11  128  

:::::: The code at line 126 was first introduced by commit
:::::: 631bc4878220932fe67fc46fc7cf7cccdb1ec597 KVM: Handle async PF in a guest.

:::::: TO: Gleb Natapov <gleb@redhat.com>
:::::: CC: Avi Kivity <avi@redhat.com>

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 29015 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/4] kvm: async_pf: Send faulting gva address in case of error
  2020-03-31 19:40   ` [Virtio-fs] " Vivek Goyal
  (?)
  (?)
@ 2020-04-01  0:16   ` kbuild test robot
  -1 siblings, 0 replies; 12+ messages in thread
From: kbuild test robot @ 2020-04-01  0:16 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 4365 bytes --]

Hi Vivek,

I love your patch! Yet something to improve:

[auto build test ERROR on vhost/linux-next]
[also build test ERROR on linus/master v5.6]
[cannot apply to kvm/linux-next linux/master next-20200331]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Vivek-Goyal/kvm-x86-async_pf-Add-capability-to-return-page-fault-error/20200401-042437
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
config: i386-allyesconfig (attached as .config)
compiler: gcc-7 (Debian 7.4.0-6) 7.4.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   arch/x86/kvm/svm.c: In function 'nested_svm_exit_special':
>> arch/x86/kvm/svm.c:3237:58: error: invalid operands to binary == (have 'struct kvm_apf_reason' and 'int')
      if (!npt_enabled && svm->vcpu.arch.apf.host_apf_reason == 0)
                          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^~
   arch/x86/kvm/svm.c: In function 'svm_vcpu_run':
>> arch/x86/kvm/svm.c:5930:40: error: too few arguments to function 'kvm_read_and_reset_pf_reason'
      svm->vcpu.arch.apf.host_apf_reason = kvm_read_and_reset_pf_reason();
                                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
   In file included from include/uapi/linux/kvm_para.h:36:0,
                    from include/linux/kvm_para.h:5,
                    from include/linux/kvm_host.h:32,
                    from arch/x86/kvm/svm.c:17:
   arch/x86/include/asm/kvm_para.h:93:6: note: declared here
    void kvm_read_and_reset_pf_reason(struct kvm_apf_reason *reason);
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~

vim +3237 arch/x86/kvm/svm.c

ab2f4d73ebc334 Ladi Prosek    2017-06-21  3220  
410e4d573d9b7f Joerg Roedel   2009-08-07  3221  static int nested_svm_exit_special(struct vcpu_svm *svm)
4c2161aed55c29 Joerg Roedel   2009-08-07  3222  {
cf74a78b229d07 Alexander Graf 2008-11-25  3223  	u32 exit_code = svm->vmcb->control.exit_code;
4c2161aed55c29 Joerg Roedel   2009-08-07  3224  
cf74a78b229d07 Alexander Graf 2008-11-25  3225  	switch (exit_code) {
cf74a78b229d07 Alexander Graf 2008-11-25  3226  	case SVM_EXIT_INTR:
cf74a78b229d07 Alexander Graf 2008-11-25  3227  	case SVM_EXIT_NMI:
ff47a49b235e59 Joerg Roedel   2010-04-22  3228  	case SVM_EXIT_EXCP_BASE + MC_VECTOR:
410e4d573d9b7f Joerg Roedel   2009-08-07  3229  		return NESTED_EXIT_HOST;
cf74a78b229d07 Alexander Graf 2008-11-25  3230  	case SVM_EXIT_NPF:
e02317153e7715 Joerg Roedel   2010-02-24  3231  		/* For now we are always handling NPFs when using them */
cf74a78b229d07 Alexander Graf 2008-11-25  3232  		if (npt_enabled)
410e4d573d9b7f Joerg Roedel   2009-08-07  3233  			return NESTED_EXIT_HOST;
cf74a78b229d07 Alexander Graf 2008-11-25  3234  		break;
cf74a78b229d07 Alexander Graf 2008-11-25  3235  	case SVM_EXIT_EXCP_BASE + PF_VECTOR:
631bc487822093 Gleb Natapov   2010-10-14  3236  		/* When we're shadowing, trap PFs, but not async PF */
1261bfa326f5e9 Wanpeng Li     2017-07-13 @3237  		if (!npt_enabled && svm->vcpu.arch.apf.host_apf_reason == 0)
410e4d573d9b7f Joerg Roedel   2009-08-07  3238  			return NESTED_EXIT_HOST;
cf74a78b229d07 Alexander Graf 2008-11-25  3239  		break;
cf74a78b229d07 Alexander Graf 2008-11-25  3240  	default:
cf74a78b229d07 Alexander Graf 2008-11-25  3241  		break;
cf74a78b229d07 Alexander Graf 2008-11-25  3242  	}
410e4d573d9b7f Joerg Roedel   2009-08-07  3243  
410e4d573d9b7f Joerg Roedel   2009-08-07  3244  	return NESTED_EXIT_CONTINUE;
cf74a78b229d07 Alexander Graf 2008-11-25  3245  }
cf74a78b229d07 Alexander Graf 2008-11-25  3246  

:::::: The code at line 3237 was first introduced by commit
:::::: 1261bfa326f5e903166498628a1894edce0caabc KVM: async_pf: Add L1 guest async_pf #PF vmexit handler

:::::: TO: Wanpeng Li <wanpeng.li@hotmail.com>
:::::: CC: Radim Krčmář <rkrcmar@redhat.com>

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 71688 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-04-01  0:16 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-31 19:40 [RFC PATCH 0/4] kvm,x86,async_pf: Add capability to return page fault error Vivek Goyal
2020-03-31 19:40 ` [Virtio-fs] [RFC PATCH 0/4] kvm, x86, async_pf: " Vivek Goyal
2020-03-31 19:40 ` [PATCH 1/4] kvm: Add capability to be able to report async pf error to guest Vivek Goyal
2020-03-31 19:40   ` [Virtio-fs] " Vivek Goyal
2020-03-31 19:40 ` [PATCH 2/4] kvm: async_pf: Send faulting gva address in case of error Vivek Goyal
2020-03-31 19:40   ` [Virtio-fs] " Vivek Goyal
2020-03-31 23:54   ` kbuild test robot
2020-04-01  0:16   ` kbuild test robot
2020-03-31 19:40 ` [PATCH 3/4] kvm: Always get async page notifications Vivek Goyal
2020-03-31 19:40   ` [Virtio-fs] " Vivek Goyal
2020-03-31 19:40 ` [PATCH 4/4] kvm,x86,async_pf: Search exception tables in case of error Vivek Goyal
2020-03-31 19:40   ` [Virtio-fs] [PATCH 4/4] kvm, x86, async_pf: " Vivek Goyal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.