All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v0 0/5] PPC: KVM: pseries: Asynchronous page fault
@ 2021-08-05  7:24 ` Bharata B Rao
  0 siblings, 0 replies; 29+ messages in thread
From: Bharata B Rao @ 2021-08-05  7:24 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev; +Cc: kvm, aneesh.kumar, bharata.rao, Bharata B Rao

Hi,

This series adds asynchronous page fault support for pseries guests
and enables the support for the same in powerpc KVM. This is an
early RFC with details and multiple TODOs listed in patch descriptions.

This patch needs supporting enablement in QEMU too which will be
posted separately.

Bharata B Rao (5):
  powerpc: Define Expropriation interrupt bit to VPA byte offset 0xB9
  KVM: PPC: Add support for KVM_REQ_ESN_EXIT
  KVM: PPC: Book3S: Enable setting SRR1 flags for DSI
  KVM: PPC: BOOK3S HV: Async PF support
  pseries: Asynchronous page fault support

 Documentation/virt/kvm/api.rst            |  15 ++
 arch/powerpc/include/asm/async-pf.h       |  12 ++
 arch/powerpc/include/asm/hvcall.h         |   1 +
 arch/powerpc/include/asm/kvm_book3s_esn.h |  24 +++
 arch/powerpc/include/asm/kvm_host.h       |  22 +++
 arch/powerpc/include/asm/kvm_ppc.h        |   4 +-
 arch/powerpc/include/asm/lppaca.h         |  20 +-
 arch/powerpc/include/uapi/asm/kvm.h       |   6 +
 arch/powerpc/kvm/Kconfig                  |   2 +
 arch/powerpc/kvm/Makefile                 |   5 +-
 arch/powerpc/kvm/book3s.c                 |   6 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c    |   9 +-
 arch/powerpc/kvm/book3s_hv.c              |  37 +++-
 arch/powerpc/kvm/book3s_hv_esn.c          | 189 +++++++++++++++++++
 arch/powerpc/kvm/book3s_hv_nested.c       |   4 +-
 arch/powerpc/kvm/book3s_pr.c              |   4 +-
 arch/powerpc/mm/fault.c                   |   7 +-
 arch/powerpc/platforms/pseries/Makefile   |   2 +-
 arch/powerpc/platforms/pseries/async-pf.c | 219 ++++++++++++++++++++++
 drivers/cpuidle/cpuidle-pseries.c         |   4 +-
 include/uapi/linux/kvm.h                  |   2 +
 tools/include/uapi/linux/kvm.h            |   1 +
 22 files changed, 574 insertions(+), 21 deletions(-)
 create mode 100644 arch/powerpc/include/asm/async-pf.h
 create mode 100644 arch/powerpc/include/asm/kvm_book3s_esn.h
 create mode 100644 arch/powerpc/kvm/book3s_hv_esn.c
 create mode 100644 arch/powerpc/platforms/pseries/async-pf.c

-- 
2.31.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC PATCH v0 0/5] PPC: KVM: pseries: Asynchronous page fault
@ 2021-08-05  7:24 ` Bharata B Rao
  0 siblings, 0 replies; 29+ messages in thread
From: Bharata B Rao @ 2021-08-05  7:24 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev; +Cc: aneesh.kumar, Bharata B Rao, kvm, bharata.rao

Hi,

This series adds asynchronous page fault support for pseries guests
and enables the support for the same in powerpc KVM. This is an
early RFC with details and multiple TODOs listed in patch descriptions.

This patch needs supporting enablement in QEMU too which will be
posted separately.

Bharata B Rao (5):
  powerpc: Define Expropriation interrupt bit to VPA byte offset 0xB9
  KVM: PPC: Add support for KVM_REQ_ESN_EXIT
  KVM: PPC: Book3S: Enable setting SRR1 flags for DSI
  KVM: PPC: BOOK3S HV: Async PF support
  pseries: Asynchronous page fault support

 Documentation/virt/kvm/api.rst            |  15 ++
 arch/powerpc/include/asm/async-pf.h       |  12 ++
 arch/powerpc/include/asm/hvcall.h         |   1 +
 arch/powerpc/include/asm/kvm_book3s_esn.h |  24 +++
 arch/powerpc/include/asm/kvm_host.h       |  22 +++
 arch/powerpc/include/asm/kvm_ppc.h        |   4 +-
 arch/powerpc/include/asm/lppaca.h         |  20 +-
 arch/powerpc/include/uapi/asm/kvm.h       |   6 +
 arch/powerpc/kvm/Kconfig                  |   2 +
 arch/powerpc/kvm/Makefile                 |   5 +-
 arch/powerpc/kvm/book3s.c                 |   6 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c    |   9 +-
 arch/powerpc/kvm/book3s_hv.c              |  37 +++-
 arch/powerpc/kvm/book3s_hv_esn.c          | 189 +++++++++++++++++++
 arch/powerpc/kvm/book3s_hv_nested.c       |   4 +-
 arch/powerpc/kvm/book3s_pr.c              |   4 +-
 arch/powerpc/mm/fault.c                   |   7 +-
 arch/powerpc/platforms/pseries/Makefile   |   2 +-
 arch/powerpc/platforms/pseries/async-pf.c | 219 ++++++++++++++++++++++
 drivers/cpuidle/cpuidle-pseries.c         |   4 +-
 include/uapi/linux/kvm.h                  |   2 +
 tools/include/uapi/linux/kvm.h            |   1 +
 22 files changed, 574 insertions(+), 21 deletions(-)
 create mode 100644 arch/powerpc/include/asm/async-pf.h
 create mode 100644 arch/powerpc/include/asm/kvm_book3s_esn.h
 create mode 100644 arch/powerpc/kvm/book3s_hv_esn.c
 create mode 100644 arch/powerpc/platforms/pseries/async-pf.c

-- 
2.31.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC PATCH v0 1/5] powerpc: Define Expropriation interrupt bit to VPA byte offset 0xB9
  2021-08-05  7:24 ` Bharata B Rao
  (?)
@ 2021-08-05  7:24   ` Bharata B Rao
  -1 siblings, 0 replies; 29+ messages in thread
From: Bharata B Rao @ 2021-08-05  7:24 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev; +Cc: kvm, aneesh.kumar, bharata.rao, Bharata B Rao

VPA byte offset 0xB9 was named as donate_dedicated_cpu as that
was the only used bit. The Expropriation/Subvention support defines
a bit in byte offset 0xB9. Define this bit and rename the field
in VPA to a generic name.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
---
 arch/powerpc/include/asm/lppaca.h | 8 +++++++-
 drivers/cpuidle/cpuidle-pseries.c | 4 ++--
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/lppaca.h b/arch/powerpc/include/asm/lppaca.h
index c390ec377bae..57e432766f3e 100644
--- a/arch/powerpc/include/asm/lppaca.h
+++ b/arch/powerpc/include/asm/lppaca.h
@@ -80,7 +80,7 @@ struct lppaca {
 	u8	ebb_regs_in_use;
 	u8	reserved7[6];
 	u8	dtl_enable_mask;	/* Dispatch Trace Log mask */
-	u8	donate_dedicated_cpu;	/* Donate dedicated CPU cycles */
+	u8	byte_b9; /* Donate dedicated CPU cycles & Expropriation int */
 	u8	fpregs_in_use;
 	u8	pmcregs_in_use;
 	u8	reserved8[28];
@@ -116,6 +116,12 @@ struct lppaca {
 
 #define lppaca_of(cpu)	(*paca_ptrs[cpu]->lppaca_ptr)
 
+/*
+ * Flags for Byte offset 0xB9
+ */
+#define LPPACA_DONATE_DED_CPU_CYCLES   0x1
+#define LPPACA_EXP_INT_ENABLED         0x2
+
 /*
  * We are using a non architected field to determine if a partition is
  * shared or dedicated. This currently works on both KVM and PHYP, but
diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
index a2b5c6f60cf0..b9d0f41c3f19 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -221,7 +221,7 @@ static int dedicated_cede_loop(struct cpuidle_device *dev,
 	u8 old_latency_hint;
 
 	pseries_idle_prolog();
-	get_lppaca()->donate_dedicated_cpu = 1;
+	get_lppaca()->byte_b9 |= LPPACA_DONATE_DED_CPU_CYCLES;
 	old_latency_hint = get_lppaca()->cede_latency_hint;
 	get_lppaca()->cede_latency_hint = cede_latency_hint[index];
 
@@ -229,7 +229,7 @@ static int dedicated_cede_loop(struct cpuidle_device *dev,
 	check_and_cede_processor();
 
 	local_irq_disable();
-	get_lppaca()->donate_dedicated_cpu = 0;
+	get_lppaca()->byte_b9 &= ~LPPACA_DONATE_DED_CPU_CYCLES;
 	get_lppaca()->cede_latency_hint = old_latency_hint;
 
 	pseries_idle_epilog();
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH v0 1/5] powerpc: Define Expropriation interrupt bit to VPA byte offset 0xB9
@ 2021-08-05  7:24   ` Bharata B Rao
  0 siblings, 0 replies; 29+ messages in thread
From: Bharata B Rao @ 2021-08-05  7:24 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev; +Cc: aneesh.kumar, Bharata B Rao, kvm, bharata.rao

VPA byte offset 0xB9 was named as donate_dedicated_cpu as that
was the only used bit. The Expropriation/Subvention support defines
a bit in byte offset 0xB9. Define this bit and rename the field
in VPA to a generic name.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
---
 arch/powerpc/include/asm/lppaca.h | 8 +++++++-
 drivers/cpuidle/cpuidle-pseries.c | 4 ++--
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/lppaca.h b/arch/powerpc/include/asm/lppaca.h
index c390ec377bae..57e432766f3e 100644
--- a/arch/powerpc/include/asm/lppaca.h
+++ b/arch/powerpc/include/asm/lppaca.h
@@ -80,7 +80,7 @@ struct lppaca {
 	u8	ebb_regs_in_use;
 	u8	reserved7[6];
 	u8	dtl_enable_mask;	/* Dispatch Trace Log mask */
-	u8	donate_dedicated_cpu;	/* Donate dedicated CPU cycles */
+	u8	byte_b9; /* Donate dedicated CPU cycles & Expropriation int */
 	u8	fpregs_in_use;
 	u8	pmcregs_in_use;
 	u8	reserved8[28];
@@ -116,6 +116,12 @@ struct lppaca {
 
 #define lppaca_of(cpu)	(*paca_ptrs[cpu]->lppaca_ptr)
 
+/*
+ * Flags for Byte offset 0xB9
+ */
+#define LPPACA_DONATE_DED_CPU_CYCLES   0x1
+#define LPPACA_EXP_INT_ENABLED         0x2
+
 /*
  * We are using a non architected field to determine if a partition is
  * shared or dedicated. This currently works on both KVM and PHYP, but
diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
index a2b5c6f60cf0..b9d0f41c3f19 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -221,7 +221,7 @@ static int dedicated_cede_loop(struct cpuidle_device *dev,
 	u8 old_latency_hint;
 
 	pseries_idle_prolog();
-	get_lppaca()->donate_dedicated_cpu = 1;
+	get_lppaca()->byte_b9 |= LPPACA_DONATE_DED_CPU_CYCLES;
 	old_latency_hint = get_lppaca()->cede_latency_hint;
 	get_lppaca()->cede_latency_hint = cede_latency_hint[index];
 
@@ -229,7 +229,7 @@ static int dedicated_cede_loop(struct cpuidle_device *dev,
 	check_and_cede_processor();
 
 	local_irq_disable();
-	get_lppaca()->donate_dedicated_cpu = 0;
+	get_lppaca()->byte_b9 &= ~LPPACA_DONATE_DED_CPU_CYCLES;
 	get_lppaca()->cede_latency_hint = old_latency_hint;
 
 	pseries_idle_epilog();
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH v0 2/5] KVM: PPC: Add support for KVM_REQ_ESN_EXIT
  2021-08-05  7:24 ` Bharata B Rao
  (?)
@ 2021-08-05  7:24   ` Bharata B Rao
  -1 siblings, 0 replies; 29+ messages in thread
From: Bharata B Rao @ 2021-08-05  7:24 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev; +Cc: kvm, aneesh.kumar, bharata.rao, Bharata B Rao

Add a new KVM exit request KVM_REQ_ESN_EXIT that will be used
to exit to userspace (QEMU) whenever subvention notification
needs to be sent to the guest.

The userspace (QEMU) issues the subvention notification by
injecting an interrupt into the guest.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
---
 arch/powerpc/include/asm/kvm_host.h | 1 +
 arch/powerpc/kvm/book3s_hv.c        | 8 ++++++++
 include/uapi/linux/kvm.h            | 1 +
 3 files changed, 10 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 9f52f282b1aa..204dc2d91388 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -52,6 +52,7 @@
 #define KVM_REQ_WATCHDOG	KVM_ARCH_REQ(0)
 #define KVM_REQ_EPR_EXIT	KVM_ARCH_REQ(1)
 #define KVM_REQ_PENDING_TIMER	KVM_ARCH_REQ(2)
+#define KVM_REQ_ESN_EXIT	KVM_ARCH_REQ(3)
 
 #include <linux/mmu_notifier.h>
 
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 085fb8ecbf68..47ccd4a2df54 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2820,6 +2820,14 @@ static void kvmppc_core_vcpu_free_hv(struct kvm_vcpu *vcpu)
 
 static int kvmppc_core_check_requests_hv(struct kvm_vcpu *vcpu)
 {
+	/*
+	 * If subvention interrupt needs to be injected to the guest
+	 * exit to user space.
+	 */
+	if (kvm_check_request(KVM_REQ_ESN_EXIT, vcpu)) {
+		vcpu->run->exit_reason = KVM_EXIT_ESN;
+		return 0;
+	}
 	/* Indicate we want to get back into the guest */
 	return 1;
 }
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index d9e4aabcb31a..47be532ed14b 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -269,6 +269,7 @@ struct kvm_xen_exit {
 #define KVM_EXIT_AP_RESET_HOLD    32
 #define KVM_EXIT_X86_BUS_LOCK     33
 #define KVM_EXIT_XEN              34
+#define KVM_EXIT_ESN		  35
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH v0 2/5] KVM: PPC: Add support for KVM_REQ_ESN_EXIT
@ 2021-08-05  7:24   ` Bharata B Rao
  0 siblings, 0 replies; 29+ messages in thread
From: Bharata B Rao @ 2021-08-05  7:24 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev; +Cc: aneesh.kumar, Bharata B Rao, kvm, bharata.rao

Add a new KVM exit request KVM_REQ_ESN_EXIT that will be used
to exit to userspace (QEMU) whenever subvention notification
needs to be sent to the guest.

The userspace (QEMU) issues the subvention notification by
injecting an interrupt into the guest.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
---
 arch/powerpc/include/asm/kvm_host.h | 1 +
 arch/powerpc/kvm/book3s_hv.c        | 8 ++++++++
 include/uapi/linux/kvm.h            | 1 +
 3 files changed, 10 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 9f52f282b1aa..204dc2d91388 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -52,6 +52,7 @@
 #define KVM_REQ_WATCHDOG	KVM_ARCH_REQ(0)
 #define KVM_REQ_EPR_EXIT	KVM_ARCH_REQ(1)
 #define KVM_REQ_PENDING_TIMER	KVM_ARCH_REQ(2)
+#define KVM_REQ_ESN_EXIT	KVM_ARCH_REQ(3)
 
 #include <linux/mmu_notifier.h>
 
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 085fb8ecbf68..47ccd4a2df54 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2820,6 +2820,14 @@ static void kvmppc_core_vcpu_free_hv(struct kvm_vcpu *vcpu)
 
 static int kvmppc_core_check_requests_hv(struct kvm_vcpu *vcpu)
 {
+	/*
+	 * If subvention interrupt needs to be injected to the guest
+	 * exit to user space.
+	 */
+	if (kvm_check_request(KVM_REQ_ESN_EXIT, vcpu)) {
+		vcpu->run->exit_reason = KVM_EXIT_ESN;
+		return 0;
+	}
 	/* Indicate we want to get back into the guest */
 	return 1;
 }
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index d9e4aabcb31a..47be532ed14b 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -269,6 +269,7 @@ struct kvm_xen_exit {
 #define KVM_EXIT_AP_RESET_HOLD    32
 #define KVM_EXIT_X86_BUS_LOCK     33
 #define KVM_EXIT_XEN              34
+#define KVM_EXIT_ESN		  35
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH v0 3/5] KVM: PPC: Book3S: Enable setting SRR1 flags for DSI
  2021-08-05  7:24 ` Bharata B Rao
  (?)
@ 2021-08-05  7:24   ` Bharata B Rao
  -1 siblings, 0 replies; 29+ messages in thread
From: Bharata B Rao @ 2021-08-05  7:24 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev; +Cc: kvm, aneesh.kumar, bharata.rao, Bharata B Rao

kvmppc_core_queue_data_storage() doesn't provide an option to
set SRR1 flags when raising DSI. Since kvmppc_inject_interrupt()
allows for such a provision, add an argument to allow the same.

This will be used to raise DSI with SRR1_PROGTRAP set when
expropriation interrupt needs to be injected to the guest.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
---
 arch/powerpc/include/asm/kvm_ppc.h     | 3 ++-
 arch/powerpc/kvm/book3s.c              | 6 +++---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 6 +++---
 arch/powerpc/kvm/book3s_hv.c           | 4 ++--
 arch/powerpc/kvm/book3s_hv_nested.c    | 4 ++--
 arch/powerpc/kvm/book3s_pr.c           | 4 ++--
 6 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 2d88944f9f34..09235bdfd4ac 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -143,7 +143,8 @@ extern void kvmppc_core_queue_dtlb_miss(struct kvm_vcpu *vcpu, ulong dear_flags,
 					ulong esr_flags);
 extern void kvmppc_core_queue_data_storage(struct kvm_vcpu *vcpu,
 					   ulong dear_flags,
-					   ulong esr_flags);
+					   ulong esr_flags,
+					   ulong srr1_flags);
 extern void kvmppc_core_queue_itlb_miss(struct kvm_vcpu *vcpu);
 extern void kvmppc_core_queue_inst_storage(struct kvm_vcpu *vcpu,
 					   ulong esr_flags);
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 79833f78d1da..f7f6641a788d 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -284,11 +284,11 @@ void kvmppc_core_dequeue_external(struct kvm_vcpu *vcpu)
 }
 
 void kvmppc_core_queue_data_storage(struct kvm_vcpu *vcpu, ulong dar,
-				    ulong flags)
+				    ulong dsisr, ulong srr1)
 {
 	kvmppc_set_dar(vcpu, dar);
-	kvmppc_set_dsisr(vcpu, flags);
-	kvmppc_inject_interrupt(vcpu, BOOK3S_INTERRUPT_DATA_STORAGE, 0);
+	kvmppc_set_dsisr(vcpu, dsisr);
+	kvmppc_inject_interrupt(vcpu, BOOK3S_INTERRUPT_DATA_STORAGE, srr1);
 }
 EXPORT_SYMBOL_GPL(kvmppc_core_queue_data_storage);
 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index b5905ae4377c..618206a504b0 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -946,7 +946,7 @@ int kvmppc_book3s_radix_page_fault(struct kvm_vcpu *vcpu,
 	if (dsisr & DSISR_BADACCESS) {
 		/* Reflect to the guest as DSI */
 		pr_err("KVM: Got radix HV page fault with DSISR=%lx\n", dsisr);
-		kvmppc_core_queue_data_storage(vcpu, ea, dsisr);
+		kvmppc_core_queue_data_storage(vcpu, ea, dsisr, 0);
 		return RESUME_GUEST;
 	}
 
@@ -971,7 +971,7 @@ int kvmppc_book3s_radix_page_fault(struct kvm_vcpu *vcpu,
 			 * Bad address in guest page table tree, or other
 			 * unusual error - reflect it to the guest as DSI.
 			 */
-			kvmppc_core_queue_data_storage(vcpu, ea, dsisr);
+			kvmppc_core_queue_data_storage(vcpu, ea, dsisr, 0);
 			return RESUME_GUEST;
 		}
 		return kvmppc_hv_emulate_mmio(vcpu, gpa, ea, writing);
@@ -981,7 +981,7 @@ int kvmppc_book3s_radix_page_fault(struct kvm_vcpu *vcpu,
 		if (writing) {
 			/* give the guest a DSI */
 			kvmppc_core_queue_data_storage(vcpu, ea, DSISR_ISSTORE |
-						       DSISR_PROTFAULT);
+						       DSISR_PROTFAULT, 0);
 			return RESUME_GUEST;
 		}
 		kvm_ro = true;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 47ccd4a2df54..d07e9065f7c1 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1592,7 +1592,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 
 		if (!(vcpu->arch.fault_dsisr & (DSISR_NOHPTE | DSISR_PROTFAULT))) {
 			kvmppc_core_queue_data_storage(vcpu,
-				vcpu->arch.fault_dar, vcpu->arch.fault_dsisr);
+				vcpu->arch.fault_dar, vcpu->arch.fault_dsisr, 0);
 			r = RESUME_GUEST;
 			break;
 		}
@@ -1610,7 +1610,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 			r = RESUME_PAGE_FAULT;
 		} else {
 			kvmppc_core_queue_data_storage(vcpu,
-				vcpu->arch.fault_dar, err);
+				vcpu->arch.fault_dar, err, 0);
 			r = RESUME_GUEST;
 		}
 		break;
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index 898f942eb198..a10ef0d5f925 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -1556,7 +1556,7 @@ static long int __kvmhv_nested_page_fault(struct kvm_vcpu *vcpu,
 	if (!memslot || (memslot->flags & KVM_MEMSLOT_INVALID)) {
 		if (dsisr & (DSISR_PRTABLE_FAULT | DSISR_BADACCESS)) {
 			/* unusual error -> reflect to the guest as a DSI */
-			kvmppc_core_queue_data_storage(vcpu, ea, dsisr);
+			kvmppc_core_queue_data_storage(vcpu, ea, dsisr, 0);
 			return RESUME_GUEST;
 		}
 
@@ -1567,7 +1567,7 @@ static long int __kvmhv_nested_page_fault(struct kvm_vcpu *vcpu,
 		if (writing) {
 			/* Give the guest a DSI */
 			kvmppc_core_queue_data_storage(vcpu, ea,
-					DSISR_ISSTORE | DSISR_PROTFAULT);
+					DSISR_ISSTORE | DSISR_PROTFAULT, 0);
 			return RESUME_GUEST;
 		}
 		kvm_ro = true;
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 6bc9425acb32..f7fc8e01fd8e 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -754,7 +754,7 @@ static int kvmppc_handle_pagefault(struct kvm_vcpu *vcpu,
 			flags = DSISR_NOHPTE;
 		if (data) {
 			flags |= vcpu->arch.fault_dsisr & DSISR_ISSTORE;
-			kvmppc_core_queue_data_storage(vcpu, eaddr, flags);
+			kvmppc_core_queue_data_storage(vcpu, eaddr, flags, 0);
 		} else {
 			kvmppc_core_queue_inst_storage(vcpu, flags);
 		}
@@ -1229,7 +1229,7 @@ int kvmppc_handle_exit_pr(struct kvm_vcpu *vcpu, unsigned int exit_nr)
 			r = kvmppc_handle_pagefault(vcpu, dar, exit_nr);
 			srcu_read_unlock(&vcpu->kvm->srcu, idx);
 		} else {
-			kvmppc_core_queue_data_storage(vcpu, dar, fault_dsisr);
+			kvmppc_core_queue_data_storage(vcpu, dar, fault_dsisr, 0);
 			r = RESUME_GUEST;
 		}
 		break;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH v0 3/5] KVM: PPC: Book3S: Enable setting SRR1 flags for DSI
@ 2021-08-05  7:24   ` Bharata B Rao
  0 siblings, 0 replies; 29+ messages in thread
From: Bharata B Rao @ 2021-08-05  7:24 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev; +Cc: aneesh.kumar, Bharata B Rao, kvm, bharata.rao

kvmppc_core_queue_data_storage() doesn't provide an option to
set SRR1 flags when raising DSI. Since kvmppc_inject_interrupt()
allows for such a provision, add an argument to allow the same.

This will be used to raise DSI with SRR1_PROGTRAP set when
expropriation interrupt needs to be injected to the guest.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
---
 arch/powerpc/include/asm/kvm_ppc.h     | 3 ++-
 arch/powerpc/kvm/book3s.c              | 6 +++---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 6 +++---
 arch/powerpc/kvm/book3s_hv.c           | 4 ++--
 arch/powerpc/kvm/book3s_hv_nested.c    | 4 ++--
 arch/powerpc/kvm/book3s_pr.c           | 4 ++--
 6 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 2d88944f9f34..09235bdfd4ac 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -143,7 +143,8 @@ extern void kvmppc_core_queue_dtlb_miss(struct kvm_vcpu *vcpu, ulong dear_flags,
 					ulong esr_flags);
 extern void kvmppc_core_queue_data_storage(struct kvm_vcpu *vcpu,
 					   ulong dear_flags,
-					   ulong esr_flags);
+					   ulong esr_flags,
+					   ulong srr1_flags);
 extern void kvmppc_core_queue_itlb_miss(struct kvm_vcpu *vcpu);
 extern void kvmppc_core_queue_inst_storage(struct kvm_vcpu *vcpu,
 					   ulong esr_flags);
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 79833f78d1da..f7f6641a788d 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -284,11 +284,11 @@ void kvmppc_core_dequeue_external(struct kvm_vcpu *vcpu)
 }
 
 void kvmppc_core_queue_data_storage(struct kvm_vcpu *vcpu, ulong dar,
-				    ulong flags)
+				    ulong dsisr, ulong srr1)
 {
 	kvmppc_set_dar(vcpu, dar);
-	kvmppc_set_dsisr(vcpu, flags);
-	kvmppc_inject_interrupt(vcpu, BOOK3S_INTERRUPT_DATA_STORAGE, 0);
+	kvmppc_set_dsisr(vcpu, dsisr);
+	kvmppc_inject_interrupt(vcpu, BOOK3S_INTERRUPT_DATA_STORAGE, srr1);
 }
 EXPORT_SYMBOL_GPL(kvmppc_core_queue_data_storage);
 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index b5905ae4377c..618206a504b0 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -946,7 +946,7 @@ int kvmppc_book3s_radix_page_fault(struct kvm_vcpu *vcpu,
 	if (dsisr & DSISR_BADACCESS) {
 		/* Reflect to the guest as DSI */
 		pr_err("KVM: Got radix HV page fault with DSISR=%lx\n", dsisr);
-		kvmppc_core_queue_data_storage(vcpu, ea, dsisr);
+		kvmppc_core_queue_data_storage(vcpu, ea, dsisr, 0);
 		return RESUME_GUEST;
 	}
 
@@ -971,7 +971,7 @@ int kvmppc_book3s_radix_page_fault(struct kvm_vcpu *vcpu,
 			 * Bad address in guest page table tree, or other
 			 * unusual error - reflect it to the guest as DSI.
 			 */
-			kvmppc_core_queue_data_storage(vcpu, ea, dsisr);
+			kvmppc_core_queue_data_storage(vcpu, ea, dsisr, 0);
 			return RESUME_GUEST;
 		}
 		return kvmppc_hv_emulate_mmio(vcpu, gpa, ea, writing);
@@ -981,7 +981,7 @@ int kvmppc_book3s_radix_page_fault(struct kvm_vcpu *vcpu,
 		if (writing) {
 			/* give the guest a DSI */
 			kvmppc_core_queue_data_storage(vcpu, ea, DSISR_ISSTORE |
-						       DSISR_PROTFAULT);
+						       DSISR_PROTFAULT, 0);
 			return RESUME_GUEST;
 		}
 		kvm_ro = true;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 47ccd4a2df54..d07e9065f7c1 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1592,7 +1592,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 
 		if (!(vcpu->arch.fault_dsisr & (DSISR_NOHPTE | DSISR_PROTFAULT))) {
 			kvmppc_core_queue_data_storage(vcpu,
-				vcpu->arch.fault_dar, vcpu->arch.fault_dsisr);
+				vcpu->arch.fault_dar, vcpu->arch.fault_dsisr, 0);
 			r = RESUME_GUEST;
 			break;
 		}
@@ -1610,7 +1610,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 			r = RESUME_PAGE_FAULT;
 		} else {
 			kvmppc_core_queue_data_storage(vcpu,
-				vcpu->arch.fault_dar, err);
+				vcpu->arch.fault_dar, err, 0);
 			r = RESUME_GUEST;
 		}
 		break;
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index 898f942eb198..a10ef0d5f925 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -1556,7 +1556,7 @@ static long int __kvmhv_nested_page_fault(struct kvm_vcpu *vcpu,
 	if (!memslot || (memslot->flags & KVM_MEMSLOT_INVALID)) {
 		if (dsisr & (DSISR_PRTABLE_FAULT | DSISR_BADACCESS)) {
 			/* unusual error -> reflect to the guest as a DSI */
-			kvmppc_core_queue_data_storage(vcpu, ea, dsisr);
+			kvmppc_core_queue_data_storage(vcpu, ea, dsisr, 0);
 			return RESUME_GUEST;
 		}
 
@@ -1567,7 +1567,7 @@ static long int __kvmhv_nested_page_fault(struct kvm_vcpu *vcpu,
 		if (writing) {
 			/* Give the guest a DSI */
 			kvmppc_core_queue_data_storage(vcpu, ea,
-					DSISR_ISSTORE | DSISR_PROTFAULT);
+					DSISR_ISSTORE | DSISR_PROTFAULT, 0);
 			return RESUME_GUEST;
 		}
 		kvm_ro = true;
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 6bc9425acb32..f7fc8e01fd8e 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -754,7 +754,7 @@ static int kvmppc_handle_pagefault(struct kvm_vcpu *vcpu,
 			flags = DSISR_NOHPTE;
 		if (data) {
 			flags |= vcpu->arch.fault_dsisr & DSISR_ISSTORE;
-			kvmppc_core_queue_data_storage(vcpu, eaddr, flags);
+			kvmppc_core_queue_data_storage(vcpu, eaddr, flags, 0);
 		} else {
 			kvmppc_core_queue_inst_storage(vcpu, flags);
 		}
@@ -1229,7 +1229,7 @@ int kvmppc_handle_exit_pr(struct kvm_vcpu *vcpu, unsigned int exit_nr)
 			r = kvmppc_handle_pagefault(vcpu, dar, exit_nr);
 			srcu_read_unlock(&vcpu->kvm->srcu, idx);
 		} else {
-			kvmppc_core_queue_data_storage(vcpu, dar, fault_dsisr);
+			kvmppc_core_queue_data_storage(vcpu, dar, fault_dsisr, 0);
 			r = RESUME_GUEST;
 		}
 		break;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH v0 4/5] KVM: PPC: BOOK3S HV: Async PF support
  2021-08-05  7:24 ` Bharata B Rao
  (?)
@ 2021-08-05  7:24   ` Bharata B Rao
  -1 siblings, 0 replies; 29+ messages in thread
From: Bharata B Rao @ 2021-08-05  7:24 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev; +Cc: kvm, aneesh.kumar, bharata.rao, Bharata B Rao

Add asynchronous page fault support for PowerKVM by making
use of the Expropriation/Subvention Notification Option
defined by PAPR specifications.

1. When guest accessed page isn't immediately available in the
host, update the vcpu's VPA with a unique expropriation correlation
number and inject a DSI to the guest with SRR1_PROGTRAP bit set in
SRR1. This informs the guest vcpu to put the process to wait and
schedule a different process.
   - Async PF is supported for data pages in this implementation
     though PAPR allows it for code pages too.
   - Async PF is supported only for user pages here.
   - The feature is currently limited only to radix guests.

2. When the page becomes available, update the Subvention Notification
Structure  with the corresponding expropriation correlation number and
and inform the guest via subvention interrupt.
   - Subvention Notification Structure (SNS) is a region of memory
     shared between host and guest via which the communication related
     to expropriated and subvened pages happens between guest and host.
   - SNS region is registered by the guest via H_REG_SNS hcall which
     is implemented in QEMU.
   - H_REG_SNS implementation in QEMU needs a new ioctl KVM_PPC_SET_SNS.
     This ioctl is used to map and pin the guest page containing SNS
     in the host.
   - Subvention notification interrupt is raised to the guest by
     QEMU in response to the guest exit via KVM_REQ_ESN_EXIT. This
     interrupt informs the guest about the availability of the
     pages.

TODO:
- H_REG_SNS is implemented in QEMU because this hcall needs to return
  the interrupt source number associated with the subvention interrupt.
  Claiming of IRQ line and raising an external interrupt seem to be
  straightforward from QEMU. Figure out the in-kernel equivalents for
  these two so that, we can save on guest exit for each expropriated
  page and move the entire hcall implementation into the host kernel.
- The code is pretty much experimental and is barely able to boot a
  guest. I do see some requests for expropriated pages not getting
  fulfilled by host leading the long delays in guest. This needs some
  debugging.
- A few other aspects recommended by PAPR around this feature(like
  setting of page state flags) need to be evaluated and incorporated
  into the implementation if found appropriate.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
---
 Documentation/virt/kvm/api.rst            |  15 ++
 arch/powerpc/include/asm/hvcall.h         |   1 +
 arch/powerpc/include/asm/kvm_book3s_esn.h |  24 +++
 arch/powerpc/include/asm/kvm_host.h       |  21 +++
 arch/powerpc/include/asm/kvm_ppc.h        |   1 +
 arch/powerpc/include/asm/lppaca.h         |  12 +-
 arch/powerpc/include/uapi/asm/kvm.h       |   6 +
 arch/powerpc/kvm/Kconfig                  |   2 +
 arch/powerpc/kvm/Makefile                 |   5 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c    |   3 +
 arch/powerpc/kvm/book3s_hv.c              |  25 +++
 arch/powerpc/kvm/book3s_hv_esn.c          | 189 ++++++++++++++++++++++
 include/uapi/linux/kvm.h                  |   1 +
 tools/include/uapi/linux/kvm.h            |   1 +
 14 files changed, 303 insertions(+), 3 deletions(-)
 create mode 100644 arch/powerpc/include/asm/kvm_book3s_esn.h
 create mode 100644 arch/powerpc/kvm/book3s_hv_esn.c

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index dae68e68ca23..512f078b9d02 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -5293,6 +5293,21 @@ the trailing ``'\0'``, is indicated by ``name_size`` in the header.
 The Stats Data block contains an array of 64-bit values in the same order
 as the descriptors in Descriptors block.
 
+4.134 KVM_PPC_SET_SNS
+---------------------
+
+:Capability: basic
+:Architectures: powerpc
+:Type: vm ioctl
+:Parameters: none
+:Returns: 0 on successful completion,
+
+As part of H_REG_SNS hypercall, this ioctl is used to map and pin
+the guest provided SNS structure in the host.
+
+This is used for providing asynchronous page fault support for
+powerpc pseries KVM guests.
+
 5. The kvm_run structure
 ========================
 
diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index 9bcf345cb208..9e33500c1723 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -321,6 +321,7 @@
 #define H_SCM_UNBIND_ALL        0x3FC
 #define H_SCM_HEALTH            0x400
 #define H_SCM_PERFORMANCE_STATS 0x418
+#define H_REG_SNS		0x41C
 #define H_RPT_INVALIDATE	0x448
 #define H_SCM_FLUSH		0x44C
 #define MAX_HCALL_OPCODE	H_SCM_FLUSH
diff --git a/arch/powerpc/include/asm/kvm_book3s_esn.h b/arch/powerpc/include/asm/kvm_book3s_esn.h
new file mode 100644
index 000000000000..d79a441ea31d
--- /dev/null
+++ b/arch/powerpc/include/asm/kvm_book3s_esn.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_KVM_BOOK3S_ESN_H__
+#define __ASM_KVM_BOOK3S_ESN_H__
+
+/* SNS buffer EQ state flags */
+#define SNS_EQ_STATE_OPERATIONAL 0X0
+#define SNS_EQ_STATE_OVERFLOW 0x1
+
+/* SNS buffer Notification control bits */
+#define SNS_EQ_CNTRL_TRIGGER 0x1
+
+struct kvmppc_sns {
+	unsigned long gpa;
+	unsigned long len;
+	void *hva;
+	uint16_t exp_corr_nr;
+	uint16_t *eq;
+	uint8_t *eq_cntrl;
+	uint8_t *eq_state;
+	unsigned long next_eq_entry;
+	unsigned long nr_eq_entries;
+};
+
+#endif /* __ASM_KVM_BOOK3S_ESN_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 204dc2d91388..8d7f73085ef5 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -25,6 +25,7 @@
 #include <asm/cacheflush.h>
 #include <asm/hvcall.h>
 #include <asm/mce.h>
+#include <asm/kvm_book3s_esn.h>
 
 #define KVM_MAX_VCPUS		NR_CPUS
 #define KVM_MAX_VCORES		NR_CPUS
@@ -325,6 +326,7 @@ struct kvm_arch {
 #endif
 	struct kvmppc_ops *kvm_ops;
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+	struct kvmppc_sns sns;
 	struct mutex uvmem_lock;
 	struct list_head uvmem_pfns;
 	struct mutex mmu_setup_lock;	/* nests inside vcpu mutexes */
@@ -855,6 +857,25 @@ struct kvm_vcpu_arch {
 #define __KVM_HAVE_ARCH_WQP
 #define __KVM_HAVE_CREATE_DEVICE
 
+/* Async pf */
+#define ASYNC_PF_PER_VCPU       64
+struct kvm_arch_async_pf {
+	unsigned long exp_token;
+};
+int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu,
+			       unsigned long gpa, unsigned long hva);
+
+void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
+			       struct kvm_async_pf *work);
+
+bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
+				     struct kvm_async_pf *work);
+
+void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
+				 struct kvm_async_pf *work);
+bool kvm_arch_can_dequeue_async_page_present(struct kvm_vcpu *vcpu);
+static inline void kvm_arch_async_page_present_queued(struct kvm_vcpu *vcpu) {}
+
 static inline void kvm_arch_hardware_disable(void) {}
 static inline void kvm_arch_hardware_unsetup(void) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 09235bdfd4ac..c14a84041d0e 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -228,6 +228,7 @@ extern long kvm_vm_ioctl_resize_hpt_commit(struct kvm *kvm,
 int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq);
 
 extern int kvm_vm_ioctl_rtas_define_token(struct kvm *kvm, void __user *argp);
+long kvm_vm_ioctl_set_sns(struct kvm *kvm, struct kvm_ppc_sns_reg *sns_reg);
 extern int kvmppc_rtas_hcall(struct kvm_vcpu *vcpu);
 extern void kvmppc_rtas_tokens_free(struct kvm *kvm);
 
diff --git a/arch/powerpc/include/asm/lppaca.h b/arch/powerpc/include/asm/lppaca.h
index 57e432766f3e..17e89c3865e8 100644
--- a/arch/powerpc/include/asm/lppaca.h
+++ b/arch/powerpc/include/asm/lppaca.h
@@ -104,7 +104,17 @@ struct lppaca {
 	volatile __be32 dispersion_count; /* dispatch changed physical cpu */
 	volatile __be64 cmo_faults;	/* CMO page fault count */
 	volatile __be64 cmo_fault_time;	/* CMO page fault time */
-	u8	reserved10[104];
+
+	/*
+	 * TODO: Insert this at correct offset
+	 * 0x17D - Exp flags (1 byte)
+	 * 0x17E - Exp corr number (2 bytes)
+	 *
+	 * Here I am using only exp corr number at an easy to insert
+	 * offset.
+	 */
+	__be16 exp_corr_nr; /* Exproppriation correlation number */
+	u8	reserved10[102];
 
 	/* cacheline 4-5 */
 
diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
index 9f18fa090f1f..d72739126ae5 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -470,6 +470,12 @@ struct kvm_ppc_cpu_char {
 #define KVM_PPC_CPU_BEHAV_BNDS_CHK_SPEC_BAR	(1ULL << 61)
 #define KVM_PPC_CPU_BEHAV_FLUSH_COUNT_CACHE	(1ull << 58)
 
+/* For KVM_PPC_SET_SNS */
+struct kvm_ppc_sns_reg {
+	__u64 addr;
+	__u64 len;
+};
+
 /* Per-vcpu XICS interrupt controller state */
 #define KVM_REG_PPC_ICP_STATE	(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x8c)
 
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index e45644657d49..4f552649a4b2 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -85,6 +85,8 @@ config KVM_BOOK3S_64_HV
 	depends on KVM_BOOK3S_64 && PPC_POWERNV
 	select KVM_BOOK3S_HV_POSSIBLE
 	select MMU_NOTIFIER
+	select KVM_ASYNC_PF
+	select KVM_ASYNC_PF_SYNC
 	select CMA
 	help
 	  Support running unmodified book3s_64 guest kernels in
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 583c14ef596e..603ab382d021 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -6,7 +6,7 @@
 ccflags-y := -Ivirt/kvm -Iarch/powerpc/kvm
 KVM := ../../../virt/kvm
 
-common-objs-y = $(KVM)/kvm_main.o $(KVM)/eventfd.o $(KVM)/binary_stats.o
+common-objs-y = $(KVM)/kvm_main.o $(KVM)/eventfd.o $(KVM)/binary_stats.o $(KVM)/async_pf.o
 common-objs-$(CONFIG_KVM_VFIO) += $(KVM)/vfio.o
 common-objs-$(CONFIG_KVM_MMIO) += $(KVM)/coalesced_mmio.o
 
@@ -70,7 +70,8 @@ kvm-hv-y += \
 	book3s_hv_interrupts.o \
 	book3s_64_mmu_hv.o \
 	book3s_64_mmu_radix.o \
-	book3s_hv_nested.o
+	book3s_hv_nested.o \
+	book3s_hv_esn.o
 
 kvm-hv-$(CONFIG_PPC_UV) += \
 	book3s_hv_uvmem.o
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 618206a504b0..1985f84bfebe 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -837,6 +837,9 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu,
 	} else {
 		unsigned long pfn;
 
+		if (kvm_arch_setup_async_pf(vcpu, gpa, hva))
+			return RESUME_GUEST;
+
 		/* Call KVM generic code to do the slow-path check */
 		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
 					   writing, upgrade_p, NULL);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index d07e9065f7c1..5cc564321521 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -77,6 +77,7 @@
 #include <asm/ultravisor.h>
 #include <asm/dtl.h>
 #include <asm/plpar_wrappers.h>
+#include <asm/kvm_book3s_esn.h>
 
 #include "book3s.h"
 
@@ -4570,6 +4571,11 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
 		return -EINTR;
 	}
 
+	if (kvm_request_pending(vcpu)) {
+		if (!kvmppc_core_check_requests(vcpu))
+			return 0;
+	}
+
 	kvm = vcpu->kvm;
 	atomic_inc(&kvm->arch.vcpus_running);
 	/* Order vcpus_running vs. mmu_ready, see kvmppc_alloc_reset_hpt */
@@ -4591,6 +4597,7 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
 	vcpu->arch.state = KVMPPC_VCPU_BUSY_IN_HOST;
 
 	do {
+		kvm_check_async_pf_completion(vcpu);
 		if (cpu_has_feature(CPU_FTR_ARCH_300))
 			r = kvmhv_run_single_vcpu(vcpu, ~(u64)0,
 						  vcpu->arch.vcore->lpcr);
@@ -5257,6 +5264,8 @@ static void kvmppc_free_vcores(struct kvm *kvm)
 
 static void kvmppc_core_destroy_vm_hv(struct kvm *kvm)
 {
+	struct kvm_ppc_sns_reg sns_reg;
+
 	debugfs_remove_recursive(kvm->arch.debugfs_dir);
 
 	if (!cpu_has_feature(CPU_FTR_ARCH_300))
@@ -5283,6 +5292,11 @@ static void kvmppc_core_destroy_vm_hv(struct kvm *kvm)
 	kvmppc_free_lpid(kvm->arch.lpid);
 
 	kvmppc_free_pimap(kvm);
+
+	/* Needed for de-registering SNS buffer */
+	sns_reg.addr = -1;
+	sns_reg.len = 0;
+	kvm_vm_ioctl_set_sns(kvm, &sns_reg);
 }
 
 /* We don't need to emulate any privileged instructions or dcbz */
@@ -5561,6 +5575,17 @@ static long kvm_arch_vm_ioctl_hv(struct file *filp,
 		break;
 	}
 
+	case KVM_PPC_SET_SNS: {
+		struct kvm_ppc_sns_reg sns_reg;
+
+		r = -EFAULT;
+		if (copy_from_user(&sns_reg, argp, sizeof(sns_reg)))
+			break;
+
+		r = kvm_vm_ioctl_set_sns(kvm, &sns_reg);
+		break;
+	}
+
 	default:
 		r = -ENOTTY;
 	}
diff --git a/arch/powerpc/kvm/book3s_hv_esn.c b/arch/powerpc/kvm/book3s_hv_esn.c
new file mode 100644
index 000000000000..b322a14c1f83
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_hv_esn.c
@@ -0,0 +1,189 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Async page fault support via PAPR Expropriation/Subvention Notification
+ * option(ESN)
+ *
+ * Copyright 2020 Bharata B Rao, IBM Corp. <bharata@linux.ibm.com>
+ */
+
+#include <linux/kvm_host.h>
+#include <asm/kvm_ppc.h>
+#include <asm/kvm_book3s_esn.h>
+
+static DEFINE_SPINLOCK(async_exp_lock); /* for updating exp_corr_nr */
+static DEFINE_SPINLOCK(async_sns_lock); /* SNS buffer updated under this lock */
+
+int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu,
+			       unsigned long gpa, unsigned long hva)
+{
+	struct kvm_arch_async_pf arch;
+	struct lppaca *vpa = vcpu->arch.vpa.pinned_addr;
+	u64 msr = kvmppc_get_msr(vcpu);
+	struct kvmppc_sns *sns = &vcpu->kvm->arch.sns;
+
+	/*
+	 * If VPA hasn't been registered yet, can't support
+	 * async pf.
+	 */
+	if (!vpa)
+		return 0;
+
+	/*
+	 * If SNS memory area hasn't been registered yet,
+	 * can't support async pf.
+	 */
+	if (!vcpu->kvm->arch.sns.eq)
+		return 0;
+
+	/*
+	 * If guest hasn't enabled expropriation interrupt,
+	 * don't try async pf.
+	 */
+	if (!(vpa->byte_b9 & LPPACA_EXP_INT_ENABLED))
+		return 0;
+
+	/*
+	 * If the fault is in the guest kernel, don,t
+	 * try async pf.
+	 */
+	if (!(msr & MSR_PR) && !(msr & MSR_HV))
+		return 0;
+
+	spin_lock(&async_sns_lock);
+	/*
+	 * Check if subvention event queue can
+	 * overflow, if so, don't try async pf.
+	 */
+	if (*(sns->eq + sns->next_eq_entry)) {
+		pr_err("%s: SNS buffer overflow\n", __func__);
+		spin_unlock(&async_sns_lock);
+		return 0;
+	}
+	spin_unlock(&async_sns_lock);
+
+	/*
+	 * TODO:
+	 *
+	 * 1. Update exp flags bit 7 to 1
+	 * ("The Subvened page data will be restored")
+	 *
+	 * 2. Check if request to this page has been
+	 * notified to guest earlier, if so send back
+	 * the same exp corr number.
+	 *
+	 * 3. exp_corr_nr could be a random but non-zero
+	 * number. Not taking care of wrapping here. Fix
+	 * it.
+	 */
+	spin_lock(&async_exp_lock);
+	vpa->exp_corr_nr = cpu_to_be16(vcpu->kvm->arch.sns.exp_corr_nr);
+	arch.exp_token = vcpu->kvm->arch.sns.exp_corr_nr++;
+	spin_unlock(&async_exp_lock);
+
+	return kvm_setup_async_pf(vcpu, gpa, hva, &arch);
+}
+
+bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
+				     struct kvm_async_pf *work)
+{
+	/* Inject DSI to guest with srr1 bit 46 set */
+	kvmppc_core_queue_data_storage(vcpu, kvmppc_get_dar(vcpu), DSISR_NOHPTE, SRR1_PROGTRAP);
+	return true;
+}
+
+void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
+				 struct kvm_async_pf *work)
+{
+	struct kvmppc_sns *sns = &vcpu->kvm->arch.sns;
+
+	spin_lock(&async_sns_lock);
+	if (*sns->eq_cntrl != SNS_EQ_CNTRL_TRIGGER) {
+		pr_err("%s: SNS Notification Trigger not set by guest\n", __func__);
+		spin_unlock(&async_sns_lock);
+		/* TODO: Terminate the guest? */
+		return;
+	}
+
+	if (arch_cmpxchg(sns->eq + sns->next_eq_entry, 0,
+	    work->arch.exp_token)) {
+		*sns->eq_state |= SNS_EQ_STATE_OVERFLOW;
+		pr_err("%s: SNS buffer overflow\n", __func__);
+		spin_unlock(&async_sns_lock);
+		/* TODO: Terminate the guest? */
+		return;
+	}
+
+	sns->next_eq_entry = (sns->next_eq_entry + 1) % sns->nr_eq_entries;
+	spin_unlock(&async_sns_lock);
+
+	/*
+	 * Request a guest exit so that ESN virtual interrupt can
+	 * be injected by QEMU.
+	 */
+	kvm_make_request(KVM_REQ_ESN_EXIT, vcpu);
+}
+
+void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
+{
+	/* We will inject the page directly */
+}
+
+bool kvm_arch_can_dequeue_async_page_present(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * PowerPC will always inject the page directly,
+	 * but we still want check_async_completion to cleanup
+	 */
+	return true;
+}
+
+long kvm_vm_ioctl_set_sns(struct kvm *kvm, struct kvm_ppc_sns_reg *sns_reg)
+{
+	unsigned long nb;
+
+	/* Deregister */
+	if (sns_reg->addr == -1) {
+		if (!kvm->arch.sns.hva)
+			return 0;
+
+		pr_info("%s: Deregistering SNS buffer for LPID %d\n",
+			__func__, kvm->arch.lpid);
+		kvmppc_unpin_guest_page(kvm, kvm->arch.sns.hva, kvm->arch.sns.gpa, false);
+		kvm->arch.sns.gpa = -1;
+		kvm->arch.sns.hva = 0;
+		return 0;
+	}
+
+	/*
+	 * Already registered with the same address?
+	 */
+	if (sns_reg->addr == kvm->arch.sns.gpa)
+		return 0;
+
+	/* If previous registration exists, free it */
+	if (kvm->arch.sns.hva) {
+		pr_info("%s: Deregistering Previous SNS buffer for LPID %d\n",
+			__func__, kvm->arch.lpid);
+		kvmppc_unpin_guest_page(kvm, kvm->arch.sns.hva, kvm->arch.sns.gpa, false);
+		kvm->arch.sns.gpa = -1;
+		kvm->arch.sns.hva = 0;
+	}
+
+	kvm->arch.sns.gpa = sns_reg->addr;
+	kvm->arch.sns.hva = kvmppc_pin_guest_page(kvm, kvm->arch.sns.gpa, &nb);
+	kvm->arch.sns.len = sns_reg->len;
+	kvm->arch.sns.nr_eq_entries = (kvm->arch.sns.len - 2) / sizeof(uint16_t);
+	kvm->arch.sns.next_eq_entry = 0;
+	kvm->arch.sns.eq = kvm->arch.sns.hva + 2;
+	kvm->arch.sns.eq_cntrl = kvm->arch.sns.hva;
+	kvm->arch.sns.eq_state = kvm->arch.sns.hva + 1;
+	kvm->arch.sns.exp_corr_nr = 1; /* Should be non-zero */
+
+	*(kvm->arch.sns.eq_state) = SNS_EQ_STATE_OPERATIONAL;
+
+	pr_info("%s: Registering SNS buffer for LPID %d sns_addr %llx eq %lx\n",
+		__func__, kvm->arch.lpid, sns_reg->addr,
+		(unsigned long)kvm->arch.sns.eq);
+
+	return 0;
+}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 47be532ed14b..dbe65e8d68d8 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1459,6 +1459,7 @@ struct kvm_s390_ucas_mapping {
 #define KVM_SET_PMU_EVENT_FILTER  _IOW(KVMIO,  0xb2, struct kvm_pmu_event_filter)
 #define KVM_PPC_SVM_OFF		  _IO(KVMIO,  0xb3)
 #define KVM_ARM_MTE_COPY_TAGS	  _IOR(KVMIO,  0xb4, struct kvm_arm_copy_mte_tags)
+#define KVM_PPC_SET_SNS		  _IOR(KVMIO, 0xb5, struct kvm_ppc_sns_reg)
 
 /* ioctl for vm fd */
 #define KVM_CREATE_DEVICE	  _IOWR(KVMIO,  0xe0, struct kvm_create_device)
diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h
index d9e4aabcb31a..e9dea164498f 100644
--- a/tools/include/uapi/linux/kvm.h
+++ b/tools/include/uapi/linux/kvm.h
@@ -1458,6 +1458,7 @@ struct kvm_s390_ucas_mapping {
 #define KVM_SET_PMU_EVENT_FILTER  _IOW(KVMIO,  0xb2, struct kvm_pmu_event_filter)
 #define KVM_PPC_SVM_OFF		  _IO(KVMIO,  0xb3)
 #define KVM_ARM_MTE_COPY_TAGS	  _IOR(KVMIO,  0xb4, struct kvm_arm_copy_mte_tags)
+#define KVM_PPC_SET_SNS		  _IOR(KVMIO, 0xb5, struct kvm_ppc_sns_reg)
 
 /* ioctl for vm fd */
 #define KVM_CREATE_DEVICE	  _IOWR(KVMIO,  0xe0, struct kvm_create_device)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH v0 4/5] KVM: PPC: BOOK3S HV: Async PF support
@ 2021-08-05  7:24   ` Bharata B Rao
  0 siblings, 0 replies; 29+ messages in thread
From: Bharata B Rao @ 2021-08-05  7:24 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev; +Cc: aneesh.kumar, Bharata B Rao, kvm, bharata.rao

Add asynchronous page fault support for PowerKVM by making
use of the Expropriation/Subvention Notification Option
defined by PAPR specifications.

1. When guest accessed page isn't immediately available in the
host, update the vcpu's VPA with a unique expropriation correlation
number and inject a DSI to the guest with SRR1_PROGTRAP bit set in
SRR1. This informs the guest vcpu to put the process to wait and
schedule a different process.
   - Async PF is supported for data pages in this implementation
     though PAPR allows it for code pages too.
   - Async PF is supported only for user pages here.
   - The feature is currently limited only to radix guests.

2. When the page becomes available, update the Subvention Notification
Structure  with the corresponding expropriation correlation number and
and inform the guest via subvention interrupt.
   - Subvention Notification Structure (SNS) is a region of memory
     shared between host and guest via which the communication related
     to expropriated and subvened pages happens between guest and host.
   - SNS region is registered by the guest via H_REG_SNS hcall which
     is implemented in QEMU.
   - H_REG_SNS implementation in QEMU needs a new ioctl KVM_PPC_SET_SNS.
     This ioctl is used to map and pin the guest page containing SNS
     in the host.
   - Subvention notification interrupt is raised to the guest by
     QEMU in response to the guest exit via KVM_REQ_ESN_EXIT. This
     interrupt informs the guest about the availability of the
     pages.

TODO:
- H_REG_SNS is implemented in QEMU because this hcall needs to return
  the interrupt source number associated with the subvention interrupt.
  Claiming of IRQ line and raising an external interrupt seem to be
  straightforward from QEMU. Figure out the in-kernel equivalents for
  these two so that, we can save on guest exit for each expropriated
  page and move the entire hcall implementation into the host kernel.
- The code is pretty much experimental and is barely able to boot a
  guest. I do see some requests for expropriated pages not getting
  fulfilled by host leading the long delays in guest. This needs some
  debugging.
- A few other aspects recommended by PAPR around this feature(like
  setting of page state flags) need to be evaluated and incorporated
  into the implementation if found appropriate.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
---
 Documentation/virt/kvm/api.rst            |  15 ++
 arch/powerpc/include/asm/hvcall.h         |   1 +
 arch/powerpc/include/asm/kvm_book3s_esn.h |  24 +++
 arch/powerpc/include/asm/kvm_host.h       |  21 +++
 arch/powerpc/include/asm/kvm_ppc.h        |   1 +
 arch/powerpc/include/asm/lppaca.h         |  12 +-
 arch/powerpc/include/uapi/asm/kvm.h       |   6 +
 arch/powerpc/kvm/Kconfig                  |   2 +
 arch/powerpc/kvm/Makefile                 |   5 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c    |   3 +
 arch/powerpc/kvm/book3s_hv.c              |  25 +++
 arch/powerpc/kvm/book3s_hv_esn.c          | 189 ++++++++++++++++++++++
 include/uapi/linux/kvm.h                  |   1 +
 tools/include/uapi/linux/kvm.h            |   1 +
 14 files changed, 303 insertions(+), 3 deletions(-)
 create mode 100644 arch/powerpc/include/asm/kvm_book3s_esn.h
 create mode 100644 arch/powerpc/kvm/book3s_hv_esn.c

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index dae68e68ca23..512f078b9d02 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -5293,6 +5293,21 @@ the trailing ``'\0'``, is indicated by ``name_size`` in the header.
 The Stats Data block contains an array of 64-bit values in the same order
 as the descriptors in Descriptors block.
 
+4.134 KVM_PPC_SET_SNS
+---------------------
+
+:Capability: basic
+:Architectures: powerpc
+:Type: vm ioctl
+:Parameters: none
+:Returns: 0 on successful completion,
+
+As part of H_REG_SNS hypercall, this ioctl is used to map and pin
+the guest provided SNS structure in the host.
+
+This is used for providing asynchronous page fault support for
+powerpc pseries KVM guests.
+
 5. The kvm_run structure
 ========================
 
diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index 9bcf345cb208..9e33500c1723 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -321,6 +321,7 @@
 #define H_SCM_UNBIND_ALL        0x3FC
 #define H_SCM_HEALTH            0x400
 #define H_SCM_PERFORMANCE_STATS 0x418
+#define H_REG_SNS		0x41C
 #define H_RPT_INVALIDATE	0x448
 #define H_SCM_FLUSH		0x44C
 #define MAX_HCALL_OPCODE	H_SCM_FLUSH
diff --git a/arch/powerpc/include/asm/kvm_book3s_esn.h b/arch/powerpc/include/asm/kvm_book3s_esn.h
new file mode 100644
index 000000000000..d79a441ea31d
--- /dev/null
+++ b/arch/powerpc/include/asm/kvm_book3s_esn.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_KVM_BOOK3S_ESN_H__
+#define __ASM_KVM_BOOK3S_ESN_H__
+
+/* SNS buffer EQ state flags */
+#define SNS_EQ_STATE_OPERATIONAL 0X0
+#define SNS_EQ_STATE_OVERFLOW 0x1
+
+/* SNS buffer Notification control bits */
+#define SNS_EQ_CNTRL_TRIGGER 0x1
+
+struct kvmppc_sns {
+	unsigned long gpa;
+	unsigned long len;
+	void *hva;
+	uint16_t exp_corr_nr;
+	uint16_t *eq;
+	uint8_t *eq_cntrl;
+	uint8_t *eq_state;
+	unsigned long next_eq_entry;
+	unsigned long nr_eq_entries;
+};
+
+#endif /* __ASM_KVM_BOOK3S_ESN_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 204dc2d91388..8d7f73085ef5 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -25,6 +25,7 @@
 #include <asm/cacheflush.h>
 #include <asm/hvcall.h>
 #include <asm/mce.h>
+#include <asm/kvm_book3s_esn.h>
 
 #define KVM_MAX_VCPUS		NR_CPUS
 #define KVM_MAX_VCORES		NR_CPUS
@@ -325,6 +326,7 @@ struct kvm_arch {
 #endif
 	struct kvmppc_ops *kvm_ops;
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+	struct kvmppc_sns sns;
 	struct mutex uvmem_lock;
 	struct list_head uvmem_pfns;
 	struct mutex mmu_setup_lock;	/* nests inside vcpu mutexes */
@@ -855,6 +857,25 @@ struct kvm_vcpu_arch {
 #define __KVM_HAVE_ARCH_WQP
 #define __KVM_HAVE_CREATE_DEVICE
 
+/* Async pf */
+#define ASYNC_PF_PER_VCPU       64
+struct kvm_arch_async_pf {
+	unsigned long exp_token;
+};
+int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu,
+			       unsigned long gpa, unsigned long hva);
+
+void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
+			       struct kvm_async_pf *work);
+
+bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
+				     struct kvm_async_pf *work);
+
+void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
+				 struct kvm_async_pf *work);
+bool kvm_arch_can_dequeue_async_page_present(struct kvm_vcpu *vcpu);
+static inline void kvm_arch_async_page_present_queued(struct kvm_vcpu *vcpu) {}
+
 static inline void kvm_arch_hardware_disable(void) {}
 static inline void kvm_arch_hardware_unsetup(void) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 09235bdfd4ac..c14a84041d0e 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -228,6 +228,7 @@ extern long kvm_vm_ioctl_resize_hpt_commit(struct kvm *kvm,
 int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq);
 
 extern int kvm_vm_ioctl_rtas_define_token(struct kvm *kvm, void __user *argp);
+long kvm_vm_ioctl_set_sns(struct kvm *kvm, struct kvm_ppc_sns_reg *sns_reg);
 extern int kvmppc_rtas_hcall(struct kvm_vcpu *vcpu);
 extern void kvmppc_rtas_tokens_free(struct kvm *kvm);
 
diff --git a/arch/powerpc/include/asm/lppaca.h b/arch/powerpc/include/asm/lppaca.h
index 57e432766f3e..17e89c3865e8 100644
--- a/arch/powerpc/include/asm/lppaca.h
+++ b/arch/powerpc/include/asm/lppaca.h
@@ -104,7 +104,17 @@ struct lppaca {
 	volatile __be32 dispersion_count; /* dispatch changed physical cpu */
 	volatile __be64 cmo_faults;	/* CMO page fault count */
 	volatile __be64 cmo_fault_time;	/* CMO page fault time */
-	u8	reserved10[104];
+
+	/*
+	 * TODO: Insert this at correct offset
+	 * 0x17D - Exp flags (1 byte)
+	 * 0x17E - Exp corr number (2 bytes)
+	 *
+	 * Here I am using only exp corr number at an easy to insert
+	 * offset.
+	 */
+	__be16 exp_corr_nr; /* Exproppriation correlation number */
+	u8	reserved10[102];
 
 	/* cacheline 4-5 */
 
diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
index 9f18fa090f1f..d72739126ae5 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -470,6 +470,12 @@ struct kvm_ppc_cpu_char {
 #define KVM_PPC_CPU_BEHAV_BNDS_CHK_SPEC_BAR	(1ULL << 61)
 #define KVM_PPC_CPU_BEHAV_FLUSH_COUNT_CACHE	(1ull << 58)
 
+/* For KVM_PPC_SET_SNS */
+struct kvm_ppc_sns_reg {
+	__u64 addr;
+	__u64 len;
+};
+
 /* Per-vcpu XICS interrupt controller state */
 #define KVM_REG_PPC_ICP_STATE	(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x8c)
 
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index e45644657d49..4f552649a4b2 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -85,6 +85,8 @@ config KVM_BOOK3S_64_HV
 	depends on KVM_BOOK3S_64 && PPC_POWERNV
 	select KVM_BOOK3S_HV_POSSIBLE
 	select MMU_NOTIFIER
+	select KVM_ASYNC_PF
+	select KVM_ASYNC_PF_SYNC
 	select CMA
 	help
 	  Support running unmodified book3s_64 guest kernels in
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 583c14ef596e..603ab382d021 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -6,7 +6,7 @@
 ccflags-y := -Ivirt/kvm -Iarch/powerpc/kvm
 KVM := ../../../virt/kvm
 
-common-objs-y = $(KVM)/kvm_main.o $(KVM)/eventfd.o $(KVM)/binary_stats.o
+common-objs-y = $(KVM)/kvm_main.o $(KVM)/eventfd.o $(KVM)/binary_stats.o $(KVM)/async_pf.o
 common-objs-$(CONFIG_KVM_VFIO) += $(KVM)/vfio.o
 common-objs-$(CONFIG_KVM_MMIO) += $(KVM)/coalesced_mmio.o
 
@@ -70,7 +70,8 @@ kvm-hv-y += \
 	book3s_hv_interrupts.o \
 	book3s_64_mmu_hv.o \
 	book3s_64_mmu_radix.o \
-	book3s_hv_nested.o
+	book3s_hv_nested.o \
+	book3s_hv_esn.o
 
 kvm-hv-$(CONFIG_PPC_UV) += \
 	book3s_hv_uvmem.o
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 618206a504b0..1985f84bfebe 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -837,6 +837,9 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu,
 	} else {
 		unsigned long pfn;
 
+		if (kvm_arch_setup_async_pf(vcpu, gpa, hva))
+			return RESUME_GUEST;
+
 		/* Call KVM generic code to do the slow-path check */
 		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
 					   writing, upgrade_p, NULL);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index d07e9065f7c1..5cc564321521 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -77,6 +77,7 @@
 #include <asm/ultravisor.h>
 #include <asm/dtl.h>
 #include <asm/plpar_wrappers.h>
+#include <asm/kvm_book3s_esn.h>
 
 #include "book3s.h"
 
@@ -4570,6 +4571,11 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
 		return -EINTR;
 	}
 
+	if (kvm_request_pending(vcpu)) {
+		if (!kvmppc_core_check_requests(vcpu))
+			return 0;
+	}
+
 	kvm = vcpu->kvm;
 	atomic_inc(&kvm->arch.vcpus_running);
 	/* Order vcpus_running vs. mmu_ready, see kvmppc_alloc_reset_hpt */
@@ -4591,6 +4597,7 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
 	vcpu->arch.state = KVMPPC_VCPU_BUSY_IN_HOST;
 
 	do {
+		kvm_check_async_pf_completion(vcpu);
 		if (cpu_has_feature(CPU_FTR_ARCH_300))
 			r = kvmhv_run_single_vcpu(vcpu, ~(u64)0,
 						  vcpu->arch.vcore->lpcr);
@@ -5257,6 +5264,8 @@ static void kvmppc_free_vcores(struct kvm *kvm)
 
 static void kvmppc_core_destroy_vm_hv(struct kvm *kvm)
 {
+	struct kvm_ppc_sns_reg sns_reg;
+
 	debugfs_remove_recursive(kvm->arch.debugfs_dir);
 
 	if (!cpu_has_feature(CPU_FTR_ARCH_300))
@@ -5283,6 +5292,11 @@ static void kvmppc_core_destroy_vm_hv(struct kvm *kvm)
 	kvmppc_free_lpid(kvm->arch.lpid);
 
 	kvmppc_free_pimap(kvm);
+
+	/* Needed for de-registering SNS buffer */
+	sns_reg.addr = -1;
+	sns_reg.len = 0;
+	kvm_vm_ioctl_set_sns(kvm, &sns_reg);
 }
 
 /* We don't need to emulate any privileged instructions or dcbz */
@@ -5561,6 +5575,17 @@ static long kvm_arch_vm_ioctl_hv(struct file *filp,
 		break;
 	}
 
+	case KVM_PPC_SET_SNS: {
+		struct kvm_ppc_sns_reg sns_reg;
+
+		r = -EFAULT;
+		if (copy_from_user(&sns_reg, argp, sizeof(sns_reg)))
+			break;
+
+		r = kvm_vm_ioctl_set_sns(kvm, &sns_reg);
+		break;
+	}
+
 	default:
 		r = -ENOTTY;
 	}
diff --git a/arch/powerpc/kvm/book3s_hv_esn.c b/arch/powerpc/kvm/book3s_hv_esn.c
new file mode 100644
index 000000000000..b322a14c1f83
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_hv_esn.c
@@ -0,0 +1,189 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Async page fault support via PAPR Expropriation/Subvention Notification
+ * option(ESN)
+ *
+ * Copyright 2020 Bharata B Rao, IBM Corp. <bharata@linux.ibm.com>
+ */
+
+#include <linux/kvm_host.h>
+#include <asm/kvm_ppc.h>
+#include <asm/kvm_book3s_esn.h>
+
+static DEFINE_SPINLOCK(async_exp_lock); /* for updating exp_corr_nr */
+static DEFINE_SPINLOCK(async_sns_lock); /* SNS buffer updated under this lock */
+
+int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu,
+			       unsigned long gpa, unsigned long hva)
+{
+	struct kvm_arch_async_pf arch;
+	struct lppaca *vpa = vcpu->arch.vpa.pinned_addr;
+	u64 msr = kvmppc_get_msr(vcpu);
+	struct kvmppc_sns *sns = &vcpu->kvm->arch.sns;
+
+	/*
+	 * If VPA hasn't been registered yet, can't support
+	 * async pf.
+	 */
+	if (!vpa)
+		return 0;
+
+	/*
+	 * If SNS memory area hasn't been registered yet,
+	 * can't support async pf.
+	 */
+	if (!vcpu->kvm->arch.sns.eq)
+		return 0;
+
+	/*
+	 * If guest hasn't enabled expropriation interrupt,
+	 * don't try async pf.
+	 */
+	if (!(vpa->byte_b9 & LPPACA_EXP_INT_ENABLED))
+		return 0;
+
+	/*
+	 * If the fault is in the guest kernel, don,t
+	 * try async pf.
+	 */
+	if (!(msr & MSR_PR) && !(msr & MSR_HV))
+		return 0;
+
+	spin_lock(&async_sns_lock);
+	/*
+	 * Check if subvention event queue can
+	 * overflow, if so, don't try async pf.
+	 */
+	if (*(sns->eq + sns->next_eq_entry)) {
+		pr_err("%s: SNS buffer overflow\n", __func__);
+		spin_unlock(&async_sns_lock);
+		return 0;
+	}
+	spin_unlock(&async_sns_lock);
+
+	/*
+	 * TODO:
+	 *
+	 * 1. Update exp flags bit 7 to 1
+	 * ("The Subvened page data will be restored")
+	 *
+	 * 2. Check if request to this page has been
+	 * notified to guest earlier, if so send back
+	 * the same exp corr number.
+	 *
+	 * 3. exp_corr_nr could be a random but non-zero
+	 * number. Not taking care of wrapping here. Fix
+	 * it.
+	 */
+	spin_lock(&async_exp_lock);
+	vpa->exp_corr_nr = cpu_to_be16(vcpu->kvm->arch.sns.exp_corr_nr);
+	arch.exp_token = vcpu->kvm->arch.sns.exp_corr_nr++;
+	spin_unlock(&async_exp_lock);
+
+	return kvm_setup_async_pf(vcpu, gpa, hva, &arch);
+}
+
+bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
+				     struct kvm_async_pf *work)
+{
+	/* Inject DSI to guest with srr1 bit 46 set */
+	kvmppc_core_queue_data_storage(vcpu, kvmppc_get_dar(vcpu), DSISR_NOHPTE, SRR1_PROGTRAP);
+	return true;
+}
+
+void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
+				 struct kvm_async_pf *work)
+{
+	struct kvmppc_sns *sns = &vcpu->kvm->arch.sns;
+
+	spin_lock(&async_sns_lock);
+	if (*sns->eq_cntrl != SNS_EQ_CNTRL_TRIGGER) {
+		pr_err("%s: SNS Notification Trigger not set by guest\n", __func__);
+		spin_unlock(&async_sns_lock);
+		/* TODO: Terminate the guest? */
+		return;
+	}
+
+	if (arch_cmpxchg(sns->eq + sns->next_eq_entry, 0,
+	    work->arch.exp_token)) {
+		*sns->eq_state |= SNS_EQ_STATE_OVERFLOW;
+		pr_err("%s: SNS buffer overflow\n", __func__);
+		spin_unlock(&async_sns_lock);
+		/* TODO: Terminate the guest? */
+		return;
+	}
+
+	sns->next_eq_entry = (sns->next_eq_entry + 1) % sns->nr_eq_entries;
+	spin_unlock(&async_sns_lock);
+
+	/*
+	 * Request a guest exit so that ESN virtual interrupt can
+	 * be injected by QEMU.
+	 */
+	kvm_make_request(KVM_REQ_ESN_EXIT, vcpu);
+}
+
+void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
+{
+	/* We will inject the page directly */
+}
+
+bool kvm_arch_can_dequeue_async_page_present(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * PowerPC will always inject the page directly,
+	 * but we still want check_async_completion to cleanup
+	 */
+	return true;
+}
+
+long kvm_vm_ioctl_set_sns(struct kvm *kvm, struct kvm_ppc_sns_reg *sns_reg)
+{
+	unsigned long nb;
+
+	/* Deregister */
+	if (sns_reg->addr == -1) {
+		if (!kvm->arch.sns.hva)
+			return 0;
+
+		pr_info("%s: Deregistering SNS buffer for LPID %d\n",
+			__func__, kvm->arch.lpid);
+		kvmppc_unpin_guest_page(kvm, kvm->arch.sns.hva, kvm->arch.sns.gpa, false);
+		kvm->arch.sns.gpa = -1;
+		kvm->arch.sns.hva = 0;
+		return 0;
+	}
+
+	/*
+	 * Already registered with the same address?
+	 */
+	if (sns_reg->addr == kvm->arch.sns.gpa)
+		return 0;
+
+	/* If previous registration exists, free it */
+	if (kvm->arch.sns.hva) {
+		pr_info("%s: Deregistering Previous SNS buffer for LPID %d\n",
+			__func__, kvm->arch.lpid);
+		kvmppc_unpin_guest_page(kvm, kvm->arch.sns.hva, kvm->arch.sns.gpa, false);
+		kvm->arch.sns.gpa = -1;
+		kvm->arch.sns.hva = 0;
+	}
+
+	kvm->arch.sns.gpa = sns_reg->addr;
+	kvm->arch.sns.hva = kvmppc_pin_guest_page(kvm, kvm->arch.sns.gpa, &nb);
+	kvm->arch.sns.len = sns_reg->len;
+	kvm->arch.sns.nr_eq_entries = (kvm->arch.sns.len - 2) / sizeof(uint16_t);
+	kvm->arch.sns.next_eq_entry = 0;
+	kvm->arch.sns.eq = kvm->arch.sns.hva + 2;
+	kvm->arch.sns.eq_cntrl = kvm->arch.sns.hva;
+	kvm->arch.sns.eq_state = kvm->arch.sns.hva + 1;
+	kvm->arch.sns.exp_corr_nr = 1; /* Should be non-zero */
+
+	*(kvm->arch.sns.eq_state) = SNS_EQ_STATE_OPERATIONAL;
+
+	pr_info("%s: Registering SNS buffer for LPID %d sns_addr %llx eq %lx\n",
+		__func__, kvm->arch.lpid, sns_reg->addr,
+		(unsigned long)kvm->arch.sns.eq);
+
+	return 0;
+}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 47be532ed14b..dbe65e8d68d8 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1459,6 +1459,7 @@ struct kvm_s390_ucas_mapping {
 #define KVM_SET_PMU_EVENT_FILTER  _IOW(KVMIO,  0xb2, struct kvm_pmu_event_filter)
 #define KVM_PPC_SVM_OFF		  _IO(KVMIO,  0xb3)
 #define KVM_ARM_MTE_COPY_TAGS	  _IOR(KVMIO,  0xb4, struct kvm_arm_copy_mte_tags)
+#define KVM_PPC_SET_SNS		  _IOR(KVMIO, 0xb5, struct kvm_ppc_sns_reg)
 
 /* ioctl for vm fd */
 #define KVM_CREATE_DEVICE	  _IOWR(KVMIO,  0xe0, struct kvm_create_device)
diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h
index d9e4aabcb31a..e9dea164498f 100644
--- a/tools/include/uapi/linux/kvm.h
+++ b/tools/include/uapi/linux/kvm.h
@@ -1458,6 +1458,7 @@ struct kvm_s390_ucas_mapping {
 #define KVM_SET_PMU_EVENT_FILTER  _IOW(KVMIO,  0xb2, struct kvm_pmu_event_filter)
 #define KVM_PPC_SVM_OFF		  _IO(KVMIO,  0xb3)
 #define KVM_ARM_MTE_COPY_TAGS	  _IOR(KVMIO,  0xb4, struct kvm_arm_copy_mte_tags)
+#define KVM_PPC_SET_SNS		  _IOR(KVMIO, 0xb5, struct kvm_ppc_sns_reg)
 
 /* ioctl for vm fd */
 #define KVM_CREATE_DEVICE	  _IOWR(KVMIO,  0xe0, struct kvm_create_device)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH v0 5/5] pseries: Asynchronous page fault support
  2021-08-05  7:24 ` Bharata B Rao
  (?)
@ 2021-08-05  7:24   ` Bharata B Rao
  -1 siblings, 0 replies; 29+ messages in thread
From: Bharata B Rao @ 2021-08-05  7:24 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev; +Cc: kvm, aneesh.kumar, bharata.rao, Bharata B Rao

Add asynchronous page fault support for pseries guests.

1. Setup the guest to handle async-pf
   - Issue H_REG_SNS hcall to register the SNS region.
   - Setup the subvention interrupt irq.
   - Enable async-pf by updating the byte_b9 of VPA for each
     CPU.
2. Check if the page fault is an expropriation notification
   (SRR1_PROGTRAP set in SRR1) and if so put the task on
   wait queue based on the expropriation correlation number
   read from the VPA.
3. Handle subvention interrupt to wake any waiting tasks.
   The wait and wakeup mechanism from x86 async-pf implementation
   is being reused here.

TODO:
- Check how to keep this feature together with other CMO features.
- The async-pf check in the page fault handler path is limited to
  guest with an #ifdef. This isn't sufficient and hence needs to
  be replaced by an appropriate check.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
---
 arch/powerpc/include/asm/async-pf.h       |  12 ++
 arch/powerpc/mm/fault.c                   |   7 +-
 arch/powerpc/platforms/pseries/Makefile   |   2 +-
 arch/powerpc/platforms/pseries/async-pf.c | 219 ++++++++++++++++++++++
 4 files changed, 238 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/include/asm/async-pf.h
 create mode 100644 arch/powerpc/platforms/pseries/async-pf.c

diff --git a/arch/powerpc/include/asm/async-pf.h b/arch/powerpc/include/asm/async-pf.h
new file mode 100644
index 000000000000..95d6c3da9f50
--- /dev/null
+++ b/arch/powerpc/include/asm/async-pf.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Async page fault support via PAPR Expropriation/Subvention Notification
+ * option(ESN)
+ *
+ * Copyright 2020 Bharata B Rao, IBM Corp. <bharata@linux.ibm.com>
+ */
+
+#ifndef _ASM_POWERPC_ASYNC_PF_H
+int handle_async_page_fault(struct pt_regs *regs, unsigned long addr);
+#define _ASM_POWERPC_ASYNC_PF_H
+#endif
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index a8d0ce85d39a..bbdc61605885 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -44,7 +44,7 @@
 #include <asm/debug.h>
 #include <asm/kup.h>
 #include <asm/inst.h>
-
+#include <asm/async-pf.h>
 
 /*
  * do_page_fault error handling helpers
@@ -395,6 +395,11 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address,
 	vm_fault_t fault, major = 0;
 	bool kprobe_fault = kprobe_page_fault(regs, 11);
 
+#ifdef CONFIG_PPC_PSERIES
+	if (handle_async_page_fault(regs, address))
+		return 0;
+#endif
+
 	if (unlikely(debugger_fault_handler(regs) || kprobe_fault))
 		return 0;
 
diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile
index 4cda0ef87be0..e0ada605ef20 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -6,7 +6,7 @@ obj-y			:= lpar.o hvCall.o nvram.o reconfig.o \
 			   of_helpers.o \
 			   setup.o iommu.o event_sources.o ras.o \
 			   firmware.o power.o dlpar.o mobility.o rng.o \
-			   pci.o pci_dlpar.o eeh_pseries.o msi.o
+			   pci.o pci_dlpar.o eeh_pseries.o msi.o async-pf.o
 obj-$(CONFIG_SMP)	+= smp.o
 obj-$(CONFIG_SCANLOG)	+= scanlog.o
 obj-$(CONFIG_KEXEC_CORE)	+= kexec.o
diff --git a/arch/powerpc/platforms/pseries/async-pf.c b/arch/powerpc/platforms/pseries/async-pf.c
new file mode 100644
index 000000000000..c2f3bbc0d674
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/async-pf.c
@@ -0,0 +1,219 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Async page fault support via PAPR Expropriation/Subvention Notification
+ * option(ESN)
+ *
+ * Copyright 2020 Bharata B Rao, IBM Corp. <bharata@linux.ibm.com>
+ */
+
+#include <linux/interrupt.h>
+#include <linux/swait.h>
+#include <linux/irqdomain.h>
+#include <asm/machdep.h>
+#include <asm/hvcall.h>
+#include <asm/paca.h>
+
+static char sns_buffer[PAGE_SIZE] __aligned(4096);
+static uint16_t *esn_q = (uint16_t *)sns_buffer + 1;
+static unsigned long next_eq_entry, nr_eq_entries;
+
+#define ASYNC_PF_SLEEP_HASHBITS 8
+#define ASYNC_PF_SLEEP_HASHSIZE (1<<ASYNC_PF_SLEEP_HASHBITS)
+
+/* Controls access to SNS buffer */
+static DEFINE_RAW_SPINLOCK(async_sns_guest_lock);
+
+/* Wait queue handling is from x86 asyn-pf implementation */
+struct async_pf_sleep_node {
+	struct hlist_node link;
+	struct swait_queue_head wq;
+	u64 token;
+	int cpu;
+};
+
+static struct async_pf_sleep_head {
+	raw_spinlock_t lock;
+	struct hlist_head list;
+} async_pf_sleepers[ASYNC_PF_SLEEP_HASHSIZE];
+
+static struct async_pf_sleep_node *_find_apf_task(struct async_pf_sleep_head *b,
+						  u64 token)
+{
+	struct hlist_node *p;
+
+	hlist_for_each(p, &b->list) {
+		struct async_pf_sleep_node *n =
+			hlist_entry(p, typeof(*n), link);
+		if (n->token == token)
+			return n;
+	}
+
+	return NULL;
+}
+static int async_pf_queue_task(u64 token, struct async_pf_sleep_node *n)
+{
+	u64 key = hash_64(token, ASYNC_PF_SLEEP_HASHBITS);
+	struct async_pf_sleep_head *b = &async_pf_sleepers[key];
+	struct async_pf_sleep_node *e;
+
+	raw_spin_lock(&b->lock);
+	e = _find_apf_task(b, token);
+	if (e) {
+		/* dummy entry exist -> wake up was delivered ahead of PF */
+		hlist_del(&e->link);
+		raw_spin_unlock(&b->lock);
+		kfree(e);
+		return false;
+	}
+
+	n->token = token;
+	n->cpu = smp_processor_id();
+	init_swait_queue_head(&n->wq);
+	hlist_add_head(&n->link, &b->list);
+	raw_spin_unlock(&b->lock);
+	return true;
+}
+
+/*
+ * Handle Expropriation notification.
+ */
+int handle_async_page_fault(struct pt_regs *regs, unsigned long addr)
+{
+	struct async_pf_sleep_node n;
+	DECLARE_SWAITQUEUE(wait);
+	unsigned long exp_corr_nr;
+
+	/* Is this Expropriation notification? */
+	if (!(mfspr(SPRN_SRR1) & SRR1_PROGTRAP))
+		return 0;
+
+	if (unlikely(!user_mode(regs)))
+		panic("Host injected async PF in kernel mode\n");
+
+	exp_corr_nr = be16_to_cpu(get_lppaca()->exp_corr_nr);
+	if (!async_pf_queue_task(exp_corr_nr, &n))
+		return 0;
+
+	for (;;) {
+		prepare_to_swait_exclusive(&n.wq, &wait, TASK_UNINTERRUPTIBLE);
+		if (hlist_unhashed(&n.link))
+			break;
+
+		local_irq_enable();
+		schedule();
+		local_irq_disable();
+	}
+
+	finish_swait(&n.wq, &wait);
+	return 1;
+}
+
+static void apf_task_wake_one(struct async_pf_sleep_node *n)
+{
+	hlist_del_init(&n->link);
+	if (swq_has_sleeper(&n->wq))
+		swake_up_one(&n->wq);
+}
+
+static void async_pf_wake_task(u64 token)
+{
+	u64 key = hash_64(token, ASYNC_PF_SLEEP_HASHBITS);
+	struct async_pf_sleep_head *b = &async_pf_sleepers[key];
+	struct async_pf_sleep_node *n;
+
+again:
+	raw_spin_lock(&b->lock);
+	n = _find_apf_task(b, token);
+	if (!n) {
+		/*
+		 * async PF was not yet handled.
+		 * Add dummy entry for the token.
+		 */
+		n = kzalloc(sizeof(*n), GFP_ATOMIC);
+		if (!n) {
+			/*
+			 * Allocation failed! Busy wait while other cpu
+			 * handles async PF.
+			 */
+			raw_spin_unlock(&b->lock);
+			cpu_relax();
+			goto again;
+		}
+		n->token = token;
+		n->cpu = smp_processor_id();
+		init_swait_queue_head(&n->wq);
+		hlist_add_head(&n->link, &b->list);
+	} else {
+		apf_task_wake_one(n);
+	}
+	raw_spin_unlock(&b->lock);
+}
+
+/*
+ * Handle Subvention notification.
+ */
+static irqreturn_t async_pf_handler(int irq, void *dev_id)
+{
+	uint16_t exp_token, old;
+
+	raw_spin_lock(&async_sns_guest_lock);
+	do {
+		exp_token = *(esn_q + next_eq_entry);
+		if (!exp_token)
+			break;
+
+		old = arch_cmpxchg(esn_q + next_eq_entry, exp_token, 0);
+		BUG_ON(old != exp_token);
+
+		async_pf_wake_task(exp_token);
+		next_eq_entry = (next_eq_entry + 1) % nr_eq_entries;
+	} while (1);
+	raw_spin_unlock(&async_sns_guest_lock);
+	return IRQ_HANDLED;
+}
+
+static int __init pseries_async_pf_init(void)
+{
+	long rc;
+	unsigned long ret[PLPAR_HCALL_BUFSIZE];
+	unsigned int irq, cpu;
+	int i;
+
+	/* Register buffer via H_REG_SNS */
+	rc = plpar_hcall(H_REG_SNS, ret, __pa(sns_buffer), PAGE_SIZE);
+	if (rc != H_SUCCESS)
+		return -1;
+
+	nr_eq_entries = (PAGE_SIZE - 2) / sizeof(uint16_t);
+
+	/* Register irq handler */
+	irq = irq_create_mapping(NULL, ret[1]);
+	if (!irq) {
+		plpar_hcall(H_REG_SNS, ret, -1, PAGE_SIZE);
+		return -1;
+	}
+
+	rc = request_irq(irq, async_pf_handler, 0, "sns-interrupt", NULL);
+	if (rc < 0) {
+		plpar_hcall(H_REG_SNS, ret, -1, PAGE_SIZE);
+		return -1;
+	}
+
+	for (i = 0; i < ASYNC_PF_SLEEP_HASHSIZE; i++)
+		raw_spin_lock_init(&async_pf_sleepers[i].lock);
+
+	/*
+	 * Enable subvention notifications from the hypervisor
+	 * by setting bit 0, byte 0 of SNS buffer
+	 */
+	*sns_buffer |= 0x1;
+
+	/* Enable LPPACA_EXP_INT_ENABLED in VPA */
+	for_each_possible_cpu(cpu)
+		lppaca_of(cpu).byte_b9 |= LPPACA_EXP_INT_ENABLED;
+
+	pr_err("%s: Enabled Async PF\n", __func__);
+	return 0;
+}
+
+machine_arch_initcall(pseries, pseries_async_pf_init);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH v0 5/5] pseries: Asynchronous page fault support
@ 2021-08-05  7:24   ` Bharata B Rao
  0 siblings, 0 replies; 29+ messages in thread
From: Bharata B Rao @ 2021-08-05  7:24 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev; +Cc: aneesh.kumar, Bharata B Rao, kvm, bharata.rao

Add asynchronous page fault support for pseries guests.

1. Setup the guest to handle async-pf
   - Issue H_REG_SNS hcall to register the SNS region.
   - Setup the subvention interrupt irq.
   - Enable async-pf by updating the byte_b9 of VPA for each
     CPU.
2. Check if the page fault is an expropriation notification
   (SRR1_PROGTRAP set in SRR1) and if so put the task on
   wait queue based on the expropriation correlation number
   read from the VPA.
3. Handle subvention interrupt to wake any waiting tasks.
   The wait and wakeup mechanism from x86 async-pf implementation
   is being reused here.

TODO:
- Check how to keep this feature together with other CMO features.
- The async-pf check in the page fault handler path is limited to
  guest with an #ifdef. This isn't sufficient and hence needs to
  be replaced by an appropriate check.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
---
 arch/powerpc/include/asm/async-pf.h       |  12 ++
 arch/powerpc/mm/fault.c                   |   7 +-
 arch/powerpc/platforms/pseries/Makefile   |   2 +-
 arch/powerpc/platforms/pseries/async-pf.c | 219 ++++++++++++++++++++++
 4 files changed, 238 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/include/asm/async-pf.h
 create mode 100644 arch/powerpc/platforms/pseries/async-pf.c

diff --git a/arch/powerpc/include/asm/async-pf.h b/arch/powerpc/include/asm/async-pf.h
new file mode 100644
index 000000000000..95d6c3da9f50
--- /dev/null
+++ b/arch/powerpc/include/asm/async-pf.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Async page fault support via PAPR Expropriation/Subvention Notification
+ * option(ESN)
+ *
+ * Copyright 2020 Bharata B Rao, IBM Corp. <bharata@linux.ibm.com>
+ */
+
+#ifndef _ASM_POWERPC_ASYNC_PF_H
+int handle_async_page_fault(struct pt_regs *regs, unsigned long addr);
+#define _ASM_POWERPC_ASYNC_PF_H
+#endif
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index a8d0ce85d39a..bbdc61605885 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -44,7 +44,7 @@
 #include <asm/debug.h>
 #include <asm/kup.h>
 #include <asm/inst.h>
-
+#include <asm/async-pf.h>
 
 /*
  * do_page_fault error handling helpers
@@ -395,6 +395,11 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address,
 	vm_fault_t fault, major = 0;
 	bool kprobe_fault = kprobe_page_fault(regs, 11);
 
+#ifdef CONFIG_PPC_PSERIES
+	if (handle_async_page_fault(regs, address))
+		return 0;
+#endif
+
 	if (unlikely(debugger_fault_handler(regs) || kprobe_fault))
 		return 0;
 
diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile
index 4cda0ef87be0..e0ada605ef20 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -6,7 +6,7 @@ obj-y			:= lpar.o hvCall.o nvram.o reconfig.o \
 			   of_helpers.o \
 			   setup.o iommu.o event_sources.o ras.o \
 			   firmware.o power.o dlpar.o mobility.o rng.o \
-			   pci.o pci_dlpar.o eeh_pseries.o msi.o
+			   pci.o pci_dlpar.o eeh_pseries.o msi.o async-pf.o
 obj-$(CONFIG_SMP)	+= smp.o
 obj-$(CONFIG_SCANLOG)	+= scanlog.o
 obj-$(CONFIG_KEXEC_CORE)	+= kexec.o
diff --git a/arch/powerpc/platforms/pseries/async-pf.c b/arch/powerpc/platforms/pseries/async-pf.c
new file mode 100644
index 000000000000..c2f3bbc0d674
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/async-pf.c
@@ -0,0 +1,219 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Async page fault support via PAPR Expropriation/Subvention Notification
+ * option(ESN)
+ *
+ * Copyright 2020 Bharata B Rao, IBM Corp. <bharata@linux.ibm.com>
+ */
+
+#include <linux/interrupt.h>
+#include <linux/swait.h>
+#include <linux/irqdomain.h>
+#include <asm/machdep.h>
+#include <asm/hvcall.h>
+#include <asm/paca.h>
+
+static char sns_buffer[PAGE_SIZE] __aligned(4096);
+static uint16_t *esn_q = (uint16_t *)sns_buffer + 1;
+static unsigned long next_eq_entry, nr_eq_entries;
+
+#define ASYNC_PF_SLEEP_HASHBITS 8
+#define ASYNC_PF_SLEEP_HASHSIZE (1<<ASYNC_PF_SLEEP_HASHBITS)
+
+/* Controls access to SNS buffer */
+static DEFINE_RAW_SPINLOCK(async_sns_guest_lock);
+
+/* Wait queue handling is from x86 asyn-pf implementation */
+struct async_pf_sleep_node {
+	struct hlist_node link;
+	struct swait_queue_head wq;
+	u64 token;
+	int cpu;
+};
+
+static struct async_pf_sleep_head {
+	raw_spinlock_t lock;
+	struct hlist_head list;
+} async_pf_sleepers[ASYNC_PF_SLEEP_HASHSIZE];
+
+static struct async_pf_sleep_node *_find_apf_task(struct async_pf_sleep_head *b,
+						  u64 token)
+{
+	struct hlist_node *p;
+
+	hlist_for_each(p, &b->list) {
+		struct async_pf_sleep_node *n =
+			hlist_entry(p, typeof(*n), link);
+		if (n->token == token)
+			return n;
+	}
+
+	return NULL;
+}
+static int async_pf_queue_task(u64 token, struct async_pf_sleep_node *n)
+{
+	u64 key = hash_64(token, ASYNC_PF_SLEEP_HASHBITS);
+	struct async_pf_sleep_head *b = &async_pf_sleepers[key];
+	struct async_pf_sleep_node *e;
+
+	raw_spin_lock(&b->lock);
+	e = _find_apf_task(b, token);
+	if (e) {
+		/* dummy entry exist -> wake up was delivered ahead of PF */
+		hlist_del(&e->link);
+		raw_spin_unlock(&b->lock);
+		kfree(e);
+		return false;
+	}
+
+	n->token = token;
+	n->cpu = smp_processor_id();
+	init_swait_queue_head(&n->wq);
+	hlist_add_head(&n->link, &b->list);
+	raw_spin_unlock(&b->lock);
+	return true;
+}
+
+/*
+ * Handle Expropriation notification.
+ */
+int handle_async_page_fault(struct pt_regs *regs, unsigned long addr)
+{
+	struct async_pf_sleep_node n;
+	DECLARE_SWAITQUEUE(wait);
+	unsigned long exp_corr_nr;
+
+	/* Is this Expropriation notification? */
+	if (!(mfspr(SPRN_SRR1) & SRR1_PROGTRAP))
+		return 0;
+
+	if (unlikely(!user_mode(regs)))
+		panic("Host injected async PF in kernel mode\n");
+
+	exp_corr_nr = be16_to_cpu(get_lppaca()->exp_corr_nr);
+	if (!async_pf_queue_task(exp_corr_nr, &n))
+		return 0;
+
+	for (;;) {
+		prepare_to_swait_exclusive(&n.wq, &wait, TASK_UNINTERRUPTIBLE);
+		if (hlist_unhashed(&n.link))
+			break;
+
+		local_irq_enable();
+		schedule();
+		local_irq_disable();
+	}
+
+	finish_swait(&n.wq, &wait);
+	return 1;
+}
+
+static void apf_task_wake_one(struct async_pf_sleep_node *n)
+{
+	hlist_del_init(&n->link);
+	if (swq_has_sleeper(&n->wq))
+		swake_up_one(&n->wq);
+}
+
+static void async_pf_wake_task(u64 token)
+{
+	u64 key = hash_64(token, ASYNC_PF_SLEEP_HASHBITS);
+	struct async_pf_sleep_head *b = &async_pf_sleepers[key];
+	struct async_pf_sleep_node *n;
+
+again:
+	raw_spin_lock(&b->lock);
+	n = _find_apf_task(b, token);
+	if (!n) {
+		/*
+		 * async PF was not yet handled.
+		 * Add dummy entry for the token.
+		 */
+		n = kzalloc(sizeof(*n), GFP_ATOMIC);
+		if (!n) {
+			/*
+			 * Allocation failed! Busy wait while other cpu
+			 * handles async PF.
+			 */
+			raw_spin_unlock(&b->lock);
+			cpu_relax();
+			goto again;
+		}
+		n->token = token;
+		n->cpu = smp_processor_id();
+		init_swait_queue_head(&n->wq);
+		hlist_add_head(&n->link, &b->list);
+	} else {
+		apf_task_wake_one(n);
+	}
+	raw_spin_unlock(&b->lock);
+}
+
+/*
+ * Handle Subvention notification.
+ */
+static irqreturn_t async_pf_handler(int irq, void *dev_id)
+{
+	uint16_t exp_token, old;
+
+	raw_spin_lock(&async_sns_guest_lock);
+	do {
+		exp_token = *(esn_q + next_eq_entry);
+		if (!exp_token)
+			break;
+
+		old = arch_cmpxchg(esn_q + next_eq_entry, exp_token, 0);
+		BUG_ON(old != exp_token);
+
+		async_pf_wake_task(exp_token);
+		next_eq_entry = (next_eq_entry + 1) % nr_eq_entries;
+	} while (1);
+	raw_spin_unlock(&async_sns_guest_lock);
+	return IRQ_HANDLED;
+}
+
+static int __init pseries_async_pf_init(void)
+{
+	long rc;
+	unsigned long ret[PLPAR_HCALL_BUFSIZE];
+	unsigned int irq, cpu;
+	int i;
+
+	/* Register buffer via H_REG_SNS */
+	rc = plpar_hcall(H_REG_SNS, ret, __pa(sns_buffer), PAGE_SIZE);
+	if (rc != H_SUCCESS)
+		return -1;
+
+	nr_eq_entries = (PAGE_SIZE - 2) / sizeof(uint16_t);
+
+	/* Register irq handler */
+	irq = irq_create_mapping(NULL, ret[1]);
+	if (!irq) {
+		plpar_hcall(H_REG_SNS, ret, -1, PAGE_SIZE);
+		return -1;
+	}
+
+	rc = request_irq(irq, async_pf_handler, 0, "sns-interrupt", NULL);
+	if (rc < 0) {
+		plpar_hcall(H_REG_SNS, ret, -1, PAGE_SIZE);
+		return -1;
+	}
+
+	for (i = 0; i < ASYNC_PF_SLEEP_HASHSIZE; i++)
+		raw_spin_lock_init(&async_pf_sleepers[i].lock);
+
+	/*
+	 * Enable subvention notifications from the hypervisor
+	 * by setting bit 0, byte 0 of SNS buffer
+	 */
+	*sns_buffer |= 0x1;
+
+	/* Enable LPPACA_EXP_INT_ENABLED in VPA */
+	for_each_possible_cpu(cpu)
+		lppaca_of(cpu).byte_b9 |= LPPACA_EXP_INT_ENABLED;
+
+	pr_err("%s: Enabled Async PF\n", __func__);
+	return 0;
+}
+
+machine_arch_initcall(pseries, pseries_async_pf_init);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v0 0/5] PPC: KVM: pseries: Asynchronous page fault
  2021-08-05  7:24 ` Bharata B Rao
  (?)
@ 2021-08-05  7:35   ` Bharata B Rao
  -1 siblings, 0 replies; 29+ messages in thread
From: Bharata B Rao @ 2021-08-05  7:35 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev; +Cc: kvm, aneesh.kumar, bharata.rao

On Thu, Aug 05, 2021 at 12:54:34PM +0530, Bharata B Rao wrote:
> Hi,
> 
> This series adds asynchronous page fault support for pseries guests
> and enables the support for the same in powerpc KVM. This is an
> early RFC with details and multiple TODOs listed in patch descriptions.
> 
> This patch needs supporting enablement in QEMU too which will be
> posted separately.

QEMU part is posted here:
https://lore.kernel.org/qemu-devel/20210805073228.502292-2-bharata@linux.ibm.com/T/#u

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v0 0/5] PPC: KVM: pseries: Asynchronous page fault
@ 2021-08-05  7:35   ` Bharata B Rao
  0 siblings, 0 replies; 29+ messages in thread
From: Bharata B Rao @ 2021-08-05  7:35 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev; +Cc: aneesh.kumar, kvm, bharata.rao

On Thu, Aug 05, 2021 at 12:54:34PM +0530, Bharata B Rao wrote:
> Hi,
> 
> This series adds asynchronous page fault support for pseries guests
> and enables the support for the same in powerpc KVM. This is an
> early RFC with details and multiple TODOs listed in patch descriptions.
> 
> This patch needs supporting enablement in QEMU too which will be
> posted separately.

QEMU part is posted here:
https://lore.kernel.org/qemu-devel/20210805073228.502292-2-bharata@linux.ibm.com/T/#u

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC PATCH v0 0/5] PPC: KVM: pseries: Asynchronous page fault
@ 2021-08-05  7:24 ` Bharata B Rao
  0 siblings, 0 replies; 29+ messages in thread
From: Bharata B Rao @ 2021-08-05  7:36 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev; +Cc: kvm, aneesh.kumar, bharata.rao, Bharata B Rao

Hi,

This series adds asynchronous page fault support for pseries guests
and enables the support for the same in powerpc KVM. This is an
early RFC with details and multiple TODOs listed in patch descriptions.

This patch needs supporting enablement in QEMU too which will be
posted separately.

Bharata B Rao (5):
  powerpc: Define Expropriation interrupt bit to VPA byte offset 0xB9
  KVM: PPC: Add support for KVM_REQ_ESN_EXIT
  KVM: PPC: Book3S: Enable setting SRR1 flags for DSI
  KVM: PPC: BOOK3S HV: Async PF support
  pseries: Asynchronous page fault support

 Documentation/virt/kvm/api.rst            |  15 ++
 arch/powerpc/include/asm/async-pf.h       |  12 ++
 arch/powerpc/include/asm/hvcall.h         |   1 +
 arch/powerpc/include/asm/kvm_book3s_esn.h |  24 +++
 arch/powerpc/include/asm/kvm_host.h       |  22 +++
 arch/powerpc/include/asm/kvm_ppc.h        |   4 +-
 arch/powerpc/include/asm/lppaca.h         |  20 +-
 arch/powerpc/include/uapi/asm/kvm.h       |   6 +
 arch/powerpc/kvm/Kconfig                  |   2 +
 arch/powerpc/kvm/Makefile                 |   5 +-
 arch/powerpc/kvm/book3s.c                 |   6 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c    |   9 +-
 arch/powerpc/kvm/book3s_hv.c              |  37 +++-
 arch/powerpc/kvm/book3s_hv_esn.c          | 189 +++++++++++++++++++
 arch/powerpc/kvm/book3s_hv_nested.c       |   4 +-
 arch/powerpc/kvm/book3s_pr.c              |   4 +-
 arch/powerpc/mm/fault.c                   |   7 +-
 arch/powerpc/platforms/pseries/Makefile   |   2 +-
 arch/powerpc/platforms/pseries/async-pf.c | 219 ++++++++++++++++++++++
 drivers/cpuidle/cpuidle-pseries.c         |   4 +-
 include/uapi/linux/kvm.h                  |   2 +
 tools/include/uapi/linux/kvm.h            |   1 +
 22 files changed, 574 insertions(+), 21 deletions(-)
 create mode 100644 arch/powerpc/include/asm/async-pf.h
 create mode 100644 arch/powerpc/include/asm/kvm_book3s_esn.h
 create mode 100644 arch/powerpc/kvm/book3s_hv_esn.c
 create mode 100644 arch/powerpc/platforms/pseries/async-pf.c

-- 
2.31.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC PATCH v0 1/5] powerpc: Define Expropriation interrupt bit to VPA byte offset 0xB9
@ 2021-08-05  7:24   ` Bharata B Rao
  0 siblings, 0 replies; 29+ messages in thread
From: Bharata B Rao @ 2021-08-05  7:36 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev; +Cc: kvm, aneesh.kumar, bharata.rao, Bharata B Rao

VPA byte offset 0xB9 was named as donate_dedicated_cpu as that
was the only used bit. The Expropriation/Subvention support defines
a bit in byte offset 0xB9. Define this bit and rename the field
in VPA to a generic name.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
---
 arch/powerpc/include/asm/lppaca.h | 8 +++++++-
 drivers/cpuidle/cpuidle-pseries.c | 4 ++--
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/lppaca.h b/arch/powerpc/include/asm/lppaca.h
index c390ec377bae..57e432766f3e 100644
--- a/arch/powerpc/include/asm/lppaca.h
+++ b/arch/powerpc/include/asm/lppaca.h
@@ -80,7 +80,7 @@ struct lppaca {
 	u8	ebb_regs_in_use;
 	u8	reserved7[6];
 	u8	dtl_enable_mask;	/* Dispatch Trace Log mask */
-	u8	donate_dedicated_cpu;	/* Donate dedicated CPU cycles */
+	u8	byte_b9; /* Donate dedicated CPU cycles & Expropriation int */
 	u8	fpregs_in_use;
 	u8	pmcregs_in_use;
 	u8	reserved8[28];
@@ -116,6 +116,12 @@ struct lppaca {
 
 #define lppaca_of(cpu)	(*paca_ptrs[cpu]->lppaca_ptr)
 
+/*
+ * Flags for Byte offset 0xB9
+ */
+#define LPPACA_DONATE_DED_CPU_CYCLES   0x1
+#define LPPACA_EXP_INT_ENABLED         0x2
+
 /*
  * We are using a non architected field to determine if a partition is
  * shared or dedicated. This currently works on both KVM and PHYP, but
diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c
index a2b5c6f60cf0..b9d0f41c3f19 100644
--- a/drivers/cpuidle/cpuidle-pseries.c
+++ b/drivers/cpuidle/cpuidle-pseries.c
@@ -221,7 +221,7 @@ static int dedicated_cede_loop(struct cpuidle_device *dev,
 	u8 old_latency_hint;
 
 	pseries_idle_prolog();
-	get_lppaca()->donate_dedicated_cpu = 1;
+	get_lppaca()->byte_b9 |= LPPACA_DONATE_DED_CPU_CYCLES;
 	old_latency_hint = get_lppaca()->cede_latency_hint;
 	get_lppaca()->cede_latency_hint = cede_latency_hint[index];
 
@@ -229,7 +229,7 @@ static int dedicated_cede_loop(struct cpuidle_device *dev,
 	check_and_cede_processor();
 
 	local_irq_disable();
-	get_lppaca()->donate_dedicated_cpu = 0;
+	get_lppaca()->byte_b9 &= ~LPPACA_DONATE_DED_CPU_CYCLES;
 	get_lppaca()->cede_latency_hint = old_latency_hint;
 
 	pseries_idle_epilog();
-- 
2.31.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH v0 2/5] KVM: PPC: Add support for KVM_REQ_ESN_EXIT
@ 2021-08-05  7:24   ` Bharata B Rao
  0 siblings, 0 replies; 29+ messages in thread
From: Bharata B Rao @ 2021-08-05  7:36 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev; +Cc: kvm, aneesh.kumar, bharata.rao, Bharata B Rao

Add a new KVM exit request KVM_REQ_ESN_EXIT that will be used
to exit to userspace (QEMU) whenever subvention notification
needs to be sent to the guest.

The userspace (QEMU) issues the subvention notification by
injecting an interrupt into the guest.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
---
 arch/powerpc/include/asm/kvm_host.h | 1 +
 arch/powerpc/kvm/book3s_hv.c        | 8 ++++++++
 include/uapi/linux/kvm.h            | 1 +
 3 files changed, 10 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 9f52f282b1aa..204dc2d91388 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -52,6 +52,7 @@
 #define KVM_REQ_WATCHDOG	KVM_ARCH_REQ(0)
 #define KVM_REQ_EPR_EXIT	KVM_ARCH_REQ(1)
 #define KVM_REQ_PENDING_TIMER	KVM_ARCH_REQ(2)
+#define KVM_REQ_ESN_EXIT	KVM_ARCH_REQ(3)
 
 #include <linux/mmu_notifier.h>
 
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 085fb8ecbf68..47ccd4a2df54 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2820,6 +2820,14 @@ static void kvmppc_core_vcpu_free_hv(struct kvm_vcpu *vcpu)
 
 static int kvmppc_core_check_requests_hv(struct kvm_vcpu *vcpu)
 {
+	/*
+	 * If subvention interrupt needs to be injected to the guest
+	 * exit to user space.
+	 */
+	if (kvm_check_request(KVM_REQ_ESN_EXIT, vcpu)) {
+		vcpu->run->exit_reason = KVM_EXIT_ESN;
+		return 0;
+	}
 	/* Indicate we want to get back into the guest */
 	return 1;
 }
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index d9e4aabcb31a..47be532ed14b 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -269,6 +269,7 @@ struct kvm_xen_exit {
 #define KVM_EXIT_AP_RESET_HOLD    32
 #define KVM_EXIT_X86_BUS_LOCK     33
 #define KVM_EXIT_XEN              34
+#define KVM_EXIT_ESN		  35
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
-- 
2.31.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH v0 3/5] KVM: PPC: Book3S: Enable setting SRR1 flags for DSI
@ 2021-08-05  7:24   ` Bharata B Rao
  0 siblings, 0 replies; 29+ messages in thread
From: Bharata B Rao @ 2021-08-05  7:36 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev; +Cc: kvm, aneesh.kumar, bharata.rao, Bharata B Rao

kvmppc_core_queue_data_storage() doesn't provide an option to
set SRR1 flags when raising DSI. Since kvmppc_inject_interrupt()
allows for such a provision, add an argument to allow the same.

This will be used to raise DSI with SRR1_PROGTRAP set when
expropriation interrupt needs to be injected to the guest.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
---
 arch/powerpc/include/asm/kvm_ppc.h     | 3 ++-
 arch/powerpc/kvm/book3s.c              | 6 +++---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 6 +++---
 arch/powerpc/kvm/book3s_hv.c           | 4 ++--
 arch/powerpc/kvm/book3s_hv_nested.c    | 4 ++--
 arch/powerpc/kvm/book3s_pr.c           | 4 ++--
 6 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 2d88944f9f34..09235bdfd4ac 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -143,7 +143,8 @@ extern void kvmppc_core_queue_dtlb_miss(struct kvm_vcpu *vcpu, ulong dear_flags,
 					ulong esr_flags);
 extern void kvmppc_core_queue_data_storage(struct kvm_vcpu *vcpu,
 					   ulong dear_flags,
-					   ulong esr_flags);
+					   ulong esr_flags,
+					   ulong srr1_flags);
 extern void kvmppc_core_queue_itlb_miss(struct kvm_vcpu *vcpu);
 extern void kvmppc_core_queue_inst_storage(struct kvm_vcpu *vcpu,
 					   ulong esr_flags);
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 79833f78d1da..f7f6641a788d 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -284,11 +284,11 @@ void kvmppc_core_dequeue_external(struct kvm_vcpu *vcpu)
 }
 
 void kvmppc_core_queue_data_storage(struct kvm_vcpu *vcpu, ulong dar,
-				    ulong flags)
+				    ulong dsisr, ulong srr1)
 {
 	kvmppc_set_dar(vcpu, dar);
-	kvmppc_set_dsisr(vcpu, flags);
-	kvmppc_inject_interrupt(vcpu, BOOK3S_INTERRUPT_DATA_STORAGE, 0);
+	kvmppc_set_dsisr(vcpu, dsisr);
+	kvmppc_inject_interrupt(vcpu, BOOK3S_INTERRUPT_DATA_STORAGE, srr1);
 }
 EXPORT_SYMBOL_GPL(kvmppc_core_queue_data_storage);
 
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index b5905ae4377c..618206a504b0 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -946,7 +946,7 @@ int kvmppc_book3s_radix_page_fault(struct kvm_vcpu *vcpu,
 	if (dsisr & DSISR_BADACCESS) {
 		/* Reflect to the guest as DSI */
 		pr_err("KVM: Got radix HV page fault with DSISR=%lx\n", dsisr);
-		kvmppc_core_queue_data_storage(vcpu, ea, dsisr);
+		kvmppc_core_queue_data_storage(vcpu, ea, dsisr, 0);
 		return RESUME_GUEST;
 	}
 
@@ -971,7 +971,7 @@ int kvmppc_book3s_radix_page_fault(struct kvm_vcpu *vcpu,
 			 * Bad address in guest page table tree, or other
 			 * unusual error - reflect it to the guest as DSI.
 			 */
-			kvmppc_core_queue_data_storage(vcpu, ea, dsisr);
+			kvmppc_core_queue_data_storage(vcpu, ea, dsisr, 0);
 			return RESUME_GUEST;
 		}
 		return kvmppc_hv_emulate_mmio(vcpu, gpa, ea, writing);
@@ -981,7 +981,7 @@ int kvmppc_book3s_radix_page_fault(struct kvm_vcpu *vcpu,
 		if (writing) {
 			/* give the guest a DSI */
 			kvmppc_core_queue_data_storage(vcpu, ea, DSISR_ISSTORE |
-						       DSISR_PROTFAULT);
+						       DSISR_PROTFAULT, 0);
 			return RESUME_GUEST;
 		}
 		kvm_ro = true;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 47ccd4a2df54..d07e9065f7c1 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1592,7 +1592,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 
 		if (!(vcpu->arch.fault_dsisr & (DSISR_NOHPTE | DSISR_PROTFAULT))) {
 			kvmppc_core_queue_data_storage(vcpu,
-				vcpu->arch.fault_dar, vcpu->arch.fault_dsisr);
+				vcpu->arch.fault_dar, vcpu->arch.fault_dsisr, 0);
 			r = RESUME_GUEST;
 			break;
 		}
@@ -1610,7 +1610,7 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
 			r = RESUME_PAGE_FAULT;
 		} else {
 			kvmppc_core_queue_data_storage(vcpu,
-				vcpu->arch.fault_dar, err);
+				vcpu->arch.fault_dar, err, 0);
 			r = RESUME_GUEST;
 		}
 		break;
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index 898f942eb198..a10ef0d5f925 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -1556,7 +1556,7 @@ static long int __kvmhv_nested_page_fault(struct kvm_vcpu *vcpu,
 	if (!memslot || (memslot->flags & KVM_MEMSLOT_INVALID)) {
 		if (dsisr & (DSISR_PRTABLE_FAULT | DSISR_BADACCESS)) {
 			/* unusual error -> reflect to the guest as a DSI */
-			kvmppc_core_queue_data_storage(vcpu, ea, dsisr);
+			kvmppc_core_queue_data_storage(vcpu, ea, dsisr, 0);
 			return RESUME_GUEST;
 		}
 
@@ -1567,7 +1567,7 @@ static long int __kvmhv_nested_page_fault(struct kvm_vcpu *vcpu,
 		if (writing) {
 			/* Give the guest a DSI */
 			kvmppc_core_queue_data_storage(vcpu, ea,
-					DSISR_ISSTORE | DSISR_PROTFAULT);
+					DSISR_ISSTORE | DSISR_PROTFAULT, 0);
 			return RESUME_GUEST;
 		}
 		kvm_ro = true;
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 6bc9425acb32..f7fc8e01fd8e 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -754,7 +754,7 @@ static int kvmppc_handle_pagefault(struct kvm_vcpu *vcpu,
 			flags = DSISR_NOHPTE;
 		if (data) {
 			flags |= vcpu->arch.fault_dsisr & DSISR_ISSTORE;
-			kvmppc_core_queue_data_storage(vcpu, eaddr, flags);
+			kvmppc_core_queue_data_storage(vcpu, eaddr, flags, 0);
 		} else {
 			kvmppc_core_queue_inst_storage(vcpu, flags);
 		}
@@ -1229,7 +1229,7 @@ int kvmppc_handle_exit_pr(struct kvm_vcpu *vcpu, unsigned int exit_nr)
 			r = kvmppc_handle_pagefault(vcpu, dar, exit_nr);
 			srcu_read_unlock(&vcpu->kvm->srcu, idx);
 		} else {
-			kvmppc_core_queue_data_storage(vcpu, dar, fault_dsisr);
+			kvmppc_core_queue_data_storage(vcpu, dar, fault_dsisr, 0);
 			r = RESUME_GUEST;
 		}
 		break;
-- 
2.31.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH v0 4/5] KVM: PPC: BOOK3S HV: Async PF support
@ 2021-08-05  7:24   ` Bharata B Rao
  0 siblings, 0 replies; 29+ messages in thread
From: Bharata B Rao @ 2021-08-05  7:36 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev; +Cc: kvm, aneesh.kumar, bharata.rao, Bharata B Rao

Add asynchronous page fault support for PowerKVM by making
use of the Expropriation/Subvention Notification Option
defined by PAPR specifications.

1. When guest accessed page isn't immediately available in the
host, update the vcpu's VPA with a unique expropriation correlation
number and inject a DSI to the guest with SRR1_PROGTRAP bit set in
SRR1. This informs the guest vcpu to put the process to wait and
schedule a different process.
   - Async PF is supported for data pages in this implementation
     though PAPR allows it for code pages too.
   - Async PF is supported only for user pages here.
   - The feature is currently limited only to radix guests.

2. When the page becomes available, update the Subvention Notification
Structure  with the corresponding expropriation correlation number and
and inform the guest via subvention interrupt.
   - Subvention Notification Structure (SNS) is a region of memory
     shared between host and guest via which the communication related
     to expropriated and subvened pages happens between guest and host.
   - SNS region is registered by the guest via H_REG_SNS hcall which
     is implemented in QEMU.
   - H_REG_SNS implementation in QEMU needs a new ioctl KVM_PPC_SET_SNS.
     This ioctl is used to map and pin the guest page containing SNS
     in the host.
   - Subvention notification interrupt is raised to the guest by
     QEMU in response to the guest exit via KVM_REQ_ESN_EXIT. This
     interrupt informs the guest about the availability of the
     pages.

TODO:
- H_REG_SNS is implemented in QEMU because this hcall needs to return
  the interrupt source number associated with the subvention interrupt.
  Claiming of IRQ line and raising an external interrupt seem to be
  straightforward from QEMU. Figure out the in-kernel equivalents for
  these two so that, we can save on guest exit for each expropriated
  page and move the entire hcall implementation into the host kernel.
- The code is pretty much experimental and is barely able to boot a
  guest. I do see some requests for expropriated pages not getting
  fulfilled by host leading the long delays in guest. This needs some
  debugging.
- A few other aspects recommended by PAPR around this feature(like
  setting of page state flags) need to be evaluated and incorporated
  into the implementation if found appropriate.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
---
 Documentation/virt/kvm/api.rst            |  15 ++
 arch/powerpc/include/asm/hvcall.h         |   1 +
 arch/powerpc/include/asm/kvm_book3s_esn.h |  24 +++
 arch/powerpc/include/asm/kvm_host.h       |  21 +++
 arch/powerpc/include/asm/kvm_ppc.h        |   1 +
 arch/powerpc/include/asm/lppaca.h         |  12 +-
 arch/powerpc/include/uapi/asm/kvm.h       |   6 +
 arch/powerpc/kvm/Kconfig                  |   2 +
 arch/powerpc/kvm/Makefile                 |   5 +-
 arch/powerpc/kvm/book3s_64_mmu_radix.c    |   3 +
 arch/powerpc/kvm/book3s_hv.c              |  25 +++
 arch/powerpc/kvm/book3s_hv_esn.c          | 189 ++++++++++++++++++++++
 include/uapi/linux/kvm.h                  |   1 +
 tools/include/uapi/linux/kvm.h            |   1 +
 14 files changed, 303 insertions(+), 3 deletions(-)
 create mode 100644 arch/powerpc/include/asm/kvm_book3s_esn.h
 create mode 100644 arch/powerpc/kvm/book3s_hv_esn.c

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index dae68e68ca23..512f078b9d02 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -5293,6 +5293,21 @@ the trailing ``'\0'``, is indicated by ``name_size`` in the header.
 The Stats Data block contains an array of 64-bit values in the same order
 as the descriptors in Descriptors block.
 
+4.134 KVM_PPC_SET_SNS
+---------------------
+
+:Capability: basic
+:Architectures: powerpc
+:Type: vm ioctl
+:Parameters: none
+:Returns: 0 on successful completion,
+
+As part of H_REG_SNS hypercall, this ioctl is used to map and pin
+the guest provided SNS structure in the host.
+
+This is used for providing asynchronous page fault support for
+powerpc pseries KVM guests.
+
 5. The kvm_run structure
 ============
 
diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index 9bcf345cb208..9e33500c1723 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -321,6 +321,7 @@
 #define H_SCM_UNBIND_ALL        0x3FC
 #define H_SCM_HEALTH            0x400
 #define H_SCM_PERFORMANCE_STATS 0x418
+#define H_REG_SNS		0x41C
 #define H_RPT_INVALIDATE	0x448
 #define H_SCM_FLUSH		0x44C
 #define MAX_HCALL_OPCODE	H_SCM_FLUSH
diff --git a/arch/powerpc/include/asm/kvm_book3s_esn.h b/arch/powerpc/include/asm/kvm_book3s_esn.h
new file mode 100644
index 000000000000..d79a441ea31d
--- /dev/null
+++ b/arch/powerpc/include/asm/kvm_book3s_esn.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_KVM_BOOK3S_ESN_H__
+#define __ASM_KVM_BOOK3S_ESN_H__
+
+/* SNS buffer EQ state flags */
+#define SNS_EQ_STATE_OPERATIONAL 0X0
+#define SNS_EQ_STATE_OVERFLOW 0x1
+
+/* SNS buffer Notification control bits */
+#define SNS_EQ_CNTRL_TRIGGER 0x1
+
+struct kvmppc_sns {
+	unsigned long gpa;
+	unsigned long len;
+	void *hva;
+	uint16_t exp_corr_nr;
+	uint16_t *eq;
+	uint8_t *eq_cntrl;
+	uint8_t *eq_state;
+	unsigned long next_eq_entry;
+	unsigned long nr_eq_entries;
+};
+
+#endif /* __ASM_KVM_BOOK3S_ESN_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 204dc2d91388..8d7f73085ef5 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -25,6 +25,7 @@
 #include <asm/cacheflush.h>
 #include <asm/hvcall.h>
 #include <asm/mce.h>
+#include <asm/kvm_book3s_esn.h>
 
 #define KVM_MAX_VCPUS		NR_CPUS
 #define KVM_MAX_VCORES		NR_CPUS
@@ -325,6 +326,7 @@ struct kvm_arch {
 #endif
 	struct kvmppc_ops *kvm_ops;
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+	struct kvmppc_sns sns;
 	struct mutex uvmem_lock;
 	struct list_head uvmem_pfns;
 	struct mutex mmu_setup_lock;	/* nests inside vcpu mutexes */
@@ -855,6 +857,25 @@ struct kvm_vcpu_arch {
 #define __KVM_HAVE_ARCH_WQP
 #define __KVM_HAVE_CREATE_DEVICE
 
+/* Async pf */
+#define ASYNC_PF_PER_VCPU       64
+struct kvm_arch_async_pf {
+	unsigned long exp_token;
+};
+int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu,
+			       unsigned long gpa, unsigned long hva);
+
+void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
+			       struct kvm_async_pf *work);
+
+bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
+				     struct kvm_async_pf *work);
+
+void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
+				 struct kvm_async_pf *work);
+bool kvm_arch_can_dequeue_async_page_present(struct kvm_vcpu *vcpu);
+static inline void kvm_arch_async_page_present_queued(struct kvm_vcpu *vcpu) {}
+
 static inline void kvm_arch_hardware_disable(void) {}
 static inline void kvm_arch_hardware_unsetup(void) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 09235bdfd4ac..c14a84041d0e 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -228,6 +228,7 @@ extern long kvm_vm_ioctl_resize_hpt_commit(struct kvm *kvm,
 int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq);
 
 extern int kvm_vm_ioctl_rtas_define_token(struct kvm *kvm, void __user *argp);
+long kvm_vm_ioctl_set_sns(struct kvm *kvm, struct kvm_ppc_sns_reg *sns_reg);
 extern int kvmppc_rtas_hcall(struct kvm_vcpu *vcpu);
 extern void kvmppc_rtas_tokens_free(struct kvm *kvm);
 
diff --git a/arch/powerpc/include/asm/lppaca.h b/arch/powerpc/include/asm/lppaca.h
index 57e432766f3e..17e89c3865e8 100644
--- a/arch/powerpc/include/asm/lppaca.h
+++ b/arch/powerpc/include/asm/lppaca.h
@@ -104,7 +104,17 @@ struct lppaca {
 	volatile __be32 dispersion_count; /* dispatch changed physical cpu */
 	volatile __be64 cmo_faults;	/* CMO page fault count */
 	volatile __be64 cmo_fault_time;	/* CMO page fault time */
-	u8	reserved10[104];
+
+	/*
+	 * TODO: Insert this at correct offset
+	 * 0x17D - Exp flags (1 byte)
+	 * 0x17E - Exp corr number (2 bytes)
+	 *
+	 * Here I am using only exp corr number at an easy to insert
+	 * offset.
+	 */
+	__be16 exp_corr_nr; /* Exproppriation correlation number */
+	u8	reserved10[102];
 
 	/* cacheline 4-5 */
 
diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
index 9f18fa090f1f..d72739126ae5 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -470,6 +470,12 @@ struct kvm_ppc_cpu_char {
 #define KVM_PPC_CPU_BEHAV_BNDS_CHK_SPEC_BAR	(1ULL << 61)
 #define KVM_PPC_CPU_BEHAV_FLUSH_COUNT_CACHE	(1ull << 58)
 
+/* For KVM_PPC_SET_SNS */
+struct kvm_ppc_sns_reg {
+	__u64 addr;
+	__u64 len;
+};
+
 /* Per-vcpu XICS interrupt controller state */
 #define KVM_REG_PPC_ICP_STATE	(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x8c)
 
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index e45644657d49..4f552649a4b2 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -85,6 +85,8 @@ config KVM_BOOK3S_64_HV
 	depends on KVM_BOOK3S_64 && PPC_POWERNV
 	select KVM_BOOK3S_HV_POSSIBLE
 	select MMU_NOTIFIER
+	select KVM_ASYNC_PF
+	select KVM_ASYNC_PF_SYNC
 	select CMA
 	help
 	  Support running unmodified book3s_64 guest kernels in
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 583c14ef596e..603ab382d021 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -6,7 +6,7 @@
 ccflags-y := -Ivirt/kvm -Iarch/powerpc/kvm
 KVM := ../../../virt/kvm
 
-common-objs-y = $(KVM)/kvm_main.o $(KVM)/eventfd.o $(KVM)/binary_stats.o
+common-objs-y = $(KVM)/kvm_main.o $(KVM)/eventfd.o $(KVM)/binary_stats.o $(KVM)/async_pf.o
 common-objs-$(CONFIG_KVM_VFIO) += $(KVM)/vfio.o
 common-objs-$(CONFIG_KVM_MMIO) += $(KVM)/coalesced_mmio.o
 
@@ -70,7 +70,8 @@ kvm-hv-y += \
 	book3s_hv_interrupts.o \
 	book3s_64_mmu_hv.o \
 	book3s_64_mmu_radix.o \
-	book3s_hv_nested.o
+	book3s_hv_nested.o \
+	book3s_hv_esn.o
 
 kvm-hv-$(CONFIG_PPC_UV) += \
 	book3s_hv_uvmem.o
diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 618206a504b0..1985f84bfebe 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -837,6 +837,9 @@ int kvmppc_book3s_instantiate_page(struct kvm_vcpu *vcpu,
 	} else {
 		unsigned long pfn;
 
+		if (kvm_arch_setup_async_pf(vcpu, gpa, hva))
+			return RESUME_GUEST;
+
 		/* Call KVM generic code to do the slow-path check */
 		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
 					   writing, upgrade_p, NULL);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index d07e9065f7c1..5cc564321521 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -77,6 +77,7 @@
 #include <asm/ultravisor.h>
 #include <asm/dtl.h>
 #include <asm/plpar_wrappers.h>
+#include <asm/kvm_book3s_esn.h>
 
 #include "book3s.h"
 
@@ -4570,6 +4571,11 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
 		return -EINTR;
 	}
 
+	if (kvm_request_pending(vcpu)) {
+		if (!kvmppc_core_check_requests(vcpu))
+			return 0;
+	}
+
 	kvm = vcpu->kvm;
 	atomic_inc(&kvm->arch.vcpus_running);
 	/* Order vcpus_running vs. mmu_ready, see kvmppc_alloc_reset_hpt */
@@ -4591,6 +4597,7 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
 	vcpu->arch.state = KVMPPC_VCPU_BUSY_IN_HOST;
 
 	do {
+		kvm_check_async_pf_completion(vcpu);
 		if (cpu_has_feature(CPU_FTR_ARCH_300))
 			r = kvmhv_run_single_vcpu(vcpu, ~(u64)0,
 						  vcpu->arch.vcore->lpcr);
@@ -5257,6 +5264,8 @@ static void kvmppc_free_vcores(struct kvm *kvm)
 
 static void kvmppc_core_destroy_vm_hv(struct kvm *kvm)
 {
+	struct kvm_ppc_sns_reg sns_reg;
+
 	debugfs_remove_recursive(kvm->arch.debugfs_dir);
 
 	if (!cpu_has_feature(CPU_FTR_ARCH_300))
@@ -5283,6 +5292,11 @@ static void kvmppc_core_destroy_vm_hv(struct kvm *kvm)
 	kvmppc_free_lpid(kvm->arch.lpid);
 
 	kvmppc_free_pimap(kvm);
+
+	/* Needed for de-registering SNS buffer */
+	sns_reg.addr = -1;
+	sns_reg.len = 0;
+	kvm_vm_ioctl_set_sns(kvm, &sns_reg);
 }
 
 /* We don't need to emulate any privileged instructions or dcbz */
@@ -5561,6 +5575,17 @@ static long kvm_arch_vm_ioctl_hv(struct file *filp,
 		break;
 	}
 
+	case KVM_PPC_SET_SNS: {
+		struct kvm_ppc_sns_reg sns_reg;
+
+		r = -EFAULT;
+		if (copy_from_user(&sns_reg, argp, sizeof(sns_reg)))
+			break;
+
+		r = kvm_vm_ioctl_set_sns(kvm, &sns_reg);
+		break;
+	}
+
 	default:
 		r = -ENOTTY;
 	}
diff --git a/arch/powerpc/kvm/book3s_hv_esn.c b/arch/powerpc/kvm/book3s_hv_esn.c
new file mode 100644
index 000000000000..b322a14c1f83
--- /dev/null
+++ b/arch/powerpc/kvm/book3s_hv_esn.c
@@ -0,0 +1,189 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Async page fault support via PAPR Expropriation/Subvention Notification
+ * option(ESN)
+ *
+ * Copyright 2020 Bharata B Rao, IBM Corp. <bharata@linux.ibm.com>
+ */
+
+#include <linux/kvm_host.h>
+#include <asm/kvm_ppc.h>
+#include <asm/kvm_book3s_esn.h>
+
+static DEFINE_SPINLOCK(async_exp_lock); /* for updating exp_corr_nr */
+static DEFINE_SPINLOCK(async_sns_lock); /* SNS buffer updated under this lock */
+
+int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu,
+			       unsigned long gpa, unsigned long hva)
+{
+	struct kvm_arch_async_pf arch;
+	struct lppaca *vpa = vcpu->arch.vpa.pinned_addr;
+	u64 msr = kvmppc_get_msr(vcpu);
+	struct kvmppc_sns *sns = &vcpu->kvm->arch.sns;
+
+	/*
+	 * If VPA hasn't been registered yet, can't support
+	 * async pf.
+	 */
+	if (!vpa)
+		return 0;
+
+	/*
+	 * If SNS memory area hasn't been registered yet,
+	 * can't support async pf.
+	 */
+	if (!vcpu->kvm->arch.sns.eq)
+		return 0;
+
+	/*
+	 * If guest hasn't enabled expropriation interrupt,
+	 * don't try async pf.
+	 */
+	if (!(vpa->byte_b9 & LPPACA_EXP_INT_ENABLED))
+		return 0;
+
+	/*
+	 * If the fault is in the guest kernel, don,t
+	 * try async pf.
+	 */
+	if (!(msr & MSR_PR) && !(msr & MSR_HV))
+		return 0;
+
+	spin_lock(&async_sns_lock);
+	/*
+	 * Check if subvention event queue can
+	 * overflow, if so, don't try async pf.
+	 */
+	if (*(sns->eq + sns->next_eq_entry)) {
+		pr_err("%s: SNS buffer overflow\n", __func__);
+		spin_unlock(&async_sns_lock);
+		return 0;
+	}
+	spin_unlock(&async_sns_lock);
+
+	/*
+	 * TODO:
+	 *
+	 * 1. Update exp flags bit 7 to 1
+	 * ("The Subvened page data will be restored")
+	 *
+	 * 2. Check if request to this page has been
+	 * notified to guest earlier, if so send back
+	 * the same exp corr number.
+	 *
+	 * 3. exp_corr_nr could be a random but non-zero
+	 * number. Not taking care of wrapping here. Fix
+	 * it.
+	 */
+	spin_lock(&async_exp_lock);
+	vpa->exp_corr_nr = cpu_to_be16(vcpu->kvm->arch.sns.exp_corr_nr);
+	arch.exp_token = vcpu->kvm->arch.sns.exp_corr_nr++;
+	spin_unlock(&async_exp_lock);
+
+	return kvm_setup_async_pf(vcpu, gpa, hva, &arch);
+}
+
+bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
+				     struct kvm_async_pf *work)
+{
+	/* Inject DSI to guest with srr1 bit 46 set */
+	kvmppc_core_queue_data_storage(vcpu, kvmppc_get_dar(vcpu), DSISR_NOHPTE, SRR1_PROGTRAP);
+	return true;
+}
+
+void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
+				 struct kvm_async_pf *work)
+{
+	struct kvmppc_sns *sns = &vcpu->kvm->arch.sns;
+
+	spin_lock(&async_sns_lock);
+	if (*sns->eq_cntrl != SNS_EQ_CNTRL_TRIGGER) {
+		pr_err("%s: SNS Notification Trigger not set by guest\n", __func__);
+		spin_unlock(&async_sns_lock);
+		/* TODO: Terminate the guest? */
+		return;
+	}
+
+	if (arch_cmpxchg(sns->eq + sns->next_eq_entry, 0,
+	    work->arch.exp_token)) {
+		*sns->eq_state |= SNS_EQ_STATE_OVERFLOW;
+		pr_err("%s: SNS buffer overflow\n", __func__);
+		spin_unlock(&async_sns_lock);
+		/* TODO: Terminate the guest? */
+		return;
+	}
+
+	sns->next_eq_entry = (sns->next_eq_entry + 1) % sns->nr_eq_entries;
+	spin_unlock(&async_sns_lock);
+
+	/*
+	 * Request a guest exit so that ESN virtual interrupt can
+	 * be injected by QEMU.
+	 */
+	kvm_make_request(KVM_REQ_ESN_EXIT, vcpu);
+}
+
+void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
+{
+	/* We will inject the page directly */
+}
+
+bool kvm_arch_can_dequeue_async_page_present(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * PowerPC will always inject the page directly,
+	 * but we still want check_async_completion to cleanup
+	 */
+	return true;
+}
+
+long kvm_vm_ioctl_set_sns(struct kvm *kvm, struct kvm_ppc_sns_reg *sns_reg)
+{
+	unsigned long nb;
+
+	/* Deregister */
+	if (sns_reg->addr = -1) {
+		if (!kvm->arch.sns.hva)
+			return 0;
+
+		pr_info("%s: Deregistering SNS buffer for LPID %d\n",
+			__func__, kvm->arch.lpid);
+		kvmppc_unpin_guest_page(kvm, kvm->arch.sns.hva, kvm->arch.sns.gpa, false);
+		kvm->arch.sns.gpa = -1;
+		kvm->arch.sns.hva = 0;
+		return 0;
+	}
+
+	/*
+	 * Already registered with the same address?
+	 */
+	if (sns_reg->addr = kvm->arch.sns.gpa)
+		return 0;
+
+	/* If previous registration exists, free it */
+	if (kvm->arch.sns.hva) {
+		pr_info("%s: Deregistering Previous SNS buffer for LPID %d\n",
+			__func__, kvm->arch.lpid);
+		kvmppc_unpin_guest_page(kvm, kvm->arch.sns.hva, kvm->arch.sns.gpa, false);
+		kvm->arch.sns.gpa = -1;
+		kvm->arch.sns.hva = 0;
+	}
+
+	kvm->arch.sns.gpa = sns_reg->addr;
+	kvm->arch.sns.hva = kvmppc_pin_guest_page(kvm, kvm->arch.sns.gpa, &nb);
+	kvm->arch.sns.len = sns_reg->len;
+	kvm->arch.sns.nr_eq_entries = (kvm->arch.sns.len - 2) / sizeof(uint16_t);
+	kvm->arch.sns.next_eq_entry = 0;
+	kvm->arch.sns.eq = kvm->arch.sns.hva + 2;
+	kvm->arch.sns.eq_cntrl = kvm->arch.sns.hva;
+	kvm->arch.sns.eq_state = kvm->arch.sns.hva + 1;
+	kvm->arch.sns.exp_corr_nr = 1; /* Should be non-zero */
+
+	*(kvm->arch.sns.eq_state) = SNS_EQ_STATE_OPERATIONAL;
+
+	pr_info("%s: Registering SNS buffer for LPID %d sns_addr %llx eq %lx\n",
+		__func__, kvm->arch.lpid, sns_reg->addr,
+		(unsigned long)kvm->arch.sns.eq);
+
+	return 0;
+}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 47be532ed14b..dbe65e8d68d8 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1459,6 +1459,7 @@ struct kvm_s390_ucas_mapping {
 #define KVM_SET_PMU_EVENT_FILTER  _IOW(KVMIO,  0xb2, struct kvm_pmu_event_filter)
 #define KVM_PPC_SVM_OFF		  _IO(KVMIO,  0xb3)
 #define KVM_ARM_MTE_COPY_TAGS	  _IOR(KVMIO,  0xb4, struct kvm_arm_copy_mte_tags)
+#define KVM_PPC_SET_SNS		  _IOR(KVMIO, 0xb5, struct kvm_ppc_sns_reg)
 
 /* ioctl for vm fd */
 #define KVM_CREATE_DEVICE	  _IOWR(KVMIO,  0xe0, struct kvm_create_device)
diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h
index d9e4aabcb31a..e9dea164498f 100644
--- a/tools/include/uapi/linux/kvm.h
+++ b/tools/include/uapi/linux/kvm.h
@@ -1458,6 +1458,7 @@ struct kvm_s390_ucas_mapping {
 #define KVM_SET_PMU_EVENT_FILTER  _IOW(KVMIO,  0xb2, struct kvm_pmu_event_filter)
 #define KVM_PPC_SVM_OFF		  _IO(KVMIO,  0xb3)
 #define KVM_ARM_MTE_COPY_TAGS	  _IOR(KVMIO,  0xb4, struct kvm_arm_copy_mte_tags)
+#define KVM_PPC_SET_SNS		  _IOR(KVMIO, 0xb5, struct kvm_ppc_sns_reg)
 
 /* ioctl for vm fd */
 #define KVM_CREATE_DEVICE	  _IOWR(KVMIO,  0xe0, struct kvm_create_device)
-- 
2.31.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH v0 5/5] pseries: Asynchronous page fault support
@ 2021-08-05  7:24   ` Bharata B Rao
  0 siblings, 0 replies; 29+ messages in thread
From: Bharata B Rao @ 2021-08-05  7:36 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev; +Cc: kvm, aneesh.kumar, bharata.rao, Bharata B Rao

Add asynchronous page fault support for pseries guests.

1. Setup the guest to handle async-pf
   - Issue H_REG_SNS hcall to register the SNS region.
   - Setup the subvention interrupt irq.
   - Enable async-pf by updating the byte_b9 of VPA for each
     CPU.
2. Check if the page fault is an expropriation notification
   (SRR1_PROGTRAP set in SRR1) and if so put the task on
   wait queue based on the expropriation correlation number
   read from the VPA.
3. Handle subvention interrupt to wake any waiting tasks.
   The wait and wakeup mechanism from x86 async-pf implementation
   is being reused here.

TODO:
- Check how to keep this feature together with other CMO features.
- The async-pf check in the page fault handler path is limited to
  guest with an #ifdef. This isn't sufficient and hence needs to
  be replaced by an appropriate check.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
---
 arch/powerpc/include/asm/async-pf.h       |  12 ++
 arch/powerpc/mm/fault.c                   |   7 +-
 arch/powerpc/platforms/pseries/Makefile   |   2 +-
 arch/powerpc/platforms/pseries/async-pf.c | 219 ++++++++++++++++++++++
 4 files changed, 238 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/include/asm/async-pf.h
 create mode 100644 arch/powerpc/platforms/pseries/async-pf.c

diff --git a/arch/powerpc/include/asm/async-pf.h b/arch/powerpc/include/asm/async-pf.h
new file mode 100644
index 000000000000..95d6c3da9f50
--- /dev/null
+++ b/arch/powerpc/include/asm/async-pf.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Async page fault support via PAPR Expropriation/Subvention Notification
+ * option(ESN)
+ *
+ * Copyright 2020 Bharata B Rao, IBM Corp. <bharata@linux.ibm.com>
+ */
+
+#ifndef _ASM_POWERPC_ASYNC_PF_H
+int handle_async_page_fault(struct pt_regs *regs, unsigned long addr);
+#define _ASM_POWERPC_ASYNC_PF_H
+#endif
diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index a8d0ce85d39a..bbdc61605885 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -44,7 +44,7 @@
 #include <asm/debug.h>
 #include <asm/kup.h>
 #include <asm/inst.h>
-
+#include <asm/async-pf.h>
 
 /*
  * do_page_fault error handling helpers
@@ -395,6 +395,11 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address,
 	vm_fault_t fault, major = 0;
 	bool kprobe_fault = kprobe_page_fault(regs, 11);
 
+#ifdef CONFIG_PPC_PSERIES
+	if (handle_async_page_fault(regs, address))
+		return 0;
+#endif
+
 	if (unlikely(debugger_fault_handler(regs) || kprobe_fault))
 		return 0;
 
diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile
index 4cda0ef87be0..e0ada605ef20 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -6,7 +6,7 @@ obj-y			:= lpar.o hvCall.o nvram.o reconfig.o \
 			   of_helpers.o \
 			   setup.o iommu.o event_sources.o ras.o \
 			   firmware.o power.o dlpar.o mobility.o rng.o \
-			   pci.o pci_dlpar.o eeh_pseries.o msi.o
+			   pci.o pci_dlpar.o eeh_pseries.o msi.o async-pf.o
 obj-$(CONFIG_SMP)	+= smp.o
 obj-$(CONFIG_SCANLOG)	+= scanlog.o
 obj-$(CONFIG_KEXEC_CORE)	+= kexec.o
diff --git a/arch/powerpc/platforms/pseries/async-pf.c b/arch/powerpc/platforms/pseries/async-pf.c
new file mode 100644
index 000000000000..c2f3bbc0d674
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/async-pf.c
@@ -0,0 +1,219 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Async page fault support via PAPR Expropriation/Subvention Notification
+ * option(ESN)
+ *
+ * Copyright 2020 Bharata B Rao, IBM Corp. <bharata@linux.ibm.com>
+ */
+
+#include <linux/interrupt.h>
+#include <linux/swait.h>
+#include <linux/irqdomain.h>
+#include <asm/machdep.h>
+#include <asm/hvcall.h>
+#include <asm/paca.h>
+
+static char sns_buffer[PAGE_SIZE] __aligned(4096);
+static uint16_t *esn_q = (uint16_t *)sns_buffer + 1;
+static unsigned long next_eq_entry, nr_eq_entries;
+
+#define ASYNC_PF_SLEEP_HASHBITS 8
+#define ASYNC_PF_SLEEP_HASHSIZE (1<<ASYNC_PF_SLEEP_HASHBITS)
+
+/* Controls access to SNS buffer */
+static DEFINE_RAW_SPINLOCK(async_sns_guest_lock);
+
+/* Wait queue handling is from x86 asyn-pf implementation */
+struct async_pf_sleep_node {
+	struct hlist_node link;
+	struct swait_queue_head wq;
+	u64 token;
+	int cpu;
+};
+
+static struct async_pf_sleep_head {
+	raw_spinlock_t lock;
+	struct hlist_head list;
+} async_pf_sleepers[ASYNC_PF_SLEEP_HASHSIZE];
+
+static struct async_pf_sleep_node *_find_apf_task(struct async_pf_sleep_head *b,
+						  u64 token)
+{
+	struct hlist_node *p;
+
+	hlist_for_each(p, &b->list) {
+		struct async_pf_sleep_node *n +			hlist_entry(p, typeof(*n), link);
+		if (n->token = token)
+			return n;
+	}
+
+	return NULL;
+}
+static int async_pf_queue_task(u64 token, struct async_pf_sleep_node *n)
+{
+	u64 key = hash_64(token, ASYNC_PF_SLEEP_HASHBITS);
+	struct async_pf_sleep_head *b = &async_pf_sleepers[key];
+	struct async_pf_sleep_node *e;
+
+	raw_spin_lock(&b->lock);
+	e = _find_apf_task(b, token);
+	if (e) {
+		/* dummy entry exist -> wake up was delivered ahead of PF */
+		hlist_del(&e->link);
+		raw_spin_unlock(&b->lock);
+		kfree(e);
+		return false;
+	}
+
+	n->token = token;
+	n->cpu = smp_processor_id();
+	init_swait_queue_head(&n->wq);
+	hlist_add_head(&n->link, &b->list);
+	raw_spin_unlock(&b->lock);
+	return true;
+}
+
+/*
+ * Handle Expropriation notification.
+ */
+int handle_async_page_fault(struct pt_regs *regs, unsigned long addr)
+{
+	struct async_pf_sleep_node n;
+	DECLARE_SWAITQUEUE(wait);
+	unsigned long exp_corr_nr;
+
+	/* Is this Expropriation notification? */
+	if (!(mfspr(SPRN_SRR1) & SRR1_PROGTRAP))
+		return 0;
+
+	if (unlikely(!user_mode(regs)))
+		panic("Host injected async PF in kernel mode\n");
+
+	exp_corr_nr = be16_to_cpu(get_lppaca()->exp_corr_nr);
+	if (!async_pf_queue_task(exp_corr_nr, &n))
+		return 0;
+
+	for (;;) {
+		prepare_to_swait_exclusive(&n.wq, &wait, TASK_UNINTERRUPTIBLE);
+		if (hlist_unhashed(&n.link))
+			break;
+
+		local_irq_enable();
+		schedule();
+		local_irq_disable();
+	}
+
+	finish_swait(&n.wq, &wait);
+	return 1;
+}
+
+static void apf_task_wake_one(struct async_pf_sleep_node *n)
+{
+	hlist_del_init(&n->link);
+	if (swq_has_sleeper(&n->wq))
+		swake_up_one(&n->wq);
+}
+
+static void async_pf_wake_task(u64 token)
+{
+	u64 key = hash_64(token, ASYNC_PF_SLEEP_HASHBITS);
+	struct async_pf_sleep_head *b = &async_pf_sleepers[key];
+	struct async_pf_sleep_node *n;
+
+again:
+	raw_spin_lock(&b->lock);
+	n = _find_apf_task(b, token);
+	if (!n) {
+		/*
+		 * async PF was not yet handled.
+		 * Add dummy entry for the token.
+		 */
+		n = kzalloc(sizeof(*n), GFP_ATOMIC);
+		if (!n) {
+			/*
+			 * Allocation failed! Busy wait while other cpu
+			 * handles async PF.
+			 */
+			raw_spin_unlock(&b->lock);
+			cpu_relax();
+			goto again;
+		}
+		n->token = token;
+		n->cpu = smp_processor_id();
+		init_swait_queue_head(&n->wq);
+		hlist_add_head(&n->link, &b->list);
+	} else {
+		apf_task_wake_one(n);
+	}
+	raw_spin_unlock(&b->lock);
+}
+
+/*
+ * Handle Subvention notification.
+ */
+static irqreturn_t async_pf_handler(int irq, void *dev_id)
+{
+	uint16_t exp_token, old;
+
+	raw_spin_lock(&async_sns_guest_lock);
+	do {
+		exp_token = *(esn_q + next_eq_entry);
+		if (!exp_token)
+			break;
+
+		old = arch_cmpxchg(esn_q + next_eq_entry, exp_token, 0);
+		BUG_ON(old != exp_token);
+
+		async_pf_wake_task(exp_token);
+		next_eq_entry = (next_eq_entry + 1) % nr_eq_entries;
+	} while (1);
+	raw_spin_unlock(&async_sns_guest_lock);
+	return IRQ_HANDLED;
+}
+
+static int __init pseries_async_pf_init(void)
+{
+	long rc;
+	unsigned long ret[PLPAR_HCALL_BUFSIZE];
+	unsigned int irq, cpu;
+	int i;
+
+	/* Register buffer via H_REG_SNS */
+	rc = plpar_hcall(H_REG_SNS, ret, __pa(sns_buffer), PAGE_SIZE);
+	if (rc != H_SUCCESS)
+		return -1;
+
+	nr_eq_entries = (PAGE_SIZE - 2) / sizeof(uint16_t);
+
+	/* Register irq handler */
+	irq = irq_create_mapping(NULL, ret[1]);
+	if (!irq) {
+		plpar_hcall(H_REG_SNS, ret, -1, PAGE_SIZE);
+		return -1;
+	}
+
+	rc = request_irq(irq, async_pf_handler, 0, "sns-interrupt", NULL);
+	if (rc < 0) {
+		plpar_hcall(H_REG_SNS, ret, -1, PAGE_SIZE);
+		return -1;
+	}
+
+	for (i = 0; i < ASYNC_PF_SLEEP_HASHSIZE; i++)
+		raw_spin_lock_init(&async_pf_sleepers[i].lock);
+
+	/*
+	 * Enable subvention notifications from the hypervisor
+	 * by setting bit 0, byte 0 of SNS buffer
+	 */
+	*sns_buffer |= 0x1;
+
+	/* Enable LPPACA_EXP_INT_ENABLED in VPA */
+	for_each_possible_cpu(cpu)
+		lppaca_of(cpu).byte_b9 |= LPPACA_EXP_INT_ENABLED;
+
+	pr_err("%s: Enabled Async PF\n", __func__);
+	return 0;
+}
+
+machine_arch_initcall(pseries, pseries_async_pf_init);
-- 
2.31.1

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v0 0/5] PPC: KVM: pseries: Asynchronous page fault
@ 2021-08-05  7:35   ` Bharata B Rao
  0 siblings, 0 replies; 29+ messages in thread
From: Bharata B Rao @ 2021-08-05  7:47 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev; +Cc: kvm, aneesh.kumar, bharata.rao

On Thu, Aug 05, 2021 at 12:54:34PM +0530, Bharata B Rao wrote:
> Hi,
> 
> This series adds asynchronous page fault support for pseries guests
> and enables the support for the same in powerpc KVM. This is an
> early RFC with details and multiple TODOs listed in patch descriptions.
> 
> This patch needs supporting enablement in QEMU too which will be
> posted separately.

QEMU part is posted here:
https://lore.kernel.org/qemu-devel/20210805073228.502292-2-bharata@linux.ibm.com/T/#u

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v0 5/5] pseries: Asynchronous page fault support
  2021-08-05  7:24   ` Bharata B Rao
  (?)
  (?)
@ 2021-08-05 11:44   ` kernel test robot
  -1 siblings, 0 replies; 29+ messages in thread
From: kernel test robot @ 2021-08-05 11:44 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2985 bytes --]

Hi Bharata,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on powerpc/next]
[also build test WARNING on kvm/queue v5.14-rc4 next-20210804]
[cannot apply to kvm-ppc/kvm-ppc-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Bharata-B-Rao/PPC-KVM-pseries-Asynchronous-page-fault/20210805-152622
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-allyesconfig (attached as .config)
compiler: powerpc-linux-gcc (GCC) 10.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/d3e7bf525224f9cc414b949855c681e6eea3b7db
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Bharata-B-Rao/PPC-KVM-pseries-Asynchronous-page-fault/20210805-152622
        git checkout d3e7bf525224f9cc414b949855c681e6eea3b7db
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-10.3.0 make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> arch/powerpc/platforms/pseries/async-pf.c:80:5: warning: no previous prototype for 'handle_async_page_fault' [-Wmissing-prototypes]
      80 | int handle_async_page_fault(struct pt_regs *regs, unsigned long addr)
         |     ^~~~~~~~~~~~~~~~~~~~~~~


vim +/handle_async_page_fault +80 arch/powerpc/platforms/pseries/async-pf.c

    76	
    77	/*
    78	 * Handle Expropriation notification.
    79	 */
  > 80	int handle_async_page_fault(struct pt_regs *regs, unsigned long addr)
    81	{
    82		struct async_pf_sleep_node n;
    83		DECLARE_SWAITQUEUE(wait);
    84		unsigned long exp_corr_nr;
    85	
    86		/* Is this Expropriation notification? */
    87		if (!(mfspr(SPRN_SRR1) & SRR1_PROGTRAP))
    88			return 0;
    89	
    90		if (unlikely(!user_mode(regs)))
    91			panic("Host injected async PF in kernel mode\n");
    92	
    93		exp_corr_nr = be16_to_cpu(get_lppaca()->exp_corr_nr);
    94		if (!async_pf_queue_task(exp_corr_nr, &n))
    95			return 0;
    96	
    97		for (;;) {
    98			prepare_to_swait_exclusive(&n.wq, &wait, TASK_UNINTERRUPTIBLE);
    99			if (hlist_unhashed(&n.link))
   100				break;
   101	
   102			local_irq_enable();
   103			schedule();
   104			local_irq_disable();
   105		}
   106	
   107		finish_swait(&n.wq, &wait);
   108		return 1;
   109	}
   110	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 73338 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v0 5/5] pseries: Asynchronous page fault support
  2021-08-05  7:24   ` Bharata B Rao
                     ` (2 preceding siblings ...)
  (?)
@ 2021-08-05 11:49   ` kernel test robot
  -1 siblings, 0 replies; 29+ messages in thread
From: kernel test robot @ 2021-08-05 11:49 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2990 bytes --]

Hi Bharata,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on powerpc/next]
[also build test ERROR on kvm/queue v5.14-rc4 next-20210804]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Bharata-B-Rao/PPC-KVM-pseries-Asynchronous-page-fault/20210805-152622
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc64-defconfig (attached as .config)
compiler: powerpc-linux-gcc (GCC) 10.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/d3e7bf525224f9cc414b949855c681e6eea3b7db
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Bharata-B-Rao/PPC-KVM-pseries-Asynchronous-page-fault/20210805-152622
        git checkout d3e7bf525224f9cc414b949855c681e6eea3b7db
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-10.3.0 make.cross ARCH=powerpc64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> arch/powerpc/platforms/pseries/async-pf.c:80:5: error: no previous prototype for 'handle_async_page_fault' [-Werror=missing-prototypes]
      80 | int handle_async_page_fault(struct pt_regs *regs, unsigned long addr)
         |     ^~~~~~~~~~~~~~~~~~~~~~~
   cc1: all warnings being treated as errors


vim +/handle_async_page_fault +80 arch/powerpc/platforms/pseries/async-pf.c

    76	
    77	/*
    78	 * Handle Expropriation notification.
    79	 */
  > 80	int handle_async_page_fault(struct pt_regs *regs, unsigned long addr)
    81	{
    82		struct async_pf_sleep_node n;
    83		DECLARE_SWAITQUEUE(wait);
    84		unsigned long exp_corr_nr;
    85	
    86		/* Is this Expropriation notification? */
    87		if (!(mfspr(SPRN_SRR1) & SRR1_PROGTRAP))
    88			return 0;
    89	
    90		if (unlikely(!user_mode(regs)))
    91			panic("Host injected async PF in kernel mode\n");
    92	
    93		exp_corr_nr = be16_to_cpu(get_lppaca()->exp_corr_nr);
    94		if (!async_pf_queue_task(exp_corr_nr, &n))
    95			return 0;
    96	
    97		for (;;) {
    98			prepare_to_swait_exclusive(&n.wq, &wait, TASK_UNINTERRUPTIBLE);
    99			if (hlist_unhashed(&n.link))
   100				break;
   101	
   102			local_irq_enable();
   103			schedule();
   104			local_irq_disable();
   105		}
   106	
   107		finish_swait(&n.wq, &wait);
   108		return 1;
   109	}
   110	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 26971 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v0 5/5] pseries: Asynchronous page fault support
  2021-08-05  7:24   ` Bharata B Rao
  (?)
@ 2021-08-13  4:06     ` Nicholas Piggin
  -1 siblings, 0 replies; 29+ messages in thread
From: Nicholas Piggin @ 2021-08-13  4:06 UTC (permalink / raw)
  To: Bharata B Rao, kvm-ppc, linuxppc-dev; +Cc: aneesh.kumar, bharata.rao, kvm

Excerpts from Bharata B Rao's message of August 5, 2021 5:24 pm:
> Add asynchronous page fault support for pseries guests.
> 
> 1. Setup the guest to handle async-pf
>    - Issue H_REG_SNS hcall to register the SNS region.
>    - Setup the subvention interrupt irq.
>    - Enable async-pf by updating the byte_b9 of VPA for each
>      CPU.
> 2. Check if the page fault is an expropriation notification
>    (SRR1_PROGTRAP set in SRR1) and if so put the task on
>    wait queue based on the expropriation correlation number
>    read from the VPA.
> 3. Handle subvention interrupt to wake any waiting tasks.
>    The wait and wakeup mechanism from x86 async-pf implementation
>    is being reused here.

I don't know too much about the background of this.

How much benefit does this give? What situations? Does PowerVM implement 
it? Do other architectures KVM have something similar?

The SRR1 setting for the DSI is in PAPR? In that case it should be okay,
it might be good to add a small comment in exceptions-64s.S.

[...]

> @@ -395,6 +395,11 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address,
>  	vm_fault_t fault, major = 0;
>  	bool kprobe_fault = kprobe_page_fault(regs, 11);
>  
> +#ifdef CONFIG_PPC_PSERIES
> +	if (handle_async_page_fault(regs, address))
> +		return 0;
> +#endif
> +
>  	if (unlikely(debugger_fault_handler(regs) || kprobe_fault))
>  		return 0;

[...]

> +int handle_async_page_fault(struct pt_regs *regs, unsigned long addr)
> +{
> +	struct async_pf_sleep_node n;
> +	DECLARE_SWAITQUEUE(wait);
> +	unsigned long exp_corr_nr;
> +
> +	/* Is this Expropriation notification? */
> +	if (!(mfspr(SPRN_SRR1) & SRR1_PROGTRAP))
> +		return 0;

Yep this should be an inline that is guarded by a static key, and then 
probably have an inline check for SRR1_PROGTRAP. You shouldn't need to
mfspr here, but just use regs->msr.

> +
> +	if (unlikely(!user_mode(regs)))
> +		panic("Host injected async PF in kernel mode\n");

Hmm. Is there anything in the PAPR interface that specifies that the
OS can only deal with problem state access faults here? Or is that
inherent in the expropriation feature?

Thanks,
Nick

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v0 5/5] pseries: Asynchronous page fault support
@ 2021-08-13  4:06     ` Nicholas Piggin
  0 siblings, 0 replies; 29+ messages in thread
From: Nicholas Piggin @ 2021-08-13  4:06 UTC (permalink / raw)
  To: Bharata B Rao, kvm-ppc, linuxppc-dev; +Cc: aneesh.kumar, kvm, bharata.rao

Excerpts from Bharata B Rao's message of August 5, 2021 5:24 pm:
> Add asynchronous page fault support for pseries guests.
> 
> 1. Setup the guest to handle async-pf
>    - Issue H_REG_SNS hcall to register the SNS region.
>    - Setup the subvention interrupt irq.
>    - Enable async-pf by updating the byte_b9 of VPA for each
>      CPU.
> 2. Check if the page fault is an expropriation notification
>    (SRR1_PROGTRAP set in SRR1) and if so put the task on
>    wait queue based on the expropriation correlation number
>    read from the VPA.
> 3. Handle subvention interrupt to wake any waiting tasks.
>    The wait and wakeup mechanism from x86 async-pf implementation
>    is being reused here.

I don't know too much about the background of this.

How much benefit does this give? What situations? Does PowerVM implement 
it? Do other architectures KVM have something similar?

The SRR1 setting for the DSI is in PAPR? In that case it should be okay,
it might be good to add a small comment in exceptions-64s.S.

[...]

> @@ -395,6 +395,11 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address,
>  	vm_fault_t fault, major = 0;
>  	bool kprobe_fault = kprobe_page_fault(regs, 11);
>  
> +#ifdef CONFIG_PPC_PSERIES
> +	if (handle_async_page_fault(regs, address))
> +		return 0;
> +#endif
> +
>  	if (unlikely(debugger_fault_handler(regs) || kprobe_fault))
>  		return 0;

[...]

> +int handle_async_page_fault(struct pt_regs *regs, unsigned long addr)
> +{
> +	struct async_pf_sleep_node n;
> +	DECLARE_SWAITQUEUE(wait);
> +	unsigned long exp_corr_nr;
> +
> +	/* Is this Expropriation notification? */
> +	if (!(mfspr(SPRN_SRR1) & SRR1_PROGTRAP))
> +		return 0;

Yep this should be an inline that is guarded by a static key, and then 
probably have an inline check for SRR1_PROGTRAP. You shouldn't need to
mfspr here, but just use regs->msr.

> +
> +	if (unlikely(!user_mode(regs)))
> +		panic("Host injected async PF in kernel mode\n");

Hmm. Is there anything in the PAPR interface that specifies that the
OS can only deal with problem state access faults here? Or is that
inherent in the expropriation feature?

Thanks,
Nick

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v0 5/5] pseries: Asynchronous page fault support
@ 2021-08-13  4:06     ` Nicholas Piggin
  0 siblings, 0 replies; 29+ messages in thread
From: Nicholas Piggin @ 2021-08-13  4:06 UTC (permalink / raw)
  To: Bharata B Rao, kvm-ppc, linuxppc-dev; +Cc: aneesh.kumar, bharata.rao, kvm

Excerpts from Bharata B Rao's message of August 5, 2021 5:24 pm:
> Add asynchronous page fault support for pseries guests.
> 
> 1. Setup the guest to handle async-pf
>    - Issue H_REG_SNS hcall to register the SNS region.
>    - Setup the subvention interrupt irq.
>    - Enable async-pf by updating the byte_b9 of VPA for each
>      CPU.
> 2. Check if the page fault is an expropriation notification
>    (SRR1_PROGTRAP set in SRR1) and if so put the task on
>    wait queue based on the expropriation correlation number
>    read from the VPA.
> 3. Handle subvention interrupt to wake any waiting tasks.
>    The wait and wakeup mechanism from x86 async-pf implementation
>    is being reused here.

I don't know too much about the background of this.

How much benefit does this give? What situations? Does PowerVM implement 
it? Do other architectures KVM have something similar?

The SRR1 setting for the DSI is in PAPR? In that case it should be okay,
it might be good to add a small comment in exceptions-64s.S.

[...]

> @@ -395,6 +395,11 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address,
>  	vm_fault_t fault, major = 0;
>  	bool kprobe_fault = kprobe_page_fault(regs, 11);
>  
> +#ifdef CONFIG_PPC_PSERIES
> +	if (handle_async_page_fault(regs, address))
> +		return 0;
> +#endif
> +
>  	if (unlikely(debugger_fault_handler(regs) || kprobe_fault))
>  		return 0;

[...]

> +int handle_async_page_fault(struct pt_regs *regs, unsigned long addr)
> +{
> +	struct async_pf_sleep_node n;
> +	DECLARE_SWAITQUEUE(wait);
> +	unsigned long exp_corr_nr;
> +
> +	/* Is this Expropriation notification? */
> +	if (!(mfspr(SPRN_SRR1) & SRR1_PROGTRAP))
> +		return 0;

Yep this should be an inline that is guarded by a static key, and then 
probably have an inline check for SRR1_PROGTRAP. You shouldn't need to
mfspr here, but just use regs->msr.

> +
> +	if (unlikely(!user_mode(regs)))
> +		panic("Host injected async PF in kernel mode\n");

Hmm. Is there anything in the PAPR interface that specifies that the
OS can only deal with problem state access faults here? Or is that
inherent in the expropriation feature?

Thanks,
Nick

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v0 5/5] pseries: Asynchronous page fault support
  2021-08-13  4:06     ` Nicholas Piggin
  (?)
@ 2021-08-13  4:54       ` Bharata B Rao
  -1 siblings, 0 replies; 29+ messages in thread
From: Bharata B Rao @ 2021-08-13  4:54 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: kvm-ppc, linuxppc-dev, aneesh.kumar, bharata.rao, kvm

On Fri, Aug 13, 2021 at 02:06:40PM +1000, Nicholas Piggin wrote:
> Excerpts from Bharata B Rao's message of August 5, 2021 5:24 pm:
> > Add asynchronous page fault support for pseries guests.
> > 
> > 1. Setup the guest to handle async-pf
> >    - Issue H_REG_SNS hcall to register the SNS region.
> >    - Setup the subvention interrupt irq.
> >    - Enable async-pf by updating the byte_b9 of VPA for each
> >      CPU.
> > 2. Check if the page fault is an expropriation notification
> >    (SRR1_PROGTRAP set in SRR1) and if so put the task on
> >    wait queue based on the expropriation correlation number
> >    read from the VPA.
> > 3. Handle subvention interrupt to wake any waiting tasks.
> >    The wait and wakeup mechanism from x86 async-pf implementation
> >    is being reused here.
> 
> I don't know too much about the background of this.
> 
> How much benefit does this give? What situations?

I haven't yet gotten into measuring the benefit of this. Once
the patches are bit more stable than what they are currently,
we need to measure and evaluate the benefits.

> Does PowerVM implement it?

I suppose so, need to check though.

> Do other architectures KVM have something similar?

Yes, x86 and s390 KVM have had this feature for a while now
and generic KVM interfaces exist to support it.

> 
> The SRR1 setting for the DSI is in PAPR? In that case it should be okay,
> it might be good to add a small comment in exceptions-64s.S.

Yes, SRR1 setting is part of PAPR.

> 
> [...]
> 
> > @@ -395,6 +395,11 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address,
> >  	vm_fault_t fault, major = 0;
> >  	bool kprobe_fault = kprobe_page_fault(regs, 11);
> >  
> > +#ifdef CONFIG_PPC_PSERIES
> > +	if (handle_async_page_fault(regs, address))
> > +		return 0;
> > +#endif
> > +
> >  	if (unlikely(debugger_fault_handler(regs) || kprobe_fault))
> >  		return 0;
> 
> [...]
> 
> > +int handle_async_page_fault(struct pt_regs *regs, unsigned long addr)
> > +{
> > +	struct async_pf_sleep_node n;
> > +	DECLARE_SWAITQUEUE(wait);
> > +	unsigned long exp_corr_nr;
> > +
> > +	/* Is this Expropriation notification? */
> > +	if (!(mfspr(SPRN_SRR1) & SRR1_PROGTRAP))
> > +		return 0;
> 
> Yep this should be an inline that is guarded by a static key, and then 
> probably have an inline check for SRR1_PROGTRAP. You shouldn't need to
> mfspr here, but just use regs->msr.

Right.

> 
> > +
> > +	if (unlikely(!user_mode(regs)))
> > +		panic("Host injected async PF in kernel mode\n");
> 
> Hmm. Is there anything in the PAPR interface that specifies that the
> OS can only deal with problem state access faults here? Or is that
> inherent in the expropriation feature?

Didn't see anything specific to that effect in PAPR. However since
this puts the faulting guest process to sleep until the page
becomes ready in the host, I have limited it to guest user space
faults.

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v0 5/5] pseries: Asynchronous page fault support
@ 2021-08-13  4:54       ` Bharata B Rao
  0 siblings, 0 replies; 29+ messages in thread
From: Bharata B Rao @ 2021-08-13  4:54 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: aneesh.kumar, linuxppc-dev, kvm, kvm-ppc, bharata.rao

On Fri, Aug 13, 2021 at 02:06:40PM +1000, Nicholas Piggin wrote:
> Excerpts from Bharata B Rao's message of August 5, 2021 5:24 pm:
> > Add asynchronous page fault support for pseries guests.
> > 
> > 1. Setup the guest to handle async-pf
> >    - Issue H_REG_SNS hcall to register the SNS region.
> >    - Setup the subvention interrupt irq.
> >    - Enable async-pf by updating the byte_b9 of VPA for each
> >      CPU.
> > 2. Check if the page fault is an expropriation notification
> >    (SRR1_PROGTRAP set in SRR1) and if so put the task on
> >    wait queue based on the expropriation correlation number
> >    read from the VPA.
> > 3. Handle subvention interrupt to wake any waiting tasks.
> >    The wait and wakeup mechanism from x86 async-pf implementation
> >    is being reused here.
> 
> I don't know too much about the background of this.
> 
> How much benefit does this give? What situations?

I haven't yet gotten into measuring the benefit of this. Once
the patches are bit more stable than what they are currently,
we need to measure and evaluate the benefits.

> Does PowerVM implement it?

I suppose so, need to check though.

> Do other architectures KVM have something similar?

Yes, x86 and s390 KVM have had this feature for a while now
and generic KVM interfaces exist to support it.

> 
> The SRR1 setting for the DSI is in PAPR? In that case it should be okay,
> it might be good to add a small comment in exceptions-64s.S.

Yes, SRR1 setting is part of PAPR.

> 
> [...]
> 
> > @@ -395,6 +395,11 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address,
> >  	vm_fault_t fault, major = 0;
> >  	bool kprobe_fault = kprobe_page_fault(regs, 11);
> >  
> > +#ifdef CONFIG_PPC_PSERIES
> > +	if (handle_async_page_fault(regs, address))
> > +		return 0;
> > +#endif
> > +
> >  	if (unlikely(debugger_fault_handler(regs) || kprobe_fault))
> >  		return 0;
> 
> [...]
> 
> > +int handle_async_page_fault(struct pt_regs *regs, unsigned long addr)
> > +{
> > +	struct async_pf_sleep_node n;
> > +	DECLARE_SWAITQUEUE(wait);
> > +	unsigned long exp_corr_nr;
> > +
> > +	/* Is this Expropriation notification? */
> > +	if (!(mfspr(SPRN_SRR1) & SRR1_PROGTRAP))
> > +		return 0;
> 
> Yep this should be an inline that is guarded by a static key, and then 
> probably have an inline check for SRR1_PROGTRAP. You shouldn't need to
> mfspr here, but just use regs->msr.

Right.

> 
> > +
> > +	if (unlikely(!user_mode(regs)))
> > +		panic("Host injected async PF in kernel mode\n");
> 
> Hmm. Is there anything in the PAPR interface that specifies that the
> OS can only deal with problem state access faults here? Or is that
> inherent in the expropriation feature?

Didn't see anything specific to that effect in PAPR. However since
this puts the faulting guest process to sleep until the page
becomes ready in the host, I have limited it to guest user space
faults.

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v0 5/5] pseries: Asynchronous page fault support
@ 2021-08-13  4:54       ` Bharata B Rao
  0 siblings, 0 replies; 29+ messages in thread
From: Bharata B Rao @ 2021-08-13  4:54 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: kvm-ppc, linuxppc-dev, aneesh.kumar, bharata.rao, kvm

On Fri, Aug 13, 2021 at 02:06:40PM +1000, Nicholas Piggin wrote:
> Excerpts from Bharata B Rao's message of August 5, 2021 5:24 pm:
> > Add asynchronous page fault support for pseries guests.
> > 
> > 1. Setup the guest to handle async-pf
> >    - Issue H_REG_SNS hcall to register the SNS region.
> >    - Setup the subvention interrupt irq.
> >    - Enable async-pf by updating the byte_b9 of VPA for each
> >      CPU.
> > 2. Check if the page fault is an expropriation notification
> >    (SRR1_PROGTRAP set in SRR1) and if so put the task on
> >    wait queue based on the expropriation correlation number
> >    read from the VPA.
> > 3. Handle subvention interrupt to wake any waiting tasks.
> >    The wait and wakeup mechanism from x86 async-pf implementation
> >    is being reused here.
> 
> I don't know too much about the background of this.
> 
> How much benefit does this give? What situations?

I haven't yet gotten into measuring the benefit of this. Once
the patches are bit more stable than what they are currently,
we need to measure and evaluate the benefits.

> Does PowerVM implement it?

I suppose so, need to check though.

> Do other architectures KVM have something similar?

Yes, x86 and s390 KVM have had this feature for a while now
and generic KVM interfaces exist to support it.

> 
> The SRR1 setting for the DSI is in PAPR? In that case it should be okay,
> it might be good to add a small comment in exceptions-64s.S.

Yes, SRR1 setting is part of PAPR.

> 
> [...]
> 
> > @@ -395,6 +395,11 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address,
> >  	vm_fault_t fault, major = 0;
> >  	bool kprobe_fault = kprobe_page_fault(regs, 11);
> >  
> > +#ifdef CONFIG_PPC_PSERIES
> > +	if (handle_async_page_fault(regs, address))
> > +		return 0;
> > +#endif
> > +
> >  	if (unlikely(debugger_fault_handler(regs) || kprobe_fault))
> >  		return 0;
> 
> [...]
> 
> > +int handle_async_page_fault(struct pt_regs *regs, unsigned long addr)
> > +{
> > +	struct async_pf_sleep_node n;
> > +	DECLARE_SWAITQUEUE(wait);
> > +	unsigned long exp_corr_nr;
> > +
> > +	/* Is this Expropriation notification? */
> > +	if (!(mfspr(SPRN_SRR1) & SRR1_PROGTRAP))
> > +		return 0;
> 
> Yep this should be an inline that is guarded by a static key, and then 
> probably have an inline check for SRR1_PROGTRAP. You shouldn't need to
> mfspr here, but just use regs->msr.

Right.

> 
> > +
> > +	if (unlikely(!user_mode(regs)))
> > +		panic("Host injected async PF in kernel mode\n");
> 
> Hmm. Is there anything in the PAPR interface that specifies that the
> OS can only deal with problem state access faults here? Or is that
> inherent in the expropriation feature?

Didn't see anything specific to that effect in PAPR. However since
this puts the faulting guest process to sleep until the page
becomes ready in the host, I have limited it to guest user space
faults.

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2021-08-13  4:55 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-05  7:24 [RFC PATCH v0 0/5] PPC: KVM: pseries: Asynchronous page fault Bharata B Rao
2021-08-05  7:36 ` Bharata B Rao
2021-08-05  7:24 ` Bharata B Rao
2021-08-05  7:24 ` [RFC PATCH v0 1/5] powerpc: Define Expropriation interrupt bit to VPA byte offset 0xB9 Bharata B Rao
2021-08-05  7:36   ` Bharata B Rao
2021-08-05  7:24   ` Bharata B Rao
2021-08-05  7:24 ` [RFC PATCH v0 2/5] KVM: PPC: Add support for KVM_REQ_ESN_EXIT Bharata B Rao
2021-08-05  7:36   ` Bharata B Rao
2021-08-05  7:24   ` Bharata B Rao
2021-08-05  7:24 ` [RFC PATCH v0 3/5] KVM: PPC: Book3S: Enable setting SRR1 flags for DSI Bharata B Rao
2021-08-05  7:36   ` Bharata B Rao
2021-08-05  7:24   ` Bharata B Rao
2021-08-05  7:24 ` [RFC PATCH v0 4/5] KVM: PPC: BOOK3S HV: Async PF support Bharata B Rao
2021-08-05  7:36   ` Bharata B Rao
2021-08-05  7:24   ` Bharata B Rao
2021-08-05  7:24 ` [RFC PATCH v0 5/5] pseries: Asynchronous page fault support Bharata B Rao
2021-08-05  7:36   ` Bharata B Rao
2021-08-05  7:24   ` Bharata B Rao
2021-08-05 11:44   ` kernel test robot
2021-08-05 11:49   ` kernel test robot
2021-08-13  4:06   ` Nicholas Piggin
2021-08-13  4:06     ` Nicholas Piggin
2021-08-13  4:06     ` Nicholas Piggin
2021-08-13  4:54     ` Bharata B Rao
2021-08-13  4:54       ` Bharata B Rao
2021-08-13  4:54       ` Bharata B Rao
2021-08-05  7:35 ` [RFC PATCH v0 0/5] PPC: KVM: pseries: Asynchronous page fault Bharata B Rao
2021-08-05  7:47   ` Bharata B Rao
2021-08-05  7:35   ` Bharata B Rao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.