kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/15] Support Asynchronous Page Fault
@ 2021-08-15  0:59 Gavin Shan
  2021-08-15  0:59 ` [PATCH v4 01/15] KVM: async_pf: Move struct kvm_async_pf around Gavin Shan
                   ` (14 more replies)
  0 siblings, 15 replies; 36+ messages in thread
From: Gavin Shan @ 2021-08-15  0:59 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-kernel, kvm, james.morse, mark.rutland, Jonathan.Cameron,
	will, maz, pbonzini, vkuznets, shan.gavin

There are two stages of page fault. The guest kernel is responsible for
handling stage-1 page fault, while the host kernel is to take care of the
stage-2 page fault. When the guest is trapped to host because of stage-2
page fault, the guest is suspended until the requested memory (page) is
populated. Sometimes, the cost to populate the requested page isn't cheap
and can take hundreds of milliseconds in extreme cases. Similarly, the
guest has to wait until the requested memory is ready in the scenario of
post-copy live migration.

This series introduces the feature (Asynchronous Page Fault) to improve
situation, so that the guest needn't have to wait in the scenarios. With
it, the overall performance is improved on the guest. This series depends
on the feature "SDEI virtualization" and QEMU changes. All code changes
can be found from github:

 https://github.com/gwshan/linux ("kvm/arm64_sdei") # SDEI virtualization
 https://github.com/gwshan/linux ("kvm/arm64_apf")  # This series + "sdei"
 https://github.com/gwshan/qemu  ("kvm/arm64_apf")  # QEMU code changes

About the design, the details can be found from last patch. Generally,
it's driven by two notifications: page-not-present and page-ready. They
are delivered from the host to guest via SDEI event and PPI separately.
In the mean while, each notification is always associated with a token,
used to identify the notification. The token is passed by the shared
memory between host/guest. Besides, the SMCCC and ioctl interface are
mitigated by VMM and guest to configure, enable, disable, even migrate
the functionality.

When the guest is trapped to host because of stage-2 page fault, a
page-not-present notification is raised by the host, and sent to the
guest through dedicated SDEI event (0x40400001) if the requested page
can't be populated immediately. In the mean while, a (background) worker
is also started to populate the requested page. On receiving the SDEI
event, the guest marks the current running process with special flag
(TIF_ASYNC_PF) and associates it with a pre-allocated waitqueue. At
same time, a (reschedule) IPI is sent to current CPU. After the SDEI
event is acknowledged by the guest, the (reschedule) IPI is delivered
and it causes context switch from that process tagged with TIF_ASYNC_PF
to another process.

Later on, a page-ready notification is sent to guest after the requested
page is populated by the (background) worker. On receiving the interrupt,
the guest uses the associated token to locate the process, which was
previously suspended because of page-not-present. The flag (TIF_ASYNC_PF)
is cleared for the suspended process and it's waken up.

The series is organized as below:

   PATCH[01-04] makes the GFN hash table management generic so that it
                can be shared by x86/arm64.
   PATCH[05-06] preparatory work to support asynchronous page fault.
   PATCH[07-08] support asynchronous page fault.
   PATCH[09-11] support ioctl and SMCCC interfaces for the functionality.
   PATHC[12-14] supoort asynchronous page fault for guest
   PATCH[15]    adds document to explain the design and internals

Testing
=======

The tests are taken using program "testsuite", which is written by myself.
The program basically does two things: (a) Starts a thread to allocate
all the available memory, write to them by the specified times. (b) The
parallel thread is started to do calculation while the memory is written
for the specified times. 

Besides, there are two testing scenarios: (a) the QEMU process is put into
cgroup where the memory is limited. With this, we should have asynchronous
page fault activities. The total used time for "testsuite" to finish is
measured. The calculation capacity is also measured if the corresponding
thread is started. (b) To measure the total used time and calculation
capacity in scenario of migrating the workload.

(a) Running "testsuite" with (cgroup) memory limitation to QEMU process.
    When the calculation thread isn't started, the consumed time is
    slightly increased because of overhead introduced by asynchronous
    page fault. The total used time drops by ~40% when the calculation
    thread is started. It means the parallelism is greatly improved
    by the asynchronous page fault.

  vCPU: 1  Memory: 1024MB   cgroupv2.limit: 512MB
  Command: testsuite test async_pf -l 1 [-t] -q

  Time-     Calculation-   Time+     Calculation+    Output
  --------------------------------------------------------------------
  14.592s                  15.010s                   +2.8%
  15.726s                  15.185s                   +3.4%
  15.742s                  15.192s                   +3.4%
  15.827s                  15.270s                   +3.5%
  15.831s                  15.291s                   +3.4%
  27.880s   2108m          16.539s    1104m          -40.6%  -47.6%
  27.972s   2111m          16.588s    1110m          -40.6%  -47.4%
  28.020s   2114m          16.656s    1117m          -40.5%  -47.1%
  28.227s   2135m          16.722s    1105m          -40.7%  -48.2%
  28.918s   2194m          16.767s    1113m          -42.0%  -49.2%

  Asynchronous page faults:  55000

(b) Migrating the workload ("testsuite"). The total used time is dropped
    a bit since the migration is completed in very short period (~1.5s).
    There are not too much asynchronous page fault activies during the
    migration. It's overall beneficial to migration performance if guest
    experices high work load. However, the used time is increased by ~14%
    because of the overhead introduced by asynchronous page fault when
    guest doesn't have high work load.

  vCPU: 1  Memory: 1024MB   cgroupv2.limit: unlimited
  Command: testsuite test async_pf -l 50 [-t] -q

  Time-     Calculation-   Time+     Calculation+    Output
  --------------------------------------------------------------------
  11.132s                  12.655s                   +13.6%
  11.135s                  12.707s                   +14.1%
  11.143s                  12.728s                   +14.2%
  11.167s                  12.746s                   +14.1%
  11.172s                  12.821s                   +14.7%
  27.308s   2252m          25.827s   2131m           -5.4%
  27.440s   2275m          26.517s   2333m           -3.3%
  28.069s   2364m          26.520s   2356m           -5.5%
  28.777s   2427m          26.726s   2383m           -7.1%
  28.915s   2452m          27.632s   2508m           -4.4%

  migrate.total_time:       ~1.6s
  Asynchronous page faults: ~100 times

Changelog
=========
v4:
   * Rebase to v5.14.rc5 and retest                               (Gavin)
v3:
   * Rebase to v5.13.rc1                                          (Gavin)
   * Drop patches from Will to detected SMCCC KVM service         (Gavin)
   * Retest and recapture the benchmarks                          (Gavin)
v2:
   * Rebase to v5.11.rc6                                          (Gavin)
   * Split the patches                                            (James)
   * Allocate "struct kvm_arch_async_control" dymaicall and use
     it to check if the feature has been enabled. The kernel
     option (CONFIG_KVM_ASYNC_PF) isn't used.                     (James)
   * Add document to explain the design                           (James)
   * Make GFN hash table management generic                       (James)
   * Add ioctl commands to support migration                      (Gavin)

Gavin Shan (15):
  KVM: async_pf: Move struct kvm_async_pf around
  KVM: async_pf: Add helper function to check completion queue
  KVM: async_pf: Make GFN slot management generic
  KVM: x86: Use generic async PF slot management
  KVM: arm64: Export kvm_handle_user_mem_abort()
  KVM: arm64: Add paravirtualization header files
  KVM: arm64: Support page-not-present notification
  KVM: arm64: Support page-ready notification
  KVM: arm64: Support async PF hypercalls
  KVM: arm64: Support async PF ioctl commands
  KVM: arm64: Export async PF capability
  arm64: Detect async PF para-virtualization feature
  arm64: Reschedule process on aync PF
  arm64: Enable async PF
  KVM: arm64: Add async PF document

 Documentation/virt/kvm/arm/apf.rst     | 143 +++++++
 Documentation/virt/kvm/arm/index.rst   |   1 +
 arch/arm64/Kconfig                     |  11 +
 arch/arm64/include/asm/esr.h           |   6 +
 arch/arm64/include/asm/kvm_emulate.h   |  27 +-
 arch/arm64/include/asm/kvm_host.h      |  85 ++++
 arch/arm64/include/asm/kvm_para.h      |  37 ++
 arch/arm64/include/asm/processor.h     |   1 +
 arch/arm64/include/asm/thread_info.h   |   4 +-
 arch/arm64/include/uapi/asm/Kbuild     |   2 -
 arch/arm64/include/uapi/asm/kvm.h      |  19 +
 arch/arm64/include/uapi/asm/kvm_para.h |  23 ++
 arch/arm64/include/uapi/asm/kvm_sdei.h |   1 +
 arch/arm64/kernel/Makefile             |   1 +
 arch/arm64/kernel/kvm.c                | 452 +++++++++++++++++++++
 arch/arm64/kernel/signal.c             |  17 +
 arch/arm64/kvm/Kconfig                 |   2 +
 arch/arm64/kvm/Makefile                |   1 +
 arch/arm64/kvm/arm.c                   |  37 +-
 arch/arm64/kvm/async_pf.c              | 533 +++++++++++++++++++++++++
 arch/arm64/kvm/hypercalls.c            |   5 +
 arch/arm64/kvm/mmu.c                   |  76 +++-
 arch/arm64/kvm/sdei.c                  |   5 +
 arch/x86/include/asm/kvm_host.h        |   2 -
 arch/x86/kvm/Kconfig                   |   1 +
 arch/x86/kvm/mmu/mmu.c                 |   2 +-
 arch/x86/kvm/x86.c                     |  88 +---
 include/linux/arm-smccc.h              |  15 +
 include/linux/kvm_host.h               |  72 +++-
 include/uapi/linux/kvm.h               |   3 +
 virt/kvm/Kconfig                       |   3 +
 virt/kvm/async_pf.c                    |  95 ++++-
 virt/kvm/kvm_main.c                    |   4 +-
 33 files changed, 1621 insertions(+), 153 deletions(-)
 create mode 100644 Documentation/virt/kvm/arm/apf.rst
 create mode 100644 arch/arm64/include/asm/kvm_para.h
 create mode 100644 arch/arm64/include/uapi/asm/kvm_para.h
 create mode 100644 arch/arm64/kernel/kvm.c
 create mode 100644 arch/arm64/kvm/async_pf.c

-- 
2.23.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v4 01/15] KVM: async_pf: Move struct kvm_async_pf around
  2021-08-15  0:59 [PATCH v4 00/15] Support Asynchronous Page Fault Gavin Shan
@ 2021-08-15  0:59 ` Gavin Shan
  2021-11-10 15:37   ` Eric Auger
  2021-08-15  0:59 ` [PATCH v4 02/15] KVM: async_pf: Add helper function to check completion queue Gavin Shan
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 36+ messages in thread
From: Gavin Shan @ 2021-08-15  0:59 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-kernel, kvm, james.morse, mark.rutland, Jonathan.Cameron,
	will, maz, pbonzini, vkuznets, shan.gavin

This moves the definition of "struct kvm_async_pf" and the related
functions after "struct kvm_vcpu" so that newly added inline functions
in the subsequent patches can dereference "struct kvm_vcpu" properly.
Otherwise, the unexpected build error will be raised:

   error: dereferencing pointer to incomplete type ‘struct kvm_vcpu’
   return !list_empty_careful(&vcpu->async_pf.done);
                                   ^~
Since we're here, the sepator between type and field in "struct kvm_vcpu"
is replaced by tab. The empty stub kvm_check_async_pf_completion() is also
added on !CONFIG_KVM_ASYNC_PF, which is needed by subsequent patches to
support asynchronous page fault on ARM64.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 include/linux/kvm_host.h | 44 +++++++++++++++++++++-------------------
 1 file changed, 23 insertions(+), 21 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ae7735b490b4..85b61a456f1c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -199,27 +199,6 @@ int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
 struct kvm_io_device *kvm_io_bus_get_dev(struct kvm *kvm, enum kvm_bus bus_idx,
 					 gpa_t addr);
 
-#ifdef CONFIG_KVM_ASYNC_PF
-struct kvm_async_pf {
-	struct work_struct work;
-	struct list_head link;
-	struct list_head queue;
-	struct kvm_vcpu *vcpu;
-	struct mm_struct *mm;
-	gpa_t cr2_or_gpa;
-	unsigned long addr;
-	struct kvm_arch_async_pf arch;
-	bool   wakeup_all;
-	bool notpresent_injected;
-};
-
-void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu);
-void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu);
-bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
-			unsigned long hva, struct kvm_arch_async_pf *arch);
-int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
-#endif
-
 #ifdef KVM_ARCH_WANT_MMU_NOTIFIER
 struct kvm_gfn_range {
 	struct kvm_memory_slot *slot;
@@ -346,6 +325,29 @@ struct kvm_vcpu {
 	struct kvm_dirty_ring dirty_ring;
 };
 
+#ifdef CONFIG_KVM_ASYNC_PF
+struct kvm_async_pf {
+	struct work_struct		work;
+	struct list_head		link;
+	struct list_head		queue;
+	struct kvm_vcpu			*vcpu;
+	struct mm_struct		*mm;
+	gpa_t				cr2_or_gpa;
+	unsigned long			addr;
+	struct kvm_arch_async_pf	arch;
+	bool				wakeup_all;
+	bool				notpresent_injected;
+};
+
+void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu);
+void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu);
+bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
+			unsigned long hva, struct kvm_arch_async_pf *arch);
+int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
+#else
+static inline void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu) { }
+#endif
+
 /* must be called with irqs disabled */
 static __always_inline void guest_enter_irqoff(void)
 {
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 02/15] KVM: async_pf: Add helper function to check completion queue
  2021-08-15  0:59 [PATCH v4 00/15] Support Asynchronous Page Fault Gavin Shan
  2021-08-15  0:59 ` [PATCH v4 01/15] KVM: async_pf: Move struct kvm_async_pf around Gavin Shan
@ 2021-08-15  0:59 ` Gavin Shan
  2021-08-16 16:53   ` Vitaly Kuznetsov
  2021-11-10 15:37   ` Eric Auger
  2021-08-15  0:59 ` [PATCH v4 03/15] KVM: async_pf: Make GFN slot management generic Gavin Shan
                   ` (12 subsequent siblings)
  14 siblings, 2 replies; 36+ messages in thread
From: Gavin Shan @ 2021-08-15  0:59 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-kernel, kvm, james.morse, mark.rutland, Jonathan.Cameron,
	will, maz, pbonzini, vkuznets, shan.gavin

This adds inline helper kvm_check_async_pf_completion_queue() to
check if there are pending completion in the queue. The empty stub
is also added on !CONFIG_KVM_ASYNC_PF so that the caller needn't
consider if CONFIG_KVM_ASYNC_PF is enabled.

All checks on the completion queue is done by the newly added inline
function since list_empty() and list_empty_careful() are interchangeable.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 arch/x86/kvm/x86.c       |  2 +-
 include/linux/kvm_host.h | 10 ++++++++++
 virt/kvm/async_pf.c      | 10 +++++-----
 virt/kvm/kvm_main.c      |  4 +---
 4 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e5d5c5ed7dd4..7f35d9324b99 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11591,7 +11591,7 @@ static inline bool kvm_guest_apic_has_interrupt(struct kvm_vcpu *vcpu)
 
 static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
 {
-	if (!list_empty_careful(&vcpu->async_pf.done))
+	if (kvm_check_async_pf_completion_queue(vcpu))
 		return true;
 
 	if (kvm_apic_has_events(vcpu))
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 85b61a456f1c..a5f990f6dc35 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -339,12 +339,22 @@ struct kvm_async_pf {
 	bool				notpresent_injected;
 };
 
+static inline bool kvm_check_async_pf_completion_queue(struct kvm_vcpu *vcpu)
+{
+	return !list_empty_careful(&vcpu->async_pf.done);
+}
+
 void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu);
 void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu);
 bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 			unsigned long hva, struct kvm_arch_async_pf *arch);
 int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
 #else
+static inline bool kvm_check_async_pf_completion_queue(struct kvm_vcpu *vcpu)
+{
+	return false;
+}
+
 static inline void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu) { }
 #endif
 
diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index dd777688d14a..d145a61a046a 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -70,7 +70,7 @@ static void async_pf_execute(struct work_struct *work)
 		kvm_arch_async_page_present(vcpu, apf);
 
 	spin_lock(&vcpu->async_pf.lock);
-	first = list_empty(&vcpu->async_pf.done);
+	first = !kvm_check_async_pf_completion_queue(vcpu);
 	list_add_tail(&apf->link, &vcpu->async_pf.done);
 	apf->vcpu = NULL;
 	spin_unlock(&vcpu->async_pf.lock);
@@ -122,7 +122,7 @@ void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu)
 		spin_lock(&vcpu->async_pf.lock);
 	}
 
-	while (!list_empty(&vcpu->async_pf.done)) {
+	while (kvm_check_async_pf_completion_queue(vcpu)) {
 		struct kvm_async_pf *work =
 			list_first_entry(&vcpu->async_pf.done,
 					 typeof(*work), link);
@@ -138,7 +138,7 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu)
 {
 	struct kvm_async_pf *work;
 
-	while (!list_empty_careful(&vcpu->async_pf.done) &&
+	while (kvm_check_async_pf_completion_queue(vcpu) &&
 	      kvm_arch_can_dequeue_async_page_present(vcpu)) {
 		spin_lock(&vcpu->async_pf.lock);
 		work = list_first_entry(&vcpu->async_pf.done, typeof(*work),
@@ -205,7 +205,7 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu)
 	struct kvm_async_pf *work;
 	bool first;
 
-	if (!list_empty_careful(&vcpu->async_pf.done))
+	if (kvm_check_async_pf_completion_queue(vcpu))
 		return 0;
 
 	work = kmem_cache_zalloc(async_pf_cache, GFP_ATOMIC);
@@ -216,7 +216,7 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu)
 	INIT_LIST_HEAD(&work->queue); /* for list_del to work */
 
 	spin_lock(&vcpu->async_pf.lock);
-	first = list_empty(&vcpu->async_pf.done);
+	first = !kvm_check_async_pf_completion_queue(vcpu);
 	list_add_tail(&work->link, &vcpu->async_pf.done);
 	spin_unlock(&vcpu->async_pf.lock);
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b50dbe269f4b..8795503651b1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3282,10 +3282,8 @@ static bool vcpu_dy_runnable(struct kvm_vcpu *vcpu)
 	if (kvm_arch_dy_runnable(vcpu))
 		return true;
 
-#ifdef CONFIG_KVM_ASYNC_PF
-	if (!list_empty_careful(&vcpu->async_pf.done))
+	if (kvm_check_async_pf_completion_queue(vcpu))
 		return true;
-#endif
 
 	return false;
 }
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 03/15] KVM: async_pf: Make GFN slot management generic
  2021-08-15  0:59 [PATCH v4 00/15] Support Asynchronous Page Fault Gavin Shan
  2021-08-15  0:59 ` [PATCH v4 01/15] KVM: async_pf: Move struct kvm_async_pf around Gavin Shan
  2021-08-15  0:59 ` [PATCH v4 02/15] KVM: async_pf: Add helper function to check completion queue Gavin Shan
@ 2021-08-15  0:59 ` Gavin Shan
  2021-11-10 17:00   ` Eric Auger
  2021-11-10 17:00   ` Eric Auger
  2021-08-15  0:59 ` [PATCH v4 04/15] KVM: x86: Use generic async PF slot management Gavin Shan
                   ` (11 subsequent siblings)
  14 siblings, 2 replies; 36+ messages in thread
From: Gavin Shan @ 2021-08-15  0:59 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-kernel, kvm, james.morse, mark.rutland, Jonathan.Cameron,
	will, maz, pbonzini, vkuznets, shan.gavin

It's not allowed to fire duplicate notification for same GFN on
x86 platform, with help of a hash table. This mechanism is going
to be used by arm64 and this makes the code generic and shareable
by multiple platforms.

   * As this mechanism isn't needed by all platforms, a new kernel
     config option (CONFIG_ASYNC_PF_SLOT) is introduced so that it
     can be disabled at compiling time.

   * The code is basically copied from x86 platform and the functions
     are renamed to reflect the fact: (a) the input parameters are
     vCPU and GFN. (b) The operations are resetting, searching, adding
     and removing.

   * Helper stub is also added on !CONFIG_KVM_ASYNC_PF because we're
     going to use IS_ENABLED() instead of #ifdef on arm64 when the
     asynchronous page fault is supported.

This is preparatory work to use the newly introduced functions on x86
platform and arm64 in subsequent patches.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 include/linux/kvm_host.h | 18 +++++++++
 virt/kvm/Kconfig         |  3 ++
 virt/kvm/async_pf.c      | 85 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 106 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a5f990f6dc35..a9685c2b2250 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -298,6 +298,9 @@ struct kvm_vcpu {
 
 #ifdef CONFIG_KVM_ASYNC_PF
 	struct {
+#ifdef CONFIG_KVM_ASYNC_PF_SLOT
+		gfn_t gfns[ASYNC_PF_PER_VCPU];
+#endif
 		u32 queued;
 		struct list_head queue;
 		struct list_head done;
@@ -339,6 +342,13 @@ struct kvm_async_pf {
 	bool				notpresent_injected;
 };
 
+#ifdef CONFIG_KVM_ASYNC_PF_SLOT
+void kvm_async_pf_reset_slot(struct kvm_vcpu *vcpu);
+void kvm_async_pf_add_slot(struct kvm_vcpu *vcpu, gfn_t gfn);
+void kvm_async_pf_remove_slot(struct kvm_vcpu *vcpu, gfn_t gfn);
+bool kvm_async_pf_find_slot(struct kvm_vcpu *vcpu, gfn_t gfn);
+#endif
+
 static inline bool kvm_check_async_pf_completion_queue(struct kvm_vcpu *vcpu)
 {
 	return !list_empty_careful(&vcpu->async_pf.done);
@@ -350,6 +360,14 @@ bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 			unsigned long hva, struct kvm_arch_async_pf *arch);
 int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
 #else
+static inline void kvm_async_pf_reset_slot(struct kvm_vcpu *vcpu) { }
+static inline void kvm_async_pf_add_slot(struct kvm_vcpu *vcpu, gfn_t gfn) { }
+static inline void kvm_async_pf_remove_slot(struct kvm_vcpu *vcpu, gfn_t gfn) { }
+static inline bool kvm_async_pf_find_slot(struct kvm_vcpu *vcpu, gfn_t gfn)
+{
+	return false;
+}
+
 static inline bool kvm_check_async_pf_completion_queue(struct kvm_vcpu *vcpu)
 {
 	return false;
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 62b39149b8c8..59b518c8c205 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -23,6 +23,9 @@ config KVM_MMIO
 config KVM_ASYNC_PF
        bool
 
+config KVM_ASYNC_PF_SLOT
+	bool
+
 # Toggle to switch between direct notification and batch job
 config KVM_ASYNC_PF_SYNC
        bool
diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index d145a61a046a..0d1fdb2932af 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -13,12 +13,97 @@
 #include <linux/module.h>
 #include <linux/mmu_context.h>
 #include <linux/sched/mm.h>
+#ifdef CONFIG_KVM_ASYNC_PF_SLOT
+#include <linux/hash.h>
+#endif
 
 #include "async_pf.h"
 #include <trace/events/kvm.h>
 
 static struct kmem_cache *async_pf_cache;
 
+#ifdef CONFIG_KVM_ASYNC_PF_SLOT
+static inline u32 kvm_async_pf_hash(gfn_t gfn)
+{
+	BUILD_BUG_ON(!is_power_of_2(ASYNC_PF_PER_VCPU));
+
+	return hash_32(gfn & 0xffffffff, order_base_2(ASYNC_PF_PER_VCPU));
+}
+
+static inline u32 kvm_async_pf_next_slot(u32 key)
+{
+	return (key + 1) & (ASYNC_PF_PER_VCPU - 1);
+}
+
+static u32 kvm_async_pf_slot(struct kvm_vcpu *vcpu, gfn_t gfn)
+{
+	u32 key = kvm_async_pf_hash(gfn);
+	int i;
+
+	for (i = 0; i < ASYNC_PF_PER_VCPU &&
+		(vcpu->async_pf.gfns[key] != gfn &&
+		vcpu->async_pf.gfns[key] != ~0); i++)
+		key = kvm_async_pf_next_slot(key);
+
+	return key;
+}
+
+void kvm_async_pf_reset_slot(struct kvm_vcpu *vcpu)
+{
+	int i;
+
+	for (i = 0; i < ASYNC_PF_PER_VCPU; i++)
+		vcpu->async_pf.gfns[i] = ~0;
+}
+
+void kvm_async_pf_add_slot(struct kvm_vcpu *vcpu, gfn_t gfn)
+{
+	u32 key = kvm_async_pf_hash(gfn);
+
+	while (vcpu->async_pf.gfns[key] != ~0)
+		key = kvm_async_pf_next_slot(key);
+
+	vcpu->async_pf.gfns[key] = gfn;
+}
+
+void kvm_async_pf_remove_slot(struct kvm_vcpu *vcpu, gfn_t gfn)
+{
+	u32 i, j, k;
+
+	i = j = kvm_async_pf_slot(vcpu, gfn);
+
+	if (WARN_ON_ONCE(vcpu->async_pf.gfns[i] != gfn))
+		return;
+
+	while (true) {
+		vcpu->async_pf.gfns[i] = ~0;
+
+		do {
+			j = kvm_async_pf_next_slot(j);
+			if (vcpu->async_pf.gfns[j] == ~0)
+				return;
+
+			k = kvm_async_pf_hash(vcpu->async_pf.gfns[j]);
+			/*
+			 * k lies cyclically in ]i,j]
+			 * |    i.k.j |
+			 * |....j i.k.| or  |.k..j i...|
+			 */
+		} while ((i <= j) ? (i < k && k <= j) : (i < k || k <= j));
+
+		vcpu->async_pf.gfns[i] = vcpu->async_pf.gfns[j];
+		i = j;
+	}
+}
+
+bool kvm_async_pf_find_slot(struct kvm_vcpu *vcpu, gfn_t gfn)
+{
+	u32 key = kvm_async_pf_slot(vcpu, gfn);
+
+	return vcpu->async_pf.gfns[key] == gfn;
+}
+#endif /* CONFIG_KVM_ASYNC_PF_SLOT */
+
 int kvm_async_pf_init(void)
 {
 	async_pf_cache = KMEM_CACHE(kvm_async_pf, 0);
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 04/15] KVM: x86: Use generic async PF slot management
  2021-08-15  0:59 [PATCH v4 00/15] Support Asynchronous Page Fault Gavin Shan
                   ` (2 preceding siblings ...)
  2021-08-15  0:59 ` [PATCH v4 03/15] KVM: async_pf: Make GFN slot management generic Gavin Shan
@ 2021-08-15  0:59 ` Gavin Shan
  2021-11-10 17:03   ` Eric Auger
  2021-08-15  0:59 ` [PATCH v4 05/15] KVM: arm64: Export kvm_handle_user_mem_abort() Gavin Shan
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 36+ messages in thread
From: Gavin Shan @ 2021-08-15  0:59 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-kernel, kvm, james.morse, mark.rutland, Jonathan.Cameron,
	will, maz, pbonzini, vkuznets, shan.gavin

This uses the generic slot management mechanism for asynchronous
page fault by enabling CONFIG_KVM_ASYNC_PF_SLOT because the private
implementation is totally duplicate to the generic one.

The changes introduced by this is pretty mechanical and shouldn't
cause any logical changes.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 arch/x86/include/asm/kvm_host.h |  2 -
 arch/x86/kvm/Kconfig            |  1 +
 arch/x86/kvm/mmu/mmu.c          |  2 +-
 arch/x86/kvm/x86.c              | 86 +++------------------------------
 4 files changed, 8 insertions(+), 83 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 974cbfb1eefe..409c1e7137cd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -810,7 +810,6 @@ struct kvm_vcpu_arch {
 
 	struct {
 		bool halted;
-		gfn_t gfns[ASYNC_PF_PER_VCPU];
 		struct gfn_to_hva_cache data;
 		u64 msr_en_val; /* MSR_KVM_ASYNC_PF_EN */
 		u64 msr_int_val; /* MSR_KVM_ASYNC_PF_INT */
@@ -1878,7 +1877,6 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
 			       struct kvm_async_pf *work);
 void kvm_arch_async_page_present_queued(struct kvm_vcpu *vcpu);
 bool kvm_arch_can_dequeue_async_page_present(struct kvm_vcpu *vcpu);
-extern bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
 
 int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu);
 int kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err);
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index ac69894eab88..53a6ef30b6ee 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -32,6 +32,7 @@ config KVM
 	select HAVE_KVM_IRQ_ROUTING
 	select HAVE_KVM_EVENTFD
 	select KVM_ASYNC_PF
+	select KVM_ASYNC_PF_SLOT
 	select USER_RETURN_NOTIFIER
 	select KVM_MMIO
 	select SCHED_INFO
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c4f4fa23320e..cd8aaa662ac2 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3799,7 +3799,7 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
 
 	if (!prefault && kvm_can_do_async_pf(vcpu)) {
 		trace_kvm_try_async_get_page(cr2_or_gpa, gfn);
-		if (kvm_find_async_pf_gfn(vcpu, gfn)) {
+		if (kvm_async_pf_find_slot(vcpu, gfn)) {
 			trace_kvm_async_pf_doublefault(cr2_or_gpa, gfn);
 			kvm_make_request(KVM_REQ_APF_HALT, vcpu);
 			return true;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7f35d9324b99..a5f7d6122178 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -332,13 +332,6 @@ static struct kmem_cache *kvm_alloc_emulator_cache(void)
 
 static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt);
 
-static inline void kvm_async_pf_hash_reset(struct kvm_vcpu *vcpu)
-{
-	int i;
-	for (i = 0; i < ASYNC_PF_PER_VCPU; i++)
-		vcpu->arch.apf.gfns[i] = ~0;
-}
-
 static void kvm_on_user_return(struct user_return_notifier *urn)
 {
 	unsigned slot;
@@ -854,7 +847,7 @@ void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned lon
 {
 	if ((cr0 ^ old_cr0) & X86_CR0_PG) {
 		kvm_clear_async_pf_completion_queue(vcpu);
-		kvm_async_pf_hash_reset(vcpu);
+		kvm_async_pf_reset_slot(vcpu);
 	}
 
 	if ((cr0 ^ old_cr0) & KVM_MMU_CR0_ROLE_BITS)
@@ -3118,7 +3111,7 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
 
 	if (!kvm_pv_async_pf_enabled(vcpu)) {
 		kvm_clear_async_pf_completion_queue(vcpu);
-		kvm_async_pf_hash_reset(vcpu);
+		kvm_async_pf_reset_slot(vcpu);
 		return 0;
 	}
 
@@ -10704,7 +10697,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 
 	vcpu->arch.pat = MSR_IA32_CR_PAT_DEFAULT;
 
-	kvm_async_pf_hash_reset(vcpu);
+	kvm_async_pf_reset_slot(vcpu);
 	kvm_pmu_init(vcpu);
 
 	vcpu->arch.pending_external_vector = -1;
@@ -10828,7 +10821,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 	kvmclock_reset(vcpu);
 
 	kvm_clear_async_pf_completion_queue(vcpu);
-	kvm_async_pf_hash_reset(vcpu);
+	kvm_async_pf_reset_slot(vcpu);
 	vcpu->arch.apf.halted = false;
 
 	if (vcpu->arch.guest_fpu && kvm_mpx_supported()) {
@@ -11737,73 +11730,6 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
 	kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true);
 }
 
-static inline u32 kvm_async_pf_hash_fn(gfn_t gfn)
-{
-	BUILD_BUG_ON(!is_power_of_2(ASYNC_PF_PER_VCPU));
-
-	return hash_32(gfn & 0xffffffff, order_base_2(ASYNC_PF_PER_VCPU));
-}
-
-static inline u32 kvm_async_pf_next_probe(u32 key)
-{
-	return (key + 1) & (ASYNC_PF_PER_VCPU - 1);
-}
-
-static void kvm_add_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
-{
-	u32 key = kvm_async_pf_hash_fn(gfn);
-
-	while (vcpu->arch.apf.gfns[key] != ~0)
-		key = kvm_async_pf_next_probe(key);
-
-	vcpu->arch.apf.gfns[key] = gfn;
-}
-
-static u32 kvm_async_pf_gfn_slot(struct kvm_vcpu *vcpu, gfn_t gfn)
-{
-	int i;
-	u32 key = kvm_async_pf_hash_fn(gfn);
-
-	for (i = 0; i < ASYNC_PF_PER_VCPU &&
-		     (vcpu->arch.apf.gfns[key] != gfn &&
-		      vcpu->arch.apf.gfns[key] != ~0); i++)
-		key = kvm_async_pf_next_probe(key);
-
-	return key;
-}
-
-bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
-{
-	return vcpu->arch.apf.gfns[kvm_async_pf_gfn_slot(vcpu, gfn)] == gfn;
-}
-
-static void kvm_del_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
-{
-	u32 i, j, k;
-
-	i = j = kvm_async_pf_gfn_slot(vcpu, gfn);
-
-	if (WARN_ON_ONCE(vcpu->arch.apf.gfns[i] != gfn))
-		return;
-
-	while (true) {
-		vcpu->arch.apf.gfns[i] = ~0;
-		do {
-			j = kvm_async_pf_next_probe(j);
-			if (vcpu->arch.apf.gfns[j] == ~0)
-				return;
-			k = kvm_async_pf_hash_fn(vcpu->arch.apf.gfns[j]);
-			/*
-			 * k lies cyclically in ]i,j]
-			 * |    i.k.j |
-			 * |....j i.k.| or  |.k..j i...|
-			 */
-		} while ((i <= j) ? (i < k && k <= j) : (i < k || k <= j));
-		vcpu->arch.apf.gfns[i] = vcpu->arch.apf.gfns[j];
-		i = j;
-	}
-}
-
 static inline int apf_put_user_notpresent(struct kvm_vcpu *vcpu)
 {
 	u32 reason = KVM_PV_REASON_PAGE_NOT_PRESENT;
@@ -11867,7 +11793,7 @@ bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
 	struct x86_exception fault;
 
 	trace_kvm_async_pf_not_present(work->arch.token, work->cr2_or_gpa);
-	kvm_add_async_pf_gfn(vcpu, work->arch.gfn);
+	kvm_async_pf_add_slot(vcpu, work->arch.gfn);
 
 	if (kvm_can_deliver_async_pf(vcpu) &&
 	    !apf_put_user_notpresent(vcpu)) {
@@ -11904,7 +11830,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 	if (work->wakeup_all)
 		work->arch.token = ~0; /* broadcast wakeup */
 	else
-		kvm_del_async_pf_gfn(vcpu, work->arch.gfn);
+		kvm_async_pf_remove_slot(vcpu, work->arch.gfn);
 	trace_kvm_async_pf_ready(work->arch.token, work->cr2_or_gpa);
 
 	if ((work->wakeup_all || work->notpresent_injected) &&
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 05/15] KVM: arm64: Export kvm_handle_user_mem_abort()
  2021-08-15  0:59 [PATCH v4 00/15] Support Asynchronous Page Fault Gavin Shan
                   ` (3 preceding siblings ...)
  2021-08-15  0:59 ` [PATCH v4 04/15] KVM: x86: Use generic async PF slot management Gavin Shan
@ 2021-08-15  0:59 ` Gavin Shan
  2021-11-10 18:02   ` Eric Auger
  2021-08-15  0:59 ` [PATCH v4 06/15] KVM: arm64: Add paravirtualization header files Gavin Shan
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 36+ messages in thread
From: Gavin Shan @ 2021-08-15  0:59 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-kernel, kvm, james.morse, mark.rutland, Jonathan.Cameron,
	will, maz, pbonzini, vkuznets, shan.gavin

The main work of stage-2 page fault is handled by user_mem_abort().
When asynchronous page fault is supported, one page fault need to
be handled with two calls to this function. It means the page fault
needs to be replayed asynchronously in that case.

   * This renames the function to kvm_handle_user_mem_abort() and
     exports it.

   * Add arguments @esr and @prefault to user_mem_abort(). @esr is
     the cached value of ESR_EL2 instead of fetching from the current
     vCPU when the page fault is replayed in scenario of asynchronous
     page fault. @prefault is used to indicate the page fault is replayed
     one or not.

   * Define helper functions esr_dbat_*() in asm/esr.h to extract
     or check various fields of the passed ESR_EL2 value because
     those helper functions defined in asm/kvm_emulate.h assumes
     the ESR_EL2 value has been cached in vCPU struct. It won't
     be true on handling the replayed page fault in scenario of
     asynchronous page fault.

   * Some helper functions defined in asm/kvm_emulate.h are used
     by mmu.c only and seem not to be used by other source file
     in near future. They are moved to mmu.c and renamed accordingly.

     is_exec_fault: kvm_vcpu_trap_is_exec_fault
     is_write_fault: kvm_is_write_fault()
     esr_abt_fault_level: kvm_vcpu_trap_get_fault_level

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 arch/arm64/include/asm/esr.h         |  6 ++++
 arch/arm64/include/asm/kvm_emulate.h | 27 ++---------------
 arch/arm64/include/asm/kvm_host.h    |  4 +++
 arch/arm64/kvm/mmu.c                 | 43 ++++++++++++++++++++++------
 4 files changed, 48 insertions(+), 32 deletions(-)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 29f97eb3dad4..0f2cb27691de 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -321,8 +321,14 @@
 					 ESR_ELx_CP15_32_ISS_DIR_READ)
 
 #ifndef __ASSEMBLY__
+#include <linux/bitfield.h>
 #include <asm/types.h>
 
+#define esr_dabt_fault_type(esr)	(esr & ESR_ELx_FSC_TYPE)
+#define esr_dabt_fault_level(esr)	(FIELD_GET(ESR_ELx_FSC_LEVEL, esr))
+#define esr_dabt_is_wnr(esr)		(!!(FIELD_GET(ESR_ELx_WNR, esr)))
+#define esr_dabt_is_s1ptw(esr)		(!!(FIELD_GET(ESR_ELx_S1PTW, esr)))
+
 static inline bool esr_is_data_abort(u32 esr)
 {
 	const u32 ec = ESR_ELx_EC(esr);
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 923b4d08ea9a..90742f4b1acd 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -285,13 +285,13 @@ static __always_inline int kvm_vcpu_dabt_get_rd(const struct kvm_vcpu *vcpu)
 
 static __always_inline bool kvm_vcpu_abt_iss1tw(const struct kvm_vcpu *vcpu)
 {
-	return !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_S1PTW);
+	return esr_dabt_is_s1ptw(kvm_vcpu_get_esr(vcpu));
 }
 
 /* Always check for S1PTW *before* using this. */
 static __always_inline bool kvm_vcpu_dabt_iswrite(const struct kvm_vcpu *vcpu)
 {
-	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_WNR;
+	return esr_dabt_is_wnr(kvm_vcpu_get_esr(vcpu));
 }
 
 static inline bool kvm_vcpu_dabt_is_cm(const struct kvm_vcpu *vcpu)
@@ -320,11 +320,6 @@ static inline bool kvm_vcpu_trap_is_iabt(const struct kvm_vcpu *vcpu)
 	return kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_IABT_LOW;
 }
 
-static inline bool kvm_vcpu_trap_is_exec_fault(const struct kvm_vcpu *vcpu)
-{
-	return kvm_vcpu_trap_is_iabt(vcpu) && !kvm_vcpu_abt_iss1tw(vcpu);
-}
-
 static __always_inline u8 kvm_vcpu_trap_get_fault(const struct kvm_vcpu *vcpu)
 {
 	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC;
@@ -332,12 +327,7 @@ static __always_inline u8 kvm_vcpu_trap_get_fault(const struct kvm_vcpu *vcpu)
 
 static __always_inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vcpu)
 {
-	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_TYPE;
-}
-
-static __always_inline u8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *vcpu)
-{
-	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_LEVEL;
+	return esr_dabt_fault_type(kvm_vcpu_get_esr(vcpu));
 }
 
 static __always_inline bool kvm_vcpu_abt_issea(const struct kvm_vcpu *vcpu)
@@ -365,17 +355,6 @@ static __always_inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu *vcpu)
 	return ESR_ELx_SYS64_ISS_RT(esr);
 }
 
-static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
-{
-	if (kvm_vcpu_abt_iss1tw(vcpu))
-		return true;
-
-	if (kvm_vcpu_trap_is_iabt(vcpu))
-		return false;
-
-	return kvm_vcpu_dabt_iswrite(vcpu);
-}
-
 static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
 {
 	return vcpu_read_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 1824f7e1f9ab..581825b9df77 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -606,6 +606,10 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
 
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 
+int kvm_handle_user_mem_abort(struct kvm_vcpu *vcpu,
+			      struct kvm_memory_slot *memslot,
+			      phys_addr_t fault_ipa, unsigned long hva,
+			      unsigned int esr, bool prefault);
 void kvm_arm_halt_guest(struct kvm *kvm);
 void kvm_arm_resume_guest(struct kvm *kvm);
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 0625bf2353c2..e4038c5e931d 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -892,9 +892,34 @@ static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
 	return 0;
 }
 
-static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
-			  struct kvm_memory_slot *memslot, unsigned long hva,
-			  unsigned long fault_status)
+static inline bool is_exec_fault(unsigned int esr)
+{
+	if (ESR_ELx_EC(esr) != ESR_ELx_EC_IABT_LOW)
+		return false;
+
+	if (esr_dabt_is_s1ptw(esr))
+		return false;
+
+	return true;
+}
+
+static inline bool is_write_fault(unsigned int esr)
+{
+	if (esr_dabt_is_s1ptw(esr))
+		return true;
+
+	if (ESR_ELx_EC(esr) == ESR_ELx_EC_IABT_LOW)
+		return false;
+
+	return esr_dabt_is_wnr(esr);
+}
+
+int kvm_handle_user_mem_abort(struct kvm_vcpu *vcpu,
+			      struct kvm_memory_slot *memslot,
+			      phys_addr_t fault_ipa,
+			      unsigned long hva,
+			      unsigned int esr,
+			      bool prefault)
 {
 	int ret = 0;
 	bool write_fault, writable, force_pte = false;
@@ -909,14 +934,15 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	gfn_t gfn;
 	kvm_pfn_t pfn;
 	bool logging_active = memslot_is_logging(memslot);
-	unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
+	unsigned int fault_status = esr_dabt_fault_type(esr);
+	unsigned long fault_level = esr_dabt_fault_level(esr);
 	unsigned long vma_pagesize, fault_granule;
 	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
 	struct kvm_pgtable *pgt;
 
 	fault_granule = 1UL << ARM64_HW_PGTABLE_LEVEL_SHIFT(fault_level);
-	write_fault = kvm_is_write_fault(vcpu);
-	exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
+	write_fault = is_write_fault(kvm_vcpu_get_esr(vcpu));
+	exec_fault = is_exec_fault(kvm_vcpu_get_esr(vcpu));
 	VM_BUG_ON(write_fault && exec_fault);
 
 	if (fault_status == FSC_PERM && !write_fault && !exec_fault) {
@@ -1176,7 +1202,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 	gfn = fault_ipa >> PAGE_SHIFT;
 	memslot = gfn_to_memslot(vcpu->kvm, gfn);
 	hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable);
-	write_fault = kvm_is_write_fault(vcpu);
+	write_fault = is_write_fault(kvm_vcpu_get_esr(vcpu));
 	if (kvm_is_error_hva(hva) || (write_fault && !writable)) {
 		/*
 		 * The guest has put either its instructions or its page-tables
@@ -1231,7 +1257,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 		goto out_unlock;
 	}
 
-	ret = user_mem_abort(vcpu, fault_ipa, memslot, hva, fault_status);
+	ret = kvm_handle_user_mem_abort(vcpu, memslot, fault_ipa, hva,
+					kvm_vcpu_get_esr(vcpu), false);
 	if (ret == 0)
 		ret = 1;
 out:
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 06/15] KVM: arm64: Add paravirtualization header files
  2021-08-15  0:59 [PATCH v4 00/15] Support Asynchronous Page Fault Gavin Shan
                   ` (4 preceding siblings ...)
  2021-08-15  0:59 ` [PATCH v4 05/15] KVM: arm64: Export kvm_handle_user_mem_abort() Gavin Shan
@ 2021-08-15  0:59 ` Gavin Shan
  2021-11-10 18:06   ` Eric Auger
  2021-08-15  0:59 ` [PATCH v4 07/15] KVM: arm64: Support page-not-present notification Gavin Shan
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 36+ messages in thread
From: Gavin Shan @ 2021-08-15  0:59 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-kernel, kvm, james.morse, mark.rutland, Jonathan.Cameron,
	will, maz, pbonzini, vkuznets, shan.gavin

We need put more stuff in the paravirtualization header files when
the asynchronous page fault is supported. The generic header files
can't meet the goal. This duplicate the generic header files to be
our platform specific header files. It's the preparatory work to
support the asynchronous page fault in the subsequent patches:

   include/uapi/asm-generic/kvm_para.h
   include/asm-generic/kvm_para.h

   arch/arm64/include/uapi/asm/kvm_para.h
   arch/arm64/include/asm/kvm_para.h

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 arch/arm64/include/asm/kvm_para.h      | 27 ++++++++++++++++++++++++++
 arch/arm64/include/uapi/asm/Kbuild     |  2 --
 arch/arm64/include/uapi/asm/kvm_para.h |  5 +++++
 3 files changed, 32 insertions(+), 2 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_para.h
 create mode 100644 arch/arm64/include/uapi/asm/kvm_para.h

diff --git a/arch/arm64/include/asm/kvm_para.h b/arch/arm64/include/asm/kvm_para.h
new file mode 100644
index 000000000000..0ea481dd1c7a
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_para.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_ARM_KVM_PARA_H
+#define _ASM_ARM_KVM_PARA_H
+
+#include <uapi/asm/kvm_para.h>
+
+static inline bool kvm_check_and_clear_guest_paused(void)
+{
+	return false;
+}
+
+static inline unsigned int kvm_arch_para_features(void)
+{
+	return 0;
+}
+
+static inline unsigned int kvm_arch_para_hints(void)
+{
+	return 0;
+}
+
+static inline bool kvm_para_available(void)
+{
+	return false;
+}
+
+#endif /* _ASM_ARM_KVM_PARA_H */
diff --git a/arch/arm64/include/uapi/asm/Kbuild b/arch/arm64/include/uapi/asm/Kbuild
index 602d137932dc..f66554cd5c45 100644
--- a/arch/arm64/include/uapi/asm/Kbuild
+++ b/arch/arm64/include/uapi/asm/Kbuild
@@ -1,3 +1 @@
 # SPDX-License-Identifier: GPL-2.0
-
-generic-y += kvm_para.h
diff --git a/arch/arm64/include/uapi/asm/kvm_para.h b/arch/arm64/include/uapi/asm/kvm_para.h
new file mode 100644
index 000000000000..cd212282b90c
--- /dev/null
+++ b/arch/arm64/include/uapi/asm/kvm_para.h
@@ -0,0 +1,5 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _UAPI_ASM_ARM_KVM_PARA_H
+#define _UAPI_ASM_ARM_KVM_PARA_H
+
+#endif /* _UAPI_ASM_ARM_KVM_PARA_H */
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 07/15] KVM: arm64: Support page-not-present notification
  2021-08-15  0:59 [PATCH v4 00/15] Support Asynchronous Page Fault Gavin Shan
                   ` (5 preceding siblings ...)
  2021-08-15  0:59 ` [PATCH v4 06/15] KVM: arm64: Add paravirtualization header files Gavin Shan
@ 2021-08-15  0:59 ` Gavin Shan
  2021-11-12 15:01   ` Eric Auger
  2021-08-15  0:59 ` [PATCH v4 08/15] KVM: arm64: Support page-ready notification Gavin Shan
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 36+ messages in thread
From: Gavin Shan @ 2021-08-15  0:59 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-kernel, kvm, james.morse, mark.rutland, Jonathan.Cameron,
	will, maz, pbonzini, vkuznets, shan.gavin

The requested page might be not resident in memory during the stage-2
page fault. For example, the requested page could be resident in swap
device (file). In this case, disk I/O is issued in order to fetch the
requested page and it could take tens of milliseconds, even hundreds
of milliseconds in extreme situation. During the period, the guest's
vCPU is suspended until the requested page becomes ready. Actually,
the something else on the guest's vCPU could be rescheduled during
the period, so that the time slice isn't wasted as the guest's vCPU
can see. This is the primary goal of the feature (Asynchronous Page
Fault).

This supports delivery of page-not-present notification through SDEI
event when the requested page isn't present. When the notification is
received on the guest's vCPU, something else (another process) can be
scheduled. The design is highlighted as below:

   * There is dedicated memory region shared by host and guest. It's
     represented by "struct kvm_vcpu_pv_apf_data". The field @reason
     indicates the reason why the SDEI event is triggered, while the
     unique @token is used by guest to associate the event with the
     suspended process.

   * One control block is associated with each guest's vCPU and it's
     represented by "struct kvm_arch_async_pf_control". It allows the
     guest to configure the functionality to indicate the situations
     where the host can deliver the page-not-present notification to
     kick off asyncrhonous page fault. Besides, runtime states are
     also maintained in this struct.

   * Before the page-not-present notification is sent to the guest's
     vCPU, a worker is started and executed asynchronously on host,
     to fetch the requested page. "struct kvm{_,_arch}async_pf" is
     associated with the worker, to track the work.

The feature isn't enabled by CONFIG_KVM_ASYNC_PF yet. Also, the
page-ready notification delivery and control path isn't implemented
and will be done in the subsequent patches.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 arch/arm64/include/asm/kvm_host.h      |  52 +++++++++
 arch/arm64/include/uapi/asm/kvm_para.h |  15 +++
 arch/arm64/kvm/Makefile                |   1 +
 arch/arm64/kvm/arm.c                   |   3 +
 arch/arm64/kvm/async_pf.c              | 145 +++++++++++++++++++++++++
 arch/arm64/kvm/mmu.c                   |  33 +++++-
 6 files changed, 247 insertions(+), 2 deletions(-)
 create mode 100644 arch/arm64/kvm/async_pf.c

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 581825b9df77..6b98aef936b4 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -283,6 +283,31 @@ struct vcpu_reset_state {
 	bool		reset;
 };
 
+/* Should be a power of two number */
+#define ASYNC_PF_PER_VCPU	64
+
+/*
+ * The association of gfn and token. The token will be sent to guest as
+ * page fault address. Also, the guest could be in aarch32 mode. So its
+ * length should be 32-bits.
+ */
+struct kvm_arch_async_pf {
+	u32	token;
+	gfn_t	gfn;
+	u32	esr;
+};
+
+struct kvm_arch_async_pf_control {
+		struct gfn_to_hva_cache	cache;
+		u64			control_block;
+		bool			send_user_only;
+		u64			sdei_event_num;
+
+		u16			id;
+		bool			notpresent_pending;
+		u32			notpresent_token;
+};
+
 struct kvm_vcpu_arch {
 	struct kvm_cpu_context ctxt;
 	void *sve_state;
@@ -346,6 +371,9 @@ struct kvm_vcpu_arch {
 	/* SDEI support */
 	struct kvm_sdei_vcpu *sdei;
 
+	/* Asynchronous page fault support */
+	struct kvm_arch_async_pf_control *apf;
+
 	/*
 	 * Guest registers we preserve during guest debugging.
 	 *
@@ -741,6 +769,30 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
 long kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
 				struct kvm_arm_copy_mte_tags *copy_tags);
 
+#ifdef CONFIG_KVM_ASYNC_PF
+void kvm_arch_async_pf_create_vcpu(struct kvm_vcpu *vcpu);
+bool kvm_arch_async_not_present_allowed(struct kvm_vcpu *vcpu);
+bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu,
+			     u32 esr, gpa_t gpa, gfn_t gfn);
+bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
+				     struct kvm_async_pf *work);
+void kvm_arch_async_pf_destroy_vcpu(struct kvm_vcpu *vcpu);
+#else
+static inline void kvm_arch_async_pf_create_vcpu(struct kvm_vcpu *vcpu) { }
+static inline void kvm_arch_async_pf_destroy_vcpu(struct kvm_vcpu *vcpu) { }
+
+static inline bool kvm_arch_async_not_present_allowed(struct kvm_vcpu *vcpu)
+{
+	return false;
+}
+
+static inline bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu,
+					   u32 esr, gpa_t gpa, gfn_t gfn)
+{
+	return false;
+}
+#endif
+
 /* Guest/host FPSIMD coordination helpers */
 int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu);
 void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/include/uapi/asm/kvm_para.h b/arch/arm64/include/uapi/asm/kvm_para.h
index cd212282b90c..3fa04006714e 100644
--- a/arch/arm64/include/uapi/asm/kvm_para.h
+++ b/arch/arm64/include/uapi/asm/kvm_para.h
@@ -2,4 +2,19 @@
 #ifndef _UAPI_ASM_ARM_KVM_PARA_H
 #define _UAPI_ASM_ARM_KVM_PARA_H
 
+#include <linux/types.h>
+
+/* Async PF */
+#define KVM_ASYNC_PF_ENABLED		(1 << 0)
+#define KVM_ASYNC_PF_SEND_ALWAYS	(1 << 1)
+
+#define KVM_PV_REASON_PAGE_NOT_PRESENT	1
+
+struct kvm_vcpu_pv_apf_data {
+	__u32	reason;
+	__u32	token;
+	__u8	pad[56];
+	__u32	enabled;
+};
+
 #endif /* _UAPI_ASM_ARM_KVM_PARA_H */
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index eefca8ca394d..c9aa307ea542 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -25,3 +25,4 @@ kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o \
 	 vgic/vgic-its.o vgic/vgic-debug.o
 
 kvm-$(CONFIG_HW_PERF_EVENTS)  += pmu-emul.o
+kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o async_pf.o
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 7d9bbc888ae5..af251896b41d 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -342,6 +342,8 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 
 	kvm_sdei_create_vcpu(vcpu);
 
+	kvm_arch_async_pf_create_vcpu(vcpu);
+
 	vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
 
 	err = kvm_vgic_vcpu_init(vcpu);
@@ -363,6 +365,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
 	kvm_timer_vcpu_terminate(vcpu);
 	kvm_pmu_vcpu_destroy(vcpu);
+	kvm_arch_async_pf_destroy_vcpu(vcpu);
 	kvm_sdei_destroy_vcpu(vcpu);
 
 	kvm_arm_vcpu_destroy(vcpu);
diff --git a/arch/arm64/kvm/async_pf.c b/arch/arm64/kvm/async_pf.c
new file mode 100644
index 000000000000..742bb8a0a8c0
--- /dev/null
+++ b/arch/arm64/kvm/async_pf.c
@@ -0,0 +1,145 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Asynchronous page fault support.
+ *
+ * Copyright (C) 2021 Red Hat, Inc.
+ *
+ * Author(s): Gavin Shan <gshan@redhat.com>
+ */
+
+#include <linux/arm-smccc.h>
+#include <linux/kvm_host.h>
+#include <asm/kvm_emulate.h>
+#include <kvm/arm_hypercalls.h>
+#include <kvm/arm_vgic.h>
+#include <asm/kvm_sdei.h>
+
+static inline int read_cache(struct kvm_vcpu *vcpu, u32 offset, u32 *val)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_arch_async_pf_control *apf = vcpu->arch.apf;
+
+	return kvm_read_guest_offset_cached(kvm, &apf->cache,
+					    val, offset, sizeof(*val));
+}
+
+static inline int write_cache(struct kvm_vcpu *vcpu, u32 offset, u32 val)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_arch_async_pf_control *apf = vcpu->arch.apf;
+
+	return kvm_write_guest_offset_cached(kvm, &apf->cache,
+					     &val, offset, sizeof(val));
+}
+
+void kvm_arch_async_pf_create_vcpu(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.apf = kzalloc(sizeof(*(vcpu->arch.apf)), GFP_KERNEL);
+}
+
+bool kvm_arch_async_not_present_allowed(struct kvm_vcpu *vcpu)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_arch_async_pf_control *apf = vcpu->arch.apf;
+	struct kvm_sdei_vcpu *vsdei = vcpu->arch.sdei;
+	u32 reason, token;
+	int ret;
+
+	if (!apf || !(apf->control_block & KVM_ASYNC_PF_ENABLED))
+		return false;
+
+	if (apf->send_user_only && vcpu_mode_priv(vcpu))
+		return false;
+
+	if (!irqchip_in_kernel(vcpu->kvm))
+		return false;
+
+	if (!vsdei || vsdei->critical_event || vsdei->normal_event)
+		return false;
+
+	/* Pending page fault, which isn't acknowledged by guest */
+	ret = read_cache(vcpu, offsetof(struct kvm_vcpu_pv_apf_data, reason),
+			 &reason);
+	if (ret) {
+		kvm_err("%s: Error %d to read reason (%d-%d)\n",
+			__func__, ret, kvm->userspace_pid, vcpu->vcpu_idx);
+		return false;
+	}
+
+	ret = read_cache(vcpu, offsetof(struct kvm_vcpu_pv_apf_data, token),
+			 &token);
+	if (ret) {
+		kvm_err("%s: Error %d to read token %d-%d\n",
+			__func__, ret, kvm->userspace_pid, vcpu->vcpu_idx);
+		return false;
+	}
+
+	if (reason || token)
+		return false;
+
+	return true;
+}
+
+bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu,
+			     u32 esr, gpa_t gpa, gfn_t gfn)
+{
+	struct kvm_arch_async_pf_control *apf = vcpu->arch.apf;
+	struct kvm_arch_async_pf arch;
+	unsigned long hva = kvm_vcpu_gfn_to_hva(vcpu, gfn);
+
+	arch.token = (apf->id++ << 12) | vcpu->vcpu_id;
+	arch.gfn = gfn;
+	arch.esr = esr;
+
+	return kvm_setup_async_pf(vcpu, gpa, hva, &arch);
+}
+
+/*
+ * It's guaranteed that no pending asynchronous page fault when this is
+ * called. It means all previous issued asynchronous page faults have
+ * been acknowledged.
+ */
+bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
+				     struct kvm_async_pf *work)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_arch_async_pf_control *apf = vcpu->arch.apf;
+	int ret;
+
+	kvm_async_pf_add_slot(vcpu, work->arch.gfn);
+
+	ret = write_cache(vcpu, offsetof(struct kvm_vcpu_pv_apf_data, token),
+			  work->arch.token);
+	if (ret) {
+		kvm_err("%s: Error %d to write token (%d-%d %08x)\n",
+			__func__, ret, kvm->userspace_pid,
+			vcpu->vcpu_idx, work->arch.token);
+		goto fail;
+	}
+
+	ret = write_cache(vcpu, offsetof(struct kvm_vcpu_pv_apf_data, reason),
+			  KVM_PV_REASON_PAGE_NOT_PRESENT);
+	if (ret) {
+		kvm_err("%s: Error %d to write reason (%d-%d %08x)\n",
+			__func__, ret, kvm->userspace_pid,
+			vcpu->vcpu_idx, work->arch.token);
+		goto fail;
+	}
+
+	apf->notpresent_pending = true;
+	apf->notpresent_token = work->arch.token;
+
+	return !kvm_sdei_inject(vcpu, apf->sdei_event_num, true);
+
+fail:
+	write_cache(vcpu, offsetof(struct kvm_vcpu_pv_apf_data, token), 0);
+	write_cache(vcpu, offsetof(struct kvm_vcpu_pv_apf_data, reason), 0);
+	kvm_async_pf_remove_slot(vcpu, work->arch.gfn);
+	return false;
+}
+
+void kvm_arch_async_pf_destroy_vcpu(struct kvm_vcpu *vcpu)
+{
+	kfree(vcpu->arch.apf);
+	vcpu->arch.apf = NULL;
+}
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index e4038c5e931d..4ba78bd1f18c 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -914,6 +914,33 @@ static inline bool is_write_fault(unsigned int esr)
 	return esr_dabt_is_wnr(esr);
 }
 
+static bool try_async_pf(struct kvm_vcpu *vcpu, unsigned int esr,
+			 gpa_t gpa, gfn_t gfn, kvm_pfn_t *pfn,
+			 bool write, bool *writable, bool prefault)
+{
+	struct kvm_arch_async_pf_control *apf = vcpu->arch.apf;
+	struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
+	bool async = false;
+
+	if (apf) {
+		/* Bail if *pfn has correct page */
+		*pfn = __gfn_to_pfn_memslot(slot, gfn, false, &async,
+					    write, writable, NULL);
+		if (!async)
+			return false;
+
+		if (!prefault && kvm_arch_async_not_present_allowed(vcpu)) {
+			if (kvm_async_pf_find_slot(vcpu, gfn) ||
+			    kvm_arch_setup_async_pf(vcpu, esr, gpa, gfn))
+				return true;
+		}
+	}
+
+	*pfn = __gfn_to_pfn_memslot(slot, gfn, false, NULL,
+				    write, writable, NULL);
+	return false;
+}
+
 int kvm_handle_user_mem_abort(struct kvm_vcpu *vcpu,
 			      struct kvm_memory_slot *memslot,
 			      phys_addr_t fault_ipa,
@@ -1035,8 +1062,10 @@ int kvm_handle_user_mem_abort(struct kvm_vcpu *vcpu,
 	 */
 	smp_rmb();
 
-	pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
-				   write_fault, &writable, NULL);
+	if (try_async_pf(vcpu, esr, fault_ipa, gfn, &pfn,
+			 write_fault, &writable, prefault))
+		return 1;
+
 	if (pfn == KVM_PFN_ERR_HWPOISON) {
 		kvm_send_hwpoison_signal(hva, vma_shift);
 		return 0;
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 08/15] KVM: arm64: Support page-ready notification
  2021-08-15  0:59 [PATCH v4 00/15] Support Asynchronous Page Fault Gavin Shan
                   ` (6 preceding siblings ...)
  2021-08-15  0:59 ` [PATCH v4 07/15] KVM: arm64: Support page-not-present notification Gavin Shan
@ 2021-08-15  0:59 ` Gavin Shan
  2021-08-15  0:59 ` [PATCH v4 09/15] KVM: arm64: Support async PF hypercalls Gavin Shan
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 36+ messages in thread
From: Gavin Shan @ 2021-08-15  0:59 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-kernel, kvm, james.morse, mark.rutland, Jonathan.Cameron,
	will, maz, pbonzini, vkuznets, shan.gavin

The asynchronous page fault starts with a worker when the requested
page isn't present. The worker makes the requested page present
in the background and the worker, together with the associated
information, is queued to the completion queue after that. The
worker and the completion queue are checked as below.

   * A request (KVM_REQ_ASYNC_PF) is raised if the worker is the
     first one enqueued to the completion queue. With the request,
     the completion queue is checked and the worker is dequeued.
     A PPI is sent to guest as the page-ready notification and
     the guest should acknowledge the interrupt by SMCCC interface.

   * When the notification (PPI) is acknowledged by guest, the
     completion queue is checked again and next worker is dequeued
     if we have one. For this particular worker, another notification
     (PPI) is sent to the guest without raising the request. Once the
     notification (PPI) is acknowledged by the guest, the completion
     queue is checked to process next worker, which has been queued
     to it.

Similar to page-not-present notification, the shared memory region
is used to convey the reason and token associated with the page-ready
notification. The region is represented by "struct kvm_vcpu_pv_apf_data".

The feature isn't enabled by CONFIG_KVM_ASYNC_PF yet. Also, the control
path isn't implemented and will be done in the subsequent patches.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 arch/arm64/include/asm/kvm_host.h      |  15 ++
 arch/arm64/include/uapi/asm/kvm_para.h |   1 +
 arch/arm64/kvm/arm.c                   |  24 ++-
 arch/arm64/kvm/async_pf.c              | 205 +++++++++++++++++++++++++
 arch/arm64/kvm/hypercalls.c            |   5 +
 include/linux/arm-smccc.h              |  10 ++
 6 files changed, 257 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 6b98aef936b4..bec95e263f93 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -48,6 +48,7 @@
 #define KVM_REQ_RELOAD_GICv4	KVM_ARCH_REQ(4)
 #define KVM_REQ_RELOAD_PMU	KVM_ARCH_REQ(5)
 #define KVM_REQ_SDEI		KVM_ARCH_REQ(6)
+#define KVM_REQ_ASYNC_PF	KVM_ARCH_REQ(7)
 
 #define KVM_DIRTY_LOG_MANUAL_CAPS   (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \
 				     KVM_DIRTY_LOG_INITIALLY_SET)
@@ -302,10 +303,12 @@ struct kvm_arch_async_pf_control {
 		u64			control_block;
 		bool			send_user_only;
 		u64			sdei_event_num;
+		u32			irq;
 
 		u16			id;
 		bool			notpresent_pending;
 		u32			notpresent_token;
+		bool			pageready_pending;
 };
 
 struct kvm_vcpu_arch {
@@ -776,6 +779,13 @@ bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu,
 			     u32 esr, gpa_t gpa, gfn_t gfn);
 bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
 				     struct kvm_async_pf *work);
+void kvm_arch_async_page_present_queued(struct kvm_vcpu *vcpu);
+bool kvm_arch_can_dequeue_async_page_present(struct kvm_vcpu *vcpu);
+void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
+			       struct kvm_async_pf *work);
+void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
+				 struct kvm_async_pf *work);
+void kvm_arch_async_pf_hypercall(struct kvm_vcpu *vcpu, u64 *val);
 void kvm_arch_async_pf_destroy_vcpu(struct kvm_vcpu *vcpu);
 #else
 static inline void kvm_arch_async_pf_create_vcpu(struct kvm_vcpu *vcpu) { }
@@ -791,6 +801,11 @@ static inline bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu,
 {
 	return false;
 }
+
+static inline void kvm_arch_async_pf_hypercall(struct kvm_vcpu *vcpu, u64 *val)
+{
+	val[0] = SMCCC_RET_NOT_SUPPORTED;
+}
 #endif
 
 /* Guest/host FPSIMD coordination helpers */
diff --git a/arch/arm64/include/uapi/asm/kvm_para.h b/arch/arm64/include/uapi/asm/kvm_para.h
index 3fa04006714e..162325e2638f 100644
--- a/arch/arm64/include/uapi/asm/kvm_para.h
+++ b/arch/arm64/include/uapi/asm/kvm_para.h
@@ -9,6 +9,7 @@
 #define KVM_ASYNC_PF_SEND_ALWAYS	(1 << 1)
 
 #define KVM_PV_REASON_PAGE_NOT_PRESENT	1
+#define KVM_PV_REASON_PAGE_READY	2
 
 struct kvm_vcpu_pv_apf_data {
 	__u32	reason;
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index af251896b41d..84f11c6b790c 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -503,9 +503,23 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
  */
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 {
+	struct kvm_arch_async_pf_control *apf = v->arch.apf;
 	bool irq_lines = *vcpu_hcr(v) & (HCR_VI | HCR_VF);
-	return ((irq_lines || kvm_vgic_vcpu_pending_irq(v))
-		&& !v->arch.power_off && !v->arch.pause);
+
+	if ((irq_lines || kvm_vgic_vcpu_pending_irq(v)) &&
+	    !v->arch.power_off && !v->arch.pause)
+		return true;
+
+	if (apf && (apf->control_block & KVM_ASYNC_PF_ENABLED)) {
+		if (kvm_check_async_pf_completion_queue(v))
+			return true;
+
+		if (apf->notpresent_pending ||
+		    apf->pageready_pending)
+			return true;
+	}
+
+	return false;
 }
 
 bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu)
@@ -695,6 +709,9 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu)
 		if (kvm_check_request(KVM_REQ_VCPU_RESET, vcpu))
 			kvm_reset_vcpu(vcpu);
 
+		if (kvm_check_request(KVM_REQ_ASYNC_PF, vcpu))
+			kvm_check_async_pf_completion(vcpu);
+
 		if (kvm_check_request(KVM_REQ_SDEI, vcpu))
 			kvm_sdei_deliver(vcpu);
 
@@ -825,7 +842,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		smp_store_mb(vcpu->mode, IN_GUEST_MODE);
 
 		if (ret <= 0 || need_new_vmid_gen(&vcpu->arch.hw_mmu->vmid) ||
-		    kvm_request_pending(vcpu)) {
+		    (kvm_request_pending(vcpu) &&
+		     READ_ONCE(vcpu->requests) != (1UL << KVM_REQ_ASYNC_PF))) {
 			vcpu->mode = OUTSIDE_GUEST_MODE;
 			isb(); /* Ensure work in x_flush_hwstate is committed */
 			kvm_pmu_sync_hwstate(vcpu);
diff --git a/arch/arm64/kvm/async_pf.c b/arch/arm64/kvm/async_pf.c
index 742bb8a0a8c0..0d2393e24ce6 100644
--- a/arch/arm64/kvm/async_pf.c
+++ b/arch/arm64/kvm/async_pf.c
@@ -138,6 +138,211 @@ bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
 	return false;
 }
 
+void kvm_arch_async_page_present_queued(struct kvm_vcpu *vcpu)
+{
+	struct kvm_arch_async_pf_control *apf = vcpu->arch.apf;
+
+	kvm_make_request(KVM_REQ_ASYNC_PF, vcpu);
+	if (apf && !apf->pageready_pending)
+		kvm_vcpu_kick(vcpu);
+}
+
+bool kvm_arch_can_dequeue_async_page_present(struct kvm_vcpu *vcpu)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_arch_async_pf_control *apf = vcpu->arch.apf;
+	struct kvm_async_pf *work;
+	u32 reason, token;
+	int ret;
+
+	if (!apf || !(apf->control_block & KVM_ASYNC_PF_ENABLED))
+		return true;
+
+	if (apf->pageready_pending)
+		goto fail;
+
+	ret = read_cache(vcpu, offsetof(struct kvm_vcpu_pv_apf_data, reason),
+			 &reason);
+	if (ret) {
+		kvm_err("%s: Error %d to read reason (%d-%d)\n",
+			__func__, ret, kvm->userspace_pid, vcpu->vcpu_idx);
+		goto fail;
+	}
+
+	ret = read_cache(vcpu, offsetof(struct kvm_vcpu_pv_apf_data, token),
+			 &token);
+	if (ret) {
+		kvm_err("%s: Error %d to read token (%d-%d)\n",
+			__func__, ret, kvm->userspace_pid, vcpu->vcpu_idx);
+		goto fail;
+	}
+
+	/*
+	 * There might be pending page-not-present notification (SDEI)
+	 * to be delivered. However, the corresponding work has been
+	 * completed. For this case, we need to cancel the notification
+	 * early to avoid the overhead because of the injected SDEI
+	 * and interrupt.
+	 */
+	if (apf->notpresent_pending) {
+		spin_lock(&vcpu->async_pf.lock);
+		work = list_first_entry_or_null(&vcpu->async_pf.done,
+						typeof(*work), link);
+		spin_unlock(&vcpu->async_pf.lock);
+		if (!work)
+			goto fail;
+
+		if (reason == KVM_PV_REASON_PAGE_NOT_PRESENT &&
+		    work->arch.token == apf->notpresent_token &&
+		    token == apf->notpresent_token) {
+			kvm_make_request(KVM_REQ_ASYNC_PF, vcpu);
+			return true;
+		}
+	}
+
+	if (reason || token)
+		goto fail;
+
+	return true;
+
+fail:
+	kvm_make_request(KVM_REQ_ASYNC_PF, vcpu);
+	return false;
+}
+
+void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
+			       struct kvm_async_pf *work)
+{
+	struct kvm_memory_slot *memslot;
+	unsigned int esr = work->arch.esr;
+	phys_addr_t gpa = work->cr2_or_gpa;
+	gfn_t gfn = gpa >> PAGE_SHIFT;
+	unsigned long hva;
+	bool write_fault, writable;
+	int idx;
+
+	/*
+	 * We shouldn't issue prefault for special work to wake up
+	 * all pending tasks because the associated token (address)
+	 * is invalid.
+	 */
+	if (work->wakeup_all)
+		return;
+
+	/*
+	 * The gpa was validated before the work is started. However, the
+	 * memory slots might be changed since then. So we need to redo the
+	 * validatation here.
+	 */
+	idx = srcu_read_lock(&vcpu->kvm->srcu);
+
+	if (esr_dabt_is_s1ptw(esr))
+		write_fault = true;
+	else if (ESR_ELx_EC(esr) == ESR_ELx_EC_IABT_LOW)
+		write_fault = false;
+	else
+		write_fault = esr_dabt_is_wnr(esr);
+
+	memslot = gfn_to_memslot(vcpu->kvm, gfn);
+	hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable);
+	if (kvm_is_error_hva(hva) || (write_fault && !writable))
+		goto out;
+
+	kvm_handle_user_mem_abort(vcpu, memslot, gpa, hva, esr, true);
+
+out:
+	srcu_read_unlock(&vcpu->kvm->srcu, idx);
+}
+
+/*
+ * It's guaranteed that no pending asynchronous page fault when this is
+ * called. It means all previous issued asynchronous page faults have
+ * been acknowledged.
+ */
+void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
+				 struct kvm_async_pf *work)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_arch_async_pf_control *apf = vcpu->arch.apf;
+	int ret;
+
+	/*
+	 * The work could be completed prior to page-not-present notification
+	 * delivery. In this case, what we need to do is just to cancel the
+	 * page-not-present notification to avoid unnecessary overhead.
+	 */
+	if (work->wakeup_all) {
+		work->arch.token = ~0;
+	} else {
+		kvm_async_pf_remove_slot(vcpu, work->arch.gfn);
+
+		if (apf->notpresent_pending &&
+		    apf->notpresent_token == work->arch.token &&
+		    !kvm_sdei_cancel(vcpu, apf->sdei_event_num)) {
+			apf->notpresent_pending = false;
+			apf->notpresent_token = 0;
+			goto done;
+		}
+	}
+
+	ret = write_cache(vcpu, offsetof(struct kvm_vcpu_pv_apf_data, token),
+			  work->arch.token);
+	if (ret) {
+		kvm_err("%s: Error %d to write token (%d-%d %08x)\n",
+			__func__, ret, kvm->userspace_pid,
+			vcpu->vcpu_idx, work->arch.token);
+		goto done;
+	}
+
+	ret = write_cache(vcpu, offsetof(struct kvm_vcpu_pv_apf_data, reason),
+			  KVM_PV_REASON_PAGE_READY);
+	if (ret) {
+		kvm_err("%s: Error %d to write reason (%d-%d %08x)\n",
+			__func__, ret, kvm->userspace_pid,
+			vcpu->vcpu_idx, work->arch.token);
+		goto done;
+	}
+
+	apf->pageready_pending = true;
+	kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_idx,
+			    apf->irq, true, NULL);
+	return;
+
+done:
+	write_cache(vcpu, offsetof(struct kvm_vcpu_pv_apf_data, reason), 0);
+	write_cache(vcpu, offsetof(struct kvm_vcpu_pv_apf_data, token), 0);
+}
+
+void kvm_arch_async_pf_hypercall(struct kvm_vcpu *vcpu, u64 *val)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_arch_async_pf_control *apf = vcpu->arch.apf;
+	u32 func;
+	long ret = SMCCC_RET_SUCCESS;
+
+	if (!apf) {
+		val[0] = SMCCC_RET_NOT_SUPPORTED;
+		return;
+	}
+
+	func = smccc_get_arg1(vcpu);
+	switch (func) {
+	case ARM_SMCCC_KVM_FUNC_ASYNC_PF_IRQ_ACK:
+		if (!apf->pageready_pending)
+			break;
+
+		kvm_vgic_inject_irq(kvm, vcpu->vcpu_idx,
+				    apf->irq, false, NULL);
+		apf->pageready_pending = false;
+		kvm_check_async_pf_completion(vcpu);
+		break;
+	default:
+		ret = SMCCC_RET_NOT_SUPPORTED;
+	}
+
+	val[0] = ret;
+}
+
 void kvm_arch_async_pf_destroy_vcpu(struct kvm_vcpu *vcpu)
 {
 	kfree(vcpu->arch.apf);
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index d3fc893a4f58..bf423cb27280 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -129,10 +129,15 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 	case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
 		val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
 		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_PTP);
+		if (vcpu->arch.apf)
+			val[0] |= BIT(ARM_SMCCC_KVM_FUNC_ASYNC_PF);
 		break;
 	case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
 		kvm_ptp_get_time(vcpu, val);
 		break;
+	case ARM_SMCCC_VENDOR_HYP_KVM_ASYNC_PF_FUNC_ID:
+		kvm_arch_async_pf_hypercall(vcpu, val);
+		break;
 	case ARM_SMCCC_TRNG_VERSION:
 	case ARM_SMCCC_TRNG_FEATURES:
 	case ARM_SMCCC_TRNG_GET_UUID:
diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
index 7d1cabe15262..e7d8ade1b3dd 100644
--- a/include/linux/arm-smccc.h
+++ b/include/linux/arm-smccc.h
@@ -107,6 +107,7 @@
 /* KVM "vendor specific" services */
 #define ARM_SMCCC_KVM_FUNC_FEATURES		0
 #define ARM_SMCCC_KVM_FUNC_PTP			1
+#define ARM_SMCCC_KVM_FUNC_ASYNC_PF		2
 #define ARM_SMCCC_KVM_FUNC_FEATURES_2		127
 #define ARM_SMCCC_KVM_NUM_FUNCS			128
 
@@ -133,6 +134,15 @@
 #define KVM_PTP_VIRT_COUNTER			0
 #define KVM_PTP_PHYS_COUNTER			1
 
+/* Asynchronous page fault service */
+#define ARM_SMCCC_KVM_FUNC_ASYNC_PF_IRQ_ACK	5
+
+#define ARM_SMCCC_VENDOR_HYP_KVM_ASYNC_PF_FUNC_ID		\
+	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,			\
+			   ARM_SMCCC_SMC_32,			\
+			   ARM_SMCCC_OWNER_VENDOR_HYP,		\
+			   ARM_SMCCC_KVM_FUNC_ASYNC_PF)
+
 /* Paravirtualised time calls (defined by ARM DEN0057A) */
 #define ARM_SMCCC_HV_PV_TIME_FEATURES				\
 	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,			\
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 09/15] KVM: arm64: Support async PF hypercalls
  2021-08-15  0:59 [PATCH v4 00/15] Support Asynchronous Page Fault Gavin Shan
                   ` (7 preceding siblings ...)
  2021-08-15  0:59 ` [PATCH v4 08/15] KVM: arm64: Support page-ready notification Gavin Shan
@ 2021-08-15  0:59 ` Gavin Shan
  2021-08-15  0:59 ` [PATCH v4 10/15] KVM: arm64: Support async PF ioctl commands Gavin Shan
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 36+ messages in thread
From: Gavin Shan @ 2021-08-15  0:59 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-kernel, kvm, james.morse, mark.rutland, Jonathan.Cameron,
	will, maz, pbonzini, vkuznets, shan.gavin

This introduces (SMCCC) KVM vendor specific services to configure
the asynchronous page fault functionality. The following services
are introduced:

   * ARM_SMCCC_KVM_FUNC_ASYNC_PF_VERSION
     Returns the version, which can be used to identify ABI changes
     in the future.
   * ARM_SMCCC_KVM_FUNC_ASYNC_PF_SLOTS
     Return maximal number of tokens that current vCPU can have.
     It's used by guest to allocate the required resources.
   * ARM_SMCCC_KVM_FUNC_ASYNC_PF_{SDEI, IRQ}
     Return the associated SDEI or (PPI) IRQ number, configured by
     vCPU ioctl command.
   * ARM_SMCCC_KVM_FUNC_ASYNC_PF_ENABLE
     Enable or disable asynchronous page fault on current vCPU.

The corresponding SDEI event and (PPI) IRQ are owned by VMM. So they
are configured by vCPU ioctl interface and it will be implemented when
the asynchronous page fault capability is exported in the subsequent
patches.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 arch/arm64/kvm/async_pf.c | 119 ++++++++++++++++++++++++++++++++++++++
 include/linux/arm-smccc.h |   5 ++
 2 files changed, 124 insertions(+)

diff --git a/arch/arm64/kvm/async_pf.c b/arch/arm64/kvm/async_pf.c
index 0d2393e24ce6..3bc69a631996 100644
--- a/arch/arm64/kvm/async_pf.c
+++ b/arch/arm64/kvm/async_pf.c
@@ -313,11 +313,114 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 	write_cache(vcpu, offsetof(struct kvm_vcpu_pv_apf_data, token), 0);
 }
 
+static void kvm_arch_async_sdei_notifier(struct kvm_vcpu *vcpu,
+					 unsigned long num,
+					 unsigned int state)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_arch_async_pf_control *apf = vcpu->arch.apf;
+
+	if (!apf)
+		return;
+
+	if (num != apf->sdei_event_num) {
+		kvm_err("%s: Invalid event number (%d-%d %lx-%llx)\n",
+			__func__, kvm->userspace_pid, vcpu->vcpu_idx,
+			num, apf->sdei_event_num);
+		return;
+	}
+
+	switch (state) {
+	case KVM_SDEI_NOTIFY_DELIVERED:
+		if (!apf->notpresent_pending)
+			break;
+
+		apf->notpresent_token = 0;
+		apf->notpresent_pending = false;
+		break;
+	case KVM_SDEI_NOTIFY_COMPLETED:
+		break;
+	default:
+		kvm_err("%s: Invalid state (%d-%d %lx-%d)\n",
+			__func__, kvm->userspace_pid, vcpu->vcpu_idx,
+			num, state);
+	}
+}
+
+static long kvm_arch_async_enable(struct kvm_vcpu *vcpu, u64 data)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_arch_async_pf_control *apf = vcpu->arch.apf;
+	gpa_t gpa = (data & ~0x3FUL);
+	bool enabled, enable;
+	int ret;
+
+	if (!apf || !irqchip_in_kernel(kvm))
+		return SMCCC_RET_NOT_SUPPORTED;
+
+	/* Bail if the state transition isn't allowed */
+	enabled = !!(apf->control_block & KVM_ASYNC_PF_ENABLED);
+	enable = !!(data & KVM_ASYNC_PF_ENABLED);
+	if (enable == enabled) {
+		kvm_debug("%s: Async PF has been %s on (%d-%d %llx-%llx)\n",
+			  __func__, enabled ? "enabled" : "disabled",
+			  kvm->userspace_pid, vcpu->vcpu_idx,
+			  apf->control_block, data);
+		return SMCCC_RET_NOT_REQUIRED;
+	}
+
+	/* To disable the functinality */
+	if (!enable) {
+		kvm_clear_async_pf_completion_queue(vcpu);
+		apf->control_block = data;
+		return SMCCC_RET_SUCCESS;
+	}
+
+	/*
+	 * The SDEI event and IRQ number should have been given
+	 * prior to enablement.
+	 */
+	if (!apf->sdei_event_num || !apf->irq) {
+		kvm_err("%s: Invalid SDEI event or IRQ (%d-%d %llx-%d)\n",
+			__func__, kvm->userspace_pid, vcpu->vcpu_idx,
+			apf->sdei_event_num, apf->irq);
+		return SMCCC_RET_INVALID_PARAMETER;
+	}
+
+	/* Register SDEI event notifier */
+	ret = kvm_sdei_register_notifier(kvm, apf->sdei_event_num,
+					 kvm_arch_async_sdei_notifier);
+	if (ret) {
+		kvm_err("%s: Error %d registering SDEI notifier (%d-%d %llx)\n",
+			__func__, ret, kvm->userspace_pid, vcpu->vcpu_idx,
+			apf->sdei_event_num);
+		return SMCCC_RET_NOT_SUPPORTED;
+	}
+
+	/* Initialize cache shared by host and guest */
+	ret = kvm_gfn_to_hva_cache_init(kvm, &apf->cache, gpa,
+			offsetofend(struct kvm_vcpu_pv_apf_data, token));
+	if (ret) {
+		kvm_err("%s: Error %d initializing cache (%d-%d)\n",
+			__func__, ret, kvm->userspace_pid, vcpu->vcpu_idx);
+		return SMCCC_RET_NOT_SUPPORTED;
+	}
+
+	/* Flush the token table */
+	kvm_async_pf_reset_slot(vcpu);
+	apf->send_user_only = !(data & KVM_ASYNC_PF_SEND_ALWAYS);
+	kvm_async_pf_wakeup_all(vcpu);
+	apf->control_block = data;
+
+	return SMCCC_RET_SUCCESS;
+}
+
 void kvm_arch_async_pf_hypercall(struct kvm_vcpu *vcpu, u64 *val)
 {
 	struct kvm *kvm = vcpu->kvm;
 	struct kvm_arch_async_pf_control *apf = vcpu->arch.apf;
 	u32 func;
+	u64 data;
 	long ret = SMCCC_RET_SUCCESS;
 
 	if (!apf) {
@@ -327,6 +430,22 @@ void kvm_arch_async_pf_hypercall(struct kvm_vcpu *vcpu, u64 *val)
 
 	func = smccc_get_arg1(vcpu);
 	switch (func) {
+	case ARM_SMCCC_KVM_FUNC_ASYNC_PF_VERSION:
+		val[1] = 0x010000; /* v1.0.0 */
+		break;
+	case ARM_SMCCC_KVM_FUNC_ASYNC_PF_SLOTS:
+		val[1] = ASYNC_PF_PER_VCPU;
+		break;
+	case ARM_SMCCC_KVM_FUNC_ASYNC_PF_SDEI:
+		val[1] = apf->sdei_event_num;
+		break;
+	case ARM_SMCCC_KVM_FUNC_ASYNC_PF_IRQ:
+		val[1] = apf->irq;
+		break;
+	case ARM_SMCCC_KVM_FUNC_ASYNC_PF_ENABLE:
+		data = (smccc_get_arg3(vcpu) << 32) | smccc_get_arg2(vcpu);
+		ret = kvm_arch_async_enable(vcpu, data);
+		break;
 	case ARM_SMCCC_KVM_FUNC_ASYNC_PF_IRQ_ACK:
 		if (!apf->pageready_pending)
 			break;
diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
index e7d8ade1b3dd..979424f620d5 100644
--- a/include/linux/arm-smccc.h
+++ b/include/linux/arm-smccc.h
@@ -135,6 +135,11 @@
 #define KVM_PTP_PHYS_COUNTER			1
 
 /* Asynchronous page fault service */
+#define ARM_SMCCC_KVM_FUNC_ASYNC_PF_VERSION	0
+#define ARM_SMCCC_KVM_FUNC_ASYNC_PF_SLOTS	1
+#define ARM_SMCCC_KVM_FUNC_ASYNC_PF_SDEI	2
+#define ARM_SMCCC_KVM_FUNC_ASYNC_PF_IRQ		3
+#define ARM_SMCCC_KVM_FUNC_ASYNC_PF_ENABLE	4
 #define ARM_SMCCC_KVM_FUNC_ASYNC_PF_IRQ_ACK	5
 
 #define ARM_SMCCC_VENDOR_HYP_KVM_ASYNC_PF_FUNC_ID		\
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 10/15] KVM: arm64: Support async PF ioctl commands
  2021-08-15  0:59 [PATCH v4 00/15] Support Asynchronous Page Fault Gavin Shan
                   ` (8 preceding siblings ...)
  2021-08-15  0:59 ` [PATCH v4 09/15] KVM: arm64: Support async PF hypercalls Gavin Shan
@ 2021-08-15  0:59 ` Gavin Shan
  2021-08-15  0:59 ` [PATCH v4 11/15] KVM: arm64: Export async PF capability Gavin Shan
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 36+ messages in thread
From: Gavin Shan @ 2021-08-15  0:59 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-kernel, kvm, james.morse, mark.rutland, Jonathan.Cameron,
	will, maz, pbonzini, vkuznets, shan.gavin

This supports ioctl commands for configuration and migration:

   KVM_ARM_ASYNC_PF_CMD_GET_VERSION
      Return implementation version
   KVM_ARM_ASYNC_PF_CMD_GET_SDEI
      Return SDEI event number used for page-not-present notification
   KVM_ARM_ASYNC_PF_CMD_GET_IRQ
      Return IRQ number used for page-ready notification
   KVM_ARM_ASYNC_PF_CMD_GET_CONTROL
      Get control block when VM is migrated
   KVM_ARM_ASYNC_PF_CMD_SET_SDEI
      Set SDEI event number when VM is started or migrated
   KVM_ARM_ASYNC_PF_CMD_SET_IRQ
      Set IRQ number during when VM is started or migrated
   KVM_ARM_ASYNC_PF_CMD_SET_CONTROL
      Set control block when VM is migrated

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 arch/arm64/include/asm/kvm_host.h | 14 +++++++
 arch/arm64/include/uapi/asm/kvm.h | 19 +++++++++
 arch/arm64/kvm/arm.c              |  6 +++
 arch/arm64/kvm/async_pf.c         | 64 +++++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h          |  3 ++
 5 files changed, 106 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index bec95e263f93..8c91a5599081 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -786,6 +786,8 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
 void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
 				 struct kvm_async_pf *work);
 void kvm_arch_async_pf_hypercall(struct kvm_vcpu *vcpu, u64 *val);
+long kvm_arch_async_pf_vm_ioctl(struct kvm *kvm, unsigned long arg);
+long kvm_arch_async_pf_vcpu_ioctl(struct kvm_vcpu *vcpu, unsigned long arg);
 void kvm_arch_async_pf_destroy_vcpu(struct kvm_vcpu *vcpu);
 #else
 static inline void kvm_arch_async_pf_create_vcpu(struct kvm_vcpu *vcpu) { }
@@ -806,6 +808,18 @@ static inline void kvm_arch_async_pf_hypercall(struct kvm_vcpu *vcpu, u64 *val)
 {
 	val[0] = SMCCC_RET_NOT_SUPPORTED;
 }
+
+static inline long kvm_arch_async_pf_vm_ioctl(struct kvm *kvm,
+					      unsigned long arg)
+{
+	return -EPERM;
+}
+
+static inline long kvm_arch_async_pf_vcpu_ioctl(struct kvm_vcpu *vcpu,
+						unsigned long arg)
+{
+	return -EPERM;
+}
 #endif
 
 /* Guest/host FPSIMD coordination helpers */
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index e1b200bb6482..068d1e0c4e5b 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -414,6 +414,25 @@ struct kvm_arm_copy_mte_tags {
 #define KVM_PSCI_RET_INVAL		PSCI_RET_INVALID_PARAMS
 #define KVM_PSCI_RET_DENIED		PSCI_RET_DENIED
 
+/* Asynchronous page fault */
+#define KVM_ARM_ASYNC_PF_CMD_GET_VERSION	0
+#define KVM_ARM_ASYNC_PF_CMD_GET_SDEI		1
+#define KVM_ARM_ASYNC_PF_CMD_GET_IRQ		2
+#define KVM_ARM_ASYNC_PF_CMD_GET_CONTROL	3
+#define KVM_ARM_ASYNC_PF_CMD_SET_SDEI		4
+#define KVM_ARM_ASYNC_PF_CMD_SET_IRQ		5
+#define KVM_ARM_ASYNC_PF_CMD_SET_CONTROL	6
+
+struct kvm_arm_async_pf_cmd {
+	__u32		cmd;
+	union {
+		__u32	version;
+		__u64	sdei;
+		__u32	irq;
+		__u64	control;
+	};
+};
+
 #endif
 
 #endif /* __ARM_KVM_H__ */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 84f11c6b790c..74ca5ec51e53 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1335,6 +1335,9 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 	case KVM_ARM_SDEI_COMMAND: {
 		return kvm_sdei_vcpu_ioctl(vcpu, arg);
 	}
+	case KVM_ARM_ASYNC_PF_COMMAND: {
+		return kvm_arch_async_pf_vcpu_ioctl(vcpu, arg);
+	}
 	default:
 		r = -EINVAL;
 	}
@@ -1419,6 +1422,9 @@ long kvm_arch_vm_ioctl(struct file *filp,
 	case KVM_ARM_SDEI_COMMAND: {
 		return kvm_sdei_vm_ioctl(kvm, arg);
 	}
+	case KVM_ARM_ASYNC_PF_COMMAND: {
+		return kvm_arch_async_pf_vm_ioctl(kvm, arg);
+	}
 	default:
 		return -EINVAL;
 	}
diff --git a/arch/arm64/kvm/async_pf.c b/arch/arm64/kvm/async_pf.c
index 3bc69a631996..3aaed516540f 100644
--- a/arch/arm64/kvm/async_pf.c
+++ b/arch/arm64/kvm/async_pf.c
@@ -462,6 +462,70 @@ void kvm_arch_async_pf_hypercall(struct kvm_vcpu *vcpu, u64 *val)
 	val[0] = ret;
 }
 
+long kvm_arch_async_pf_vm_ioctl(struct kvm *kvm, unsigned long arg)
+{
+	struct kvm_arm_async_pf_cmd cmd;
+	unsigned int version = 0x010000; /* v1.0.0 */
+	void __user *argp = (void __user *)arg;
+
+	if (copy_from_user(&cmd, argp, sizeof(cmd)))
+		return -EFAULT;
+
+	if (cmd.cmd != KVM_ARM_ASYNC_PF_CMD_GET_VERSION)
+		return -EINVAL;
+
+	cmd.version = version;
+	if (copy_to_user(argp, &cmd, sizeof(cmd)))
+		return -EFAULT;
+
+	return 0;
+}
+
+long kvm_arch_async_pf_vcpu_ioctl(struct kvm_vcpu *vcpu, unsigned long arg)
+{
+	struct kvm_arch_async_pf_control *apf = vcpu->arch.apf;
+	struct kvm_arm_async_pf_cmd cmd;
+	void __user *argp = (void __user *)arg;
+	long ret = 0;
+
+	if (!apf)
+		return -EPERM;
+
+	if (copy_from_user(&cmd, argp, sizeof(cmd)))
+		return -EFAULT;
+
+	switch (cmd.cmd) {
+	case KVM_ARM_ASYNC_PF_CMD_GET_SDEI:
+		cmd.sdei = apf->sdei_event_num;
+		break;
+	case KVM_ARM_ASYNC_PF_CMD_GET_IRQ:
+		cmd.irq = apf->irq;
+		break;
+	case KVM_ARM_ASYNC_PF_CMD_GET_CONTROL:
+		cmd.control = apf->control_block;
+		break;
+	case KVM_ARM_ASYNC_PF_CMD_SET_SDEI:
+		apf->sdei_event_num = cmd.sdei;
+		break;
+	case KVM_ARM_ASYNC_PF_CMD_SET_IRQ:
+		apf->irq = cmd.irq;
+		break;
+	case KVM_ARM_ASYNC_PF_CMD_SET_CONTROL:
+		if (kvm_arch_async_enable(vcpu, cmd.control) !=
+		    SMCCC_RET_SUCCESS)
+			ret = -EIO;
+
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	if (!ret && copy_to_user(argp, &cmd, sizeof(cmd)))
+		ret = -EFAULT;
+
+	return ret;
+}
+
 void kvm_arch_async_pf_destroy_vcpu(struct kvm_vcpu *vcpu)
 {
 	kfree(vcpu->arch.apf);
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 2aa748fd89c7..bb058bf73840 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1683,6 +1683,9 @@ struct kvm_xen_vcpu_attr {
 /* Available with KVM_CAP_ARM_SDEI */
 #define KVM_ARM_SDEI_COMMAND	_IOWR(KVMIO, 0xce, struct kvm_sdei_cmd)
 
+/* Available with KVM_CAP_ASYNC_PF or KVM_CAP_ASYNC_PF_INT */
+#define KVM_ARM_ASYNC_PF_COMMAND _IOWR(KVMIO, 0xcf, struct kvm_arm_async_pf_cmd)
+
 /* Secure Encrypted Virtualization command */
 enum sev_cmd_id {
 	/* Guest initialization commands */
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 11/15] KVM: arm64: Export async PF capability
  2021-08-15  0:59 [PATCH v4 00/15] Support Asynchronous Page Fault Gavin Shan
                   ` (9 preceding siblings ...)
  2021-08-15  0:59 ` [PATCH v4 10/15] KVM: arm64: Support async PF ioctl commands Gavin Shan
@ 2021-08-15  0:59 ` Gavin Shan
  2021-08-15  0:59 ` [PATCH v4 12/15] arm64: Detect async PF para-virtualization feature Gavin Shan
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 36+ messages in thread
From: Gavin Shan @ 2021-08-15  0:59 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-kernel, kvm, james.morse, mark.rutland, Jonathan.Cameron,
	will, maz, pbonzini, vkuznets, shan.gavin

This exports the asynchronous page fault capability:

    * Identify capability KVM_CAP_ASYNC_{PF, PF_INT}.

    * Standardize SDEI event for asynchronous page fault.

    * Enable kernel config CONFIG_KVM_ASYNC_{PF, PF_SLOT}.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 arch/arm64/include/uapi/asm/kvm_sdei.h | 1 +
 arch/arm64/kvm/Kconfig                 | 2 ++
 arch/arm64/kvm/arm.c                   | 4 ++++
 arch/arm64/kvm/sdei.c                  | 5 +++++
 4 files changed, 12 insertions(+)

diff --git a/arch/arm64/include/uapi/asm/kvm_sdei.h b/arch/arm64/include/uapi/asm/kvm_sdei.h
index f7a6b2b22b50..cbe8be3d0a25 100644
--- a/arch/arm64/include/uapi/asm/kvm_sdei.h
+++ b/arch/arm64/include/uapi/asm/kvm_sdei.h
@@ -16,6 +16,7 @@
 #define KVM_SDEI_MAX_VCPUS	512
 #define KVM_SDEI_INVALID_NUM	0
 #define KVM_SDEI_DEFAULT_NUM	0x40400000
+#define KVM_SDEI_ASYNC_PF_NUM	0x40400001
 
 struct kvm_sdei_event_state {
 	__u64	num;
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index a4eba0908bfa..3c6f89b4c9a0 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -29,6 +29,8 @@ menuconfig KVM
 	select SRCU
 	select KVM_VFIO
 	select HAVE_KVM_EVENTFD
+	select KVM_ASYNC_PF
+	select KVM_ASYNC_PF_SLOT
 	select HAVE_KVM_IRQFD
 	select HAVE_KVM_MSI
 	select HAVE_KVM_IRQCHIP
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 74ca5ec51e53..2692bd24df86 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -281,6 +281,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_ARM_SDEI:
 		r = 1;
 		break;
+	case KVM_CAP_ASYNC_PF:
+	case KVM_CAP_ASYNC_PF_INT:
+		r = IS_ENABLED(CONFIG_KVM_ASYNC_PF) ? 1 : 0;
+		break;
 	default:
 		r = 0;
 	}
diff --git a/arch/arm64/kvm/sdei.c b/arch/arm64/kvm/sdei.c
index 4f5a582daa97..437303bfafba 100644
--- a/arch/arm64/kvm/sdei.c
+++ b/arch/arm64/kvm/sdei.c
@@ -19,6 +19,11 @@ static struct kvm_sdei_event_state defined_kse[] = {
 	  1,
 	  SDEI_EVENT_PRIORITY_CRITICAL
 	},
+	{ KVM_SDEI_ASYNC_PF_NUM,
+	  SDEI_EVENT_TYPE_PRIVATE,
+	  1,
+	  SDEI_EVENT_PRIORITY_CRITICAL
+	},
 };
 
 static struct kvm_sdei_event *kvm_sdei_find_event(struct kvm *kvm,
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 12/15] arm64: Detect async PF para-virtualization feature
  2021-08-15  0:59 [PATCH v4 00/15] Support Asynchronous Page Fault Gavin Shan
                   ` (10 preceding siblings ...)
  2021-08-15  0:59 ` [PATCH v4 11/15] KVM: arm64: Export async PF capability Gavin Shan
@ 2021-08-15  0:59 ` Gavin Shan
  2021-08-15  0:59 ` [PATCH v4 13/15] arm64: Reschedule process on aync PF Gavin Shan
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 36+ messages in thread
From: Gavin Shan @ 2021-08-15  0:59 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-kernel, kvm, james.morse, mark.rutland, Jonathan.Cameron,
	will, maz, pbonzini, vkuznets, shan.gavin

This implements kvm_para_available() to check if para-virtualization
features are available or not. Besides, kvm_para_has_feature() is
enhanced to detect the asynchronous page fault para-virtualization
feature. These two functions are going to be used by guest kernel
to enable the asynchronous page fault.

This also adds kernel option (CONFIG_KVM_GUEST), which is the umbrella
for the optimizations related to KVM para-virtualization.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 arch/arm64/Kconfig                     | 11 +++++++++++
 arch/arm64/include/asm/kvm_para.h      | 12 +++++++++++-
 arch/arm64/include/uapi/asm/kvm_para.h |  2 ++
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index fdcd54d39c1e..6dceae6ed7d3 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1081,6 +1081,17 @@ config PARAVIRT_TIME_ACCOUNTING
 
 	  If in doubt, say N here.
 
+config KVM_GUEST
+	bool "KVM Guest Support"
+	depends on PARAVIRT
+	default y
+	help
+	  This option enables various optimizations for running under the KVM
+	  hypervisor. Overhead for the kernel when not running inside KVM should
+	  be minimal.
+
+	  In case of doubt, say Y
+
 config KEXEC
 	depends on PM_SLEEP_SMP
 	select KEXEC_CORE
diff --git a/arch/arm64/include/asm/kvm_para.h b/arch/arm64/include/asm/kvm_para.h
index 0ea481dd1c7a..8f39c60a6619 100644
--- a/arch/arm64/include/asm/kvm_para.h
+++ b/arch/arm64/include/asm/kvm_para.h
@@ -3,6 +3,8 @@
 #define _ASM_ARM_KVM_PARA_H
 
 #include <uapi/asm/kvm_para.h>
+#include <linux/of.h>
+#include <asm/hypervisor.h>
 
 static inline bool kvm_check_and_clear_guest_paused(void)
 {
@@ -11,7 +13,12 @@ static inline bool kvm_check_and_clear_guest_paused(void)
 
 static inline unsigned int kvm_arch_para_features(void)
 {
-	return 0;
+	unsigned int features = 0;
+
+	if (kvm_arm_hyp_service_available(ARM_SMCCC_KVM_FUNC_ASYNC_PF))
+		features |= (1 << KVM_FEATURE_ASYNC_PF);
+
+	return features;
 }
 
 static inline unsigned int kvm_arch_para_hints(void)
@@ -21,6 +28,9 @@ static inline unsigned int kvm_arch_para_hints(void)
 
 static inline bool kvm_para_available(void)
 {
+	if (IS_ENABLED(CONFIG_KVM_GUEST))
+		return true;
+
 	return false;
 }
 
diff --git a/arch/arm64/include/uapi/asm/kvm_para.h b/arch/arm64/include/uapi/asm/kvm_para.h
index 162325e2638f..70bbc7d1ec75 100644
--- a/arch/arm64/include/uapi/asm/kvm_para.h
+++ b/arch/arm64/include/uapi/asm/kvm_para.h
@@ -4,6 +4,8 @@
 
 #include <linux/types.h>
 
+#define KVM_FEATURE_ASYNC_PF		0
+
 /* Async PF */
 #define KVM_ASYNC_PF_ENABLED		(1 << 0)
 #define KVM_ASYNC_PF_SEND_ALWAYS	(1 << 1)
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 13/15] arm64: Reschedule process on aync PF
  2021-08-15  0:59 [PATCH v4 00/15] Support Asynchronous Page Fault Gavin Shan
                   ` (11 preceding siblings ...)
  2021-08-15  0:59 ` [PATCH v4 12/15] arm64: Detect async PF para-virtualization feature Gavin Shan
@ 2021-08-15  0:59 ` Gavin Shan
  2021-08-15  0:59 ` [PATCH v4 14/15] arm64: Enable async PF Gavin Shan
  2021-08-15  0:59 ` [PATCH v4 15/15] KVM: arm64: Add async PF document Gavin Shan
  14 siblings, 0 replies; 36+ messages in thread
From: Gavin Shan @ 2021-08-15  0:59 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-kernel, kvm, james.morse, mark.rutland, Jonathan.Cameron,
	will, maz, pbonzini, vkuznets, shan.gavin

The page-not-present notification is delivered by SDEI event. The
guest reschedules current process to another one when the SDEI event
is received. It's not safe to do so in the SDEI event handler because
the SDEI event should be acknowledged as soon as possible.

So the rescheduling is postponed until the current process switches
from kernel to user mode. In order to trigger the switch, the SDEI
event handler sends (reschedule) IPI to current CPU and it's delivered
in time after the SDEI event is acknowledged.

A new thread flag (TIF_ASYNC_PF) is introduced in order to track the
state for the process, to be rescheduled. With the flag is set, there
is a head of wait-queue is associated with the process. The process
keeps rescheduling itself until the flag is cleared when page-ready
notification is received through (PPI) interrupt.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 arch/arm64/include/asm/processor.h   |  1 +
 arch/arm64/include/asm/thread_info.h |  4 +++-
 arch/arm64/kernel/signal.c           | 17 +++++++++++++++++
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index b6517fd03d7b..4d05d292baa1 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -156,6 +156,7 @@ struct thread_struct {
 	u64			gcr_user_excl;
 #endif
 	u64			sctlr_user;
+	void			*data;
 };
 
 #define SCTLR_USER_MASK                                                        \
diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h
index 6623c99f0984..38567adb26be 100644
--- a/arch/arm64/include/asm/thread_info.h
+++ b/arch/arm64/include/asm/thread_info.h
@@ -67,6 +67,7 @@ int arch_dup_task_struct(struct task_struct *dst,
 #define TIF_UPROBE		4	/* uprobe breakpoint or singlestep */
 #define TIF_MTE_ASYNC_FAULT	5	/* MTE Asynchronous Tag Check Fault */
 #define TIF_NOTIFY_SIGNAL	6	/* signal notifications exist */
+#define TIF_ASYNC_PF		7	/* Asynchronous page fault */
 #define TIF_SYSCALL_TRACE	8	/* syscall trace active */
 #define TIF_SYSCALL_AUDIT	9	/* syscall auditing */
 #define TIF_SYSCALL_TRACEPOINT	10	/* syscall tracepoint for ftrace */
@@ -97,11 +98,12 @@ int arch_dup_task_struct(struct task_struct *dst,
 #define _TIF_SVE		(1 << TIF_SVE)
 #define _TIF_MTE_ASYNC_FAULT	(1 << TIF_MTE_ASYNC_FAULT)
 #define _TIF_NOTIFY_SIGNAL	(1 << TIF_NOTIFY_SIGNAL)
+#define _TIF_ASYNC_PF		(1 << TIF_ASYNC_PF)
 
 #define _TIF_WORK_MASK		(_TIF_NEED_RESCHED | _TIF_SIGPENDING | \
 				 _TIF_NOTIFY_RESUME | _TIF_FOREIGN_FPSTATE | \
 				 _TIF_UPROBE | _TIF_MTE_ASYNC_FAULT | \
-				 _TIF_NOTIFY_SIGNAL)
+				 _TIF_NOTIFY_SIGNAL | _TIF_ASYNC_PF)
 
 #define _TIF_SYSCALL_WORK	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
 				 _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 23036334f4dc..15c7d115aa5d 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -929,6 +929,23 @@ asmlinkage void do_notify_resume(struct pt_regs *regs,
 				 unsigned long thread_flags)
 {
 	do {
+		if (thread_flags & _TIF_ASYNC_PF) {
+			struct swait_queue_head *wq =
+				READ_ONCE(current->thread.data);
+			DECLARE_SWAITQUEUE(wait);
+
+			local_daif_restore(DAIF_PROCCTX_NOIRQ);
+
+			do {
+				prepare_to_swait_exclusive(wq,
+					&wait, TASK_UNINTERRUPTIBLE);
+				if (!test_thread_flag(TIF_ASYNC_PF))
+					break;
+
+				schedule();
+			} while (test_thread_flag(TIF_ASYNC_PF));
+		}
+
 		if (thread_flags & _TIF_NEED_RESCHED) {
 			/* Unmask Debug and SError for the next task */
 			local_daif_restore(DAIF_PROCCTX_NOIRQ);
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 14/15] arm64: Enable async PF
  2021-08-15  0:59 [PATCH v4 00/15] Support Asynchronous Page Fault Gavin Shan
                   ` (12 preceding siblings ...)
  2021-08-15  0:59 ` [PATCH v4 13/15] arm64: Reschedule process on aync PF Gavin Shan
@ 2021-08-15  0:59 ` Gavin Shan
  2021-08-16 17:05   ` Vitaly Kuznetsov
  2021-08-15  0:59 ` [PATCH v4 15/15] KVM: arm64: Add async PF document Gavin Shan
  14 siblings, 1 reply; 36+ messages in thread
From: Gavin Shan @ 2021-08-15  0:59 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-kernel, kvm, james.morse, mark.rutland, Jonathan.Cameron,
	will, maz, pbonzini, vkuznets, shan.gavin

This enables asynchronous page fault from guest side. The design
is highlighted as below:

   * The per-vCPU shared memory region, which is represented by
     "struct kvm_vcpu_pv_apf_data", is allocated. The reason and
     token associated with the received notifications of asynchronous
     page fault are delivered through it.

   * A per-vCPU table, which is represented by "struct kvm_apf_table",
     is allocated. The process, on which the page-not-present notification
     is received, is added into the table so that it can reschedule
     itself on switching from kernel to user mode. Afterwards, the
     process, identified by token, is removed from the table and put
     into runnable state when page-ready notification is received.

   * During CPU hotplug, the (private) SDEI event is expected to be
     enabled or disabled on the affected CPU by SDEI client driver.
     The (PPI) interrupt is enabled or disabled on the affected CPU
     by ourself. When the system is going to reboot, the SDEI event
     is disabled and unregistered and the (PPI) interrupt is disabled.

   * The SDEI event and (PPI) interrupt number are retrieved from host
     through SMCCC interface. Besides, the version of the asynchronous
     page fault is validated when the feature is enabled on the guest.

   * The feature is disabled on guest when boot parameter "no-kvmapf"
     is specified.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 arch/arm64/kernel/Makefile |   1 +
 arch/arm64/kernel/kvm.c    | 452 +++++++++++++++++++++++++++++++++++++
 2 files changed, 453 insertions(+)
 create mode 100644 arch/arm64/kernel/kvm.c

diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 3f1490bfb938..f0c1a6a7eaa7 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -59,6 +59,7 @@ obj-$(CONFIG_ACPI)			+= acpi.o
 obj-$(CONFIG_ACPI_NUMA)			+= acpi_numa.o
 obj-$(CONFIG_ARM64_ACPI_PARKING_PROTOCOL)	+= acpi_parking_protocol.o
 obj-$(CONFIG_PARAVIRT)			+= paravirt.o
+obj-$(CONFIG_KVM_GUEST)			+= kvm.o
 obj-$(CONFIG_RANDOMIZE_BASE)		+= kaslr.o
 obj-$(CONFIG_HIBERNATION)		+= hibernate.o hibernate-asm.o
 obj-$(CONFIG_KEXEC_CORE)		+= machine_kexec.o relocate_kernel.o	\
diff --git a/arch/arm64/kernel/kvm.c b/arch/arm64/kernel/kvm.c
new file mode 100644
index 000000000000..effe8dc7e921
--- /dev/null
+++ b/arch/arm64/kernel/kvm.c
@@ -0,0 +1,452 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Asynchronous page fault support.
+ *
+ * Copyright (C) 2021 Red Hat, Inc.
+ *
+ * Author(s): Gavin Shan <gshan@redhat.com>
+ */
+
+#include <linux/kernel.h>
+#include <linux/spinlock.h>
+#include <linux/slab.h>
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/of.h>
+#include <linux/of_fdt.h>
+#include <linux/arm-smccc.h>
+#include <linux/kvm_para.h>
+#include <linux/arm_sdei.h>
+#include <linux/acpi.h>
+#include <linux/cpuhotplug.h>
+#include <linux/reboot.h>
+
+struct kvm_apf_task {
+	unsigned int		token;
+	struct task_struct	*task;
+	struct swait_queue_head	wq;
+};
+
+struct kvm_apf_table {
+	raw_spinlock_t		lock;
+	unsigned int		count;
+	struct kvm_apf_task	tasks[0];
+};
+
+static bool async_pf_available = true;
+static DEFINE_PER_CPU_DECRYPTED(struct kvm_vcpu_pv_apf_data, apf_data) __aligned(64);
+static struct kvm_apf_table __percpu *apf_tables;
+static unsigned int apf_tasks;
+static unsigned int apf_sdei_num;
+static unsigned int apf_ppi_num;
+static int apf_irq;
+
+static bool kvm_async_pf_add_task(struct task_struct *task,
+				  unsigned int token)
+{
+	struct kvm_apf_table *table = this_cpu_ptr(apf_tables);
+	unsigned int i, index = apf_tasks;
+	bool ret = false;
+
+	raw_spin_lock(&table->lock);
+
+	if (WARN_ON(table->count >= apf_tasks))
+		goto unlock;
+
+	for (i = 0; i < apf_tasks; i++) {
+		if (!table->tasks[i].task) {
+			if (index == apf_tasks) {
+				ret = true;
+				index = i;
+			}
+		} else if (table->tasks[i].task == task) {
+			WARN_ON(table->tasks[i].token != token);
+			ret = false;
+			break;
+		}
+	}
+
+	if (!ret)
+		goto unlock;
+
+	task->thread.data = &table->tasks[index].wq;
+	set_tsk_thread_flag(task, TIF_ASYNC_PF);
+
+	table->count++;
+	table->tasks[index].task = task;
+	table->tasks[index].token = token;
+
+unlock:
+	raw_spin_unlock(&table->lock);
+	return ret;
+}
+
+static inline void kvm_async_pf_remove_one_task(struct kvm_apf_table *table,
+						unsigned int index)
+{
+	clear_tsk_thread_flag(table->tasks[index].task, TIF_ASYNC_PF);
+	WRITE_ONCE(table->tasks[index].task->thread.data, NULL);
+
+	table->count--;
+	table->tasks[index].task = NULL;
+	table->tasks[index].token = 0;
+
+	swake_up_one(&table->tasks[index].wq);
+}
+
+static bool kvm_async_pf_remove_task(unsigned int token)
+{
+	struct kvm_apf_table *table = this_cpu_ptr(apf_tables);
+	unsigned int i;
+	bool ret = (token == UINT_MAX);
+
+	raw_spin_lock(&table->lock);
+
+	for (i = 0; i < apf_tasks; i++) {
+		if (!table->tasks[i].task)
+			continue;
+
+		/* Wakeup all */
+		if (token == UINT_MAX) {
+			kvm_async_pf_remove_one_task(table, i);
+			continue;
+		}
+
+		if (table->tasks[i].token == token) {
+			kvm_async_pf_remove_one_task(table, i);
+			ret = true;
+			break;
+		}
+	}
+
+	raw_spin_unlock(&table->lock);
+
+	return ret;
+}
+
+static int kvm_async_pf_sdei_handler(unsigned int event,
+				     struct pt_regs *regs,
+				     void *arg)
+{
+	unsigned int reason = __this_cpu_read(apf_data.reason);
+	unsigned int token = __this_cpu_read(apf_data.token);
+	bool ret;
+
+	if (reason != KVM_PV_REASON_PAGE_NOT_PRESENT) {
+		pr_warn("%s: Bogus notification (%d, 0x%08x)\n",
+			__func__, reason, token);
+		return -EINVAL;
+	}
+
+	ret = kvm_async_pf_add_task(current, token);
+	__this_cpu_write(apf_data.token, 0);
+	__this_cpu_write(apf_data.reason, 0);
+
+	if (!ret)
+		return -ENOSPC;
+
+	smp_send_reschedule(smp_processor_id());
+
+	return 0;
+}
+
+static irqreturn_t kvm_async_pf_irq_handler(int irq, void *dev_id)
+{
+	unsigned int reason = __this_cpu_read(apf_data.reason);
+	unsigned int token = __this_cpu_read(apf_data.token);
+	struct arm_smccc_res res;
+
+	if (reason != KVM_PV_REASON_PAGE_READY) {
+		pr_warn("%s: Bogus interrupt %d (%d, 0x%08x)\n",
+			__func__, irq, reason, token);
+		return IRQ_HANDLED;
+	}
+
+	kvm_async_pf_remove_task(token);
+
+	__this_cpu_write(apf_data.token, 0);
+	__this_cpu_write(apf_data.reason, 0);
+	arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_ASYNC_PF_FUNC_ID,
+			     ARM_SMCCC_KVM_FUNC_ASYNC_PF_IRQ_ACK, &res);
+
+	return IRQ_HANDLED;
+}
+
+static int __init kvm_async_pf_available(char *arg)
+{
+	async_pf_available = false;
+
+	return 0;
+}
+early_param("no-kvmapf", kvm_async_pf_available);
+
+static void kvm_async_pf_disable(void)
+{
+	struct arm_smccc_res res;
+	u32 enabled = __this_cpu_read(apf_data.enabled);
+
+	if (!enabled)
+		return;
+
+	/* Disable the functionality */
+	arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_ASYNC_PF_FUNC_ID,
+			     ARM_SMCCC_KVM_FUNC_ASYNC_PF_ENABLE,
+			     0, 0, &res);
+	if (res.a0 != SMCCC_RET_SUCCESS) {
+		pr_warn("%s: Error %ld to disable on CPU%d\n",
+			__func__, res.a0, smp_processor_id());
+		return;
+	}
+
+	__this_cpu_write(apf_data.enabled, 0);
+
+	pr_info("Async PF disabled on CPU%d\n", smp_processor_id());
+}
+
+static void kvm_async_pf_enable(void)
+{
+	struct arm_smccc_res res;
+	u32 enabled = __this_cpu_read(apf_data.enabled);
+	u64 val = virt_to_phys(this_cpu_ptr(&apf_data));
+
+	if (enabled)
+		return;
+
+	val |= KVM_ASYNC_PF_ENABLED;
+	arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_ASYNC_PF_FUNC_ID,
+			     ARM_SMCCC_KVM_FUNC_ASYNC_PF_ENABLE,
+			     (u32)val, (u32)(val >> 32), &res);
+	if (res.a0 != SMCCC_RET_SUCCESS) {
+		pr_warn("%s: Error %ld to enable CPU%d\n",
+			__func__, res.a0, smp_processor_id());
+		return;
+	}
+
+	__this_cpu_write(apf_data.enabled, 1);
+
+	pr_info("Async PF enabled on CPU%d\n", smp_processor_id());
+}
+
+static void kvm_async_pf_cpu_disable(void *info)
+{
+	disable_percpu_irq(apf_irq);
+	kvm_async_pf_disable();
+}
+
+static void kvm_async_pf_cpu_enable(void *info)
+{
+	enable_percpu_irq(apf_irq, IRQ_TYPE_LEVEL_HIGH);
+	kvm_async_pf_enable();
+}
+
+static int kvm_async_pf_cpu_reboot_notify(struct notifier_block *nb,
+					  unsigned long code,
+					  void *unused)
+{
+	if (code == SYS_RESTART) {
+		sdei_event_disable(apf_sdei_num);
+		sdei_event_unregister(apf_sdei_num);
+
+		on_each_cpu(kvm_async_pf_cpu_disable, NULL, 1);
+	}
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block kvm_async_pf_cpu_reboot_nb = {
+	.notifier_call = kvm_async_pf_cpu_reboot_notify,
+};
+
+static int kvm_async_pf_cpu_online(unsigned int cpu)
+{
+	kvm_async_pf_cpu_enable(NULL);
+
+	return 0;
+}
+
+static int kvm_async_pf_cpu_offline(unsigned int cpu)
+{
+	kvm_async_pf_cpu_disable(NULL);
+
+	return 0;
+}
+
+static int __init kvm_async_pf_check_version(void)
+{
+	struct arm_smccc_res res;
+
+	/*
+	 * Check the version and v1.0.0 or higher version is required
+	 * to support the functionality.
+	 */
+	arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_ASYNC_PF_FUNC_ID,
+			     ARM_SMCCC_KVM_FUNC_ASYNC_PF_VERSION, &res);
+	if (res.a0 != SMCCC_RET_SUCCESS) {
+		pr_warn("%s: Error %ld to get version\n",
+			__func__, res.a0);
+		return -EPERM;
+	}
+
+	if ((res.a1 & 0xFFFFFFFFFF000000) ||
+	    ((res.a1 & 0xFF0000) >> 16) < 0x1) {
+		pr_warn("%s: Invalid version (0x%016lx)\n",
+			__func__, res.a1);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int __init kvm_async_pf_info(void)
+{
+	struct arm_smccc_res res;
+
+	/* Retrieve number of tokens */
+	arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_ASYNC_PF_FUNC_ID,
+			     ARM_SMCCC_KVM_FUNC_ASYNC_PF_SLOTS, &res);
+	if (res.a0 != SMCCC_RET_SUCCESS) {
+		pr_warn("%s: Error %ld to get token number\n",
+			__func__, res.a0);
+		return -EPERM;
+	}
+
+	apf_tasks = res.a1 * 2;
+
+	/* Retrieve SDEI event number */
+	arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_ASYNC_PF_FUNC_ID,
+			     ARM_SMCCC_KVM_FUNC_ASYNC_PF_SDEI, &res);
+	if (res.a0 != SMCCC_RET_SUCCESS) {
+		pr_warn("%s: Error %ld to get SDEI event number\n",
+			__func__, res.a0);
+		return -EPERM;
+	}
+
+	apf_sdei_num = res.a1;
+
+	/* Retrieve (PPI) interrupt number */
+	arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_ASYNC_PF_FUNC_ID,
+			     ARM_SMCCC_KVM_FUNC_ASYNC_PF_IRQ, &res);
+	if (res.a0 != SMCCC_RET_SUCCESS) {
+		pr_warn("%s: Error %ld to get IRQ\n",
+			__func__, res.a0);
+		return -EPERM;
+	}
+
+	apf_ppi_num = res.a1;
+
+	return 0;
+}
+
+static int __init kvm_async_pf_init(void)
+{
+	struct kvm_apf_table *table;
+	size_t size;
+	int cpu, i, ret;
+
+	if (!kvm_para_has_feature(KVM_FEATURE_ASYNC_PF) ||
+	    !async_pf_available)
+		return -EPERM;
+
+	ret = kvm_async_pf_check_version();
+	if (ret)
+		return ret;
+
+	ret = kvm_async_pf_info();
+	if (ret)
+		return ret;
+
+	/* Allocate and initialize the sleeper table */
+	size = sizeof(struct kvm_apf_table) +
+	       apf_tasks * sizeof(struct kvm_apf_task);
+	apf_tables = __alloc_percpu(size, 0);
+	if (!apf_tables) {
+		pr_warn("%s: Unable to alloc async PF table\n",
+			__func__);
+		return -ENOMEM;
+	}
+
+	for_each_possible_cpu(cpu) {
+		table = per_cpu_ptr(apf_tables, cpu);
+		raw_spin_lock_init(&table->lock);
+		for (i = 0; i < apf_tasks; i++)
+			init_swait_queue_head(&table->tasks[i].wq);
+	}
+
+	/*
+	 * Initialize SDEI event for page-not-present notification.
+	 * The SDEI event number should have been retrieved from
+	 * the host.
+	 */
+	ret = sdei_event_register(apf_sdei_num,
+				  kvm_async_pf_sdei_handler, NULL);
+	if (ret) {
+		pr_warn("%s: Error %d to register SDEI event\n",
+			__func__, ret);
+		ret = -EIO;
+		goto release_tables;
+	}
+
+	ret = sdei_event_enable(apf_sdei_num);
+	if (ret) {
+		pr_warn("%s: Error %d to enable SDEI event\n",
+			__func__, ret);
+		goto unregister_event;
+	}
+
+	/*
+	 * Initialize interrupt for page-ready notification. The
+	 * interrupt number and its properties should have been
+	 * retrieved from the ACPI:APFT table.
+	 */
+	apf_irq = acpi_register_gsi(NULL, apf_ppi_num,
+				    ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_HIGH);
+	if (apf_irq <= 0) {
+		ret = -EIO;
+		pr_warn("%s: Error %d to register IRQ\n",
+			__func__, apf_irq);
+		goto disable_event;
+	}
+
+	ret = request_percpu_irq(apf_irq, kvm_async_pf_irq_handler,
+				 "Asynchronous Page Fault", &apf_data);
+	if (ret) {
+		pr_warn("%s: Error %d to request IRQ\n",
+			__func__, ret);
+		goto unregister_irq;
+	}
+
+	register_reboot_notifier(&kvm_async_pf_cpu_reboot_nb);
+	ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+			"arm/kvm:online", kvm_async_pf_cpu_online,
+			kvm_async_pf_cpu_offline);
+	if (ret < 0) {
+		pr_warn("%s: Error %d to install cpu hotplug callbacks\n",
+			__func__, ret);
+		goto release_irq;
+	}
+
+	/* Enable async PF on the online CPUs */
+	on_each_cpu(kvm_async_pf_cpu_enable, NULL, 1);
+
+	return 0;
+
+release_irq:
+	free_percpu_irq(apf_irq, &apf_data);
+unregister_irq:
+	acpi_unregister_gsi(apf_ppi_num);
+disable_event:
+	sdei_event_disable(apf_sdei_num);
+unregister_event:
+	sdei_event_unregister(apf_sdei_num);
+release_tables:
+	free_percpu(apf_tables);
+
+	return ret;
+}
+
+static int __init kvm_guest_init(void)
+{
+	return kvm_async_pf_init();
+}
+
+fs_initcall(kvm_guest_init);
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v4 15/15] KVM: arm64: Add async PF document
  2021-08-15  0:59 [PATCH v4 00/15] Support Asynchronous Page Fault Gavin Shan
                   ` (13 preceding siblings ...)
  2021-08-15  0:59 ` [PATCH v4 14/15] arm64: Enable async PF Gavin Shan
@ 2021-08-15  0:59 ` Gavin Shan
  2021-11-11 10:39   ` Eric Auger
  14 siblings, 1 reply; 36+ messages in thread
From: Gavin Shan @ 2021-08-15  0:59 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-kernel, kvm, james.morse, mark.rutland, Jonathan.Cameron,
	will, maz, pbonzini, vkuznets, shan.gavin

This adds document to explain the interface for asynchronous page
fault and how it works in general.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 Documentation/virt/kvm/arm/apf.rst   | 143 +++++++++++++++++++++++++++
 Documentation/virt/kvm/arm/index.rst |   1 +
 2 files changed, 144 insertions(+)
 create mode 100644 Documentation/virt/kvm/arm/apf.rst

diff --git a/Documentation/virt/kvm/arm/apf.rst b/Documentation/virt/kvm/arm/apf.rst
new file mode 100644
index 000000000000..4f5c01b6699f
--- /dev/null
+++ b/Documentation/virt/kvm/arm/apf.rst
@@ -0,0 +1,143 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Asynchronous Page Fault Support for arm64
+=========================================
+
+There are two stages of page faults when KVM module is enabled as accelerator
+to the guest. The guest is responsible for handling the stage-1 page faults,
+while the host handles the stage-2 page faults. During the period of handling
+the stage-2 page faults, the guest is suspended until the requested page is
+ready. It could take several milliseconds, even hundreds of milliseconds in
+extreme situations because I/O might be required to move the requested page
+from disk to DRAM. The guest does not do any work when it is suspended. The
+feature (Asynchronous Page Fault) is introduced to take advantage of the
+suspending period and to improve the overall performance.
+
+There are two paths in order to fulfil the asynchronous page fault, called
+as control path and data path. The control path allows the VMM or guest to
+configure the functionality, while the notifications are delivered in data
+path. The notifications are classified into page-not-present and page-ready
+notifications.
+
+Data Path
+---------
+
+There are two types of notifications delivered from host to guest in the
+data path: page-not-present and page-ready notification. They are delivered
+through SDEI event and (PPI) interrupt separately. Besides, there is a shared
+buffer between host and guest to indicate the reason and sequential token,
+which is used to identify the asynchronous page fault. The reason and token
+resident in the shared buffer is written by host, read and cleared by guest.
+An asynchronous page fault is delivered and completed as below.
+
+(1) When an asynchronous page fault starts, a (workqueue) worker is created
+    and queued to the vCPU's pending queue. The worker makes the requested
+    page ready and resident to DRAM in the background. The shared buffer is
+    updated with reason and sequential token. After that, SDEI event is sent
+    to guest as page-not-present notification.
+
+(2) When the SDEI event is received on guest, the current process is tagged
+    with TIF_ASYNC_PF and associated with a wait queue. The process is ready
+    to keep rescheduling itself on switching from kernel to user mode. After
+    that, a reschedule IPI is sent to current CPU and the received SDEI event
+    is acknowledged. Note that the IPI is delivered when the acknowledgment
+    on the SDEI event is received on host.
+
+(3) On the host, the worker is dequeued from the vCPU's pending queue and
+    enqueued to its completion queue when the requested page becomes ready.
+    In the mean while, KVM_REQ_ASYNC_PF request is sent the vCPU if the
+    worker is the first element enqueued to the completion queue.
+
+(4) With pending KVM_REQ_ASYNC_PF request, the first worker in the completion
+    queue is dequeued and destroyed. In the mean while, a (PPI) interrupt is
+    sent to guest with updated reason and token in the shared buffer.
+
+(5) When the (PPI) interrupt is received on guest, the affected process is
+    located using the token and waken up after its TIF_ASYNC_PF tag is cleared.
+    After that, the interrupt is acknowledged through SMCCC interface. The
+    workers in the completion queue is dequeued and destroyed if any workers
+    exist, and another (PPI) interrupt is sent to the guest.
+
+Control Path
+------------
+
+The configurations are passed through SMCCC or ioctl interface. The SDEI
+event and (PPI) interrupt are owned by VMM, so the SDEI event and interrupt
+numbers are configured through ioctl command on per-vCPU basis. Besides,
+the functionality might be enabled and configured through ioctl interface
+by VMM during migration:
+
+   * KVM_ARM_ASYNC_PF_CMD_GET_VERSION
+
+     Returns the current version of the feature, supported by the host. It is
+     made up of major, minor and revision fields. Each field is one byte in
+     length.
+
+   * KVM_ARM_ASYNC_PF_CMD_GET_SDEI:
+
+     Retrieve the SDEI event number, used for page-not-present notification,
+     so that it can be configured on destination VM in the scenario of
+     migration.
+
+   * KVM_ARM_ASYNC_PF_GET_IRQ:
+
+     Retrieve the IRQ (PPI) number, used for page-ready notification, so that
+     it can be configured on destination VM in the scenario of migration.
+
+   * KVM_ARM_ASYNC_PF_CMD_GET_CONTROL
+
+     Retrieve the address of control block, so that it can be configured on
+     destination VM in the scenario of migration.
+
+   * KVM_ARM_ASYNC_PF_CMD_SET_SDEI:
+
+     Used by VMM to configure number of SDEI event, which is used to deliver
+     page-not-present notification by host. This is used when VM is started
+     or migrated.
+
+   * KVM_ARM_ASYNC_PF_CMD_SET_IRQ
+
+     Used by VMM to configure number of (PPI) interrupt, which is used to
+     deliver page-ready notification by host. This is used when VM is started
+     or migrated.
+
+   * KVM_ARM_ASYNC_PF_CMD_SET_CONTROL
+
+     Set the control block on the destination VM in the scenario of migration.
+
+The other configurations are passed through SMCCC interface. The host exports
+the capability through KVM vendor specific service, which is identified by
+ARM_SMCCC_KVM_FUNC_ASYNC_PF_FUNC_ID. There are several functions defined for
+this:
+
+   * ARM_SMCCC_KVM_FUNC_ASYNC_PF_VERSION
+
+     Returns the current version of the feature, supported by the host. It is
+     made up of major, minor and revision fields. Each field is one byte in
+     length.
+
+   * ARM_SMCCC_KVM_FUNC_ASYNC_PF_SLOTS
+
+     Returns the size of the hashed GFN table. It is used by guest to set up
+     the capacity of waiting process table.
+
+   * ARM_SMCCC_KVM_FUNC_ASYNC_PF_SDEI
+   * ARM_SMCCC_KVM_FUNC_ASYNC_PF_IRQ
+
+     Used by the guest to retrieve the SDEI event and (PPI) interrupt number
+     that are configured by VMM.
+
+   * ARM_SMCCC_KVM_FUNC_ASYNC_PF_ENABLE
+
+     Used by the guest to enable or disable the feature on the specific vCPU.
+     The argument is made up of shared buffer and flags. The shared buffer
+     is written by host to indicate the reason about the delivered asynchronous
+     page fault and token (sequence number) to identify that. There are two
+     flags are supported: KVM_ASYNC_PF_ENABLED is used to enable or disable
+     the feature. KVM_ASYNC_PF_SEND_ALWAYS allows to deliver page-not-present
+     notification regardless of the guest's state. Otherwise, the notification
+     is delivered only when the guest is in user mode.
+
+   * ARM_SMCCC_KVM_FUNC_ASYNC_PF_IRQ_ACK
+
+     Used by the guest to acknowledge the completion of page-ready notification.
diff --git a/Documentation/virt/kvm/arm/index.rst b/Documentation/virt/kvm/arm/index.rst
index 78a9b670aafe..f43b5fe25f61 100644
--- a/Documentation/virt/kvm/arm/index.rst
+++ b/Documentation/virt/kvm/arm/index.rst
@@ -7,6 +7,7 @@ ARM
 .. toctree::
    :maxdepth: 2
 
+   apf
    hyp-abi
    psci
    pvtime
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 02/15] KVM: async_pf: Add helper function to check completion queue
  2021-08-15  0:59 ` [PATCH v4 02/15] KVM: async_pf: Add helper function to check completion queue Gavin Shan
@ 2021-08-16 16:53   ` Vitaly Kuznetsov
  2021-08-17 10:44     ` Gavin Shan
  2021-11-10 15:37   ` Eric Auger
  1 sibling, 1 reply; 36+ messages in thread
From: Vitaly Kuznetsov @ 2021-08-16 16:53 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linux-kernel, kvm, james.morse, mark.rutland, Jonathan.Cameron,
	will, maz, pbonzini, shan.gavin, kvmarm

Gavin Shan <gshan@redhat.com> writes:

> This adds inline helper kvm_check_async_pf_completion_queue() to
> check if there are pending completion in the queue. The empty stub
> is also added on !CONFIG_KVM_ASYNC_PF so that the caller needn't
> consider if CONFIG_KVM_ASYNC_PF is enabled.
>
> All checks on the completion queue is done by the newly added inline
> function since list_empty() and list_empty_careful() are interchangeable.
>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
>  arch/x86/kvm/x86.c       |  2 +-
>  include/linux/kvm_host.h | 10 ++++++++++
>  virt/kvm/async_pf.c      | 10 +++++-----
>  virt/kvm/kvm_main.c      |  4 +---
>  4 files changed, 17 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e5d5c5ed7dd4..7f35d9324b99 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -11591,7 +11591,7 @@ static inline bool kvm_guest_apic_has_interrupt(struct kvm_vcpu *vcpu)
>  
>  static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
>  {
> -	if (!list_empty_careful(&vcpu->async_pf.done))
> +	if (kvm_check_async_pf_completion_queue(vcpu))
>  		return true;
>  
>  	if (kvm_apic_has_events(vcpu))
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 85b61a456f1c..a5f990f6dc35 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -339,12 +339,22 @@ struct kvm_async_pf {
>  	bool				notpresent_injected;
>  };
>  
> +static inline bool kvm_check_async_pf_completion_queue(struct kvm_vcpu *vcpu)

Nitpicking: When not reading the implementation, I'm not exactly sure
what this function returns as 'check' is too ambiguous ('true' when the
queue is full? when it's empty? when it's not empty? when it was
properly set up?). I'd suggest we go with a more specific:

kvm_async_pf_completion_queue_empty() or something like that instead
(we'll have to invert the logic everywhere then). 

Side note: x86 seems to already use a shortened 'apf' instead of
'async_pf' in a number of places (e.g. 'apf_put_user_ready()'), we may
want to either fight this practice or support the rebelion by renaming
all functions from below instead :-)

> +{
> +	return !list_empty_careful(&vcpu->async_pf.done);
> +}
> +
>  void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu);
>  void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu);
>  bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>  			unsigned long hva, struct kvm_arch_async_pf *arch);
>  int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
>  #else
> +static inline bool kvm_check_async_pf_completion_queue(struct kvm_vcpu *vcpu)
> +{
> +	return false;
> +}
> +
>  static inline void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu) { }
>  #endif
>  
> diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
> index dd777688d14a..d145a61a046a 100644
> --- a/virt/kvm/async_pf.c
> +++ b/virt/kvm/async_pf.c
> @@ -70,7 +70,7 @@ static void async_pf_execute(struct work_struct *work)
>  		kvm_arch_async_page_present(vcpu, apf);
>  
>  	spin_lock(&vcpu->async_pf.lock);
> -	first = list_empty(&vcpu->async_pf.done);
> +	first = !kvm_check_async_pf_completion_queue(vcpu);
>  	list_add_tail(&apf->link, &vcpu->async_pf.done);
>  	apf->vcpu = NULL;
>  	spin_unlock(&vcpu->async_pf.lock);
> @@ -122,7 +122,7 @@ void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu)
>  		spin_lock(&vcpu->async_pf.lock);
>  	}
>  
> -	while (!list_empty(&vcpu->async_pf.done)) {
> +	while (kvm_check_async_pf_completion_queue(vcpu)) {
>  		struct kvm_async_pf *work =
>  			list_first_entry(&vcpu->async_pf.done,
>  					 typeof(*work), link);
> @@ -138,7 +138,7 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_async_pf *work;
>  
> -	while (!list_empty_careful(&vcpu->async_pf.done) &&
> +	while (kvm_check_async_pf_completion_queue(vcpu) &&
>  	      kvm_arch_can_dequeue_async_page_present(vcpu)) {
>  		spin_lock(&vcpu->async_pf.lock);
>  		work = list_first_entry(&vcpu->async_pf.done, typeof(*work),
> @@ -205,7 +205,7 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu)
>  	struct kvm_async_pf *work;
>  	bool first;
>  
> -	if (!list_empty_careful(&vcpu->async_pf.done))
> +	if (kvm_check_async_pf_completion_queue(vcpu))
>  		return 0;
>  
>  	work = kmem_cache_zalloc(async_pf_cache, GFP_ATOMIC);
> @@ -216,7 +216,7 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu)
>  	INIT_LIST_HEAD(&work->queue); /* for list_del to work */
>  
>  	spin_lock(&vcpu->async_pf.lock);
> -	first = list_empty(&vcpu->async_pf.done);
> +	first = !kvm_check_async_pf_completion_queue(vcpu);
>  	list_add_tail(&work->link, &vcpu->async_pf.done);
>  	spin_unlock(&vcpu->async_pf.lock);
>  
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index b50dbe269f4b..8795503651b1 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3282,10 +3282,8 @@ static bool vcpu_dy_runnable(struct kvm_vcpu *vcpu)
>  	if (kvm_arch_dy_runnable(vcpu))
>  		return true;
>  
> -#ifdef CONFIG_KVM_ASYNC_PF
> -	if (!list_empty_careful(&vcpu->async_pf.done))
> +	if (kvm_check_async_pf_completion_queue(vcpu))
>  		return true;
> -#endif
>  
>  	return false;
>  }

-- 
Vitaly


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 14/15] arm64: Enable async PF
  2021-08-15  0:59 ` [PATCH v4 14/15] arm64: Enable async PF Gavin Shan
@ 2021-08-16 17:05   ` Vitaly Kuznetsov
  2021-08-17 10:49     ` Gavin Shan
  0 siblings, 1 reply; 36+ messages in thread
From: Vitaly Kuznetsov @ 2021-08-16 17:05 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linux-kernel, kvm, james.morse, mark.rutland, Jonathan.Cameron,
	will, maz, pbonzini, shan.gavin, kvmarm

Gavin Shan <gshan@redhat.com> writes:

> This enables asynchronous page fault from guest side. The design
> is highlighted as below:
>
>    * The per-vCPU shared memory region, which is represented by
>      "struct kvm_vcpu_pv_apf_data", is allocated. The reason and
>      token associated with the received notifications of asynchronous
>      page fault are delivered through it.
>
>    * A per-vCPU table, which is represented by "struct kvm_apf_table",
>      is allocated. The process, on which the page-not-present notification
>      is received, is added into the table so that it can reschedule
>      itself on switching from kernel to user mode. Afterwards, the
>      process, identified by token, is removed from the table and put
>      into runnable state when page-ready notification is received.
>
>    * During CPU hotplug, the (private) SDEI event is expected to be
>      enabled or disabled on the affected CPU by SDEI client driver.
>      The (PPI) interrupt is enabled or disabled on the affected CPU
>      by ourself. When the system is going to reboot, the SDEI event
>      is disabled and unregistered and the (PPI) interrupt is disabled.
>
>    * The SDEI event and (PPI) interrupt number are retrieved from host
>      through SMCCC interface. Besides, the version of the asynchronous
>      page fault is validated when the feature is enabled on the guest.
>
>    * The feature is disabled on guest when boot parameter "no-kvmapf"
>      is specified.

Documentation/admin-guide/kernel-parameters.txt states this one is
x86-only:

        no-kvmapf       [X86,KVM] Disable paravirtualized asynchronous page
                        fault handling.

makes sense to update in this patch I believe.

>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
>  arch/arm64/kernel/Makefile |   1 +
>  arch/arm64/kernel/kvm.c    | 452 +++++++++++++++++++++++++++++++++++++
>  2 files changed, 453 insertions(+)
>  create mode 100644 arch/arm64/kernel/kvm.c
>
> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
> index 3f1490bfb938..f0c1a6a7eaa7 100644
> --- a/arch/arm64/kernel/Makefile
> +++ b/arch/arm64/kernel/Makefile
> @@ -59,6 +59,7 @@ obj-$(CONFIG_ACPI)			+= acpi.o
>  obj-$(CONFIG_ACPI_NUMA)			+= acpi_numa.o
>  obj-$(CONFIG_ARM64_ACPI_PARKING_PROTOCOL)	+= acpi_parking_protocol.o
>  obj-$(CONFIG_PARAVIRT)			+= paravirt.o
> +obj-$(CONFIG_KVM_GUEST)			+= kvm.o
>  obj-$(CONFIG_RANDOMIZE_BASE)		+= kaslr.o
>  obj-$(CONFIG_HIBERNATION)		+= hibernate.o hibernate-asm.o
>  obj-$(CONFIG_KEXEC_CORE)		+= machine_kexec.o relocate_kernel.o	\
> diff --git a/arch/arm64/kernel/kvm.c b/arch/arm64/kernel/kvm.c
> new file mode 100644
> index 000000000000..effe8dc7e921
> --- /dev/null
> +++ b/arch/arm64/kernel/kvm.c
> @@ -0,0 +1,452 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Asynchronous page fault support.
> + *
> + * Copyright (C) 2021 Red Hat, Inc.
> + *
> + * Author(s): Gavin Shan <gshan@redhat.com>
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/spinlock.h>
> +#include <linux/slab.h>
> +#include <linux/interrupt.h>
> +#include <linux/irq.h>
> +#include <linux/of.h>
> +#include <linux/of_fdt.h>
> +#include <linux/arm-smccc.h>
> +#include <linux/kvm_para.h>
> +#include <linux/arm_sdei.h>
> +#include <linux/acpi.h>
> +#include <linux/cpuhotplug.h>
> +#include <linux/reboot.h>
> +
> +struct kvm_apf_task {
> +	unsigned int		token;
> +	struct task_struct	*task;
> +	struct swait_queue_head	wq;
> +};
> +
> +struct kvm_apf_table {
> +	raw_spinlock_t		lock;
> +	unsigned int		count;
> +	struct kvm_apf_task	tasks[0];
> +};
> +
> +static bool async_pf_available = true;
> +static DEFINE_PER_CPU_DECRYPTED(struct kvm_vcpu_pv_apf_data, apf_data) __aligned(64);
> +static struct kvm_apf_table __percpu *apf_tables;
> +static unsigned int apf_tasks;
> +static unsigned int apf_sdei_num;
> +static unsigned int apf_ppi_num;
> +static int apf_irq;
> +
> +static bool kvm_async_pf_add_task(struct task_struct *task,
> +				  unsigned int token)
> +{
> +	struct kvm_apf_table *table = this_cpu_ptr(apf_tables);
> +	unsigned int i, index = apf_tasks;
> +	bool ret = false;
> +
> +	raw_spin_lock(&table->lock);
> +
> +	if (WARN_ON(table->count >= apf_tasks))
> +		goto unlock;
> +
> +	for (i = 0; i < apf_tasks; i++) {
> +		if (!table->tasks[i].task) {
> +			if (index == apf_tasks) {
> +				ret = true;
> +				index = i;
> +			}
> +		} else if (table->tasks[i].task == task) {
> +			WARN_ON(table->tasks[i].token != token);
> +			ret = false;
> +			break;
> +		}
> +	}
> +
> +	if (!ret)
> +		goto unlock;
> +
> +	task->thread.data = &table->tasks[index].wq;
> +	set_tsk_thread_flag(task, TIF_ASYNC_PF);
> +
> +	table->count++;
> +	table->tasks[index].task = task;
> +	table->tasks[index].token = token;
> +
> +unlock:
> +	raw_spin_unlock(&table->lock);
> +	return ret;
> +}
> +
> +static inline void kvm_async_pf_remove_one_task(struct kvm_apf_table *table,
> +						unsigned int index)
> +{
> +	clear_tsk_thread_flag(table->tasks[index].task, TIF_ASYNC_PF);
> +	WRITE_ONCE(table->tasks[index].task->thread.data, NULL);
> +
> +	table->count--;
> +	table->tasks[index].task = NULL;
> +	table->tasks[index].token = 0;
> +
> +	swake_up_one(&table->tasks[index].wq);
> +}
> +
> +static bool kvm_async_pf_remove_task(unsigned int token)
> +{
> +	struct kvm_apf_table *table = this_cpu_ptr(apf_tables);
> +	unsigned int i;
> +	bool ret = (token == UINT_MAX);
> +
> +	raw_spin_lock(&table->lock);
> +
> +	for (i = 0; i < apf_tasks; i++) {
> +		if (!table->tasks[i].task)
> +			continue;
> +
> +		/* Wakeup all */
> +		if (token == UINT_MAX) {
> +			kvm_async_pf_remove_one_task(table, i);
> +			continue;
> +		}
> +
> +		if (table->tasks[i].token == token) {
> +			kvm_async_pf_remove_one_task(table, i);
> +			ret = true;
> +			break;
> +		}
> +	}
> +
> +	raw_spin_unlock(&table->lock);
> +
> +	return ret;
> +}
> +
> +static int kvm_async_pf_sdei_handler(unsigned int event,
> +				     struct pt_regs *regs,
> +				     void *arg)
> +{
> +	unsigned int reason = __this_cpu_read(apf_data.reason);
> +	unsigned int token = __this_cpu_read(apf_data.token);
> +	bool ret;
> +
> +	if (reason != KVM_PV_REASON_PAGE_NOT_PRESENT) {
> +		pr_warn("%s: Bogus notification (%d, 0x%08x)\n",
> +			__func__, reason, token);
> +		return -EINVAL;
> +	}
> +
> +	ret = kvm_async_pf_add_task(current, token);
> +	__this_cpu_write(apf_data.token, 0);
> +	__this_cpu_write(apf_data.reason, 0);
> +
> +	if (!ret)
> +		return -ENOSPC;
> +
> +	smp_send_reschedule(smp_processor_id());
> +
> +	return 0;
> +}
> +
> +static irqreturn_t kvm_async_pf_irq_handler(int irq, void *dev_id)
> +{
> +	unsigned int reason = __this_cpu_read(apf_data.reason);
> +	unsigned int token = __this_cpu_read(apf_data.token);
> +	struct arm_smccc_res res;
> +
> +	if (reason != KVM_PV_REASON_PAGE_READY) {
> +		pr_warn("%s: Bogus interrupt %d (%d, 0x%08x)\n",
> +			__func__, irq, reason, token);

Spurrious interrupt or bogus APF reason set? Could be both I belive.

> +		return IRQ_HANDLED;
> +	}
> +
> +	kvm_async_pf_remove_task(token);
> +
> +	__this_cpu_write(apf_data.token, 0);
> +	__this_cpu_write(apf_data.reason, 0);
> +	arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_ASYNC_PF_FUNC_ID,
> +			     ARM_SMCCC_KVM_FUNC_ASYNC_PF_IRQ_ACK, &res);
> +
> +	return IRQ_HANDLED;
> +}
> +
> +static int __init kvm_async_pf_available(char *arg)
> +{
> +	async_pf_available = false;
> +
> +	return 0;
> +}
> +early_param("no-kvmapf", kvm_async_pf_available);
> +
> +static void kvm_async_pf_disable(void)
> +{
> +	struct arm_smccc_res res;
> +	u32 enabled = __this_cpu_read(apf_data.enabled);
> +
> +	if (!enabled)
> +		return;
> +
> +	/* Disable the functionality */
> +	arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_ASYNC_PF_FUNC_ID,
> +			     ARM_SMCCC_KVM_FUNC_ASYNC_PF_ENABLE,
> +			     0, 0, &res);
> +	if (res.a0 != SMCCC_RET_SUCCESS) {
> +		pr_warn("%s: Error %ld to disable on CPU%d\n",
> +			__func__, res.a0, smp_processor_id());
> +		return;
> +	}
> +
> +	__this_cpu_write(apf_data.enabled, 0);
> +
> +	pr_info("Async PF disabled on CPU%d\n", smp_processor_id());

Nitpicking: x86 uses 

"setup async PF for cpu %d\n" and
"disable async PF for cpu %d\n"

which are not ideal maybe but in any case it would probably make sense
to be consistent across arches.


> +}
> +
> +static void kvm_async_pf_enable(void)
> +{
> +	struct arm_smccc_res res;
> +	u32 enabled = __this_cpu_read(apf_data.enabled);
> +	u64 val = virt_to_phys(this_cpu_ptr(&apf_data));
> +
> +	if (enabled)
> +		return;
> +
> +	val |= KVM_ASYNC_PF_ENABLED;
> +	arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_ASYNC_PF_FUNC_ID,
> +			     ARM_SMCCC_KVM_FUNC_ASYNC_PF_ENABLE,
> +			     (u32)val, (u32)(val >> 32), &res);
> +	if (res.a0 != SMCCC_RET_SUCCESS) {
> +		pr_warn("%s: Error %ld to enable CPU%d\n",
> +			__func__, res.a0, smp_processor_id());
> +		return;
> +	}
> +
> +	__this_cpu_write(apf_data.enabled, 1);
> +
> +	pr_info("Async PF enabled on CPU%d\n", smp_processor_id());
> +}
> +
> +static void kvm_async_pf_cpu_disable(void *info)
> +{
> +	disable_percpu_irq(apf_irq);
> +	kvm_async_pf_disable();
> +}
> +
> +static void kvm_async_pf_cpu_enable(void *info)
> +{
> +	enable_percpu_irq(apf_irq, IRQ_TYPE_LEVEL_HIGH);
> +	kvm_async_pf_enable();
> +}
> +
> +static int kvm_async_pf_cpu_reboot_notify(struct notifier_block *nb,
> +					  unsigned long code,
> +					  void *unused)
> +{
> +	if (code == SYS_RESTART) {
> +		sdei_event_disable(apf_sdei_num);
> +		sdei_event_unregister(apf_sdei_num);
> +
> +		on_each_cpu(kvm_async_pf_cpu_disable, NULL, 1);
> +	}
> +
> +	return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block kvm_async_pf_cpu_reboot_nb = {
> +	.notifier_call = kvm_async_pf_cpu_reboot_notify,
> +};
> +
> +static int kvm_async_pf_cpu_online(unsigned int cpu)
> +{
> +	kvm_async_pf_cpu_enable(NULL);
> +
> +	return 0;
> +}
> +
> +static int kvm_async_pf_cpu_offline(unsigned int cpu)
> +{
> +	kvm_async_pf_cpu_disable(NULL);
> +
> +	return 0;
> +}
> +
> +static int __init kvm_async_pf_check_version(void)
> +{
> +	struct arm_smccc_res res;
> +
> +	/*
> +	 * Check the version and v1.0.0 or higher version is required
> +	 * to support the functionality.
> +	 */
> +	arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_ASYNC_PF_FUNC_ID,
> +			     ARM_SMCCC_KVM_FUNC_ASYNC_PF_VERSION, &res);
> +	if (res.a0 != SMCCC_RET_SUCCESS) {
> +		pr_warn("%s: Error %ld to get version\n",
> +			__func__, res.a0);
> +		return -EPERM;
> +	}
> +
> +	if ((res.a1 & 0xFFFFFFFFFF000000) ||
> +	    ((res.a1 & 0xFF0000) >> 16) < 0x1) {
> +		pr_warn("%s: Invalid version (0x%016lx)\n",
> +			__func__, res.a1);
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +static int __init kvm_async_pf_info(void)
> +{
> +	struct arm_smccc_res res;
> +
> +	/* Retrieve number of tokens */
> +	arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_ASYNC_PF_FUNC_ID,
> +			     ARM_SMCCC_KVM_FUNC_ASYNC_PF_SLOTS, &res);
> +	if (res.a0 != SMCCC_RET_SUCCESS) {
> +		pr_warn("%s: Error %ld to get token number\n",
> +			__func__, res.a0);
> +		return -EPERM;
> +	}
> +
> +	apf_tasks = res.a1 * 2;
> +
> +	/* Retrieve SDEI event number */
> +	arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_ASYNC_PF_FUNC_ID,
> +			     ARM_SMCCC_KVM_FUNC_ASYNC_PF_SDEI, &res);
> +	if (res.a0 != SMCCC_RET_SUCCESS) {
> +		pr_warn("%s: Error %ld to get SDEI event number\n",
> +			__func__, res.a0);
> +		return -EPERM;
> +	}
> +
> +	apf_sdei_num = res.a1;
> +
> +	/* Retrieve (PPI) interrupt number */
> +	arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_ASYNC_PF_FUNC_ID,
> +			     ARM_SMCCC_KVM_FUNC_ASYNC_PF_IRQ, &res);
> +	if (res.a0 != SMCCC_RET_SUCCESS) {
> +		pr_warn("%s: Error %ld to get IRQ\n",
> +			__func__, res.a0);
> +		return -EPERM;
> +	}
> +
> +	apf_ppi_num = res.a1;
> +
> +	return 0;
> +}
> +
> +static int __init kvm_async_pf_init(void)
> +{
> +	struct kvm_apf_table *table;
> +	size_t size;
> +	int cpu, i, ret;
> +
> +	if (!kvm_para_has_feature(KVM_FEATURE_ASYNC_PF) ||
> +	    !async_pf_available)
> +		return -EPERM;
> +
> +	ret = kvm_async_pf_check_version();
> +	if (ret)
> +		return ret;
> +
> +	ret = kvm_async_pf_info();
> +	if (ret)
> +		return ret;
> +
> +	/* Allocate and initialize the sleeper table */
> +	size = sizeof(struct kvm_apf_table) +
> +	       apf_tasks * sizeof(struct kvm_apf_task);
> +	apf_tables = __alloc_percpu(size, 0);
> +	if (!apf_tables) {
> +		pr_warn("%s: Unable to alloc async PF table\n",
> +			__func__);
> +		return -ENOMEM;
> +	}
> +
> +	for_each_possible_cpu(cpu) {
> +		table = per_cpu_ptr(apf_tables, cpu);
> +		raw_spin_lock_init(&table->lock);
> +		for (i = 0; i < apf_tasks; i++)
> +			init_swait_queue_head(&table->tasks[i].wq);
> +	}
> +
> +	/*
> +	 * Initialize SDEI event for page-not-present notification.
> +	 * The SDEI event number should have been retrieved from
> +	 * the host.
> +	 */
> +	ret = sdei_event_register(apf_sdei_num,
> +				  kvm_async_pf_sdei_handler, NULL);
> +	if (ret) {
> +		pr_warn("%s: Error %d to register SDEI event\n",
> +			__func__, ret);
> +		ret = -EIO;
> +		goto release_tables;
> +	}
> +
> +	ret = sdei_event_enable(apf_sdei_num);
> +	if (ret) {
> +		pr_warn("%s: Error %d to enable SDEI event\n",
> +			__func__, ret);
> +		goto unregister_event;
> +	}
> +
> +	/*
> +	 * Initialize interrupt for page-ready notification. The
> +	 * interrupt number and its properties should have been
> +	 * retrieved from the ACPI:APFT table.
> +	 */
> +	apf_irq = acpi_register_gsi(NULL, apf_ppi_num,
> +				    ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_HIGH);
> +	if (apf_irq <= 0) {
> +		ret = -EIO;
> +		pr_warn("%s: Error %d to register IRQ\n",
> +			__func__, apf_irq);
> +		goto disable_event;
> +	}
> +
> +	ret = request_percpu_irq(apf_irq, kvm_async_pf_irq_handler,
> +				 "Asynchronous Page Fault", &apf_data);
> +	if (ret) {
> +		pr_warn("%s: Error %d to request IRQ\n",
> +			__func__, ret);
> +		goto unregister_irq;
> +	}
> +
> +	register_reboot_notifier(&kvm_async_pf_cpu_reboot_nb);
> +	ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
> +			"arm/kvm:online", kvm_async_pf_cpu_online,
> +			kvm_async_pf_cpu_offline);
> +	if (ret < 0) {
> +		pr_warn("%s: Error %d to install cpu hotplug callbacks\n",
> +			__func__, ret);
> +		goto release_irq;
> +	}
> +
> +	/* Enable async PF on the online CPUs */
> +	on_each_cpu(kvm_async_pf_cpu_enable, NULL, 1);
> +
> +	return 0;
> +
> +release_irq:
> +	free_percpu_irq(apf_irq, &apf_data);
> +unregister_irq:
> +	acpi_unregister_gsi(apf_ppi_num);
> +disable_event:
> +	sdei_event_disable(apf_sdei_num);
> +unregister_event:
> +	sdei_event_unregister(apf_sdei_num);
> +release_tables:
> +	free_percpu(apf_tables);
> +
> +	return ret;
> +}
> +
> +static int __init kvm_guest_init(void)
> +{
> +	return kvm_async_pf_init();
> +}
> +
> +fs_initcall(kvm_guest_init);

-- 
Vitaly


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 02/15] KVM: async_pf: Add helper function to check completion queue
  2021-08-16 16:53   ` Vitaly Kuznetsov
@ 2021-08-17 10:44     ` Gavin Shan
  0 siblings, 0 replies; 36+ messages in thread
From: Gavin Shan @ 2021-08-17 10:44 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: linux-kernel, kvm, james.morse, mark.rutland, Jonathan.Cameron,
	will, maz, pbonzini, shan.gavin, kvmarm

Hi Vitaly,

On 8/17/21 2:53 AM, Vitaly Kuznetsov wrote:
> Gavin Shan <gshan@redhat.com> writes:
> 
>> This adds inline helper kvm_check_async_pf_completion_queue() to
>> check if there are pending completion in the queue. The empty stub
>> is also added on !CONFIG_KVM_ASYNC_PF so that the caller needn't
>> consider if CONFIG_KVM_ASYNC_PF is enabled.
>>
>> All checks on the completion queue is done by the newly added inline
>> function since list_empty() and list_empty_careful() are interchangeable.
>>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> ---
>>   arch/x86/kvm/x86.c       |  2 +-
>>   include/linux/kvm_host.h | 10 ++++++++++
>>   virt/kvm/async_pf.c      | 10 +++++-----
>>   virt/kvm/kvm_main.c      |  4 +---
>>   4 files changed, 17 insertions(+), 9 deletions(-)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index e5d5c5ed7dd4..7f35d9324b99 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -11591,7 +11591,7 @@ static inline bool kvm_guest_apic_has_interrupt(struct kvm_vcpu *vcpu)
>>   
>>   static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
>>   {
>> -	if (!list_empty_careful(&vcpu->async_pf.done))
>> +	if (kvm_check_async_pf_completion_queue(vcpu))
>>   		return true;
>>   
>>   	if (kvm_apic_has_events(vcpu))
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index 85b61a456f1c..a5f990f6dc35 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -339,12 +339,22 @@ struct kvm_async_pf {
>>   	bool				notpresent_injected;
>>   };
>>   
>> +static inline bool kvm_check_async_pf_completion_queue(struct kvm_vcpu *vcpu)
> 
> Nitpicking: When not reading the implementation, I'm not exactly sure
> what this function returns as 'check' is too ambiguous ('true' when the
> queue is full? when it's empty? when it's not empty? when it was
> properly set up?). I'd suggest we go with a more specific:
> 
> kvm_async_pf_completion_queue_empty() or something like that instead
> (we'll have to invert the logic everywhere then).
> 
> Side note: x86 seems to already use a shortened 'apf' instead of
> 'async_pf' in a number of places (e.g. 'apf_put_user_ready()'), we may
> want to either fight this practice or support the rebelion by renaming
> all functions from below instead :-)
> 

Yeah, I was wandering if the name is ambiguous when I had it. The
reason why I had the name is to be consistent with the existing
one, which is kvm_check_async_pf_completion().

Yes, kvm_async_pf_completion_queue_empty() is much better and I
will include this in next revision.

It's correct that x86 functions include 'apf', but the generic
functions, shared by multiple architectures, use 'async_pf' if
my understanding is correct. So I wouldn't bother to change
the generic function names in this series :)

>> +{
>> +	return !list_empty_careful(&vcpu->async_pf.done);
>> +}
>> +
>>   void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu);
>>   void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu);
>>   bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>>   			unsigned long hva, struct kvm_arch_async_pf *arch);
>>   int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
>>   #else
>> +static inline bool kvm_check_async_pf_completion_queue(struct kvm_vcpu *vcpu)
>> +{
>> +	return false;
>> +}
>> +
>>   static inline void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu) { }
>>   #endif
>>   
>> diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
>> index dd777688d14a..d145a61a046a 100644
>> --- a/virt/kvm/async_pf.c
>> +++ b/virt/kvm/async_pf.c
>> @@ -70,7 +70,7 @@ static void async_pf_execute(struct work_struct *work)
>>   		kvm_arch_async_page_present(vcpu, apf);
>>   
>>   	spin_lock(&vcpu->async_pf.lock);
>> -	first = list_empty(&vcpu->async_pf.done);
>> +	first = !kvm_check_async_pf_completion_queue(vcpu);
>>   	list_add_tail(&apf->link, &vcpu->async_pf.done);
>>   	apf->vcpu = NULL;
>>   	spin_unlock(&vcpu->async_pf.lock);
>> @@ -122,7 +122,7 @@ void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu)
>>   		spin_lock(&vcpu->async_pf.lock);
>>   	}
>>   
>> -	while (!list_empty(&vcpu->async_pf.done)) {
>> +	while (kvm_check_async_pf_completion_queue(vcpu)) {
>>   		struct kvm_async_pf *work =
>>   			list_first_entry(&vcpu->async_pf.done,
>>   					 typeof(*work), link);
>> @@ -138,7 +138,7 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu)
>>   {
>>   	struct kvm_async_pf *work;
>>   
>> -	while (!list_empty_careful(&vcpu->async_pf.done) &&
>> +	while (kvm_check_async_pf_completion_queue(vcpu) &&
>>   	      kvm_arch_can_dequeue_async_page_present(vcpu)) {
>>   		spin_lock(&vcpu->async_pf.lock);
>>   		work = list_first_entry(&vcpu->async_pf.done, typeof(*work),
>> @@ -205,7 +205,7 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu)
>>   	struct kvm_async_pf *work;
>>   	bool first;
>>   
>> -	if (!list_empty_careful(&vcpu->async_pf.done))
>> +	if (kvm_check_async_pf_completion_queue(vcpu))
>>   		return 0;
>>   
>>   	work = kmem_cache_zalloc(async_pf_cache, GFP_ATOMIC);
>> @@ -216,7 +216,7 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu)
>>   	INIT_LIST_HEAD(&work->queue); /* for list_del to work */
>>   
>>   	spin_lock(&vcpu->async_pf.lock);
>> -	first = list_empty(&vcpu->async_pf.done);
>> +	first = !kvm_check_async_pf_completion_queue(vcpu);
>>   	list_add_tail(&work->link, &vcpu->async_pf.done);
>>   	spin_unlock(&vcpu->async_pf.lock);
>>   
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index b50dbe269f4b..8795503651b1 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -3282,10 +3282,8 @@ static bool vcpu_dy_runnable(struct kvm_vcpu *vcpu)
>>   	if (kvm_arch_dy_runnable(vcpu))
>>   		return true;
>>   
>> -#ifdef CONFIG_KVM_ASYNC_PF
>> -	if (!list_empty_careful(&vcpu->async_pf.done))
>> +	if (kvm_check_async_pf_completion_queue(vcpu))
>>   		return true;
>> -#endif
>>   
>>   	return false;
>>   }
> 

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 14/15] arm64: Enable async PF
  2021-08-16 17:05   ` Vitaly Kuznetsov
@ 2021-08-17 10:49     ` Gavin Shan
  0 siblings, 0 replies; 36+ messages in thread
From: Gavin Shan @ 2021-08-17 10:49 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: linux-kernel, kvm, james.morse, mark.rutland, Jonathan.Cameron,
	will, maz, pbonzini, shan.gavin, kvmarm

Hi Vitaly,

On 8/17/21 3:05 AM, Vitaly Kuznetsov wrote:
> Gavin Shan <gshan@redhat.com> writes:
> 
>> This enables asynchronous page fault from guest side. The design
>> is highlighted as below:
>>
>>     * The per-vCPU shared memory region, which is represented by
>>       "struct kvm_vcpu_pv_apf_data", is allocated. The reason and
>>       token associated with the received notifications of asynchronous
>>       page fault are delivered through it.
>>
>>     * A per-vCPU table, which is represented by "struct kvm_apf_table",
>>       is allocated. The process, on which the page-not-present notification
>>       is received, is added into the table so that it can reschedule
>>       itself on switching from kernel to user mode. Afterwards, the
>>       process, identified by token, is removed from the table and put
>>       into runnable state when page-ready notification is received.
>>
>>     * During CPU hotplug, the (private) SDEI event is expected to be
>>       enabled or disabled on the affected CPU by SDEI client driver.
>>       The (PPI) interrupt is enabled or disabled on the affected CPU
>>       by ourself. When the system is going to reboot, the SDEI event
>>       is disabled and unregistered and the (PPI) interrupt is disabled.
>>
>>     * The SDEI event and (PPI) interrupt number are retrieved from host
>>       through SMCCC interface. Besides, the version of the asynchronous
>>       page fault is validated when the feature is enabled on the guest.
>>
>>     * The feature is disabled on guest when boot parameter "no-kvmapf"
>>       is specified.
> 
> Documentation/admin-guide/kernel-parameters.txt states this one is
> x86-only:
> 
>          no-kvmapf       [X86,KVM] Disable paravirtualized asynchronous page
>                          fault handling.
> 
> makes sense to update in this patch I believe.
> 

Yes, I will update in next revision.

>>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> ---
>>   arch/arm64/kernel/Makefile |   1 +
>>   arch/arm64/kernel/kvm.c    | 452 +++++++++++++++++++++++++++++++++++++
>>   2 files changed, 453 insertions(+)
>>   create mode 100644 arch/arm64/kernel/kvm.c
>>
>> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
>> index 3f1490bfb938..f0c1a6a7eaa7 100644
>> --- a/arch/arm64/kernel/Makefile
>> +++ b/arch/arm64/kernel/Makefile
>> @@ -59,6 +59,7 @@ obj-$(CONFIG_ACPI)			+= acpi.o
>>   obj-$(CONFIG_ACPI_NUMA)			+= acpi_numa.o
>>   obj-$(CONFIG_ARM64_ACPI_PARKING_PROTOCOL)	+= acpi_parking_protocol.o
>>   obj-$(CONFIG_PARAVIRT)			+= paravirt.o
>> +obj-$(CONFIG_KVM_GUEST)			+= kvm.o
>>   obj-$(CONFIG_RANDOMIZE_BASE)		+= kaslr.o
>>   obj-$(CONFIG_HIBERNATION)		+= hibernate.o hibernate-asm.o
>>   obj-$(CONFIG_KEXEC_CORE)		+= machine_kexec.o relocate_kernel.o	\
>> diff --git a/arch/arm64/kernel/kvm.c b/arch/arm64/kernel/kvm.c
>> new file mode 100644
>> index 000000000000..effe8dc7e921
>> --- /dev/null
>> +++ b/arch/arm64/kernel/kvm.c
>> @@ -0,0 +1,452 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Asynchronous page fault support.
>> + *
>> + * Copyright (C) 2021 Red Hat, Inc.
>> + *
>> + * Author(s): Gavin Shan <gshan@redhat.com>
>> + */
>> +
>> +#include <linux/kernel.h>
>> +#include <linux/spinlock.h>
>> +#include <linux/slab.h>
>> +#include <linux/interrupt.h>
>> +#include <linux/irq.h>
>> +#include <linux/of.h>
>> +#include <linux/of_fdt.h>
>> +#include <linux/arm-smccc.h>
>> +#include <linux/kvm_para.h>
>> +#include <linux/arm_sdei.h>
>> +#include <linux/acpi.h>
>> +#include <linux/cpuhotplug.h>
>> +#include <linux/reboot.h>
>> +
>> +struct kvm_apf_task {
>> +	unsigned int		token;
>> +	struct task_struct	*task;
>> +	struct swait_queue_head	wq;
>> +};
>> +
>> +struct kvm_apf_table {
>> +	raw_spinlock_t		lock;
>> +	unsigned int		count;
>> +	struct kvm_apf_task	tasks[0];
>> +};
>> +
>> +static bool async_pf_available = true;
>> +static DEFINE_PER_CPU_DECRYPTED(struct kvm_vcpu_pv_apf_data, apf_data) __aligned(64);
>> +static struct kvm_apf_table __percpu *apf_tables;
>> +static unsigned int apf_tasks;
>> +static unsigned int apf_sdei_num;
>> +static unsigned int apf_ppi_num;
>> +static int apf_irq;
>> +
>> +static bool kvm_async_pf_add_task(struct task_struct *task,
>> +				  unsigned int token)
>> +{
>> +	struct kvm_apf_table *table = this_cpu_ptr(apf_tables);
>> +	unsigned int i, index = apf_tasks;
>> +	bool ret = false;
>> +
>> +	raw_spin_lock(&table->lock);
>> +
>> +	if (WARN_ON(table->count >= apf_tasks))
>> +		goto unlock;
>> +
>> +	for (i = 0; i < apf_tasks; i++) {
>> +		if (!table->tasks[i].task) {
>> +			if (index == apf_tasks) {
>> +				ret = true;
>> +				index = i;
>> +			}
>> +		} else if (table->tasks[i].task == task) {
>> +			WARN_ON(table->tasks[i].token != token);
>> +			ret = false;
>> +			break;
>> +		}
>> +	}
>> +
>> +	if (!ret)
>> +		goto unlock;
>> +
>> +	task->thread.data = &table->tasks[index].wq;
>> +	set_tsk_thread_flag(task, TIF_ASYNC_PF);
>> +
>> +	table->count++;
>> +	table->tasks[index].task = task;
>> +	table->tasks[index].token = token;
>> +
>> +unlock:
>> +	raw_spin_unlock(&table->lock);
>> +	return ret;
>> +}
>> +
>> +static inline void kvm_async_pf_remove_one_task(struct kvm_apf_table *table,
>> +						unsigned int index)
>> +{
>> +	clear_tsk_thread_flag(table->tasks[index].task, TIF_ASYNC_PF);
>> +	WRITE_ONCE(table->tasks[index].task->thread.data, NULL);
>> +
>> +	table->count--;
>> +	table->tasks[index].task = NULL;
>> +	table->tasks[index].token = 0;
>> +
>> +	swake_up_one(&table->tasks[index].wq);
>> +}
>> +
>> +static bool kvm_async_pf_remove_task(unsigned int token)
>> +{
>> +	struct kvm_apf_table *table = this_cpu_ptr(apf_tables);
>> +	unsigned int i;
>> +	bool ret = (token == UINT_MAX);
>> +
>> +	raw_spin_lock(&table->lock);
>> +
>> +	for (i = 0; i < apf_tasks; i++) {
>> +		if (!table->tasks[i].task)
>> +			continue;
>> +
>> +		/* Wakeup all */
>> +		if (token == UINT_MAX) {
>> +			kvm_async_pf_remove_one_task(table, i);
>> +			continue;
>> +		}
>> +
>> +		if (table->tasks[i].token == token) {
>> +			kvm_async_pf_remove_one_task(table, i);
>> +			ret = true;
>> +			break;
>> +		}
>> +	}
>> +
>> +	raw_spin_unlock(&table->lock);
>> +
>> +	return ret;
>> +}
>> +
>> +static int kvm_async_pf_sdei_handler(unsigned int event,
>> +				     struct pt_regs *regs,
>> +				     void *arg)
>> +{
>> +	unsigned int reason = __this_cpu_read(apf_data.reason);
>> +	unsigned int token = __this_cpu_read(apf_data.token);
>> +	bool ret;
>> +
>> +	if (reason != KVM_PV_REASON_PAGE_NOT_PRESENT) {
>> +		pr_warn("%s: Bogus notification (%d, 0x%08x)\n",
>> +			__func__, reason, token);
>> +		return -EINVAL;
>> +	}
>> +
>> +	ret = kvm_async_pf_add_task(current, token);
>> +	__this_cpu_write(apf_data.token, 0);
>> +	__this_cpu_write(apf_data.reason, 0);
>> +
>> +	if (!ret)
>> +		return -ENOSPC;
>> +
>> +	smp_send_reschedule(smp_processor_id());
>> +
>> +	return 0;
>> +}
>> +
>> +static irqreturn_t kvm_async_pf_irq_handler(int irq, void *dev_id)
>> +{
>> +	unsigned int reason = __this_cpu_read(apf_data.reason);
>> +	unsigned int token = __this_cpu_read(apf_data.token);
>> +	struct arm_smccc_res res;
>> +
>> +	if (reason != KVM_PV_REASON_PAGE_READY) {
>> +		pr_warn("%s: Bogus interrupt %d (%d, 0x%08x)\n",
>> +			__func__, irq, reason, token);
> 
> Spurrious interrupt or bogus APF reason set? Could be both I belive.
> 

Yes, It could be both and the message can be more specific like below:

                 pr_warn("%s: Wrong interrupt (%d) or state (%d 0x%08x) received\n",
                         __func__, irq, reason, token);

>> +		return IRQ_HANDLED;
>> +	}
>> +
>> +	kvm_async_pf_remove_task(token);
>> +
>> +	__this_cpu_write(apf_data.token, 0);
>> +	__this_cpu_write(apf_data.reason, 0);
>> +	arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_ASYNC_PF_FUNC_ID,
>> +			     ARM_SMCCC_KVM_FUNC_ASYNC_PF_IRQ_ACK, &res);
>> +
>> +	return IRQ_HANDLED;
>> +}
>> +
>> +static int __init kvm_async_pf_available(char *arg)
>> +{
>> +	async_pf_available = false;
>> +
>> +	return 0;
>> +}
>> +early_param("no-kvmapf", kvm_async_pf_available);
>> +
>> +static void kvm_async_pf_disable(void)
>> +{
>> +	struct arm_smccc_res res;
>> +	u32 enabled = __this_cpu_read(apf_data.enabled);
>> +
>> +	if (!enabled)
>> +		return;
>> +
>> +	/* Disable the functionality */
>> +	arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_ASYNC_PF_FUNC_ID,
>> +			     ARM_SMCCC_KVM_FUNC_ASYNC_PF_ENABLE,
>> +			     0, 0, &res);
>> +	if (res.a0 != SMCCC_RET_SUCCESS) {
>> +		pr_warn("%s: Error %ld to disable on CPU%d\n",
>> +			__func__, res.a0, smp_processor_id());
>> +		return;
>> +	}
>> +
>> +	__this_cpu_write(apf_data.enabled, 0);
>> +
>> +	pr_info("Async PF disabled on CPU%d\n", smp_processor_id());
> 
> Nitpicking: x86 uses
> 
> "setup async PF for cpu %d\n" and
> "disable async PF for cpu %d\n"
> 
> which are not ideal maybe but in any case it would probably make sense
> to be consistent across arches.
> 

Yes, It's worthwhile to do so :)

> 
>> +}
>> +
>> +static void kvm_async_pf_enable(void)
>> +{
>> +	struct arm_smccc_res res;
>> +	u32 enabled = __this_cpu_read(apf_data.enabled);
>> +	u64 val = virt_to_phys(this_cpu_ptr(&apf_data));
>> +
>> +	if (enabled)
>> +		return;
>> +
>> +	val |= KVM_ASYNC_PF_ENABLED;
>> +	arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_ASYNC_PF_FUNC_ID,
>> +			     ARM_SMCCC_KVM_FUNC_ASYNC_PF_ENABLE,
>> +			     (u32)val, (u32)(val >> 32), &res);
>> +	if (res.a0 != SMCCC_RET_SUCCESS) {
>> +		pr_warn("%s: Error %ld to enable CPU%d\n",
>> +			__func__, res.a0, smp_processor_id());
>> +		return;
>> +	}
>> +
>> +	__this_cpu_write(apf_data.enabled, 1);
>> +
>> +	pr_info("Async PF enabled on CPU%d\n", smp_processor_id());
>> +}
>> +
>> +static void kvm_async_pf_cpu_disable(void *info)
>> +{
>> +	disable_percpu_irq(apf_irq);
>> +	kvm_async_pf_disable();
>> +}
>> +
>> +static void kvm_async_pf_cpu_enable(void *info)
>> +{
>> +	enable_percpu_irq(apf_irq, IRQ_TYPE_LEVEL_HIGH);
>> +	kvm_async_pf_enable();
>> +}
>> +
>> +static int kvm_async_pf_cpu_reboot_notify(struct notifier_block *nb,
>> +					  unsigned long code,
>> +					  void *unused)
>> +{
>> +	if (code == SYS_RESTART) {
>> +		sdei_event_disable(apf_sdei_num);
>> +		sdei_event_unregister(apf_sdei_num);
>> +
>> +		on_each_cpu(kvm_async_pf_cpu_disable, NULL, 1);
>> +	}
>> +
>> +	return NOTIFY_DONE;
>> +}
>> +
>> +static struct notifier_block kvm_async_pf_cpu_reboot_nb = {
>> +	.notifier_call = kvm_async_pf_cpu_reboot_notify,
>> +};
>> +
>> +static int kvm_async_pf_cpu_online(unsigned int cpu)
>> +{
>> +	kvm_async_pf_cpu_enable(NULL);
>> +
>> +	return 0;
>> +}
>> +
>> +static int kvm_async_pf_cpu_offline(unsigned int cpu)
>> +{
>> +	kvm_async_pf_cpu_disable(NULL);
>> +
>> +	return 0;
>> +}
>> +
>> +static int __init kvm_async_pf_check_version(void)
>> +{
>> +	struct arm_smccc_res res;
>> +
>> +	/*
>> +	 * Check the version and v1.0.0 or higher version is required
>> +	 * to support the functionality.
>> +	 */
>> +	arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_ASYNC_PF_FUNC_ID,
>> +			     ARM_SMCCC_KVM_FUNC_ASYNC_PF_VERSION, &res);
>> +	if (res.a0 != SMCCC_RET_SUCCESS) {
>> +		pr_warn("%s: Error %ld to get version\n",
>> +			__func__, res.a0);
>> +		return -EPERM;
>> +	}
>> +
>> +	if ((res.a1 & 0xFFFFFFFFFF000000) ||
>> +	    ((res.a1 & 0xFF0000) >> 16) < 0x1) {
>> +		pr_warn("%s: Invalid version (0x%016lx)\n",
>> +			__func__, res.a1);
>> +		return -EINVAL;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int __init kvm_async_pf_info(void)
>> +{
>> +	struct arm_smccc_res res;
>> +
>> +	/* Retrieve number of tokens */
>> +	arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_ASYNC_PF_FUNC_ID,
>> +			     ARM_SMCCC_KVM_FUNC_ASYNC_PF_SLOTS, &res);
>> +	if (res.a0 != SMCCC_RET_SUCCESS) {
>> +		pr_warn("%s: Error %ld to get token number\n",
>> +			__func__, res.a0);
>> +		return -EPERM;
>> +	}
>> +
>> +	apf_tasks = res.a1 * 2;
>> +
>> +	/* Retrieve SDEI event number */
>> +	arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_ASYNC_PF_FUNC_ID,
>> +			     ARM_SMCCC_KVM_FUNC_ASYNC_PF_SDEI, &res);
>> +	if (res.a0 != SMCCC_RET_SUCCESS) {
>> +		pr_warn("%s: Error %ld to get SDEI event number\n",
>> +			__func__, res.a0);
>> +		return -EPERM;
>> +	}
>> +
>> +	apf_sdei_num = res.a1;
>> +
>> +	/* Retrieve (PPI) interrupt number */
>> +	arm_smccc_1_1_invoke(ARM_SMCCC_VENDOR_HYP_KVM_ASYNC_PF_FUNC_ID,
>> +			     ARM_SMCCC_KVM_FUNC_ASYNC_PF_IRQ, &res);
>> +	if (res.a0 != SMCCC_RET_SUCCESS) {
>> +		pr_warn("%s: Error %ld to get IRQ\n",
>> +			__func__, res.a0);
>> +		return -EPERM;
>> +	}
>> +
>> +	apf_ppi_num = res.a1;
>> +
>> +	return 0;
>> +}
>> +
>> +static int __init kvm_async_pf_init(void)
>> +{
>> +	struct kvm_apf_table *table;
>> +	size_t size;
>> +	int cpu, i, ret;
>> +
>> +	if (!kvm_para_has_feature(KVM_FEATURE_ASYNC_PF) ||
>> +	    !async_pf_available)
>> +		return -EPERM;
>> +
>> +	ret = kvm_async_pf_check_version();
>> +	if (ret)
>> +		return ret;
>> +
>> +	ret = kvm_async_pf_info();
>> +	if (ret)
>> +		return ret;
>> +
>> +	/* Allocate and initialize the sleeper table */
>> +	size = sizeof(struct kvm_apf_table) +
>> +	       apf_tasks * sizeof(struct kvm_apf_task);
>> +	apf_tables = __alloc_percpu(size, 0);
>> +	if (!apf_tables) {
>> +		pr_warn("%s: Unable to alloc async PF table\n",
>> +			__func__);
>> +		return -ENOMEM;
>> +	}
>> +
>> +	for_each_possible_cpu(cpu) {
>> +		table = per_cpu_ptr(apf_tables, cpu);
>> +		raw_spin_lock_init(&table->lock);
>> +		for (i = 0; i < apf_tasks; i++)
>> +			init_swait_queue_head(&table->tasks[i].wq);
>> +	}
>> +
>> +	/*
>> +	 * Initialize SDEI event for page-not-present notification.
>> +	 * The SDEI event number should have been retrieved from
>> +	 * the host.
>> +	 */
>> +	ret = sdei_event_register(apf_sdei_num,
>> +				  kvm_async_pf_sdei_handler, NULL);
>> +	if (ret) {
>> +		pr_warn("%s: Error %d to register SDEI event\n",
>> +			__func__, ret);
>> +		ret = -EIO;
>> +		goto release_tables;
>> +	}
>> +
>> +	ret = sdei_event_enable(apf_sdei_num);
>> +	if (ret) {
>> +		pr_warn("%s: Error %d to enable SDEI event\n",
>> +			__func__, ret);
>> +		goto unregister_event;
>> +	}
>> +
>> +	/*
>> +	 * Initialize interrupt for page-ready notification. The
>> +	 * interrupt number and its properties should have been
>> +	 * retrieved from the ACPI:APFT table.
>> +	 */
>> +	apf_irq = acpi_register_gsi(NULL, apf_ppi_num,
>> +				    ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_HIGH);
>> +	if (apf_irq <= 0) {
>> +		ret = -EIO;
>> +		pr_warn("%s: Error %d to register IRQ\n",
>> +			__func__, apf_irq);
>> +		goto disable_event;
>> +	}
>> +
>> +	ret = request_percpu_irq(apf_irq, kvm_async_pf_irq_handler,
>> +				 "Asynchronous Page Fault", &apf_data);
>> +	if (ret) {
>> +		pr_warn("%s: Error %d to request IRQ\n",
>> +			__func__, ret);
>> +		goto unregister_irq;
>> +	}
>> +
>> +	register_reboot_notifier(&kvm_async_pf_cpu_reboot_nb);
>> +	ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
>> +			"arm/kvm:online", kvm_async_pf_cpu_online,
>> +			kvm_async_pf_cpu_offline);
>> +	if (ret < 0) {
>> +		pr_warn("%s: Error %d to install cpu hotplug callbacks\n",
>> +			__func__, ret);
>> +		goto release_irq;
>> +	}
>> +
>> +	/* Enable async PF on the online CPUs */
>> +	on_each_cpu(kvm_async_pf_cpu_enable, NULL, 1);
>> +
>> +	return 0;
>> +
>> +release_irq:
>> +	free_percpu_irq(apf_irq, &apf_data);
>> +unregister_irq:
>> +	acpi_unregister_gsi(apf_ppi_num);
>> +disable_event:
>> +	sdei_event_disable(apf_sdei_num);
>> +unregister_event:
>> +	sdei_event_unregister(apf_sdei_num);
>> +release_tables:
>> +	free_percpu(apf_tables);
>> +
>> +	return ret;
>> +}
>> +
>> +static int __init kvm_guest_init(void)
>> +{
>> +	return kvm_async_pf_init();
>> +}
>> +
>> +fs_initcall(kvm_guest_init);
> 

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 02/15] KVM: async_pf: Add helper function to check completion queue
  2021-08-15  0:59 ` [PATCH v4 02/15] KVM: async_pf: Add helper function to check completion queue Gavin Shan
  2021-08-16 16:53   ` Vitaly Kuznetsov
@ 2021-11-10 15:37   ` Eric Auger
  2022-01-13  7:38     ` Gavin Shan
  1 sibling, 1 reply; 36+ messages in thread
From: Eric Auger @ 2021-11-10 15:37 UTC (permalink / raw)
  To: Gavin Shan, kvmarm
  Cc: kvm, maz, linux-kernel, shan.gavin, Jonathan.Cameron, pbonzini,
	vkuznets, will

Hi Gavin,

On 8/15/21 2:59 AM, Gavin Shan wrote:
> This adds inline helper kvm_check_async_pf_completion_queue() to
> check if there are pending completion in the queue. The empty stub
> is also added on !CONFIG_KVM_ASYNC_PF so that the caller needn't
> consider if CONFIG_KVM_ASYNC_PF is enabled.
> 
> All checks on the completion queue is done by the newly added inline
> function since list_empty() and list_empty_careful() are interchangeable.
why is it interchangeable?

> 
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
>  arch/x86/kvm/x86.c       |  2 +-
>  include/linux/kvm_host.h | 10 ++++++++++
>  virt/kvm/async_pf.c      | 10 +++++-----
>  virt/kvm/kvm_main.c      |  4 +---
>  4 files changed, 17 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e5d5c5ed7dd4..7f35d9324b99 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -11591,7 +11591,7 @@ static inline bool kvm_guest_apic_has_interrupt(struct kvm_vcpu *vcpu)
>  
>  static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
>  {
> -	if (!list_empty_careful(&vcpu->async_pf.done))
> +	if (kvm_check_async_pf_completion_queue(vcpu))
>  		return true;
>  
>  	if (kvm_apic_has_events(vcpu))
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 85b61a456f1c..a5f990f6dc35 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -339,12 +339,22 @@ struct kvm_async_pf {
>  	bool				notpresent_injected;
>  };
>  
> +static inline bool kvm_check_async_pf_completion_queue(struct kvm_vcpu *vcpu)
> +{
> +	return !list_empty_careful(&vcpu->async_pf.done);
> +}
> +
>  void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu);
>  void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu);
>  bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>  			unsigned long hva, struct kvm_arch_async_pf *arch);
>  int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
>  #else
> +static inline bool kvm_check_async_pf_completion_queue(struct kvm_vcpu *vcpu)
> +{
> +	return false;
> +}
> +
>  static inline void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu) { }
>  #endif
>  
> diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
> index dd777688d14a..d145a61a046a 100644
> --- a/virt/kvm/async_pf.c
> +++ b/virt/kvm/async_pf.c
> @@ -70,7 +70,7 @@ static void async_pf_execute(struct work_struct *work)
>  		kvm_arch_async_page_present(vcpu, apf);
>  
>  	spin_lock(&vcpu->async_pf.lock);
> -	first = list_empty(&vcpu->async_pf.done);
> +	first = !kvm_check_async_pf_completion_queue(vcpu);
>  	list_add_tail(&apf->link, &vcpu->async_pf.done);
>  	apf->vcpu = NULL;
>  	spin_unlock(&vcpu->async_pf.lock);
> @@ -122,7 +122,7 @@ void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu)
>  		spin_lock(&vcpu->async_pf.lock);
>  	}
>  
> -	while (!list_empty(&vcpu->async_pf.done)) {
> +	while (kvm_check_async_pf_completion_queue(vcpu)) {
this is replaced by a stronger check. Please can you explain why is it
equivalent?
>  		struct kvm_async_pf *work =
>  			list_first_entry(&vcpu->async_pf.done,
>  					 typeof(*work), link);
> @@ -138,7 +138,7 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_async_pf *work;
>  
> -	while (!list_empty_careful(&vcpu->async_pf.done) &&
> +	while (kvm_check_async_pf_completion_queue(vcpu) &&
>  	      kvm_arch_can_dequeue_async_page_present(vcpu)) {
>  		spin_lock(&vcpu->async_pf.lock);
>  		work = list_first_entry(&vcpu->async_pf.done, typeof(*work),
> @@ -205,7 +205,7 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu)
>  	struct kvm_async_pf *work;
>  	bool first;
>  
> -	if (!list_empty_careful(&vcpu->async_pf.done))
> +	if (kvm_check_async_pf_completion_queue(vcpu))
>  		return 0;
>  
>  	work = kmem_cache_zalloc(async_pf_cache, GFP_ATOMIC);
> @@ -216,7 +216,7 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu)
>  	INIT_LIST_HEAD(&work->queue); /* for list_del to work */
>  
>  	spin_lock(&vcpu->async_pf.lock);
> -	first = list_empty(&vcpu->async_pf.done);
> +	first = !kvm_check_async_pf_completion_queue(vcpu);
>  	list_add_tail(&work->link, &vcpu->async_pf.done);
>  	spin_unlock(&vcpu->async_pf.lock);
>  
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index b50dbe269f4b..8795503651b1 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3282,10 +3282,8 @@ static bool vcpu_dy_runnable(struct kvm_vcpu *vcpu)
>  	if (kvm_arch_dy_runnable(vcpu))
>  		return true;
>  
> -#ifdef CONFIG_KVM_ASYNC_PF
> -	if (!list_empty_careful(&vcpu->async_pf.done))
> +	if (kvm_check_async_pf_completion_queue(vcpu))
>  		return true;
> -#endif
>  
>  	return false;
>  }
> 
Thanks

Eric


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 01/15] KVM: async_pf: Move struct kvm_async_pf around
  2021-08-15  0:59 ` [PATCH v4 01/15] KVM: async_pf: Move struct kvm_async_pf around Gavin Shan
@ 2021-11-10 15:37   ` Eric Auger
  2022-01-13  7:21     ` Gavin Shan
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Auger @ 2021-11-10 15:37 UTC (permalink / raw)
  To: Gavin Shan, kvmarm
  Cc: kvm, maz, linux-kernel, shan.gavin, Jonathan.Cameron, pbonzini,
	vkuznets, will

Hi Gavin,

On 8/15/21 2:59 AM, Gavin Shan wrote:
> This moves the definition of "struct kvm_async_pf" and the related
> functions after "struct kvm_vcpu" so that newly added inline functions
> in the subsequent patches can dereference "struct kvm_vcpu" properly.
> Otherwise, the unexpected build error will be raised:
> 
>    error: dereferencing pointer to incomplete type ‘struct kvm_vcpu’
>    return !list_empty_careful(&vcpu->async_pf.done);
>                                    ^~
> Since we're here, the sepator between type and field in "struct kvm_vcpu"
separator
> is replaced by tab. The empty stub kvm_check_async_pf_completion() is also
> added on !CONFIG_KVM_ASYNC_PF, which is needed by subsequent patches to
> support asynchronous page fault on ARM64.
> 
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
>  include/linux/kvm_host.h | 44 +++++++++++++++++++++-------------------
>  1 file changed, 23 insertions(+), 21 deletions(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index ae7735b490b4..85b61a456f1c 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -199,27 +199,6 @@ int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
>  struct kvm_io_device *kvm_io_bus_get_dev(struct kvm *kvm, enum kvm_bus bus_idx,
>  					 gpa_t addr);
>  
> -#ifdef CONFIG_KVM_ASYNC_PF
> -struct kvm_async_pf {
> -	struct work_struct work;
> -	struct list_head link;
> -	struct list_head queue;
> -	struct kvm_vcpu *vcpu;
> -	struct mm_struct *mm;
> -	gpa_t cr2_or_gpa;
> -	unsigned long addr;
> -	struct kvm_arch_async_pf arch;
> -	bool   wakeup_all;
> -	bool notpresent_injected;
> -};
> -
> -void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu);
> -void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu);
> -bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> -			unsigned long hva, struct kvm_arch_async_pf *arch);
> -int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
> -#endif
> -
>  #ifdef KVM_ARCH_WANT_MMU_NOTIFIER
>  struct kvm_gfn_range {
>  	struct kvm_memory_slot *slot;
> @@ -346,6 +325,29 @@ struct kvm_vcpu {
>  	struct kvm_dirty_ring dirty_ring;
>  };
>  
> +#ifdef CONFIG_KVM_ASYNC_PF
> +struct kvm_async_pf {
> +	struct work_struct		work;
> +	struct list_head		link;
> +	struct list_head		queue;
> +	struct kvm_vcpu			*vcpu;
> +	struct mm_struct		*mm;
> +	gpa_t				cr2_or_gpa;
> +	unsigned long			addr;
> +	struct kvm_arch_async_pf	arch;
> +	bool				wakeup_all;
> +	bool				notpresent_injected;
> +};
> +
> +void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu);
> +void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu);
> +bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> +			unsigned long hva, struct kvm_arch_async_pf *arch);
> +int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
> +#else
> +static inline void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu) { }
why is that stub needed on ARM64 and not on the other archs?

Eric
> +#endif
> +
>  /* must be called with irqs disabled */
>  static __always_inline void guest_enter_irqoff(void)
>  {
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 03/15] KVM: async_pf: Make GFN slot management generic
  2021-08-15  0:59 ` [PATCH v4 03/15] KVM: async_pf: Make GFN slot management generic Gavin Shan
@ 2021-11-10 17:00   ` Eric Auger
  2022-01-13  7:42     ` Gavin Shan
  2021-11-10 17:00   ` Eric Auger
  1 sibling, 1 reply; 36+ messages in thread
From: Eric Auger @ 2021-11-10 17:00 UTC (permalink / raw)
  To: Gavin Shan, kvmarm
  Cc: kvm, maz, linux-kernel, shan.gavin, Jonathan.Cameron, pbonzini,
	vkuznets, will

Hi Gavin,

On 8/15/21 2:59 AM, Gavin Shan wrote:
> It's not allowed to fire duplicate notification for same GFN on
> x86 platform, with help of a hash table. This mechanism is going
s/, with help of a hash table/this is achieved through a hash table
> to be used by arm64 and this makes the code generic and shareable
s/and this makes/.\n Turn the code generic
> by multiple platforms.
> 
>    * As this mechanism isn't needed by all platforms, a new kernel
>      config option (CONFIG_ASYNC_PF_SLOT) is introduced so that it
>      can be disabled at compiling time.
compile time
> 
>    * The code is basically copied from x86 platform and the functions
>      are renamed to reflect the fact: (a) the input parameters are
>      vCPU and GFN. 
not for reset
(b) The operations are resetting, searching, adding
>      and removing.
find, add, remove ops are renamed with _slot suffix
> 
>    * Helper stub is also added on !CONFIG_KVM_ASYNC_PF because we're
>      going to use IS_ENABLED() instead of #ifdef on arm64 when the
>      asynchronous page fault is supported.
> 
> This is preparatory work to use the newly introduced functions on x86
> platform and arm64 in subsequent patches.
> 
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
>  include/linux/kvm_host.h | 18 +++++++++
>  virt/kvm/Kconfig         |  3 ++
>  virt/kvm/async_pf.c      | 85 ++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 106 insertions(+)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index a5f990f6dc35..a9685c2b2250 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -298,6 +298,9 @@ struct kvm_vcpu {
>  
>  #ifdef CONFIG_KVM_ASYNC_PF
>  	struct {
> +#ifdef CONFIG_KVM_ASYNC_PF_SLOT
> +		gfn_t gfns[ASYNC_PF_PER_VCPU];
> +#endif
>  		u32 queued;
>  		struct list_head queue;
>  		struct list_head done;
> @@ -339,6 +342,13 @@ struct kvm_async_pf {
>  	bool				notpresent_injected;
>  };
>  
> +#ifdef CONFIG_KVM_ASYNC_PF_SLOT
> +void kvm_async_pf_reset_slot(struct kvm_vcpu *vcpu);
this does not reset a "slot" but the whole hash table. So to me this
shouldn't be renamed with _slot suffix. reset_hash or reset_all_slots?
> +void kvm_async_pf_add_slot(struct kvm_vcpu *vcpu, gfn_t gfn);
> +void kvm_async_pf_remove_slot(struct kvm_vcpu *vcpu, gfn_t gfn);
> +bool kvm_async_pf_find_slot(struct kvm_vcpu *vcpu, gfn_t gfn);
> +#endif
> +
>  static inline bool kvm_check_async_pf_completion_queue(struct kvm_vcpu *vcpu)
>  {
>  	return !list_empty_careful(&vcpu->async_pf.done);
> @@ -350,6 +360,14 @@ bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>  			unsigned long hva, struct kvm_arch_async_pf *arch);
>  int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
>  #else
> +static inline void kvm_async_pf_reset_slot(struct kvm_vcpu *vcpu) { }
> +static inline void kvm_async_pf_add_slot(struct kvm_vcpu *vcpu, gfn_t gfn) { }
> +static inline void kvm_async_pf_remove_slot(struct kvm_vcpu *vcpu, gfn_t gfn) { }
> +static inline bool kvm_async_pf_find_slot(struct kvm_vcpu *vcpu, gfn_t gfn)
> +{
> +	return false;
> +}
> +
>  static inline bool kvm_check_async_pf_completion_queue(struct kvm_vcpu *vcpu)
>  {
>  	return false;
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 62b39149b8c8..59b518c8c205 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -23,6 +23,9 @@ config KVM_MMIO
>  config KVM_ASYNC_PF
>         bool
>  
> +config KVM_ASYNC_PF_SLOT
> +	bool
> +
>  # Toggle to switch between direct notification and batch job
>  config KVM_ASYNC_PF_SYNC
>         bool
> diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
> index d145a61a046a..0d1fdb2932af 100644
> --- a/virt/kvm/async_pf.c
> +++ b/virt/kvm/async_pf.c
> @@ -13,12 +13,97 @@
>  #include <linux/module.h>
>  #include <linux/mmu_context.h>
>  #include <linux/sched/mm.h>
> +#ifdef CONFIG_KVM_ASYNC_PF_SLOT
> +#include <linux/hash.h>
> +#endif
>  
>  #include "async_pf.h"
>  #include <trace/events/kvm.h>
>  
>  static struct kmem_cache *async_pf_cache;
>  
> +#ifdef CONFIG_KVM_ASYNC_PF_SLOT
> +static inline u32 kvm_async_pf_hash(gfn_t gfn)
> +{
> +	BUILD_BUG_ON(!is_power_of_2(ASYNC_PF_PER_VCPU));
> +
> +	return hash_32(gfn & 0xffffffff, order_base_2(ASYNC_PF_PER_VCPU));
> +}
> +
> +static inline u32 kvm_async_pf_next_slot(u32 key)
> +{
> +	return (key + 1) & (ASYNC_PF_PER_VCPU - 1);
> +}
> +
> +static u32 kvm_async_pf_slot(struct kvm_vcpu *vcpu, gfn_t gfn)
> +{
> +	u32 key = kvm_async_pf_hash(gfn);
> +	int i;
> +
> +	for (i = 0; i < ASYNC_PF_PER_VCPU &&
> +		(vcpu->async_pf.gfns[key] != gfn &&
> +		vcpu->async_pf.gfns[key] != ~0); i++)
> +		key = kvm_async_pf_next_slot(key);
> +
> +	return key;
> +}
> +
> +void kvm_async_pf_reset_slot(struct kvm_vcpu *vcpu)
> +{
> +	int i;
> +
> +	for (i = 0; i < ASYNC_PF_PER_VCPU; i++)
> +		vcpu->async_pf.gfns[i] = ~0;
> +}
> +
> +void kvm_async_pf_add_slot(struct kvm_vcpu *vcpu, gfn_t gfn)
> +{
> +	u32 key = kvm_async_pf_hash(gfn);
> +
> +	while (vcpu->async_pf.gfns[key] != ~0)
> +		key = kvm_async_pf_next_slot(key);
> +
> +	vcpu->async_pf.gfns[key] = gfn;
> +}
> +
> +void kvm_async_pf_remove_slot(struct kvm_vcpu *vcpu, gfn_t gfn)
> +{
> +	u32 i, j, k;
> +
> +	i = j = kvm_async_pf_slot(vcpu, gfn);
> +
> +	if (WARN_ON_ONCE(vcpu->async_pf.gfns[i] != gfn))
> +		return;
> +
> +	while (true) {
> +		vcpu->async_pf.gfns[i] = ~0;
> +
> +		do {
> +			j = kvm_async_pf_next_slot(j);
> +			if (vcpu->async_pf.gfns[j] == ~0)
> +				return;
> +
> +			k = kvm_async_pf_hash(vcpu->async_pf.gfns[j]);
> +			/*
> +			 * k lies cyclically in ]i,j]
> +			 * |    i.k.j |
> +			 * |....j i.k.| or  |.k..j i...|
> +			 */
> +		} while ((i <= j) ? (i < k && k <= j) : (i < k || k <= j));
> +
> +		vcpu->async_pf.gfns[i] = vcpu->async_pf.gfns[j];
> +		i = j;
> +	}
> +}
> +
> +bool kvm_async_pf_find_slot(struct kvm_vcpu *vcpu, gfn_t gfn)
> +{
> +	u32 key = kvm_async_pf_slot(vcpu, gfn);
> +
> +	return vcpu->async_pf.gfns[key] == gfn;
> +}
> +#endif /* CONFIG_KVM_ASYNC_PF_SLOT */
> +
>  int kvm_async_pf_init(void)
>  {
>  	async_pf_cache = KMEM_CACHE(kvm_async_pf, 0);
> 
Thanks

Eric


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 03/15] KVM: async_pf: Make GFN slot management generic
  2021-08-15  0:59 ` [PATCH v4 03/15] KVM: async_pf: Make GFN slot management generic Gavin Shan
  2021-11-10 17:00   ` Eric Auger
@ 2021-11-10 17:00   ` Eric Auger
  1 sibling, 0 replies; 36+ messages in thread
From: Eric Auger @ 2021-11-10 17:00 UTC (permalink / raw)
  To: Gavin Shan, kvmarm
  Cc: kvm, maz, linux-kernel, shan.gavin, Jonathan.Cameron, pbonzini,
	vkuznets, will

Hi Gavin,

On 8/15/21 2:59 AM, Gavin Shan wrote:
> It's not allowed to fire duplicate notification for same GFN on
> x86 platform, with help of a hash table. This mechanism is going
s/, with help of a hash table/this is achieved through a hash table
> to be used by arm64 and this makes the code generic and shareable
s/and this makes/.\n Turn the code generic
> by multiple platforms.
> 
>    * As this mechanism isn't needed by all platforms, a new kernel
>      config option (CONFIG_ASYNC_PF_SLOT) is introduced so that it
>      can be disabled at compiling time.
compile time
> 
>    * The code is basically copied from x86 platform and the functions
>      are renamed to reflect the fact: (a) the input parameters are
>      vCPU and GFN. 
not for reset
(b) The operations are resetting, searching, adding
>      and removing.
find, add, remove ops are renamed with _slot suffix
> 
>    * Helper stub is also added on !CONFIG_KVM_ASYNC_PF because we're
>      going to use IS_ENABLED() instead of #ifdef on arm64 when the
>      asynchronous page fault is supported.
> 
> This is preparatory work to use the newly introduced functions on x86
> platform and arm64 in subsequent patches.
> 
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
>  include/linux/kvm_host.h | 18 +++++++++
>  virt/kvm/Kconfig         |  3 ++
>  virt/kvm/async_pf.c      | 85 ++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 106 insertions(+)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index a5f990f6dc35..a9685c2b2250 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -298,6 +298,9 @@ struct kvm_vcpu {
>  
>  #ifdef CONFIG_KVM_ASYNC_PF
>  	struct {
> +#ifdef CONFIG_KVM_ASYNC_PF_SLOT
> +		gfn_t gfns[ASYNC_PF_PER_VCPU];
> +#endif
>  		u32 queued;
>  		struct list_head queue;
>  		struct list_head done;
> @@ -339,6 +342,13 @@ struct kvm_async_pf {
>  	bool				notpresent_injected;
>  };
>  
> +#ifdef CONFIG_KVM_ASYNC_PF_SLOT
> +void kvm_async_pf_reset_slot(struct kvm_vcpu *vcpu);
this does not reset a "slot" but the whole hash table. So to me this
shouldn't be renamed with _slot suffix. reset_hash or reset_all_slots?
> +void kvm_async_pf_add_slot(struct kvm_vcpu *vcpu, gfn_t gfn);
> +void kvm_async_pf_remove_slot(struct kvm_vcpu *vcpu, gfn_t gfn);
> +bool kvm_async_pf_find_slot(struct kvm_vcpu *vcpu, gfn_t gfn);
> +#endif
> +
>  static inline bool kvm_check_async_pf_completion_queue(struct kvm_vcpu *vcpu)
>  {
>  	return !list_empty_careful(&vcpu->async_pf.done);
> @@ -350,6 +360,14 @@ bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>  			unsigned long hva, struct kvm_arch_async_pf *arch);
>  int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
>  #else
> +static inline void kvm_async_pf_reset_slot(struct kvm_vcpu *vcpu) { }
> +static inline void kvm_async_pf_add_slot(struct kvm_vcpu *vcpu, gfn_t gfn) { }
> +static inline void kvm_async_pf_remove_slot(struct kvm_vcpu *vcpu, gfn_t gfn) { }
> +static inline bool kvm_async_pf_find_slot(struct kvm_vcpu *vcpu, gfn_t gfn)
> +{
> +	return false;
> +}
> +
>  static inline bool kvm_check_async_pf_completion_queue(struct kvm_vcpu *vcpu)
>  {
>  	return false;
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 62b39149b8c8..59b518c8c205 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -23,6 +23,9 @@ config KVM_MMIO
>  config KVM_ASYNC_PF
>         bool
>  
> +config KVM_ASYNC_PF_SLOT
> +	bool
> +
>  # Toggle to switch between direct notification and batch job
>  config KVM_ASYNC_PF_SYNC
>         bool
> diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
> index d145a61a046a..0d1fdb2932af 100644
> --- a/virt/kvm/async_pf.c
> +++ b/virt/kvm/async_pf.c
> @@ -13,12 +13,97 @@
>  #include <linux/module.h>
>  #include <linux/mmu_context.h>
>  #include <linux/sched/mm.h>
> +#ifdef CONFIG_KVM_ASYNC_PF_SLOT
> +#include <linux/hash.h>
> +#endif
>  
>  #include "async_pf.h"
>  #include <trace/events/kvm.h>
>  
>  static struct kmem_cache *async_pf_cache;
>  
> +#ifdef CONFIG_KVM_ASYNC_PF_SLOT
> +static inline u32 kvm_async_pf_hash(gfn_t gfn)
> +{
> +	BUILD_BUG_ON(!is_power_of_2(ASYNC_PF_PER_VCPU));
> +
> +	return hash_32(gfn & 0xffffffff, order_base_2(ASYNC_PF_PER_VCPU));
> +}
> +
> +static inline u32 kvm_async_pf_next_slot(u32 key)
> +{
> +	return (key + 1) & (ASYNC_PF_PER_VCPU - 1);
> +}
> +
> +static u32 kvm_async_pf_slot(struct kvm_vcpu *vcpu, gfn_t gfn)
> +{
> +	u32 key = kvm_async_pf_hash(gfn);
> +	int i;
> +
> +	for (i = 0; i < ASYNC_PF_PER_VCPU &&
> +		(vcpu->async_pf.gfns[key] != gfn &&
> +		vcpu->async_pf.gfns[key] != ~0); i++)
> +		key = kvm_async_pf_next_slot(key);
> +
> +	return key;
> +}
> +
> +void kvm_async_pf_reset_slot(struct kvm_vcpu *vcpu)
> +{
> +	int i;
> +
> +	for (i = 0; i < ASYNC_PF_PER_VCPU; i++)
> +		vcpu->async_pf.gfns[i] = ~0;
> +}
> +
> +void kvm_async_pf_add_slot(struct kvm_vcpu *vcpu, gfn_t gfn)
> +{
> +	u32 key = kvm_async_pf_hash(gfn);
> +
> +	while (vcpu->async_pf.gfns[key] != ~0)
> +		key = kvm_async_pf_next_slot(key);
> +
> +	vcpu->async_pf.gfns[key] = gfn;
> +}
> +
> +void kvm_async_pf_remove_slot(struct kvm_vcpu *vcpu, gfn_t gfn)
> +{
> +	u32 i, j, k;
> +
> +	i = j = kvm_async_pf_slot(vcpu, gfn);
> +
> +	if (WARN_ON_ONCE(vcpu->async_pf.gfns[i] != gfn))
> +		return;
> +
> +	while (true) {
> +		vcpu->async_pf.gfns[i] = ~0;
> +
> +		do {
> +			j = kvm_async_pf_next_slot(j);
> +			if (vcpu->async_pf.gfns[j] == ~0)
> +				return;
> +
> +			k = kvm_async_pf_hash(vcpu->async_pf.gfns[j]);
> +			/*
> +			 * k lies cyclically in ]i,j]
> +			 * |    i.k.j |
> +			 * |....j i.k.| or  |.k..j i...|
> +			 */
> +		} while ((i <= j) ? (i < k && k <= j) : (i < k || k <= j));
> +
> +		vcpu->async_pf.gfns[i] = vcpu->async_pf.gfns[j];
> +		i = j;
> +	}
> +}
> +
> +bool kvm_async_pf_find_slot(struct kvm_vcpu *vcpu, gfn_t gfn)
> +{
> +	u32 key = kvm_async_pf_slot(vcpu, gfn);
> +
> +	return vcpu->async_pf.gfns[key] == gfn;
> +}
> +#endif /* CONFIG_KVM_ASYNC_PF_SLOT */
> +
>  int kvm_async_pf_init(void)
>  {
>  	async_pf_cache = KMEM_CACHE(kvm_async_pf, 0);
> 
Thanks

Eric


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 04/15] KVM: x86: Use generic async PF slot management
  2021-08-15  0:59 ` [PATCH v4 04/15] KVM: x86: Use generic async PF slot management Gavin Shan
@ 2021-11-10 17:03   ` Eric Auger
  2022-01-13  7:44     ` Gavin Shan
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Auger @ 2021-11-10 17:03 UTC (permalink / raw)
  To: Gavin Shan, kvmarm
  Cc: kvm, maz, linux-kernel, shan.gavin, Jonathan.Cameron, pbonzini,
	vkuznets, will

Hi Gavin,

On 8/15/21 2:59 AM, Gavin Shan wrote:
> This uses the generic slot management mechanism for asynchronous
Now we have moved the hash table management in the generic code, Use
this latter ...
> page fault by enabling CONFIG_KVM_ASYNC_PF_SLOT because the private
> implementation is totally duplicate to the generic one.
> 
> The changes introduced by this is pretty mechanical and shouldn't
> cause any logical changes.
suggest: No functional change intended.
> 
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
>  arch/x86/include/asm/kvm_host.h |  2 -
>  arch/x86/kvm/Kconfig            |  1 +
>  arch/x86/kvm/mmu/mmu.c          |  2 +-
>  arch/x86/kvm/x86.c              | 86 +++------------------------------
>  4 files changed, 8 insertions(+), 83 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 974cbfb1eefe..409c1e7137cd 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -810,7 +810,6 @@ struct kvm_vcpu_arch {
>  
>  	struct {
>  		bool halted;
> -		gfn_t gfns[ASYNC_PF_PER_VCPU];
>  		struct gfn_to_hva_cache data;
>  		u64 msr_en_val; /* MSR_KVM_ASYNC_PF_EN */
>  		u64 msr_int_val; /* MSR_KVM_ASYNC_PF_INT */
> @@ -1878,7 +1877,6 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
>  			       struct kvm_async_pf *work);
>  void kvm_arch_async_page_present_queued(struct kvm_vcpu *vcpu);
>  bool kvm_arch_can_dequeue_async_page_present(struct kvm_vcpu *vcpu);
> -extern bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
>  
>  int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu);
>  int kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err);
> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> index ac69894eab88..53a6ef30b6ee 100644
> --- a/arch/x86/kvm/Kconfig
> +++ b/arch/x86/kvm/Kconfig
> @@ -32,6 +32,7 @@ config KVM
>  	select HAVE_KVM_IRQ_ROUTING
>  	select HAVE_KVM_EVENTFD
>  	select KVM_ASYNC_PF
> +	select KVM_ASYNC_PF_SLOT
>  	select USER_RETURN_NOTIFIER
>  	select KVM_MMIO
>  	select SCHED_INFO
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index c4f4fa23320e..cd8aaa662ac2 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3799,7 +3799,7 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
>  
>  	if (!prefault && kvm_can_do_async_pf(vcpu)) {
>  		trace_kvm_try_async_get_page(cr2_or_gpa, gfn);
> -		if (kvm_find_async_pf_gfn(vcpu, gfn)) {
> +		if (kvm_async_pf_find_slot(vcpu, gfn)) {
>  			trace_kvm_async_pf_doublefault(cr2_or_gpa, gfn);
>  			kvm_make_request(KVM_REQ_APF_HALT, vcpu);
>  			return true;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 7f35d9324b99..a5f7d6122178 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -332,13 +332,6 @@ static struct kmem_cache *kvm_alloc_emulator_cache(void)
>  
>  static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt);
>  
> -static inline void kvm_async_pf_hash_reset(struct kvm_vcpu *vcpu)
> -{
> -	int i;
> -	for (i = 0; i < ASYNC_PF_PER_VCPU; i++)
> -		vcpu->arch.apf.gfns[i] = ~0;
> -}
> -
>  static void kvm_on_user_return(struct user_return_notifier *urn)
>  {
>  	unsigned slot;
> @@ -854,7 +847,7 @@ void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned lon
>  {
>  	if ((cr0 ^ old_cr0) & X86_CR0_PG) {
>  		kvm_clear_async_pf_completion_queue(vcpu);
> -		kvm_async_pf_hash_reset(vcpu);
> +		kvm_async_pf_reset_slot(vcpu);
>  	}
>  
>  	if ((cr0 ^ old_cr0) & KVM_MMU_CR0_ROLE_BITS)
> @@ -3118,7 +3111,7 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
>  
>  	if (!kvm_pv_async_pf_enabled(vcpu)) {
>  		kvm_clear_async_pf_completion_queue(vcpu);
> -		kvm_async_pf_hash_reset(vcpu);
> +		kvm_async_pf_reset_slot(vcpu);
>  		return 0;
>  	}
>  
> @@ -10704,7 +10697,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
>  
>  	vcpu->arch.pat = MSR_IA32_CR_PAT_DEFAULT;
>  
> -	kvm_async_pf_hash_reset(vcpu);
> +	kvm_async_pf_reset_slot(vcpu);
>  	kvm_pmu_init(vcpu);
>  
>  	vcpu->arch.pending_external_vector = -1;
> @@ -10828,7 +10821,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>  	kvmclock_reset(vcpu);
>  
>  	kvm_clear_async_pf_completion_queue(vcpu);
> -	kvm_async_pf_hash_reset(vcpu);
> +	kvm_async_pf_reset_slot(vcpu);
>  	vcpu->arch.apf.halted = false;
>  
>  	if (vcpu->arch.guest_fpu && kvm_mpx_supported()) {
> @@ -11737,73 +11730,6 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
>  	kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true);
>  }
>  
> -static inline u32 kvm_async_pf_hash_fn(gfn_t gfn)
> -{
> -	BUILD_BUG_ON(!is_power_of_2(ASYNC_PF_PER_VCPU));
> -
> -	return hash_32(gfn & 0xffffffff, order_base_2(ASYNC_PF_PER_VCPU));
> -}
> -
> -static inline u32 kvm_async_pf_next_probe(u32 key)
> -{
> -	return (key + 1) & (ASYNC_PF_PER_VCPU - 1);
> -}
> -
> -static void kvm_add_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
> -{
> -	u32 key = kvm_async_pf_hash_fn(gfn);
> -
> -	while (vcpu->arch.apf.gfns[key] != ~0)
> -		key = kvm_async_pf_next_probe(key);
> -
> -	vcpu->arch.apf.gfns[key] = gfn;
> -}
> -
> -static u32 kvm_async_pf_gfn_slot(struct kvm_vcpu *vcpu, gfn_t gfn)
> -{
> -	int i;
> -	u32 key = kvm_async_pf_hash_fn(gfn);
> -
> -	for (i = 0; i < ASYNC_PF_PER_VCPU &&
> -		     (vcpu->arch.apf.gfns[key] != gfn &&
> -		      vcpu->arch.apf.gfns[key] != ~0); i++)
> -		key = kvm_async_pf_next_probe(key);
> -
> -	return key;
> -}
> -
> -bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
> -{
> -	return vcpu->arch.apf.gfns[kvm_async_pf_gfn_slot(vcpu, gfn)] == gfn;
> -}
> -
> -static void kvm_del_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
> -{
> -	u32 i, j, k;
> -
> -	i = j = kvm_async_pf_gfn_slot(vcpu, gfn);
> -
> -	if (WARN_ON_ONCE(vcpu->arch.apf.gfns[i] != gfn))
> -		return;
> -
> -	while (true) {
> -		vcpu->arch.apf.gfns[i] = ~0;
> -		do {
> -			j = kvm_async_pf_next_probe(j);
> -			if (vcpu->arch.apf.gfns[j] == ~0)
> -				return;
> -			k = kvm_async_pf_hash_fn(vcpu->arch.apf.gfns[j]);
> -			/*
> -			 * k lies cyclically in ]i,j]
> -			 * |    i.k.j |
> -			 * |....j i.k.| or  |.k..j i...|
> -			 */
> -		} while ((i <= j) ? (i < k && k <= j) : (i < k || k <= j));
> -		vcpu->arch.apf.gfns[i] = vcpu->arch.apf.gfns[j];
> -		i = j;
> -	}
> -}
> -
>  static inline int apf_put_user_notpresent(struct kvm_vcpu *vcpu)
>  {
>  	u32 reason = KVM_PV_REASON_PAGE_NOT_PRESENT;
> @@ -11867,7 +11793,7 @@ bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
>  	struct x86_exception fault;
>  
>  	trace_kvm_async_pf_not_present(work->arch.token, work->cr2_or_gpa);
> -	kvm_add_async_pf_gfn(vcpu, work->arch.gfn);
> +	kvm_async_pf_add_slot(vcpu, work->arch.gfn);
>  
>  	if (kvm_can_deliver_async_pf(vcpu) &&
>  	    !apf_put_user_notpresent(vcpu)) {
> @@ -11904,7 +11830,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
>  	if (work->wakeup_all)
>  		work->arch.token = ~0; /* broadcast wakeup */
>  	else
> -		kvm_del_async_pf_gfn(vcpu, work->arch.gfn);
> +		kvm_async_pf_remove_slot(vcpu, work->arch.gfn);
>  	trace_kvm_async_pf_ready(work->arch.token, work->cr2_or_gpa);
>  
>  	if ((work->wakeup_all || work->notpresent_injected) &&
> 
Looks good to me

Eric


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 05/15] KVM: arm64: Export kvm_handle_user_mem_abort()
  2021-08-15  0:59 ` [PATCH v4 05/15] KVM: arm64: Export kvm_handle_user_mem_abort() Gavin Shan
@ 2021-11-10 18:02   ` Eric Auger
  2022-01-13  7:55     ` Gavin Shan
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Auger @ 2021-11-10 18:02 UTC (permalink / raw)
  To: Gavin Shan, kvmarm
  Cc: linux-kernel, kvm, james.morse, mark.rutland, Jonathan.Cameron,
	will, maz, pbonzini, vkuznets, shan.gavin

Hi Gavin,
On 8/15/21 2:59 AM, Gavin Shan wrote:
> The main work of stage-2 page fault is handled by user_mem_abort().
> When asynchronous page fault is supported, one page fault need to
> be handled with two calls to this function. It means the page fault
> needs to be replayed asynchronously in that case.
> 
>    * This renames the function to kvm_handle_user_mem_abort() and
>      exports it.
> 
>    * Add arguments @esr and @prefault to user_mem_abort(). @esr is
>      the cached value of ESR_EL2 instead of fetching from the current
>      vCPU when the page fault is replayed in scenario of asynchronous
>      page fault. @prefault is used to indicate the page fault is replayed
>      one or not.
Also explain that fault_status arg is not needed anymore as derived from
@esr because otherwise at first sight a distracted reviewer like me may
have the impression you replaced fault_status by prefault while it is
totally unrelated
> 
>    * Define helper functions esr_dbat_*() in asm/esr.h to extract
>      or check various fields of the passed ESR_EL2 value because
>      those helper functions defined in asm/kvm_emulate.h assumes
>      the ESR_EL2 value has been cached in vCPU struct. It won't
>      be true on handling the replayed page fault in scenario of
>      asynchronous page fault.
I would introduce a seperate preliminary patch with those esr macros and
changes to the call sites + changes below.
> 
>    * Some helper functions defined in asm/kvm_emulate.h are used
>      by mmu.c only and seem not to be used by other source file
>      in near future. They are moved to mmu.c and renamed accordingly.>
>      is_exec_fault: kvm_vcpu_trap_is_exec_fault
>      is_write_fault: kvm_is_write_fault()
> 
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
>  arch/arm64/include/asm/esr.h         |  6 ++++
>  arch/arm64/include/asm/kvm_emulate.h | 27 ++---------------
>  arch/arm64/include/asm/kvm_host.h    |  4 +++
>  arch/arm64/kvm/mmu.c                 | 43 ++++++++++++++++++++++------
>  4 files changed, 48 insertions(+), 32 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
> index 29f97eb3dad4..0f2cb27691de 100644
> --- a/arch/arm64/include/asm/esr.h
> +++ b/arch/arm64/include/asm/esr.h
> @@ -321,8 +321,14 @@
>  					 ESR_ELx_CP15_32_ISS_DIR_READ)
>  
>  #ifndef __ASSEMBLY__
> +#include <linux/bitfield.h>
>  #include <asm/types.h>
>  
> +#define esr_dabt_fault_type(esr)	(esr & ESR_ELx_FSC_TYPE)
> +#define esr_dabt_fault_level(esr)	(FIELD_GET(ESR_ELx_FSC_LEVEL, esr))
> +#define esr_dabt_is_wnr(esr)		(!!(FIELD_GET(ESR_ELx_WNR, esr)))
> +#define esr_dabt_is_s1ptw(esr)		(!!(FIELD_GET(ESR_ELx_S1PTW, esr)))
> +
>  static inline bool esr_is_data_abort(u32 esr)
>  {
>  	const u32 ec = ESR_ELx_EC(esr);
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 923b4d08ea9a..90742f4b1acd 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -285,13 +285,13 @@ static __always_inline int kvm_vcpu_dabt_get_rd(const struct kvm_vcpu *vcpu)
>  
>  static __always_inline bool kvm_vcpu_abt_iss1tw(const struct kvm_vcpu *vcpu)
>  {
> -	return !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_S1PTW);
> +	return esr_dabt_is_s1ptw(kvm_vcpu_get_esr(vcpu));
>  }
>  
>  /* Always check for S1PTW *before* using this. */
>  static __always_inline bool kvm_vcpu_dabt_iswrite(const struct kvm_vcpu *vcpu)
>  {
> -	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_WNR;
> +	return esr_dabt_is_wnr(kvm_vcpu_get_esr(vcpu));
>  }
>  
>  static inline bool kvm_vcpu_dabt_is_cm(const struct kvm_vcpu *vcpu)
> @@ -320,11 +320,6 @@ static inline bool kvm_vcpu_trap_is_iabt(const struct kvm_vcpu *vcpu)
>  	return kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_IABT_LOW;
>  }
>  
> -static inline bool kvm_vcpu_trap_is_exec_fault(const struct kvm_vcpu *vcpu)
> -{
> -	return kvm_vcpu_trap_is_iabt(vcpu) && !kvm_vcpu_abt_iss1tw(vcpu);
> -}
> -
>  static __always_inline u8 kvm_vcpu_trap_get_fault(const struct kvm_vcpu *vcpu)
>  {
>  	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC;
> @@ -332,12 +327,7 @@ static __always_inline u8 kvm_vcpu_trap_get_fault(const struct kvm_vcpu *vcpu)
>  
>  static __always_inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vcpu)
>  {
> -	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_TYPE;
> -}
> -
> -static __always_inline u8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *vcpu)
> -{
> -	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_LEVEL;
> +	return esr_dabt_fault_type(kvm_vcpu_get_esr(vcpu));
>  }
>  
>  static __always_inline bool kvm_vcpu_abt_issea(const struct kvm_vcpu *vcpu)
> @@ -365,17 +355,6 @@ static __always_inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu *vcpu)
>  	return ESR_ELx_SYS64_ISS_RT(esr);
>  }
>  
> -static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
> -{
> -	if (kvm_vcpu_abt_iss1tw(vcpu))
> -		return true;
> -
> -	if (kvm_vcpu_trap_is_iabt(vcpu))
> -		return false;
> -
> -	return kvm_vcpu_dabt_iswrite(vcpu);
> -}
> -
>  static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
>  {
>  	return vcpu_read_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 1824f7e1f9ab..581825b9df77 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -606,6 +606,10 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
>  
>  #define KVM_ARCH_WANT_MMU_NOTIFIER
>  
> +int kvm_handle_user_mem_abort(struct kvm_vcpu *vcpu,
> +			      struct kvm_memory_slot *memslot,
> +			      phys_addr_t fault_ipa, unsigned long hva,
> +			      unsigned int esr, bool prefault);
>  void kvm_arm_halt_guest(struct kvm *kvm);
>  void kvm_arm_resume_guest(struct kvm *kvm);
>  
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 0625bf2353c2..e4038c5e931d 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -892,9 +892,34 @@ static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
>  	return 0;
>  }
>  
> -static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> -			  struct kvm_memory_slot *memslot, unsigned long hva,
> -			  unsigned long fault_status)
> +static inline bool is_exec_fault(unsigned int esr)
> +{
> +	if (ESR_ELx_EC(esr) != ESR_ELx_EC_IABT_LOW)
> +		return false;
> +
> +	if (esr_dabt_is_s1ptw(esr))
> +		return false;
> +
> +	return true;
> +}
> +
> +static inline bool is_write_fault(unsigned int esr)
> +{
> +	if (esr_dabt_is_s1ptw(esr))
> +		return true;
> +
> +	if (ESR_ELx_EC(esr) == ESR_ELx_EC_IABT_LOW)
> +		return false;
> +
> +	return esr_dabt_is_wnr(esr);
> +}
> +
> +int kvm_handle_user_mem_abort(struct kvm_vcpu *vcpu,
> +			      struct kvm_memory_slot *memslot,
> +			      phys_addr_t fault_ipa,
> +			      unsigned long hva,
> +			      unsigned int esr,
> +			      bool prefault)
you added the prefault arg but this latter is not used in the function?
To me you shall introduce that change in a subsequent patch when relevant.
>  {
>  	int ret = 0;
>  	bool write_fault, writable, force_pte = false;
> @@ -909,14 +934,15 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	gfn_t gfn;
>  	kvm_pfn_t pfn;
>  	bool logging_active = memslot_is_logging(memslot);
> -	unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
> +	unsigned int fault_status = esr_dabt_fault_type(esr);
> +	unsigned long fault_level = esr_dabt_fault_level(esr);
>  	unsigned long vma_pagesize, fault_granule;
>  	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
>  	struct kvm_pgtable *pgt;
>  
>  	fault_granule = 1UL << ARM64_HW_PGTABLE_LEVEL_SHIFT(fault_level);
> -	write_fault = kvm_is_write_fault(vcpu);
> -	exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
> +	write_fault = is_write_fault(kvm_vcpu_get_esr(vcpu));
> +	exec_fault = is_exec_fault(kvm_vcpu_get_esr(vcpu));
>  	VM_BUG_ON(write_fault && exec_fault);
>  
>  	if (fault_status == FSC_PERM && !write_fault && !exec_fault) {
> @@ -1176,7 +1202,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>  	gfn = fault_ipa >> PAGE_SHIFT;
>  	memslot = gfn_to_memslot(vcpu->kvm, gfn);
>  	hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable);
> -	write_fault = kvm_is_write_fault(vcpu);
> +	write_fault = is_write_fault(kvm_vcpu_get_esr(vcpu));
>  	if (kvm_is_error_hva(hva) || (write_fault && !writable)) {
>  		/*
>  		 * The guest has put either its instructions or its page-tables
> @@ -1231,7 +1257,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>  		goto out_unlock;
>  	}
>  
> -	ret = user_mem_abort(vcpu, fault_ipa, memslot, hva, fault_status);
> +	ret = kvm_handle_user_mem_abort(vcpu, memslot, fault_ipa, hva,
> +					kvm_vcpu_get_esr(vcpu), false);>  	if (ret == 0)
>  		ret = 1;
>  out:
> 
Thanks

Eric


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 06/15] KVM: arm64: Add paravirtualization header files
  2021-08-15  0:59 ` [PATCH v4 06/15] KVM: arm64: Add paravirtualization header files Gavin Shan
@ 2021-11-10 18:06   ` Eric Auger
  2022-01-13  8:00     ` Gavin Shan
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Auger @ 2021-11-10 18:06 UTC (permalink / raw)
  To: Gavin Shan, kvmarm
  Cc: kvm, maz, linux-kernel, shan.gavin, Jonathan.Cameron, pbonzini,
	vkuznets, will

Hi Gavin,

On 8/15/21 2:59 AM, Gavin Shan wrote:
> We need put more stuff in the paravirtualization header files when
> the asynchronous page fault is supported. The generic header files
> can't meet the goal.
you need to explain why
 This duplicate the generic header files to be
s/This duplicate/Duplicate
> our platform specific header files. It's the preparatory work to
> support the asynchronous page fault in the subsequent patches:
why duplication and not move. Shouldn't it be squashed with another
subsequent patch?

Eric
> 
>    include/uapi/asm-generic/kvm_para.h
>    include/asm-generic/kvm_para.h
> 
>    arch/arm64/include/uapi/asm/kvm_para.h
>    arch/arm64/include/asm/kvm_para.h
> 
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
>  arch/arm64/include/asm/kvm_para.h      | 27 ++++++++++++++++++++++++++
>  arch/arm64/include/uapi/asm/Kbuild     |  2 --
>  arch/arm64/include/uapi/asm/kvm_para.h |  5 +++++
>  3 files changed, 32 insertions(+), 2 deletions(-)
>  create mode 100644 arch/arm64/include/asm/kvm_para.h
>  create mode 100644 arch/arm64/include/uapi/asm/kvm_para.h
> 
> diff --git a/arch/arm64/include/asm/kvm_para.h b/arch/arm64/include/asm/kvm_para.h
> new file mode 100644
> index 000000000000..0ea481dd1c7a
> --- /dev/null
> +++ b/arch/arm64/include/asm/kvm_para.h
> @@ -0,0 +1,27 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _ASM_ARM_KVM_PARA_H
> +#define _ASM_ARM_KVM_PARA_H
> +
> +#include <uapi/asm/kvm_para.h>
> +
> +static inline bool kvm_check_and_clear_guest_paused(void)
> +{
> +	return false;
> +}
> +
> +static inline unsigned int kvm_arch_para_features(void)
> +{
> +	return 0;
> +}
> +
> +static inline unsigned int kvm_arch_para_hints(void)
> +{
> +	return 0;
> +}
> +
> +static inline bool kvm_para_available(void)
> +{
> +	return false;
> +}
> +
> +#endif /* _ASM_ARM_KVM_PARA_H */
> diff --git a/arch/arm64/include/uapi/asm/Kbuild b/arch/arm64/include/uapi/asm/Kbuild
> index 602d137932dc..f66554cd5c45 100644
> --- a/arch/arm64/include/uapi/asm/Kbuild
> +++ b/arch/arm64/include/uapi/asm/Kbuild
> @@ -1,3 +1 @@
>  # SPDX-License-Identifier: GPL-2.0
> -
> -generic-y += kvm_para.h
> diff --git a/arch/arm64/include/uapi/asm/kvm_para.h b/arch/arm64/include/uapi/asm/kvm_para.h
> new file mode 100644
> index 000000000000..cd212282b90c
> --- /dev/null
> +++ b/arch/arm64/include/uapi/asm/kvm_para.h
> @@ -0,0 +1,5 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +#ifndef _UAPI_ASM_ARM_KVM_PARA_H
> +#define _UAPI_ASM_ARM_KVM_PARA_H
> +
> +#endif /* _UAPI_ASM_ARM_KVM_PARA_H */
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 15/15] KVM: arm64: Add async PF document
  2021-08-15  0:59 ` [PATCH v4 15/15] KVM: arm64: Add async PF document Gavin Shan
@ 2021-11-11 10:39   ` Eric Auger
  0 siblings, 0 replies; 36+ messages in thread
From: Eric Auger @ 2021-11-11 10:39 UTC (permalink / raw)
  To: Gavin Shan, kvmarm
  Cc: kvm, maz, linux-kernel, shan.gavin, Jonathan.Cameron, pbonzini,
	vkuznets, will

Hi Gavin,

On 8/15/21 2:59 AM, Gavin Shan wrote:
> This adds document to explain the interface for asynchronous page
> fault and how it works in general.
> 
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
>  Documentation/virt/kvm/arm/apf.rst   | 143 +++++++++++++++++++++++++++
>  Documentation/virt/kvm/arm/index.rst |   1 +
>  2 files changed, 144 insertions(+)
>  create mode 100644 Documentation/virt/kvm/arm/apf.rst
> 
> diff --git a/Documentation/virt/kvm/arm/apf.rst b/Documentation/virt/kvm/arm/apf.rst
> new file mode 100644
> index 000000000000..4f5c01b6699f
> --- /dev/null
> +++ b/Documentation/virt/kvm/arm/apf.rst
> @@ -0,0 +1,143 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +Asynchronous Page Fault Support for arm64
> +=========================================
> +
> +There are two stages of page faults when KVM module is enabled as accelerator
> +to the guest. The guest is responsible for handling the stage-1 page faults,
> +while the host handles the stage-2 page faults. During the period of handling
> +the stage-2 page faults, the guest is suspended until the requested page is
> +ready. It could take several milliseconds, even hundreds of milliseconds in
 s/It could take several milliseconds, even hundreds of milliseconds/It
could take up to hundreds of milliseconds
> +extreme situations because I/O might be required to move the requested page
> +from disk to DRAM. The guest does not do any work when it is suspended. The
> +feature (Asynchronous Page Fault) is introduced to take advantage of the
s/The feature (Asynchronous Page Fault)/ The Asynchronous Page Fault
feature allows to improve the overall performance by allowing the guest
to reschedule ... ?
> +suspending period and to improve the overall performance.
> +
> +There are two paths in order to fulfil the asynchronous page fault, called
> +as control path and data path.
The asynchronous page fault is implemented upon a control path and a
data path?
 The control path allows the VMM or guest to
> +configure the functionality, while the notifications are delivered in data
> +path. The notifications are classified into page-not-present and page-ready
> +notifications.
> +
> +Data Path
> +---------
> +
> +There are two types of notifications delivered from host to guest in the
> +data path: page-not-present and page-ready notification. They are delivered
> +through SDEI event and (PPI) interrupt separately.
s/separately/respectively
 Besides, there is a shared
> +buffer between host and guest to indicate the reason and sequential token,
s/to indicate/that indicates
Can you clarify 'reason'?
Also a sequential token is used ...
> +which is used to identify the asynchronous page fault. The reason and token
> +resident in the shared buffer is written by host, read and cleared by guest.
s/is/are
> +An asynchronous page fault is delivered and completed as below.
> +
> +(1) When an asynchronous page fault starts, a (workqueue) worker is created
> +    and queued to the vCPU's pending queue. The worker makes the requested
> +    page ready and resident to DRAM in the background. The shared buffer is
> +    updated with reason and sequential token. After that, SDEI event is sent
> +    to guest as page-not-present notification.
This gives the impression the SDEI event is sent after the worker
completes the job. I think you should rephrase.
> +
> +(2) When the SDEI event is received on guest, the current process is tagged
> +    with TIF_ASYNC_PF and associated with a wait queue. The process is ready
> +    to keep rescheduling itself on switching from kernel to user mode. After
above sentence sounds a bit cryptic to me: ~ waits for being rescheduled
later?
> +    that, a reschedule IPI is sent to current CPU and the received SDEI event
> +    is acknowledged. Note that the IPI is delivered when the acknowledgment
> +    on the SDEI event is received on host.
> +
> +(3) On the host, the worker is dequeued from the vCPU's pending queue and
> +    enqueued to its completion queue when the requested page becomes ready.
> +    In the mean while, KVM_REQ_ASYNC_PF request is sent the vCPU if the
in the meanwhile here and below
> +    worker is the first element enqueued to the completion queue.
I think you should remind what is the intent of this KVM_REQ_ASYNC_PF
request, ie. notify that the page is ready.
> +
> +(4) With pending KVM_REQ_ASYNC_PF request, the first worker in the completion
> +    queue is dequeued and destroyed. In the mean while, a (PPI) interrupt is

> +    sent to guest with updated reason and token in the shared buffer.
> +
> +(5) When the (PPI) interrupt is received on guest, the affected process is
> +    located using the token and waken up after its TIF_ASYNC_PF tag is cleared.
> +    After that, the interrupt is acknowledged through SMCCC interface. The
> +    workers in the completion queue is dequeued and destroyed if any workers
the worker
Isn't it destroyed even if no other worker exist?
> +    exist, and another (PPI) interrupt is sent to the guest.

I think you should briefly remind the motivation of SDEI and PPI
mechanism for both synchros. Maybe by doing an analogy with x86
implementation?
> +
> +Control Path
> +------------
> +
> +The configurations are passed through SMCCC or ioctl interface. The SDEI
> +event and (PPI) interrupt are owned by VMM, so the SDEI event and interrupt
> +numbers are configured through ioctl command on per-vCPU basis.
The "owned" terminology looks weird here. Do you mean the SDEI event
number and the PPI ID are defined by the VMM userspace?

 Besides,
> +the functionality might be enabled and configured through ioctl interface
> +by VMM during migration:
> +
> +   * KVM_ARM_ASYNC_PF_CMD_GET_VERSION
> +
> +     Returns the current version of the feature, supported by the host. It is
> +     made up of major, minor and revision fields. Each field is one byte in
> +     length.
> +
> +   * KVM_ARM_ASYNC_PF_CMD_GET_SDEI:
> +
> +     Retrieve the SDEI event number, used for page-not-present notification,
> +     so that it can be configured on destination VM in the scenario of
> +     migration.
> +
> +   * KVM_ARM_ASYNC_PF_GET_IRQ:
> +
> +     Retrieve the IRQ (PPI) number, used for page-ready notification, so that
> +     it can be configured on destination VM in the scenario of migration.
> +
> +   * KVM_ARM_ASYNC_PF_CMD_GET_CONTROL
> +
> +     Retrieve the address of control block, so that it can be configured on
> +     destination VM in the scenario of migration.
> +
> +   * KVM_ARM_ASYNC_PF_CMD_SET_SDEI:
> +
> +     Used by VMM to configure number of SDEI event, which is used to deliver
> +     page-not-present notification by host. This is used when VM is started
> +     or migrated.
> +
> +   * KVM_ARM_ASYNC_PF_CMD_SET_IRQ
> +
> +     Used by VMM to configure number of (PPI) interrupt, which is used to
> +     deliver page-ready notification by host. This is used when VM is started
> +     or migrated.
> +
> +   * KVM_ARM_ASYNC_PF_CMD_SET_CONTROL
> +
> +     Set the control block on the destination VM in the scenario of migration.
What is the size of this control block?
> +
> +The other configurations are passed through SMCCC interface. The host exports
> +the capability through KVM vendor specific service, which is identified by
> +ARM_SMCCC_KVM_FUNC_ASYNC_PF_FUNC_ID. There are several functions defined for
> +this:
> +
> +   * ARM_SMCCC_KVM_FUNC_ASYNC_PF_VERSION
> +
> +     Returns the current version of the feature, supported by the host. It is
> +     made up of major, minor and revision fields. Each field is one byte in
> +     length.
> +
> +   * ARM_SMCCC_KVM_FUNC_ASYNC_PF_SLOTS
> +
> +     Returns the size of the hashed GFN table. It is used by guest to set up
by the guest
> +     the capacity of waiting process table.
> +
> +   * ARM_SMCCC_KVM_FUNC_ASYNC_PF_SDEI
> +   * ARM_SMCCC_KVM_FUNC_ASYNC_PF_IRQ
> +
> +     Used by the guest to retrieve the SDEI event and (PPI) interrupt number
> +     that are configured by VMM.
How does the guest recognize which SDEI event num it shall register.
Same question for PPI? What if we were to expose the guest with several
SDEIs?
> +
> +   * ARM_SMCCC_KVM_FUNC_ASYNC_PF_ENABLE
> +
> +     Used by the guest to enable or disable the feature on the specific vCPU.
> +     The argument is made up of shared buffer and flags. The shared buffer
> +     is written by host to indicate the reason about the delivered asynchronous
> +     page fault and token (sequence number) to identify that. There are two
> +     flags are supported: KVM_ASYNC_PF_ENABLED is used to enable or disable
> +     the feature. KVM_ASYNC_PF_SEND_ALWAYS allows to deliver page-not-present
> +     notification regardless of the guest's state. Otherwise, the notification
> +     is delivered only when the guest is in user mode.
> +
> +   * ARM_SMCCC_KVM_FUNC_ASYNC_PF_IRQ_ACK

How does it compare to x86? I mean there are a huger number of IOCTLs
and SMCCC calls to achieve the functionaly. Was the x86 implementation
as invasive as this implementation?

Thanks

Eric
> +
> +     Used by the guest to acknowledge the completion of page-ready notification.
> diff --git a/Documentation/virt/kvm/arm/index.rst b/Documentation/virt/kvm/arm/index.rst
> index 78a9b670aafe..f43b5fe25f61 100644
> --- a/Documentation/virt/kvm/arm/index.rst
> +++ b/Documentation/virt/kvm/arm/index.rst
> @@ -7,6 +7,7 @@ ARM
>  .. toctree::
>     :maxdepth: 2
>  
> +   apf
>     hyp-abi
>     psci
>     pvtime
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 07/15] KVM: arm64: Support page-not-present notification
  2021-08-15  0:59 ` [PATCH v4 07/15] KVM: arm64: Support page-not-present notification Gavin Shan
@ 2021-11-12 15:01   ` Eric Auger
  2022-01-13  8:43     ` Gavin Shan
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Auger @ 2021-11-12 15:01 UTC (permalink / raw)
  To: Gavin Shan, kvmarm
  Cc: kvm, maz, linux-kernel, shan.gavin, Jonathan.Cameron, pbonzini,
	vkuznets, will

Hi Gavin,

On 8/15/21 2:59 AM, Gavin Shan wrote:
> The requested page might be not resident in memory during the stage-2
> page fault. For example, the requested page could be resident in swap
> device (file). In this case, disk I/O is issued in order to fetch the
> requested page and it could take tens of milliseconds, even hundreds
> of milliseconds in extreme situation. During the period, the guest's
> vCPU is suspended until the requested page becomes ready. Actually,
> the something else on the guest's vCPU could be rescheduled during
s/the//
> the period, so that the time slice isn't wasted as the guest's vCPU
> can see. This is the primary goal of the feature (Asynchronous Page
> Fault).
> 
> This supports delivery of page-not-present notification through SDEI
> event when the requested page isn't present. When the notification is
> received on the guest's vCPU, something else (another process) can be
> scheduled. The design is highlighted as below:
> 
>    * There is dedicated memory region shared by host and guest. It's
>      represented by "struct kvm_vcpu_pv_apf_data". The field @reason
>      indicates the reason why the SDEI event is triggered, while the
>      unique @token is used by guest to associate the event with the
>      suspended process.
> 
>    * One control block is associated with each guest's vCPU and it's
>      represented by "struct kvm_arch_async_pf_control". It allows the
>      guest to configure the functionality to indicate the situations
>      where the host can deliver the page-not-present notification to
>      kick off asyncrhonous page fault. Besides, runtime states are
asynchronous
>      also maintained in this struct.
> 
>    * Before the page-not-present notification is sent to the guest's
>      vCPU, a worker is started and executed asynchronously on host,
>      to fetch the requested page. "struct kvm{_,_arch}async_pf" is
>      associated with the worker, to track the work.
> 
> The feature isn't enabled by CONFIG_KVM_ASYNC_PF yet. Also, the
> page-ready notification delivery and control path isn't implemented
> and will be done in the subsequent patches.
> 
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
>  arch/arm64/include/asm/kvm_host.h      |  52 +++++++++
>  arch/arm64/include/uapi/asm/kvm_para.h |  15 +++
>  arch/arm64/kvm/Makefile                |   1 +
>  arch/arm64/kvm/arm.c                   |   3 +
>  arch/arm64/kvm/async_pf.c              | 145 +++++++++++++++++++++++++
>  arch/arm64/kvm/mmu.c                   |  33 +++++-
>  6 files changed, 247 insertions(+), 2 deletions(-)
>  create mode 100644 arch/arm64/kvm/async_pf.c
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 581825b9df77..6b98aef936b4 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -283,6 +283,31 @@ struct vcpu_reset_state {
>  	bool		reset;
>  };
>  
> +/* Should be a power of two number */
> +#define ASYNC_PF_PER_VCPU	64
> +
> +/*
> + * The association of gfn and token. The token will be sent to guest as
> + * page fault address. Also, the guest could be in aarch32 mode. So its
s/as page fault address/together with page fault address?
> + * length should be 32-bits.
> + */
> +struct kvm_arch_async_pf {
> +	u32	token;
> +	gfn_t	gfn;
> +	u32	esr;
> +};
> +
> +struct kvm_arch_async_pf_control {
> +		struct gfn_to_hva_cache	cache;
> +		u64			control_block;
> +		bool			send_user_only;
> +		u64			sdei_event_num;
> +
nit: spare empty line
> +		u16			id;
> +		bool			notpresent_pending;
> +		u32			notpresent_token;
> +};
> +
>  struct kvm_vcpu_arch {
>  	struct kvm_cpu_context ctxt;
>  	void *sve_state;
> @@ -346,6 +371,9 @@ struct kvm_vcpu_arch {
>  	/* SDEI support */
>  	struct kvm_sdei_vcpu *sdei;
>  
> +	/* Asynchronous page fault support */
> +	struct kvm_arch_async_pf_control *apf;
> +
>  	/*
>  	 * Guest registers we preserve during guest debugging.
>  	 *
> @@ -741,6 +769,30 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
>  long kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
>  				struct kvm_arm_copy_mte_tags *copy_tags);
>  
> +#ifdef CONFIG_KVM_ASYNC_PF
> +void kvm_arch_async_pf_create_vcpu(struct kvm_vcpu *vcpu);
> +bool kvm_arch_async_not_present_allowed(struct kvm_vcpu *vcpu);
> +bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu,
> +			     u32 esr, gpa_t gpa, gfn_t gfn);
> +bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
> +				     struct kvm_async_pf *work);
> +void kvm_arch_async_pf_destroy_vcpu(struct kvm_vcpu *vcpu);
> +#else
> +static inline void kvm_arch_async_pf_create_vcpu(struct kvm_vcpu *vcpu) { }
> +static inline void kvm_arch_async_pf_destroy_vcpu(struct kvm_vcpu *vcpu) { }
> +
> +static inline bool kvm_arch_async_not_present_allowed(struct kvm_vcpu *vcpu)
> +{
> +	return false;
> +}
> +
> +static inline bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu,
> +					   u32 esr, gpa_t gpa, gfn_t gfn)
> +{
> +	return false;
> +}
> +#endif
> +
>  /* Guest/host FPSIMD coordination helpers */
>  int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu);
>  void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu);
> diff --git a/arch/arm64/include/uapi/asm/kvm_para.h b/arch/arm64/include/uapi/asm/kvm_para.h
> index cd212282b90c..3fa04006714e 100644
> --- a/arch/arm64/include/uapi/asm/kvm_para.h
> +++ b/arch/arm64/include/uapi/asm/kvm_para.h
> @@ -2,4 +2,19 @@
>  #ifndef _UAPI_ASM_ARM_KVM_PARA_H
>  #define _UAPI_ASM_ARM_KVM_PARA_H
>  
> +#include <linux/types.h>
> +
> +/* Async PF */
> +#define KVM_ASYNC_PF_ENABLED		(1 << 0)
> +#define KVM_ASYNC_PF_SEND_ALWAYS	(1 << 1)
The above define is not used in this patch. Besides can you explain what
it aims at?
> +
> +#define KVM_PV_REASON_PAGE_NOT_PRESENT	1
> +
> +struct kvm_vcpu_pv_apf_data {
> +	__u32	reason;
on x86 it was renamed into flags. Should we do the same right now?
> +	__u32	token;
> +	__u8	pad[56];
> +	__u32	enabled;
> +};
> +
>  #endif /* _UAPI_ASM_ARM_KVM_PARA_H */
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index eefca8ca394d..c9aa307ea542 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -25,3 +25,4 @@ kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o \
>  	 vgic/vgic-its.o vgic/vgic-debug.o
>  
>  kvm-$(CONFIG_HW_PERF_EVENTS)  += pmu-emul.o
> +kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o async_pf.o
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 7d9bbc888ae5..af251896b41d 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -342,6 +342,8 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
>  
>  	kvm_sdei_create_vcpu(vcpu);
>  
> +	kvm_arch_async_pf_create_vcpu(vcpu);
> +
>  	vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
>  
>  	err = kvm_vgic_vcpu_init(vcpu);
> @@ -363,6 +365,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>  	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
>  	kvm_timer_vcpu_terminate(vcpu);
>  	kvm_pmu_vcpu_destroy(vcpu);
> +	kvm_arch_async_pf_destroy_vcpu(vcpu);
>  	kvm_sdei_destroy_vcpu(vcpu);
>  
>  	kvm_arm_vcpu_destroy(vcpu);
> diff --git a/arch/arm64/kvm/async_pf.c b/arch/arm64/kvm/async_pf.c
> new file mode 100644
> index 000000000000..742bb8a0a8c0
> --- /dev/null
> +++ b/arch/arm64/kvm/async_pf.c
> @@ -0,0 +1,145 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Asynchronous page fault support.
> + *
> + * Copyright (C) 2021 Red Hat, Inc.
> + *
> + * Author(s): Gavin Shan <gshan@redhat.com>
> + */
> +
> +#include <linux/arm-smccc.h>
> +#include <linux/kvm_host.h>
> +#include <asm/kvm_emulate.h>
> +#include <kvm/arm_hypercalls.h>
> +#include <kvm/arm_vgic.h>
> +#include <asm/kvm_sdei.h>
> +
> +static inline int read_cache(struct kvm_vcpu *vcpu, u32 offset, u32 *val)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	struct kvm_arch_async_pf_control *apf = vcpu->arch.apf;
> +
> +	return kvm_read_guest_offset_cached(kvm, &apf->cache,
> +					    val, offset, sizeof(*val));
> +}
> +
> +static inline int write_cache(struct kvm_vcpu *vcpu, u32 offset, u32 val)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	struct kvm_arch_async_pf_control *apf = vcpu->arch.apf;
> +
> +	return kvm_write_guest_offset_cached(kvm, &apf->cache,
> +					     &val, offset, sizeof(val));
> +}
> +
> +void kvm_arch_async_pf_create_vcpu(struct kvm_vcpu *vcpu)
> +{
> +	vcpu->arch.apf = kzalloc(sizeof(*(vcpu->arch.apf)), GFP_KERNEL);
shouldn't we escalate the alloc failure and fail the vcpu creation
instead of checking everywhere that apf is !null which is error prone.
By the way I saw that on x86 this is a struct included in the vcpu one
instead of a poinnter.
> +}
> +
> +bool kvm_arch_async_not_present_allowed(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	struct kvm_arch_async_pf_control *apf = vcpu->arch.apf;
> +	struct kvm_sdei_vcpu *vsdei = vcpu->arch.sdei;
> +	u32 reason, token;
> +	int ret;
> +
> +	if (!apf || !(apf->control_block & KVM_ASYNC_PF_ENABLED))
> +		return false;
> +
> +	if (apf->send_user_only && vcpu_mode_priv(vcpu))
> +		return false;
> +
> +	if (!irqchip_in_kernel(vcpu->kvm))
> +		return false;
can you explain why this is needed?
> +
> +	if (!vsdei || vsdei->critical_event || vsdei->normal_event)
> +		return false;
don't you need some locking mechanism to void that vdsei fields change
after that check? At the moment we may have a single SDEI num but
nothing prevents from adding others in the future, right?
> +
> +	/* Pending page fault, which isn't acknowledged by guest */
> +	ret = read_cache(vcpu, offsetof(struct kvm_vcpu_pv_apf_data, reason),
> +			 &reason);
> +	if (ret) {
> +		kvm_err("%s: Error %d to read reason (%d-%d)\n",
> +			__func__, ret, kvm->userspace_pid, vcpu->vcpu_idx);
> +		return false;
> +	}
x86 code does not have those kvm_err(). You may simply drop them.
> +
> +	ret = read_cache(vcpu, offsetof(struct kvm_vcpu_pv_apf_data, token),
> +			 &token);
> +	if (ret) {
> +		kvm_err("%s: Error %d to read token %d-%d\n",
> +			__func__, ret, kvm->userspace_pid, vcpu->vcpu_idx);
> +		return false;
> +	}
> +
> +	if (reason || token)
can't the token be null?
> +		return false;
> +
> +	return true;
> +}
> +
> +bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu,
> +			     u32 esr, gpa_t gpa, gfn_t gfn)
> +{
> +	struct kvm_arch_async_pf_control *apf = vcpu->arch.apf;
> +	struct kvm_arch_async_pf arch;
> +	unsigned long hva = kvm_vcpu_gfn_to_hva(vcpu, gfn);
> +
> +	arch.token = (apf->id++ << 12) | vcpu->vcpu_id;
> +	arch.gfn = gfn;
> +	arch.esr = esr;
> +
> +	return kvm_setup_async_pf(vcpu, gpa, hva, &arch);
> +}
> +
> +/*
> + * It's guaranteed that no pending asynchronous page fault when this is
that no APF is pending
> + * called. It means all previous issued asynchronous page faults have
> + * been acknowledged.
> + */
> +bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
> +				     struct kvm_async_pf *work)
> +{
> +	struct kvm *kvm = vcpu->kvm;
> +	struct kvm_arch_async_pf_control *apf = vcpu->arch.apf;
> +	int ret;
> +
> +	kvm_async_pf_add_slot(vcpu, work->arch.gfn);
> +
> +	ret = write_cache(vcpu, offsetof(struct kvm_vcpu_pv_apf_data, token),
> +			  work->arch.token);
> +	if (ret) {
> +		kvm_err("%s: Error %d to write token (%d-%d %08x)\n",
kvm_err's may be dropped
> +			__func__, ret, kvm->userspace_pid,
> +			vcpu->vcpu_idx, work->arch.token);
> +		goto fail;
> +	}
> +
> +	ret = write_cache(vcpu, offsetof(struct kvm_vcpu_pv_apf_data, reason),
> +			  KVM_PV_REASON_PAGE_NOT_PRESENT);
> +	if (ret) {
> +		kvm_err("%s: Error %d to write reason (%d-%d %08x)\n",
> +			__func__, ret, kvm->userspace_pid,
> +			vcpu->vcpu_idx, work->arch.token);
> +		goto fail;
> +	}
> +
> +	apf->notpresent_pending = true;
> +	apf->notpresent_token = work->arch.token;
> +
> +	return !kvm_sdei_inject(vcpu, apf->sdei_event_num, true);
> +
> +fail:
> +	write_cache(vcpu, offsetof(struct kvm_vcpu_pv_apf_data, token), 0);
> +	write_cache(vcpu, offsetof(struct kvm_vcpu_pv_apf_data, reason), 0);
> +	kvm_async_pf_remove_slot(vcpu, work->arch.gfn);
> +	return false;
> +}
> +
> +void kvm_arch_async_pf_destroy_vcpu(struct kvm_vcpu *vcpu)
> +{
> +	kfree(vcpu->arch.apf);
> +	vcpu->arch.apf = NULL;
> +}
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index e4038c5e931d..4ba78bd1f18c 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -914,6 +914,33 @@ static inline bool is_write_fault(unsigned int esr)
>  	return esr_dabt_is_wnr(esr);
>  }
>  
> +static bool try_async_pf(struct kvm_vcpu *vcpu, unsigned int esr,
> +			 gpa_t gpa, gfn_t gfn, kvm_pfn_t *pfn,
> +			 bool write, bool *writable, bool prefault)
> +{
> +	struct kvm_arch_async_pf_control *apf = vcpu->arch.apf;
> +	struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
> +	bool async = false;
> +
> +	if (apf) {
checking apf each time for a potential kacl of resource at vcpu creation
look heavy to me.
> +		/* Bail if *pfn has correct page */
s/bail/bail out? Comment rather related to !async check.
> +		*pfn = __gfn_to_pfn_memslot(slot, gfn, false, &async,
> +					    write, writable, NULL);
> +		if (!async)
> +			return false;
> +
> +		if (!prefault && kvm_arch_async_not_present_allowed(vcpu)) {
x86 kvm_can_do_async_pf() naming look more straightforward than
kvm_arch_async_not_present_allowed
> +			if (kvm_async_pf_find_slot(vcpu, gfn) ||
x86 has some trace points. You may envision to add some, maybe later on.
> +			    kvm_arch_setup_async_pf(vcpu, esr, gpa, gfn))
> +				return true;
> +		}
> +	}
> +
> +	*pfn = __gfn_to_pfn_memslot(slot, gfn, false, NULL,
> +				    write, writable, NULL);
> +	return false;
> +}
> +
>  int kvm_handle_user_mem_abort(struct kvm_vcpu *vcpu,
>  			      struct kvm_memory_slot *memslot,
>  			      phys_addr_t fault_ipa,
> @@ -1035,8 +1062,10 @@ int kvm_handle_user_mem_abort(struct kvm_vcpu *vcpu,
>  	 */
>  	smp_rmb();
>  
> -	pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
> -				   write_fault, &writable, NULL);
> +	if (try_async_pf(vcpu, esr, fault_ipa, gfn, &pfn,
> +			 write_fault, &writable, prefault))
> +		return 1;
> +
>  	if (pfn == KVM_PFN_ERR_HWPOISON) {
>  		kvm_send_hwpoison_signal(hva, vma_shift);
>  		return 0;
> 
Eric


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 01/15] KVM: async_pf: Move struct kvm_async_pf around
  2021-11-10 15:37   ` Eric Auger
@ 2022-01-13  7:21     ` Gavin Shan
  0 siblings, 0 replies; 36+ messages in thread
From: Gavin Shan @ 2022-01-13  7:21 UTC (permalink / raw)
  To: Eric Auger, kvmarm
  Cc: kvm, maz, linux-kernel, shan.gavin, Jonathan.Cameron, pbonzini,
	vkuznets, will

Hi Eric,

On 11/10/21 11:37 PM, Eric Auger wrote:
> On 8/15/21 2:59 AM, Gavin Shan wrote:
>> This moves the definition of "struct kvm_async_pf" and the related
>> functions after "struct kvm_vcpu" so that newly added inline functions
>> in the subsequent patches can dereference "struct kvm_vcpu" properly.
>> Otherwise, the unexpected build error will be raised:
>>
>>     error: dereferencing pointer to incomplete type ‘struct kvm_vcpu’
>>     return !list_empty_careful(&vcpu->async_pf.done);
>>                                     ^~
>> Since we're here, the sepator between type and field in "struct kvm_vcpu"
> separator

Thanks, It will be fixed in next respin.

>> is replaced by tab. The empty stub kvm_check_async_pf_completion() is also
>> added on !CONFIG_KVM_ASYNC_PF, which is needed by subsequent patches to
>> support asynchronous page fault on ARM64.
>>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> ---
>>   include/linux/kvm_host.h | 44 +++++++++++++++++++++-------------------
>>   1 file changed, 23 insertions(+), 21 deletions(-)
>>
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index ae7735b490b4..85b61a456f1c 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -199,27 +199,6 @@ int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
>>   struct kvm_io_device *kvm_io_bus_get_dev(struct kvm *kvm, enum kvm_bus bus_idx,
>>   					 gpa_t addr);
>>   
>> -#ifdef CONFIG_KVM_ASYNC_PF
>> -struct kvm_async_pf {
>> -	struct work_struct work;
>> -	struct list_head link;
>> -	struct list_head queue;
>> -	struct kvm_vcpu *vcpu;
>> -	struct mm_struct *mm;
>> -	gpa_t cr2_or_gpa;
>> -	unsigned long addr;
>> -	struct kvm_arch_async_pf arch;
>> -	bool   wakeup_all;
>> -	bool notpresent_injected;
>> -};
>> -
>> -void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu);
>> -void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu);
>> -bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>> -			unsigned long hva, struct kvm_arch_async_pf *arch);
>> -int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
>> -#endif
>> -
>>   #ifdef KVM_ARCH_WANT_MMU_NOTIFIER
>>   struct kvm_gfn_range {
>>   	struct kvm_memory_slot *slot;
>> @@ -346,6 +325,29 @@ struct kvm_vcpu {
>>   	struct kvm_dirty_ring dirty_ring;
>>   };
>>   
>> +#ifdef CONFIG_KVM_ASYNC_PF
>> +struct kvm_async_pf {
>> +	struct work_struct		work;
>> +	struct list_head		link;
>> +	struct list_head		queue;
>> +	struct kvm_vcpu			*vcpu;
>> +	struct mm_struct		*mm;
>> +	gpa_t				cr2_or_gpa;
>> +	unsigned long			addr;
>> +	struct kvm_arch_async_pf	arch;
>> +	bool				wakeup_all;
>> +	bool				notpresent_injected;
>> +};
>> +
>> +void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu);
>> +void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu);
>> +bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>> +			unsigned long hva, struct kvm_arch_async_pf *arch);
>> +int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
>> +#else
>> +static inline void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu) { }
> why is that stub needed on ARM64 and not on the other archs?
> 

We use the following pattern, suggested by James Morse.

int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
{
         int r;
         switch (ext) {
           :
                 case KVM_CAP_ASYNC_PF:
         case KVM_CAP_ASYNC_PF_INT:
                 r = IS_ENABLED(CONFIG_KVM_ASYNC_PF) ? 1 : 0;
                 break;
         default:
                 r = 0;
         }

         return r;
}

Thanks,
Gavin

>> +#endif
>> +
>>   /* must be called with irqs disabled */
>>   static __always_inline void guest_enter_irqoff(void)
>>   {
>>
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 02/15] KVM: async_pf: Add helper function to check completion queue
  2021-11-10 15:37   ` Eric Auger
@ 2022-01-13  7:38     ` Gavin Shan
  0 siblings, 0 replies; 36+ messages in thread
From: Gavin Shan @ 2022-01-13  7:38 UTC (permalink / raw)
  To: Eric Auger, kvmarm
  Cc: kvm, maz, linux-kernel, shan.gavin, Jonathan.Cameron, pbonzini,
	vkuznets, will

Hi Eric,

On 11/10/21 11:37 PM, Eric Auger wrote:
> On 8/15/21 2:59 AM, Gavin Shan wrote:
>> This adds inline helper kvm_check_async_pf_completion_queue() to
>> check if there are pending completion in the queue. The empty stub
>> is also added on !CONFIG_KVM_ASYNC_PF so that the caller needn't
>> consider if CONFIG_KVM_ASYNC_PF is enabled.
>>
>> All checks on the completion queue is done by the newly added inline
>> function since list_empty() and list_empty_careful() are interchangeable.
> why is it interchangeable?
>

I think the commit log is misleading. list_empty_careful() is more strict
than list_empty(). In this patch, we replace list_empty() with list_empty_careful().
I will correct the commit log in next respin like below:

    All checks on the completion queue is done by the newly added inline
    function where list_empty_careful() instead of list_empty() is used.
  
>>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> ---
>>   arch/x86/kvm/x86.c       |  2 +-
>>   include/linux/kvm_host.h | 10 ++++++++++
>>   virt/kvm/async_pf.c      | 10 +++++-----
>>   virt/kvm/kvm_main.c      |  4 +---
>>   4 files changed, 17 insertions(+), 9 deletions(-)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index e5d5c5ed7dd4..7f35d9324b99 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -11591,7 +11591,7 @@ static inline bool kvm_guest_apic_has_interrupt(struct kvm_vcpu *vcpu)
>>   
>>   static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
>>   {
>> -	if (!list_empty_careful(&vcpu->async_pf.done))
>> +	if (kvm_check_async_pf_completion_queue(vcpu))
>>   		return true;
>>   
>>   	if (kvm_apic_has_events(vcpu))
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index 85b61a456f1c..a5f990f6dc35 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -339,12 +339,22 @@ struct kvm_async_pf {
>>   	bool				notpresent_injected;
>>   };
>>   
>> +static inline bool kvm_check_async_pf_completion_queue(struct kvm_vcpu *vcpu)
>> +{
>> +	return !list_empty_careful(&vcpu->async_pf.done);
>> +}
>> +
>>   void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu);
>>   void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu);
>>   bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>>   			unsigned long hva, struct kvm_arch_async_pf *arch);
>>   int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
>>   #else
>> +static inline bool kvm_check_async_pf_completion_queue(struct kvm_vcpu *vcpu)
>> +{
>> +	return false;
>> +}
>> +
>>   static inline void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu) { }
>>   #endif
>>   
>> diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
>> index dd777688d14a..d145a61a046a 100644
>> --- a/virt/kvm/async_pf.c
>> +++ b/virt/kvm/async_pf.c
>> @@ -70,7 +70,7 @@ static void async_pf_execute(struct work_struct *work)
>>   		kvm_arch_async_page_present(vcpu, apf);
>>   
>>   	spin_lock(&vcpu->async_pf.lock);
>> -	first = list_empty(&vcpu->async_pf.done);
>> +	first = !kvm_check_async_pf_completion_queue(vcpu);
>>   	list_add_tail(&apf->link, &vcpu->async_pf.done);
>>   	apf->vcpu = NULL;
>>   	spin_unlock(&vcpu->async_pf.lock);
>> @@ -122,7 +122,7 @@ void kvm_clear_async_pf_completion_queue(struct kvm_vcpu *vcpu)
>>   		spin_lock(&vcpu->async_pf.lock);
>>   	}
>>   
>> -	while (!list_empty(&vcpu->async_pf.done)) {
>> +	while (kvm_check_async_pf_completion_queue(vcpu)) {
> this is replaced by a stronger check. Please can you explain why is it
> equivalent?

Access to the completion queue is protected by spinlock. So the additional
check in list_empty_careful() to verify the head's prev/next are modified
on the fly shouldn't happen. It means they're same in our case.

>>   		struct kvm_async_pf *work =
>>   			list_first_entry(&vcpu->async_pf.done,
>>   					 typeof(*work), link);
>> @@ -138,7 +138,7 @@ void kvm_check_async_pf_completion(struct kvm_vcpu *vcpu)
>>   {
>>   	struct kvm_async_pf *work;
>>   
>> -	while (!list_empty_careful(&vcpu->async_pf.done) &&
>> +	while (kvm_check_async_pf_completion_queue(vcpu) &&
>>   	      kvm_arch_can_dequeue_async_page_present(vcpu)) {
>>   		spin_lock(&vcpu->async_pf.lock);
>>   		work = list_first_entry(&vcpu->async_pf.done, typeof(*work),
>> @@ -205,7 +205,7 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu)
>>   	struct kvm_async_pf *work;
>>   	bool first;
>>   
>> -	if (!list_empty_careful(&vcpu->async_pf.done))
>> +	if (kvm_check_async_pf_completion_queue(vcpu))
>>   		return 0;
>>   
>>   	work = kmem_cache_zalloc(async_pf_cache, GFP_ATOMIC);
>> @@ -216,7 +216,7 @@ int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu)
>>   	INIT_LIST_HEAD(&work->queue); /* for list_del to work */
>>   
>>   	spin_lock(&vcpu->async_pf.lock);
>> -	first = list_empty(&vcpu->async_pf.done);
>> +	first = !kvm_check_async_pf_completion_queue(vcpu);
>>   	list_add_tail(&work->link, &vcpu->async_pf.done);
>>   	spin_unlock(&vcpu->async_pf.lock);
>>   
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index b50dbe269f4b..8795503651b1 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -3282,10 +3282,8 @@ static bool vcpu_dy_runnable(struct kvm_vcpu *vcpu)
>>   	if (kvm_arch_dy_runnable(vcpu))
>>   		return true;
>>   
>> -#ifdef CONFIG_KVM_ASYNC_PF
>> -	if (!list_empty_careful(&vcpu->async_pf.done))
>> +	if (kvm_check_async_pf_completion_queue(vcpu))
>>   		return true;
>> -#endif
>>   
>>   	return false;
>>   }
>>

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 03/15] KVM: async_pf: Make GFN slot management generic
  2021-11-10 17:00   ` Eric Auger
@ 2022-01-13  7:42     ` Gavin Shan
  0 siblings, 0 replies; 36+ messages in thread
From: Gavin Shan @ 2022-01-13  7:42 UTC (permalink / raw)
  To: Eric Auger, kvmarm
  Cc: kvm, maz, linux-kernel, shan.gavin, Jonathan.Cameron, pbonzini,
	vkuznets, will

Hi Eric,

On 11/11/21 1:00 AM, Eric Auger wrote:
> On 8/15/21 2:59 AM, Gavin Shan wrote:
>> It's not allowed to fire duplicate notification for same GFN on
>> x86 platform, with help of a hash table. This mechanism is going
> s/, with help of a hash table/this is achieved through a hash table
>> to be used by arm64 and this makes the code generic and shareable
> s/and this makes/.\n Turn the code generic
>> by multiple platforms.
>>
>>     * As this mechanism isn't needed by all platforms, a new kernel
>>       config option (CONFIG_ASYNC_PF_SLOT) is introduced so that it
>>       can be disabled at compiling time.
> compile time

Ok.

>>
>>     * The code is basically copied from x86 platform and the functions
>>       are renamed to reflect the fact: (a) the input parameters are
>>       vCPU and GFN.
> not for reset
> (b) The operations are resetting, searching, adding

Ok.

>>       and removing.
> find, add, remove ops are renamed with _slot suffix

Ok. The commit log will be improved based on your suggestions in
next respin :)

>>
>>     * Helper stub is also added on !CONFIG_KVM_ASYNC_PF because we're
>>       going to use IS_ENABLED() instead of #ifdef on arm64 when the
>>       asynchronous page fault is supported.
>>
>> This is preparatory work to use the newly introduced functions on x86
>> platform and arm64 in subsequent patches.
>>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> ---
>>   include/linux/kvm_host.h | 18 +++++++++
>>   virt/kvm/Kconfig         |  3 ++
>>   virt/kvm/async_pf.c      | 85 ++++++++++++++++++++++++++++++++++++++++
>>   3 files changed, 106 insertions(+)
>>
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index a5f990f6dc35..a9685c2b2250 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -298,6 +298,9 @@ struct kvm_vcpu {
>>   
>>   #ifdef CONFIG_KVM_ASYNC_PF
>>   	struct {
>> +#ifdef CONFIG_KVM_ASYNC_PF_SLOT
>> +		gfn_t gfns[ASYNC_PF_PER_VCPU];
>> +#endif
>>   		u32 queued;
>>   		struct list_head queue;
>>   		struct list_head done;
>> @@ -339,6 +342,13 @@ struct kvm_async_pf {
>>   	bool				notpresent_injected;
>>   };
>>   
>> +#ifdef CONFIG_KVM_ASYNC_PF_SLOT
>> +void kvm_async_pf_reset_slot(struct kvm_vcpu *vcpu);
> this does not reset a "slot" but the whole hash table. So to me this
> shouldn't be renamed with _slot suffix. reset_hash or reset_all_slots?

Sure, lets have kvm_async_pf_reset_all_slots() in next respin.

>> +void kvm_async_pf_add_slot(struct kvm_vcpu *vcpu, gfn_t gfn);
>> +void kvm_async_pf_remove_slot(struct kvm_vcpu *vcpu, gfn_t gfn);
>> +bool kvm_async_pf_find_slot(struct kvm_vcpu *vcpu, gfn_t gfn);
>> +#endif
>> +
>>   static inline bool kvm_check_async_pf_completion_queue(struct kvm_vcpu *vcpu)
>>   {
>>   	return !list_empty_careful(&vcpu->async_pf.done);
>> @@ -350,6 +360,14 @@ bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>>   			unsigned long hva, struct kvm_arch_async_pf *arch);
>>   int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
>>   #else
>> +static inline void kvm_async_pf_reset_slot(struct kvm_vcpu *vcpu) { }
>> +static inline void kvm_async_pf_add_slot(struct kvm_vcpu *vcpu, gfn_t gfn) { }
>> +static inline void kvm_async_pf_remove_slot(struct kvm_vcpu *vcpu, gfn_t gfn) { }
>> +static inline bool kvm_async_pf_find_slot(struct kvm_vcpu *vcpu, gfn_t gfn)
>> +{
>> +	return false;
>> +}
>> +
>>   static inline bool kvm_check_async_pf_completion_queue(struct kvm_vcpu *vcpu)
>>   {
>>   	return false;
>> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
>> index 62b39149b8c8..59b518c8c205 100644
>> --- a/virt/kvm/Kconfig
>> +++ b/virt/kvm/Kconfig
>> @@ -23,6 +23,9 @@ config KVM_MMIO
>>   config KVM_ASYNC_PF
>>          bool
>>   
>> +config KVM_ASYNC_PF_SLOT
>> +	bool
>> +
>>   # Toggle to switch between direct notification and batch job
>>   config KVM_ASYNC_PF_SYNC
>>          bool
>> diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
>> index d145a61a046a..0d1fdb2932af 100644
>> --- a/virt/kvm/async_pf.c
>> +++ b/virt/kvm/async_pf.c
>> @@ -13,12 +13,97 @@
>>   #include <linux/module.h>
>>   #include <linux/mmu_context.h>
>>   #include <linux/sched/mm.h>
>> +#ifdef CONFIG_KVM_ASYNC_PF_SLOT
>> +#include <linux/hash.h>
>> +#endif
>>   
>>   #include "async_pf.h"
>>   #include <trace/events/kvm.h>
>>   
>>   static struct kmem_cache *async_pf_cache;
>>   
>> +#ifdef CONFIG_KVM_ASYNC_PF_SLOT
>> +static inline u32 kvm_async_pf_hash(gfn_t gfn)
>> +{
>> +	BUILD_BUG_ON(!is_power_of_2(ASYNC_PF_PER_VCPU));
>> +
>> +	return hash_32(gfn & 0xffffffff, order_base_2(ASYNC_PF_PER_VCPU));
>> +}
>> +
>> +static inline u32 kvm_async_pf_next_slot(u32 key)
>> +{
>> +	return (key + 1) & (ASYNC_PF_PER_VCPU - 1);
>> +}
>> +
>> +static u32 kvm_async_pf_slot(struct kvm_vcpu *vcpu, gfn_t gfn)
>> +{
>> +	u32 key = kvm_async_pf_hash(gfn);
>> +	int i;
>> +
>> +	for (i = 0; i < ASYNC_PF_PER_VCPU &&
>> +		(vcpu->async_pf.gfns[key] != gfn &&
>> +		vcpu->async_pf.gfns[key] != ~0); i++)
>> +		key = kvm_async_pf_next_slot(key);
>> +
>> +	return key;
>> +}
>> +
>> +void kvm_async_pf_reset_slot(struct kvm_vcpu *vcpu)
>> +{
>> +	int i;
>> +
>> +	for (i = 0; i < ASYNC_PF_PER_VCPU; i++)
>> +		vcpu->async_pf.gfns[i] = ~0;
>> +}
>> +
>> +void kvm_async_pf_add_slot(struct kvm_vcpu *vcpu, gfn_t gfn)
>> +{
>> +	u32 key = kvm_async_pf_hash(gfn);
>> +
>> +	while (vcpu->async_pf.gfns[key] != ~0)
>> +		key = kvm_async_pf_next_slot(key);
>> +
>> +	vcpu->async_pf.gfns[key] = gfn;
>> +}
>> +
>> +void kvm_async_pf_remove_slot(struct kvm_vcpu *vcpu, gfn_t gfn)
>> +{
>> +	u32 i, j, k;
>> +
>> +	i = j = kvm_async_pf_slot(vcpu, gfn);
>> +
>> +	if (WARN_ON_ONCE(vcpu->async_pf.gfns[i] != gfn))
>> +		return;
>> +
>> +	while (true) {
>> +		vcpu->async_pf.gfns[i] = ~0;
>> +
>> +		do {
>> +			j = kvm_async_pf_next_slot(j);
>> +			if (vcpu->async_pf.gfns[j] == ~0)
>> +				return;
>> +
>> +			k = kvm_async_pf_hash(vcpu->async_pf.gfns[j]);
>> +			/*
>> +			 * k lies cyclically in ]i,j]
>> +			 * |    i.k.j |
>> +			 * |....j i.k.| or  |.k..j i...|
>> +			 */
>> +		} while ((i <= j) ? (i < k && k <= j) : (i < k || k <= j));
>> +
>> +		vcpu->async_pf.gfns[i] = vcpu->async_pf.gfns[j];
>> +		i = j;
>> +	}
>> +}
>> +
>> +bool kvm_async_pf_find_slot(struct kvm_vcpu *vcpu, gfn_t gfn)
>> +{
>> +	u32 key = kvm_async_pf_slot(vcpu, gfn);
>> +
>> +	return vcpu->async_pf.gfns[key] == gfn;
>> +}
>> +#endif /* CONFIG_KVM_ASYNC_PF_SLOT */
>> +
>>   int kvm_async_pf_init(void)
>>   {
>>   	async_pf_cache = KMEM_CACHE(kvm_async_pf, 0);
>>

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 04/15] KVM: x86: Use generic async PF slot management
  2021-11-10 17:03   ` Eric Auger
@ 2022-01-13  7:44     ` Gavin Shan
  0 siblings, 0 replies; 36+ messages in thread
From: Gavin Shan @ 2022-01-13  7:44 UTC (permalink / raw)
  To: Eric Auger, kvmarm
  Cc: kvm, maz, linux-kernel, shan.gavin, Jonathan.Cameron, pbonzini,
	vkuznets, will

Hi Eric,

On 11/11/21 1:03 AM, Eric Auger wrote:
> On 8/15/21 2:59 AM, Gavin Shan wrote:
>> This uses the generic slot management mechanism for asynchronous
> Now we have moved the hash table management in the generic code, Use
> this latter ...

Ok.

>> page fault by enabling CONFIG_KVM_ASYNC_PF_SLOT because the private
>> implementation is totally duplicate to the generic one.
>>
>> The changes introduced by this is pretty mechanical and shouldn't
>> cause any logical changes.
> suggest: No functional change intended.

Ok. The commit log will be improved accordingly in next respin.

>>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> ---
>>   arch/x86/include/asm/kvm_host.h |  2 -
>>   arch/x86/kvm/Kconfig            |  1 +
>>   arch/x86/kvm/mmu/mmu.c          |  2 +-
>>   arch/x86/kvm/x86.c              | 86 +++------------------------------
>>   4 files changed, 8 insertions(+), 83 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 974cbfb1eefe..409c1e7137cd 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -810,7 +810,6 @@ struct kvm_vcpu_arch {
>>   
>>   	struct {
>>   		bool halted;
>> -		gfn_t gfns[ASYNC_PF_PER_VCPU];
>>   		struct gfn_to_hva_cache data;
>>   		u64 msr_en_val; /* MSR_KVM_ASYNC_PF_EN */
>>   		u64 msr_int_val; /* MSR_KVM_ASYNC_PF_INT */
>> @@ -1878,7 +1877,6 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
>>   			       struct kvm_async_pf *work);
>>   void kvm_arch_async_page_present_queued(struct kvm_vcpu *vcpu);
>>   bool kvm_arch_can_dequeue_async_page_present(struct kvm_vcpu *vcpu);
>> -extern bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
>>   
>>   int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu);
>>   int kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err);
>> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
>> index ac69894eab88..53a6ef30b6ee 100644
>> --- a/arch/x86/kvm/Kconfig
>> +++ b/arch/x86/kvm/Kconfig
>> @@ -32,6 +32,7 @@ config KVM
>>   	select HAVE_KVM_IRQ_ROUTING
>>   	select HAVE_KVM_EVENTFD
>>   	select KVM_ASYNC_PF
>> +	select KVM_ASYNC_PF_SLOT
>>   	select USER_RETURN_NOTIFIER
>>   	select KVM_MMIO
>>   	select SCHED_INFO
>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>> index c4f4fa23320e..cd8aaa662ac2 100644
>> --- a/arch/x86/kvm/mmu/mmu.c
>> +++ b/arch/x86/kvm/mmu/mmu.c
>> @@ -3799,7 +3799,7 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
>>   
>>   	if (!prefault && kvm_can_do_async_pf(vcpu)) {
>>   		trace_kvm_try_async_get_page(cr2_or_gpa, gfn);
>> -		if (kvm_find_async_pf_gfn(vcpu, gfn)) {
>> +		if (kvm_async_pf_find_slot(vcpu, gfn)) {
>>   			trace_kvm_async_pf_doublefault(cr2_or_gpa, gfn);
>>   			kvm_make_request(KVM_REQ_APF_HALT, vcpu);
>>   			return true;
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 7f35d9324b99..a5f7d6122178 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -332,13 +332,6 @@ static struct kmem_cache *kvm_alloc_emulator_cache(void)
>>   
>>   static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt);
>>   
>> -static inline void kvm_async_pf_hash_reset(struct kvm_vcpu *vcpu)
>> -{
>> -	int i;
>> -	for (i = 0; i < ASYNC_PF_PER_VCPU; i++)
>> -		vcpu->arch.apf.gfns[i] = ~0;
>> -}
>> -
>>   static void kvm_on_user_return(struct user_return_notifier *urn)
>>   {
>>   	unsigned slot;
>> @@ -854,7 +847,7 @@ void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned lon
>>   {
>>   	if ((cr0 ^ old_cr0) & X86_CR0_PG) {
>>   		kvm_clear_async_pf_completion_queue(vcpu);
>> -		kvm_async_pf_hash_reset(vcpu);
>> +		kvm_async_pf_reset_slot(vcpu);
>>   	}
>>   
>>   	if ((cr0 ^ old_cr0) & KVM_MMU_CR0_ROLE_BITS)
>> @@ -3118,7 +3111,7 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
>>   
>>   	if (!kvm_pv_async_pf_enabled(vcpu)) {
>>   		kvm_clear_async_pf_completion_queue(vcpu);
>> -		kvm_async_pf_hash_reset(vcpu);
>> +		kvm_async_pf_reset_slot(vcpu);
>>   		return 0;
>>   	}
>>   
>> @@ -10704,7 +10697,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
>>   
>>   	vcpu->arch.pat = MSR_IA32_CR_PAT_DEFAULT;
>>   
>> -	kvm_async_pf_hash_reset(vcpu);
>> +	kvm_async_pf_reset_slot(vcpu);
>>   	kvm_pmu_init(vcpu);
>>   
>>   	vcpu->arch.pending_external_vector = -1;
>> @@ -10828,7 +10821,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>>   	kvmclock_reset(vcpu);
>>   
>>   	kvm_clear_async_pf_completion_queue(vcpu);
>> -	kvm_async_pf_hash_reset(vcpu);
>> +	kvm_async_pf_reset_slot(vcpu);
>>   	vcpu->arch.apf.halted = false;
>>   
>>   	if (vcpu->arch.guest_fpu && kvm_mpx_supported()) {
>> @@ -11737,73 +11730,6 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
>>   	kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true);
>>   }
>>   
>> -static inline u32 kvm_async_pf_hash_fn(gfn_t gfn)
>> -{
>> -	BUILD_BUG_ON(!is_power_of_2(ASYNC_PF_PER_VCPU));
>> -
>> -	return hash_32(gfn & 0xffffffff, order_base_2(ASYNC_PF_PER_VCPU));
>> -}
>> -
>> -static inline u32 kvm_async_pf_next_probe(u32 key)
>> -{
>> -	return (key + 1) & (ASYNC_PF_PER_VCPU - 1);
>> -}
>> -
>> -static void kvm_add_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
>> -{
>> -	u32 key = kvm_async_pf_hash_fn(gfn);
>> -
>> -	while (vcpu->arch.apf.gfns[key] != ~0)
>> -		key = kvm_async_pf_next_probe(key);
>> -
>> -	vcpu->arch.apf.gfns[key] = gfn;
>> -}
>> -
>> -static u32 kvm_async_pf_gfn_slot(struct kvm_vcpu *vcpu, gfn_t gfn)
>> -{
>> -	int i;
>> -	u32 key = kvm_async_pf_hash_fn(gfn);
>> -
>> -	for (i = 0; i < ASYNC_PF_PER_VCPU &&
>> -		     (vcpu->arch.apf.gfns[key] != gfn &&
>> -		      vcpu->arch.apf.gfns[key] != ~0); i++)
>> -		key = kvm_async_pf_next_probe(key);
>> -
>> -	return key;
>> -}
>> -
>> -bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
>> -{
>> -	return vcpu->arch.apf.gfns[kvm_async_pf_gfn_slot(vcpu, gfn)] == gfn;
>> -}
>> -
>> -static void kvm_del_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
>> -{
>> -	u32 i, j, k;
>> -
>> -	i = j = kvm_async_pf_gfn_slot(vcpu, gfn);
>> -
>> -	if (WARN_ON_ONCE(vcpu->arch.apf.gfns[i] != gfn))
>> -		return;
>> -
>> -	while (true) {
>> -		vcpu->arch.apf.gfns[i] = ~0;
>> -		do {
>> -			j = kvm_async_pf_next_probe(j);
>> -			if (vcpu->arch.apf.gfns[j] == ~0)
>> -				return;
>> -			k = kvm_async_pf_hash_fn(vcpu->arch.apf.gfns[j]);
>> -			/*
>> -			 * k lies cyclically in ]i,j]
>> -			 * |    i.k.j |
>> -			 * |....j i.k.| or  |.k..j i...|
>> -			 */
>> -		} while ((i <= j) ? (i < k && k <= j) : (i < k || k <= j));
>> -		vcpu->arch.apf.gfns[i] = vcpu->arch.apf.gfns[j];
>> -		i = j;
>> -	}
>> -}
>> -
>>   static inline int apf_put_user_notpresent(struct kvm_vcpu *vcpu)
>>   {
>>   	u32 reason = KVM_PV_REASON_PAGE_NOT_PRESENT;
>> @@ -11867,7 +11793,7 @@ bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
>>   	struct x86_exception fault;
>>   
>>   	trace_kvm_async_pf_not_present(work->arch.token, work->cr2_or_gpa);
>> -	kvm_add_async_pf_gfn(vcpu, work->arch.gfn);
>> +	kvm_async_pf_add_slot(vcpu, work->arch.gfn);
>>   
>>   	if (kvm_can_deliver_async_pf(vcpu) &&
>>   	    !apf_put_user_notpresent(vcpu)) {
>> @@ -11904,7 +11830,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
>>   	if (work->wakeup_all)
>>   		work->arch.token = ~0; /* broadcast wakeup */
>>   	else
>> -		kvm_del_async_pf_gfn(vcpu, work->arch.gfn);
>> +		kvm_async_pf_remove_slot(vcpu, work->arch.gfn);
>>   	trace_kvm_async_pf_ready(work->arch.token, work->cr2_or_gpa);
>>   
>>   	if ((work->wakeup_all || work->notpresent_injected) &&
>>
> Looks good to me
> 

Ok.

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 05/15] KVM: arm64: Export kvm_handle_user_mem_abort()
  2021-11-10 18:02   ` Eric Auger
@ 2022-01-13  7:55     ` Gavin Shan
  0 siblings, 0 replies; 36+ messages in thread
From: Gavin Shan @ 2022-01-13  7:55 UTC (permalink / raw)
  To: Eric Auger, kvmarm
  Cc: linux-kernel, kvm, james.morse, mark.rutland, Jonathan.Cameron,
	will, maz, pbonzini, vkuznets, shan.gavin

Hi Eric,

On 11/11/21 2:02 AM, Eric Auger wrote:
> On 8/15/21 2:59 AM, Gavin Shan wrote:
>> The main work of stage-2 page fault is handled by user_mem_abort().
>> When asynchronous page fault is supported, one page fault need to
>> be handled with two calls to this function. It means the page fault
>> needs to be replayed asynchronously in that case.
>>
>>     * This renames the function to kvm_handle_user_mem_abort() and
>>       exports it.
>>
>>     * Add arguments @esr and @prefault to user_mem_abort(). @esr is
>>       the cached value of ESR_EL2 instead of fetching from the current
>>       vCPU when the page fault is replayed in scenario of asynchronous
>>       page fault. @prefault is used to indicate the page fault is replayed
>>       one or not.
> Also explain that fault_status arg is not needed anymore as derived from
> @esr because otherwise at first sight a distracted reviewer like me may
> have the impression you replaced fault_status by prefault while it is
> totally unrelated

Yep, good point. Will do in next respin.

>>
>>     * Define helper functions esr_dbat_*() in asm/esr.h to extract
>>       or check various fields of the passed ESR_EL2 value because
>>       those helper functions defined in asm/kvm_emulate.h assumes
>>       the ESR_EL2 value has been cached in vCPU struct. It won't
>>       be true on handling the replayed page fault in scenario of
>>       asynchronous page fault.
> I would introduce a seperate preliminary patch with those esr macros and
> changes to the call sites + changes below.

Ok. I will split this patch into two.

>>
>>     * Some helper functions defined in asm/kvm_emulate.h are used
>>       by mmu.c only and seem not to be used by other source file
>>       in near future. They are moved to mmu.c and renamed accordingly.>
>>       is_exec_fault: kvm_vcpu_trap_is_exec_fault
>>       is_write_fault: kvm_is_write_fault()
>>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> ---
>>   arch/arm64/include/asm/esr.h         |  6 ++++
>>   arch/arm64/include/asm/kvm_emulate.h | 27 ++---------------
>>   arch/arm64/include/asm/kvm_host.h    |  4 +++
>>   arch/arm64/kvm/mmu.c                 | 43 ++++++++++++++++++++++------
>>   4 files changed, 48 insertions(+), 32 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
>> index 29f97eb3dad4..0f2cb27691de 100644
>> --- a/arch/arm64/include/asm/esr.h
>> +++ b/arch/arm64/include/asm/esr.h
>> @@ -321,8 +321,14 @@
>>   					 ESR_ELx_CP15_32_ISS_DIR_READ)
>>   
>>   #ifndef __ASSEMBLY__
>> +#include <linux/bitfield.h>
>>   #include <asm/types.h>
>>   
>> +#define esr_dabt_fault_type(esr)	(esr & ESR_ELx_FSC_TYPE)
>> +#define esr_dabt_fault_level(esr)	(FIELD_GET(ESR_ELx_FSC_LEVEL, esr))
>> +#define esr_dabt_is_wnr(esr)		(!!(FIELD_GET(ESR_ELx_WNR, esr)))
>> +#define esr_dabt_is_s1ptw(esr)		(!!(FIELD_GET(ESR_ELx_S1PTW, esr)))
>> +
>>   static inline bool esr_is_data_abort(u32 esr)
>>   {
>>   	const u32 ec = ESR_ELx_EC(esr);
>> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
>> index 923b4d08ea9a..90742f4b1acd 100644
>> --- a/arch/arm64/include/asm/kvm_emulate.h
>> +++ b/arch/arm64/include/asm/kvm_emulate.h
>> @@ -285,13 +285,13 @@ static __always_inline int kvm_vcpu_dabt_get_rd(const struct kvm_vcpu *vcpu)
>>   
>>   static __always_inline bool kvm_vcpu_abt_iss1tw(const struct kvm_vcpu *vcpu)
>>   {
>> -	return !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_S1PTW);
>> +	return esr_dabt_is_s1ptw(kvm_vcpu_get_esr(vcpu));
>>   }
>>   
>>   /* Always check for S1PTW *before* using this. */
>>   static __always_inline bool kvm_vcpu_dabt_iswrite(const struct kvm_vcpu *vcpu)
>>   {
>> -	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_WNR;
>> +	return esr_dabt_is_wnr(kvm_vcpu_get_esr(vcpu));
>>   }
>>   
>>   static inline bool kvm_vcpu_dabt_is_cm(const struct kvm_vcpu *vcpu)
>> @@ -320,11 +320,6 @@ static inline bool kvm_vcpu_trap_is_iabt(const struct kvm_vcpu *vcpu)
>>   	return kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_IABT_LOW;
>>   }
>>   
>> -static inline bool kvm_vcpu_trap_is_exec_fault(const struct kvm_vcpu *vcpu)
>> -{
>> -	return kvm_vcpu_trap_is_iabt(vcpu) && !kvm_vcpu_abt_iss1tw(vcpu);
>> -}
>> -
>>   static __always_inline u8 kvm_vcpu_trap_get_fault(const struct kvm_vcpu *vcpu)
>>   {
>>   	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC;
>> @@ -332,12 +327,7 @@ static __always_inline u8 kvm_vcpu_trap_get_fault(const struct kvm_vcpu *vcpu)
>>   
>>   static __always_inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vcpu)
>>   {
>> -	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_TYPE;
>> -}
>> -
>> -static __always_inline u8 kvm_vcpu_trap_get_fault_level(const struct kvm_vcpu *vcpu)
>> -{
>> -	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_LEVEL;
>> +	return esr_dabt_fault_type(kvm_vcpu_get_esr(vcpu));
>>   }
>>   
>>   static __always_inline bool kvm_vcpu_abt_issea(const struct kvm_vcpu *vcpu)
>> @@ -365,17 +355,6 @@ static __always_inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu *vcpu)
>>   	return ESR_ELx_SYS64_ISS_RT(esr);
>>   }
>>   
>> -static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
>> -{
>> -	if (kvm_vcpu_abt_iss1tw(vcpu))
>> -		return true;
>> -
>> -	if (kvm_vcpu_trap_is_iabt(vcpu))
>> -		return false;
>> -
>> -	return kvm_vcpu_dabt_iswrite(vcpu);
>> -}
>> -
>>   static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
>>   {
>>   	return vcpu_read_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 1824f7e1f9ab..581825b9df77 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -606,6 +606,10 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
>>   
>>   #define KVM_ARCH_WANT_MMU_NOTIFIER
>>   
>> +int kvm_handle_user_mem_abort(struct kvm_vcpu *vcpu,
>> +			      struct kvm_memory_slot *memslot,
>> +			      phys_addr_t fault_ipa, unsigned long hva,
>> +			      unsigned int esr, bool prefault);
>>   void kvm_arm_halt_guest(struct kvm *kvm);
>>   void kvm_arm_resume_guest(struct kvm *kvm);
>>   
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index 0625bf2353c2..e4038c5e931d 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -892,9 +892,34 @@ static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
>>   	return 0;
>>   }
>>   
>> -static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>> -			  struct kvm_memory_slot *memslot, unsigned long hva,
>> -			  unsigned long fault_status)
>> +static inline bool is_exec_fault(unsigned int esr)
>> +{
>> +	if (ESR_ELx_EC(esr) != ESR_ELx_EC_IABT_LOW)
>> +		return false;
>> +
>> +	if (esr_dabt_is_s1ptw(esr))
>> +		return false;
>> +
>> +	return true;
>> +}
>> +
>> +static inline bool is_write_fault(unsigned int esr)
>> +{
>> +	if (esr_dabt_is_s1ptw(esr))
>> +		return true;
>> +
>> +	if (ESR_ELx_EC(esr) == ESR_ELx_EC_IABT_LOW)
>> +		return false;
>> +
>> +	return esr_dabt_is_wnr(esr);
>> +}
>> +
>> +int kvm_handle_user_mem_abort(struct kvm_vcpu *vcpu,
>> +			      struct kvm_memory_slot *memslot,
>> +			      phys_addr_t fault_ipa,
>> +			      unsigned long hva,
>> +			      unsigned int esr,
>> +			      bool prefault)
> you added the prefault arg but this latter is not used in the function?
> To me you shall introduce that change in a subsequent patch when relevant.

Yep, it's the preparatory patch for the following one. The other changes
included in this patch are also preparatory work. Considering to the
complexity of this patch, especially after we split it up into two patches.
I think it's fine to keep the change here.

[PATCH v4 08/15] KVM: arm64: Support page-ready notification

>>   {
>>   	int ret = 0;
>>   	bool write_fault, writable, force_pte = false;
>> @@ -909,14 +934,15 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>   	gfn_t gfn;
>>   	kvm_pfn_t pfn;
>>   	bool logging_active = memslot_is_logging(memslot);
>> -	unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu);
>> +	unsigned int fault_status = esr_dabt_fault_type(esr);
>> +	unsigned long fault_level = esr_dabt_fault_level(esr);
>>   	unsigned long vma_pagesize, fault_granule;
>>   	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
>>   	struct kvm_pgtable *pgt;
>>   
>>   	fault_granule = 1UL << ARM64_HW_PGTABLE_LEVEL_SHIFT(fault_level);
>> -	write_fault = kvm_is_write_fault(vcpu);
>> -	exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
>> +	write_fault = is_write_fault(kvm_vcpu_get_esr(vcpu));
>> +	exec_fault = is_exec_fault(kvm_vcpu_get_esr(vcpu));
>>   	VM_BUG_ON(write_fault && exec_fault);
>>   
>>   	if (fault_status == FSC_PERM && !write_fault && !exec_fault) {
>> @@ -1176,7 +1202,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>>   	gfn = fault_ipa >> PAGE_SHIFT;
>>   	memslot = gfn_to_memslot(vcpu->kvm, gfn);
>>   	hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable);
>> -	write_fault = kvm_is_write_fault(vcpu);
>> +	write_fault = is_write_fault(kvm_vcpu_get_esr(vcpu));
>>   	if (kvm_is_error_hva(hva) || (write_fault && !writable)) {
>>   		/*
>>   		 * The guest has put either its instructions or its page-tables
>> @@ -1231,7 +1257,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>>   		goto out_unlock;
>>   	}
>>   
>> -	ret = user_mem_abort(vcpu, fault_ipa, memslot, hva, fault_status);
>> +	ret = kvm_handle_user_mem_abort(vcpu, memslot, fault_ipa, hva,
>> +					kvm_vcpu_get_esr(vcpu), false);>  	if (ret == 0)
>>   		ret = 1;
>>   out:
>>

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 06/15] KVM: arm64: Add paravirtualization header files
  2021-11-10 18:06   ` Eric Auger
@ 2022-01-13  8:00     ` Gavin Shan
  0 siblings, 0 replies; 36+ messages in thread
From: Gavin Shan @ 2022-01-13  8:00 UTC (permalink / raw)
  To: Eric Auger, kvmarm
  Cc: kvm, maz, linux-kernel, shan.gavin, Jonathan.Cameron, pbonzini,
	vkuznets, will

Hi Eric,

On 11/11/21 2:06 AM, Eric Auger wrote:
> On 8/15/21 2:59 AM, Gavin Shan wrote:
>> We need put more stuff in the paravirtualization header files when
>> the asynchronous page fault is supported. The generic header files
>> can't meet the goal.
> you need to explain why
>   This duplicate the generic header files to be
> s/This duplicate/Duplicate

Ok.

>> our platform specific header files. It's the preparatory work to
>> support the asynchronous page fault in the subsequent patches:
> why duplication and not move. Shouldn't it be squashed with another
> subsequent patch?
> 

It's also fine to squash this one to PATCH[v4 07/15]. My intent was
to keep them separate to make PATCH[v4 07/17] a bit easier to be
reviewed. So lets keep it as separate patch :)

>>
>>     include/uapi/asm-generic/kvm_para.h
>>     include/asm-generic/kvm_para.h
>>
>>     arch/arm64/include/uapi/asm/kvm_para.h
>>     arch/arm64/include/asm/kvm_para.h
>>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> ---
>>   arch/arm64/include/asm/kvm_para.h      | 27 ++++++++++++++++++++++++++
>>   arch/arm64/include/uapi/asm/Kbuild     |  2 --
>>   arch/arm64/include/uapi/asm/kvm_para.h |  5 +++++
>>   3 files changed, 32 insertions(+), 2 deletions(-)
>>   create mode 100644 arch/arm64/include/asm/kvm_para.h
>>   create mode 100644 arch/arm64/include/uapi/asm/kvm_para.h
>>
>> diff --git a/arch/arm64/include/asm/kvm_para.h b/arch/arm64/include/asm/kvm_para.h
>> new file mode 100644
>> index 000000000000..0ea481dd1c7a
>> --- /dev/null
>> +++ b/arch/arm64/include/asm/kvm_para.h
>> @@ -0,0 +1,27 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#ifndef _ASM_ARM_KVM_PARA_H
>> +#define _ASM_ARM_KVM_PARA_H
>> +
>> +#include <uapi/asm/kvm_para.h>
>> +
>> +static inline bool kvm_check_and_clear_guest_paused(void)
>> +{
>> +	return false;
>> +}
>> +
>> +static inline unsigned int kvm_arch_para_features(void)
>> +{
>> +	return 0;
>> +}
>> +
>> +static inline unsigned int kvm_arch_para_hints(void)
>> +{
>> +	return 0;
>> +}
>> +
>> +static inline bool kvm_para_available(void)
>> +{
>> +	return false;
>> +}
>> +
>> +#endif /* _ASM_ARM_KVM_PARA_H */
>> diff --git a/arch/arm64/include/uapi/asm/Kbuild b/arch/arm64/include/uapi/asm/Kbuild
>> index 602d137932dc..f66554cd5c45 100644
>> --- a/arch/arm64/include/uapi/asm/Kbuild
>> +++ b/arch/arm64/include/uapi/asm/Kbuild
>> @@ -1,3 +1 @@
>>   # SPDX-License-Identifier: GPL-2.0
>> -
>> -generic-y += kvm_para.h
>> diff --git a/arch/arm64/include/uapi/asm/kvm_para.h b/arch/arm64/include/uapi/asm/kvm_para.h
>> new file mode 100644
>> index 000000000000..cd212282b90c
>> --- /dev/null
>> +++ b/arch/arm64/include/uapi/asm/kvm_para.h
>> @@ -0,0 +1,5 @@
>> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
>> +#ifndef _UAPI_ASM_ARM_KVM_PARA_H
>> +#define _UAPI_ASM_ARM_KVM_PARA_H
>> +
>> +#endif /* _UAPI_ASM_ARM_KVM_PARA_H */
>>

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 07/15] KVM: arm64: Support page-not-present notification
  2021-11-12 15:01   ` Eric Auger
@ 2022-01-13  8:43     ` Gavin Shan
  0 siblings, 0 replies; 36+ messages in thread
From: Gavin Shan @ 2022-01-13  8:43 UTC (permalink / raw)
  To: Eric Auger, kvmarm
  Cc: kvm, maz, linux-kernel, shan.gavin, Jonathan.Cameron, pbonzini,
	vkuznets, will

Hi Eric,

On 11/12/21 11:01 PM, Eric Auger wrote:
> On 8/15/21 2:59 AM, Gavin Shan wrote:
>> The requested page might be not resident in memory during the stage-2
>> page fault. For example, the requested page could be resident in swap
>> device (file). In this case, disk I/O is issued in order to fetch the
>> requested page and it could take tens of milliseconds, even hundreds
>> of milliseconds in extreme situation. During the period, the guest's
>> vCPU is suspended until the requested page becomes ready. Actually,
>> the something else on the guest's vCPU could be rescheduled during
> s/the//

ok.

>> the period, so that the time slice isn't wasted as the guest's vCPU
>> can see. This is the primary goal of the feature (Asynchronous Page
>> Fault).
>>
>> This supports delivery of page-not-present notification through SDEI
>> event when the requested page isn't present. When the notification is
>> received on the guest's vCPU, something else (another process) can be
>> scheduled. The design is highlighted as below:
>>
>>     * There is dedicated memory region shared by host and guest. It's
>>       represented by "struct kvm_vcpu_pv_apf_data". The field @reason
>>       indicates the reason why the SDEI event is triggered, while the
>>       unique @token is used by guest to associate the event with the
>>       suspended process.
>>
>>     * One control block is associated with each guest's vCPU and it's
>>       represented by "struct kvm_arch_async_pf_control". It allows the
>>       guest to configure the functionality to indicate the situations
>>       where the host can deliver the page-not-present notification to
>>       kick off asyncrhonous page fault. Besides, runtime states are
> asynchronous

ok.

>>       also maintained in this struct.
>>
>>     * Before the page-not-present notification is sent to the guest's
>>       vCPU, a worker is started and executed asynchronously on host,
>>       to fetch the requested page. "struct kvm{_,_arch}async_pf" is
>>       associated with the worker, to track the work.
>>
>> The feature isn't enabled by CONFIG_KVM_ASYNC_PF yet. Also, the
>> page-ready notification delivery and control path isn't implemented
>> and will be done in the subsequent patches.
>>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> ---
>>   arch/arm64/include/asm/kvm_host.h      |  52 +++++++++
>>   arch/arm64/include/uapi/asm/kvm_para.h |  15 +++
>>   arch/arm64/kvm/Makefile                |   1 +
>>   arch/arm64/kvm/arm.c                   |   3 +
>>   arch/arm64/kvm/async_pf.c              | 145 +++++++++++++++++++++++++
>>   arch/arm64/kvm/mmu.c                   |  33 +++++-
>>   6 files changed, 247 insertions(+), 2 deletions(-)
>>   create mode 100644 arch/arm64/kvm/async_pf.c
>>
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 581825b9df77..6b98aef936b4 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -283,6 +283,31 @@ struct vcpu_reset_state {
>>   	bool		reset;
>>   };
>>   
>> +/* Should be a power of two number */
>> +#define ASYNC_PF_PER_VCPU	64
>> +
>> +/*
>> + * The association of gfn and token. The token will be sent to guest as
>> + * page fault address. Also, the guest could be in aarch32 mode. So its
> s/as page fault address/together with page fault address?
>> + * length should be 32-bits.
>> + */
>> +struct kvm_arch_async_pf {
>> +	u32	token;
>> +	gfn_t	gfn;
>> +	u32	esr;
>> +};
>> +
>> +struct kvm_arch_async_pf_control {
>> +		struct gfn_to_hva_cache	cache;
>> +		u64			control_block;
>> +		bool			send_user_only;
>> +		u64			sdei_event_num;
>> +
> nit: spare empty line

It's intended because I want to keep two blocks separate
due to their usages :)

>> +		u16			id;
>> +		bool			notpresent_pending;
>> +		u32			notpresent_token;
>> +};
>> +
>>   struct kvm_vcpu_arch {
>>   	struct kvm_cpu_context ctxt;
>>   	void *sve_state;
>> @@ -346,6 +371,9 @@ struct kvm_vcpu_arch {
>>   	/* SDEI support */
>>   	struct kvm_sdei_vcpu *sdei;
>>   
>> +	/* Asynchronous page fault support */
>> +	struct kvm_arch_async_pf_control *apf;
>> +
>>   	/*
>>   	 * Guest registers we preserve during guest debugging.
>>   	 *
>> @@ -741,6 +769,30 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
>>   long kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
>>   				struct kvm_arm_copy_mte_tags *copy_tags);
>>   
>> +#ifdef CONFIG_KVM_ASYNC_PF
>> +void kvm_arch_async_pf_create_vcpu(struct kvm_vcpu *vcpu);
>> +bool kvm_arch_async_not_present_allowed(struct kvm_vcpu *vcpu);
>> +bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu,
>> +			     u32 esr, gpa_t gpa, gfn_t gfn);
>> +bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
>> +				     struct kvm_async_pf *work);
>> +void kvm_arch_async_pf_destroy_vcpu(struct kvm_vcpu *vcpu);
>> +#else
>> +static inline void kvm_arch_async_pf_create_vcpu(struct kvm_vcpu *vcpu) { }
>> +static inline void kvm_arch_async_pf_destroy_vcpu(struct kvm_vcpu *vcpu) { }
>> +
>> +static inline bool kvm_arch_async_not_present_allowed(struct kvm_vcpu *vcpu)
>> +{
>> +	return false;
>> +}
>> +
>> +static inline bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu,
>> +					   u32 esr, gpa_t gpa, gfn_t gfn)
>> +{
>> +	return false;
>> +}
>> +#endif
>> +
>>   /* Guest/host FPSIMD coordination helpers */
>>   int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu);
>>   void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu);
>> diff --git a/arch/arm64/include/uapi/asm/kvm_para.h b/arch/arm64/include/uapi/asm/kvm_para.h
>> index cd212282b90c..3fa04006714e 100644
>> --- a/arch/arm64/include/uapi/asm/kvm_para.h
>> +++ b/arch/arm64/include/uapi/asm/kvm_para.h
>> @@ -2,4 +2,19 @@
>>   #ifndef _UAPI_ASM_ARM_KVM_PARA_H
>>   #define _UAPI_ASM_ARM_KVM_PARA_H
>>   
>> +#include <linux/types.h>
>> +
>> +/* Async PF */
>> +#define KVM_ASYNC_PF_ENABLED		(1 << 0)
>> +#define KVM_ASYNC_PF_SEND_ALWAYS	(1 << 1)
> The above define is not used in this patch. Besides can you explain what
> it aims at?

Yep, this should be dropped. When host receives stage-2 page fault, it's
unaware what context the guest is in. The guest can be in user or kernel
mode. It's safe to preempt (interrupt) the guest if it's in user mode
because of async PF. However, it's not safe to do when the guest in
kernel mode.

It seems it's supported on x86, but I didn't look into the details. So
we don't support it on ARM64 for now. It can be supported in future if
needed.

>> +
>> +#define KVM_PV_REASON_PAGE_NOT_PRESENT	1
>> +
>> +struct kvm_vcpu_pv_apf_data {
>> +	__u32	reason;
> on x86 it was renamed into flags. Should we do the same right now?
>> +	__u32	token;
>> +	__u8	pad[56];
>> +	__u32	enabled;
>> +};
>> +
>>   #endif /* _UAPI_ASM_ARM_KVM_PARA_H */
>> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
>> index eefca8ca394d..c9aa307ea542 100644
>> --- a/arch/arm64/kvm/Makefile
>> +++ b/arch/arm64/kvm/Makefile
>> @@ -25,3 +25,4 @@ kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o \
>>   	 vgic/vgic-its.o vgic/vgic-debug.o
>>   
>>   kvm-$(CONFIG_HW_PERF_EVENTS)  += pmu-emul.o
>> +kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o async_pf.o
>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>> index 7d9bbc888ae5..af251896b41d 100644
>> --- a/arch/arm64/kvm/arm.c
>> +++ b/arch/arm64/kvm/arm.c
>> @@ -342,6 +342,8 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
>>   
>>   	kvm_sdei_create_vcpu(vcpu);
>>   
>> +	kvm_arch_async_pf_create_vcpu(vcpu);
>> +
>>   	vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
>>   
>>   	err = kvm_vgic_vcpu_init(vcpu);
>> @@ -363,6 +365,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>>   	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
>>   	kvm_timer_vcpu_terminate(vcpu);
>>   	kvm_pmu_vcpu_destroy(vcpu);
>> +	kvm_arch_async_pf_destroy_vcpu(vcpu);
>>   	kvm_sdei_destroy_vcpu(vcpu);
>>   
>>   	kvm_arm_vcpu_destroy(vcpu);
>> diff --git a/arch/arm64/kvm/async_pf.c b/arch/arm64/kvm/async_pf.c
>> new file mode 100644
>> index 000000000000..742bb8a0a8c0
>> --- /dev/null
>> +++ b/arch/arm64/kvm/async_pf.c
>> @@ -0,0 +1,145 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * Asynchronous page fault support.
>> + *
>> + * Copyright (C) 2021 Red Hat, Inc.
>> + *
>> + * Author(s): Gavin Shan <gshan@redhat.com>
>> + */
>> +
>> +#include <linux/arm-smccc.h>
>> +#include <linux/kvm_host.h>
>> +#include <asm/kvm_emulate.h>
>> +#include <kvm/arm_hypercalls.h>
>> +#include <kvm/arm_vgic.h>
>> +#include <asm/kvm_sdei.h>
>> +
>> +static inline int read_cache(struct kvm_vcpu *vcpu, u32 offset, u32 *val)
>> +{
>> +	struct kvm *kvm = vcpu->kvm;
>> +	struct kvm_arch_async_pf_control *apf = vcpu->arch.apf;
>> +
>> +	return kvm_read_guest_offset_cached(kvm, &apf->cache,
>> +					    val, offset, sizeof(*val));
>> +}
>> +
>> +static inline int write_cache(struct kvm_vcpu *vcpu, u32 offset, u32 val)
>> +{
>> +	struct kvm *kvm = vcpu->kvm;
>> +	struct kvm_arch_async_pf_control *apf = vcpu->arch.apf;
>> +
>> +	return kvm_write_guest_offset_cached(kvm, &apf->cache,
>> +					     &val, offset, sizeof(val));
>> +}
>> +
>> +void kvm_arch_async_pf_create_vcpu(struct kvm_vcpu *vcpu)
>> +{
>> +	vcpu->arch.apf = kzalloc(sizeof(*(vcpu->arch.apf)), GFP_KERNEL);
> shouldn't we escalate the alloc failure and fail the vcpu creation
> instead of checking everywhere that apf is !null which is error prone.
> By the way I saw that on x86 this is a struct included in the vcpu one
> instead of a poinnter.

Ok. Lets embeded this struct to kvm_vcpu_arch, to avoid memory allocation
here. Async PF is a auxillary function and everything should work just
fine when it's disabled. It's why I didn't escalate the memory allocation
failure.

>> +}
>> +
>> +bool kvm_arch_async_not_present_allowed(struct kvm_vcpu *vcpu)
>> +{
>> +	struct kvm *kvm = vcpu->kvm;
>> +	struct kvm_arch_async_pf_control *apf = vcpu->arch.apf;
>> +	struct kvm_sdei_vcpu *vsdei = vcpu->arch.sdei;
>> +	u32 reason, token;
>> +	int ret;
>> +
>> +	if (!apf || !(apf->control_block & KVM_ASYNC_PF_ENABLED))
>> +		return false;
>> +
>> +	if (apf->send_user_only && vcpu_mode_priv(vcpu))
>> +		return false;
>> +
>> +	if (!irqchip_in_kernel(vcpu->kvm))
>> +		return false;
> can you explain why this is needed?

Async PF uses SDEI event to deliver page-not-present notification. When
the guest receives the SDEI event, the associated handler is invoked.
After that, the hyper call (COMPLETE_AND_RESUME) is issued from guest
to host. When host receives this hypercall, all pending interrupts
should be delivered immediately. Without a in-kernel IRQ chip, we're
unable to do that.

>> +
>> +	if (!vsdei || vsdei->critical_event || vsdei->normal_event)
>> +		return false;
> don't you need some locking mechanism to void that vdsei fields change
> after that check? At the moment we may have a single SDEI num but
> nothing prevents from adding others in the future, right?

You're right that we currently have only one SDEI event. However,
additional inline helper with lock is needed for this in future.

>> +
>> +	/* Pending page fault, which isn't acknowledged by guest */
>> +	ret = read_cache(vcpu, offsetof(struct kvm_vcpu_pv_apf_data, reason),
>> +			 &reason);
>> +	if (ret) {
>> +		kvm_err("%s: Error %d to read reason (%d-%d)\n",
>> +			__func__, ret, kvm->userspace_pid, vcpu->vcpu_idx);
>> +		return false;
>> +	}
> x86 code does not have those kvm_err(). You may simply drop them.

It's useful for debugging. I may introduce a macro for this and it's
disabled by default.

>> +
>> +	ret = read_cache(vcpu, offsetof(struct kvm_vcpu_pv_apf_data, token),
>> +			 &token);
>> +	if (ret) {
>> +		kvm_err("%s: Error %d to read token %d-%d\n",
>> +			__func__, ret, kvm->userspace_pid, vcpu->vcpu_idx);
>> +		return false;
>> +	}
>> +
>> +	if (reason || token)
> can't the token be null?

Nice catch! My intend is @token can't be NULL. However, it's not true
in current implementation. Please refer to the explanation below.

>> +		return false;
>> +
>> +	return true;
>> +}
>> +
>> +bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu,
>> +			     u32 esr, gpa_t gpa, gfn_t gfn)
>> +{
>> +	struct kvm_arch_async_pf_control *apf = vcpu->arch.apf;
>> +	struct kvm_arch_async_pf arch;
>> +	unsigned long hva = kvm_vcpu_gfn_to_hva(vcpu, gfn);
>> +
>> +	arch.token = (apf->id++ << 12) | vcpu->vcpu_id;
>> +	arch.gfn = gfn;
>> +	arch.esr = esr;

@arch.token is spposed not NULL. However, "apf->id++" can overflow.
So we need change the code into:

         /*
          * The token is invalid when it's zero. To avoid that by
          * check if overflowing happens.
          */
	if (apf->id == USHORT_MAX)
            apf->id = 0;

	arch.token = (apf->id++ << 12) | vcpu->vcpu_id;
         arch.gfn = gfn;
	arch.esr = esr;

>> +
>> +	return kvm_setup_async_pf(vcpu, gpa, hva, &arch);
>> +}
>> +
>> +/*
>> + * It's guaranteed that no pending asynchronous page fault when this is
> that no APF is pending

Ok.

>> + * called. It means all previous issued asynchronous page faults have
>> + * been acknowledged.
>> + */
>> +bool kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
>> +				     struct kvm_async_pf *work)
>> +{
>> +	struct kvm *kvm = vcpu->kvm;
>> +	struct kvm_arch_async_pf_control *apf = vcpu->arch.apf;
>> +	int ret;
>> +
>> +	kvm_async_pf_add_slot(vcpu, work->arch.gfn);
>> +
>> +	ret = write_cache(vcpu, offsetof(struct kvm_vcpu_pv_apf_data, token),
>> +			  work->arch.token);
>> +	if (ret) {
>> +		kvm_err("%s: Error %d to write token (%d-%d %08x)\n",
> kvm_err's may be dropped

Ok. I will introduce a marco, which is disabled by default.

>> +			__func__, ret, kvm->userspace_pid,
>> +			vcpu->vcpu_idx, work->arch.token);
>> +		goto fail;
>> +	}
>> +
>> +	ret = write_cache(vcpu, offsetof(struct kvm_vcpu_pv_apf_data, reason),
>> +			  KVM_PV_REASON_PAGE_NOT_PRESENT);
>> +	if (ret) {
>> +		kvm_err("%s: Error %d to write reason (%d-%d %08x)\n",
>> +			__func__, ret, kvm->userspace_pid,
>> +			vcpu->vcpu_idx, work->arch.token);
>> +		goto fail;
>> +	}
>> +
>> +	apf->notpresent_pending = true;
>> +	apf->notpresent_token = work->arch.token;
>> +
>> +	return !kvm_sdei_inject(vcpu, apf->sdei_event_num, true);
>> +
>> +fail:
>> +	write_cache(vcpu, offsetof(struct kvm_vcpu_pv_apf_data, token), 0);
>> +	write_cache(vcpu, offsetof(struct kvm_vcpu_pv_apf_data, reason), 0);
>> +	kvm_async_pf_remove_slot(vcpu, work->arch.gfn);
>> +	return false;
>> +}
>> +
>> +void kvm_arch_async_pf_destroy_vcpu(struct kvm_vcpu *vcpu)
>> +{
>> +	kfree(vcpu->arch.apf);
>> +	vcpu->arch.apf = NULL;
>> +}
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index e4038c5e931d..4ba78bd1f18c 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -914,6 +914,33 @@ static inline bool is_write_fault(unsigned int esr)
>>   	return esr_dabt_is_wnr(esr);
>>   }
>>   
>> +static bool try_async_pf(struct kvm_vcpu *vcpu, unsigned int esr,
>> +			 gpa_t gpa, gfn_t gfn, kvm_pfn_t *pfn,
>> +			 bool write, bool *writable, bool prefault)
>> +{
>> +	struct kvm_arch_async_pf_control *apf = vcpu->arch.apf;
>> +	struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
>> +	bool async = false;
>> +
>> +	if (apf) {
> checking apf each time for a potential kacl of resource at vcpu creation
> look heavy to me.

Yep, As replied before, lets embedded the struct to kvm_vcpu_arch to
avoid the memory allocation. In that case, we needn't check it any
more.

>> +		/* Bail if *pfn has correct page */
> s/bail/bail out? Comment rather related to !async check.

Ok.

>> +		*pfn = __gfn_to_pfn_memslot(slot, gfn, false, &async,
>> +					    write, writable, NULL);
>> +		if (!async)
>> +			return false;
>> +
>> +		if (!prefault && kvm_arch_async_not_present_allowed(vcpu)) {
> x86 kvm_can_do_async_pf() naming look more straightforward than
> kvm_arch_async_not_present_allowed

I would keep the function name as I intend to have all function names
have prefix "kvm_arch_async". So lets rename it to kvm_arch_async_allowed()
in next revision.

>> +			if (kvm_async_pf_find_slot(vcpu, gfn) ||
> x86 has some trace points. You may envision to add some, maybe later on.

Yeah, lets ignore this one and we can add it in future.

>> +			    kvm_arch_setup_async_pf(vcpu, esr, gpa, gfn))
>> +				return true;
>> +		}
>> +	}
>> +
>> +	*pfn = __gfn_to_pfn_memslot(slot, gfn, false, NULL,
>> +				    write, writable, NULL);
>> +	return false;
>> +}
>> +
>>   int kvm_handle_user_mem_abort(struct kvm_vcpu *vcpu,
>>   			      struct kvm_memory_slot *memslot,
>>   			      phys_addr_t fault_ipa,
>> @@ -1035,8 +1062,10 @@ int kvm_handle_user_mem_abort(struct kvm_vcpu *vcpu,
>>   	 */
>>   	smp_rmb();
>>   
>> -	pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
>> -				   write_fault, &writable, NULL);
>> +	if (try_async_pf(vcpu, esr, fault_ipa, gfn, &pfn,
>> +			 write_fault, &writable, prefault))
>> +		return 1;
>> +
>>   	if (pfn == KVM_PFN_ERR_HWPOISON) {
>>   		kvm_send_hwpoison_signal(hva, vma_shift);
>>   		return 0;
>>

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2022-01-13  8:44 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-15  0:59 [PATCH v4 00/15] Support Asynchronous Page Fault Gavin Shan
2021-08-15  0:59 ` [PATCH v4 01/15] KVM: async_pf: Move struct kvm_async_pf around Gavin Shan
2021-11-10 15:37   ` Eric Auger
2022-01-13  7:21     ` Gavin Shan
2021-08-15  0:59 ` [PATCH v4 02/15] KVM: async_pf: Add helper function to check completion queue Gavin Shan
2021-08-16 16:53   ` Vitaly Kuznetsov
2021-08-17 10:44     ` Gavin Shan
2021-11-10 15:37   ` Eric Auger
2022-01-13  7:38     ` Gavin Shan
2021-08-15  0:59 ` [PATCH v4 03/15] KVM: async_pf: Make GFN slot management generic Gavin Shan
2021-11-10 17:00   ` Eric Auger
2022-01-13  7:42     ` Gavin Shan
2021-11-10 17:00   ` Eric Auger
2021-08-15  0:59 ` [PATCH v4 04/15] KVM: x86: Use generic async PF slot management Gavin Shan
2021-11-10 17:03   ` Eric Auger
2022-01-13  7:44     ` Gavin Shan
2021-08-15  0:59 ` [PATCH v4 05/15] KVM: arm64: Export kvm_handle_user_mem_abort() Gavin Shan
2021-11-10 18:02   ` Eric Auger
2022-01-13  7:55     ` Gavin Shan
2021-08-15  0:59 ` [PATCH v4 06/15] KVM: arm64: Add paravirtualization header files Gavin Shan
2021-11-10 18:06   ` Eric Auger
2022-01-13  8:00     ` Gavin Shan
2021-08-15  0:59 ` [PATCH v4 07/15] KVM: arm64: Support page-not-present notification Gavin Shan
2021-11-12 15:01   ` Eric Auger
2022-01-13  8:43     ` Gavin Shan
2021-08-15  0:59 ` [PATCH v4 08/15] KVM: arm64: Support page-ready notification Gavin Shan
2021-08-15  0:59 ` [PATCH v4 09/15] KVM: arm64: Support async PF hypercalls Gavin Shan
2021-08-15  0:59 ` [PATCH v4 10/15] KVM: arm64: Support async PF ioctl commands Gavin Shan
2021-08-15  0:59 ` [PATCH v4 11/15] KVM: arm64: Export async PF capability Gavin Shan
2021-08-15  0:59 ` [PATCH v4 12/15] arm64: Detect async PF para-virtualization feature Gavin Shan
2021-08-15  0:59 ` [PATCH v4 13/15] arm64: Reschedule process on aync PF Gavin Shan
2021-08-15  0:59 ` [PATCH v4 14/15] arm64: Enable async PF Gavin Shan
2021-08-16 17:05   ` Vitaly Kuznetsov
2021-08-17 10:49     ` Gavin Shan
2021-08-15  0:59 ` [PATCH v4 15/15] KVM: arm64: Add async PF document Gavin Shan
2021-11-11 10:39   ` Eric Auger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).