All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/9] KVM: arm64: Enable ring-based dirty memory tracking
@ 2022-10-31  0:36 ` Gavin Shan
  0 siblings, 0 replies; 68+ messages in thread
From: Gavin Shan @ 2022-10-31  0:36 UTC (permalink / raw)
  To: kvmarm
  Cc: shuah, catalin.marinas, kvm, maz, andrew.jones, shan.gavin,
	bgardon, dmatlack, pbonzini, zhenyzha, will, kvmarm, ajones

This series enables the ring-based dirty memory tracking for ARM64.
The feature has been available and enabled on x86 for a while. It
is beneficial when the number of dirty pages is small in a checkpointing
system or live migration scenario. More details can be found from
fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking").

v6: https://lore.kernel.org/kvmarm/20221011061447.131531-1-gshan@redhat.com/
v5: https://lore.kernel.org/all/20221005004154.83502-1-gshan@redhat.com/
v4: https://lore.kernel.org/kvmarm/20220927005439.21130-1-gshan@redhat.com/
v3: https://lore.kernel.org/r/20220922003214.276736-1-gshan@redhat.com
v2: https://lore.kernel.org/lkml/YyiV%2Fl7O23aw5aaO@xz-m1.local/T/
v1: https://lore.kernel.org/lkml/20220819005601.198436-1-gshan@redhat.com

Testing
=======
(1) kvm/selftests/dirty_log_test
(2) Live migration by QEMU

Changelog
=========
v7:
  * Cut down #ifdef, avoid using 'container_of()', move the
    dirty-ring check after KVM_REQ_VM_DEAD, add comments
    for kvm_dirty_ring_check_request(), use tab character
    for KVM event definitions in kvm_host.h in PATCH[v7 01]    (Sean)
  * Add PATCH[v7 03] to recheck if the capability has
    been advertised prior to enable RING/RING_ACEL_REL         (Sean)
  * Improve the description about capability RING_WITH_BITMAP,
    rename kvm_dirty_ring_exclusive() to kvm_use_dirty_bitmap()
    in PATCH[v7 04/09]                                         (Peter/Oliver/Sean)
  * Add PATCH[v7 05/09] to improve no-running-vcpu report      (Marc/Sean)
  * Improve commit messages                                    (Sean/Oliver)
v6:
  * Add CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP, for arm64
    to advertise KVM_CAP_DIRTY_RING_WITH_BITMAP in
    PATCH[v6 3/8]                                              (Oliver/Peter)
  * Add helper kvm_dirty_ring_exclusive() to check if
    traditional bitmap-based dirty log tracking is
    exclusive to dirty-ring in PATCH[v6 3/8]                   (Peter)
  * Enable KVM_CAP_DIRTY_RING_WITH_BITMAP in PATCH[v6 5/8]     (Gavin)
v5:
  * Drop empty stub kvm_dirty_ring_check_request()             (Marc/Peter)
  * Add PATCH[v5 3/7] to allow using bitmap, indicated by
    KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP                        (Marc/Peter)
v4:
  * Commit log improvement                                     (Marc)
  * Add helper kvm_dirty_ring_check_request()                  (Marc)
  * Drop ifdef for kvm_cpu_dirty_log_size()                    (Marc)
v3:
  * Check KVM_REQ_RING_SOFT_RULL inside kvm_request_pending()  (Peter)
  * Move declaration of kvm_cpu_dirty_log_size()               (test-robot)
v2:
  * Introduce KVM_REQ_RING_SOFT_FULL                           (Marc)
  * Changelog improvement                                      (Marc)
  * Fix dirty_log_test without knowing host page size          (Drew)

Gavin Shan (9):
  KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL
  KVM: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h
  KVM: Check KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL} prior to enabling
    them
  KVM: Support dirty ring in conjunction with bitmap
  KVM: arm64: Improve no-running-vcpu report for dirty ring
  KVM: arm64: Enable ring-based dirty memory tracking
  KVM: selftests: Use host page size to map ring buffer in
    dirty_log_test
  KVM: selftests: Clear dirty ring states between two modes in
    dirty_log_test
  KVM: selftests: Automate choosing dirty ring size in dirty_log_test

 Documentation/virt/kvm/api.rst               | 33 +++++++++---
 arch/arm64/include/uapi/asm/kvm.h            |  1 +
 arch/arm64/kvm/Kconfig                       |  2 +
 arch/arm64/kvm/arm.c                         |  3 ++
 arch/arm64/kvm/mmu.c                         | 14 ++++++
 arch/arm64/kvm/vgic/vgic-init.c              |  4 +-
 arch/arm64/kvm/vgic/vgic-irqfd.c             |  4 +-
 arch/arm64/kvm/vgic/vgic-its.c               |  2 +-
 arch/arm64/kvm/vgic/vgic-mmio-v3.c           | 18 ++-----
 arch/arm64/kvm/vgic/vgic.c                   | 10 ++++
 arch/arm64/kvm/vgic/vgic.h                   |  1 -
 arch/x86/include/asm/kvm_host.h              |  2 -
 arch/x86/kvm/x86.c                           | 15 +++---
 include/kvm/arm_vgic.h                       |  1 +
 include/linux/kvm_dirty_ring.h               | 25 +++++----
 include/linux/kvm_host.h                     | 10 ++--
 include/uapi/linux/kvm.h                     |  1 +
 tools/testing/selftests/kvm/dirty_log_test.c | 53 ++++++++++++++------
 tools/testing/selftests/kvm/lib/kvm_util.c   |  2 +-
 virt/kvm/Kconfig                             |  8 +++
 virt/kvm/dirty_ring.c                        | 44 ++++++++++++++--
 virt/kvm/kvm_main.c                          | 39 +++++++++-----
 22 files changed, 208 insertions(+), 84 deletions(-)

-- 
2.23.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v7 0/9] KVM: arm64: Enable ring-based dirty memory tracking
@ 2022-10-31  0:36 ` Gavin Shan
  0 siblings, 0 replies; 68+ messages in thread
From: Gavin Shan @ 2022-10-31  0:36 UTC (permalink / raw)
  To: kvmarm
  Cc: kvm, kvmarm, andrew.jones, ajones, maz, bgardon, catalin.marinas,
	dmatlack, will, pbonzini, peterx, oliver.upton, seanjc,
	james.morse, shuah, suzuki.poulose, alexandru.elisei, zhenyzha,
	shan.gavin

This series enables the ring-based dirty memory tracking for ARM64.
The feature has been available and enabled on x86 for a while. It
is beneficial when the number of dirty pages is small in a checkpointing
system or live migration scenario. More details can be found from
fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking").

v6: https://lore.kernel.org/kvmarm/20221011061447.131531-1-gshan@redhat.com/
v5: https://lore.kernel.org/all/20221005004154.83502-1-gshan@redhat.com/
v4: https://lore.kernel.org/kvmarm/20220927005439.21130-1-gshan@redhat.com/
v3: https://lore.kernel.org/r/20220922003214.276736-1-gshan@redhat.com
v2: https://lore.kernel.org/lkml/YyiV%2Fl7O23aw5aaO@xz-m1.local/T/
v1: https://lore.kernel.org/lkml/20220819005601.198436-1-gshan@redhat.com

Testing
=======
(1) kvm/selftests/dirty_log_test
(2) Live migration by QEMU

Changelog
=========
v7:
  * Cut down #ifdef, avoid using 'container_of()', move the
    dirty-ring check after KVM_REQ_VM_DEAD, add comments
    for kvm_dirty_ring_check_request(), use tab character
    for KVM event definitions in kvm_host.h in PATCH[v7 01]    (Sean)
  * Add PATCH[v7 03] to recheck if the capability has
    been advertised prior to enable RING/RING_ACEL_REL         (Sean)
  * Improve the description about capability RING_WITH_BITMAP,
    rename kvm_dirty_ring_exclusive() to kvm_use_dirty_bitmap()
    in PATCH[v7 04/09]                                         (Peter/Oliver/Sean)
  * Add PATCH[v7 05/09] to improve no-running-vcpu report      (Marc/Sean)
  * Improve commit messages                                    (Sean/Oliver)
v6:
  * Add CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP, for arm64
    to advertise KVM_CAP_DIRTY_RING_WITH_BITMAP in
    PATCH[v6 3/8]                                              (Oliver/Peter)
  * Add helper kvm_dirty_ring_exclusive() to check if
    traditional bitmap-based dirty log tracking is
    exclusive to dirty-ring in PATCH[v6 3/8]                   (Peter)
  * Enable KVM_CAP_DIRTY_RING_WITH_BITMAP in PATCH[v6 5/8]     (Gavin)
v5:
  * Drop empty stub kvm_dirty_ring_check_request()             (Marc/Peter)
  * Add PATCH[v5 3/7] to allow using bitmap, indicated by
    KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP                        (Marc/Peter)
v4:
  * Commit log improvement                                     (Marc)
  * Add helper kvm_dirty_ring_check_request()                  (Marc)
  * Drop ifdef for kvm_cpu_dirty_log_size()                    (Marc)
v3:
  * Check KVM_REQ_RING_SOFT_RULL inside kvm_request_pending()  (Peter)
  * Move declaration of kvm_cpu_dirty_log_size()               (test-robot)
v2:
  * Introduce KVM_REQ_RING_SOFT_FULL                           (Marc)
  * Changelog improvement                                      (Marc)
  * Fix dirty_log_test without knowing host page size          (Drew)

Gavin Shan (9):
  KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL
  KVM: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h
  KVM: Check KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL} prior to enabling
    them
  KVM: Support dirty ring in conjunction with bitmap
  KVM: arm64: Improve no-running-vcpu report for dirty ring
  KVM: arm64: Enable ring-based dirty memory tracking
  KVM: selftests: Use host page size to map ring buffer in
    dirty_log_test
  KVM: selftests: Clear dirty ring states between two modes in
    dirty_log_test
  KVM: selftests: Automate choosing dirty ring size in dirty_log_test

 Documentation/virt/kvm/api.rst               | 33 +++++++++---
 arch/arm64/include/uapi/asm/kvm.h            |  1 +
 arch/arm64/kvm/Kconfig                       |  2 +
 arch/arm64/kvm/arm.c                         |  3 ++
 arch/arm64/kvm/mmu.c                         | 14 ++++++
 arch/arm64/kvm/vgic/vgic-init.c              |  4 +-
 arch/arm64/kvm/vgic/vgic-irqfd.c             |  4 +-
 arch/arm64/kvm/vgic/vgic-its.c               |  2 +-
 arch/arm64/kvm/vgic/vgic-mmio-v3.c           | 18 ++-----
 arch/arm64/kvm/vgic/vgic.c                   | 10 ++++
 arch/arm64/kvm/vgic/vgic.h                   |  1 -
 arch/x86/include/asm/kvm_host.h              |  2 -
 arch/x86/kvm/x86.c                           | 15 +++---
 include/kvm/arm_vgic.h                       |  1 +
 include/linux/kvm_dirty_ring.h               | 25 +++++----
 include/linux/kvm_host.h                     | 10 ++--
 include/uapi/linux/kvm.h                     |  1 +
 tools/testing/selftests/kvm/dirty_log_test.c | 53 ++++++++++++++------
 tools/testing/selftests/kvm/lib/kvm_util.c   |  2 +-
 virt/kvm/Kconfig                             |  8 +++
 virt/kvm/dirty_ring.c                        | 44 ++++++++++++++--
 virt/kvm/kvm_main.c                          | 39 +++++++++-----
 22 files changed, 208 insertions(+), 84 deletions(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v7 1/9] KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL
  2022-10-31  0:36 ` Gavin Shan
@ 2022-10-31  0:36   ` Gavin Shan
  -1 siblings, 0 replies; 68+ messages in thread
From: Gavin Shan @ 2022-10-31  0:36 UTC (permalink / raw)
  To: kvmarm
  Cc: shuah, catalin.marinas, kvm, maz, andrew.jones, shan.gavin,
	bgardon, dmatlack, pbonzini, zhenyzha, will, kvmarm, ajones

The VCPU isn't expected to be runnable when the dirty ring becomes soft
full, until the dirty pages are harvested and the dirty ring is reset
from userspace. So there is a check in each guest's entrace to see if
the dirty ring is soft full or not. The VCPU is stopped from running if
its dirty ring has been soft full. The similar check will be needed when
the feature is going to be supported on ARM64. As Marc Zyngier suggested,
a new event will avoid pointless overhead to check the size of the dirty
ring ('vcpu->kvm->dirty_ring_size') in each guest's entrance.

Add KVM_REQ_DIRTY_RING_SOFT_FULL. The event is raised when the dirty ring
becomes soft full in kvm_dirty_ring_push(). The event is cleared in the
check, done in the newly added helper kvm_dirty_ring_check_request(), or
when the dirty ring is reset by userspace. Since the VCPU is not runnable
when the dirty ring becomes soft full, the KVM_REQ_DIRTY_RING_SOFT_FULL
event is always set to prevent the VCPU from running until the dirty pages
are harvested and the dirty ring is reset by userspace.

kvm_dirty_ring_soft_full() becomes a private function with the newly added
helper kvm_dirty_ring_check_request(). The alignment for the various event
definitions in kvm_host.h is changed to tab character by the way. In order
to avoid using 'container_of()', the argument @ring is replaced by @vcpu
in kvm_dirty_ring_push() and kvm_dirty_ring_reset(). The argument @kvm to
kvm_dirty_ring_reset() is dropped since it can be retrieved from the VCPU.

Link: https://lore.kernel.org/kvmarm/87lerkwtm5.wl-maz@kernel.org
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 arch/x86/kvm/x86.c             | 15 ++++++---------
 include/linux/kvm_dirty_ring.h | 17 ++++++-----------
 include/linux/kvm_host.h       |  9 +++++----
 virt/kvm/dirty_ring.c          | 34 +++++++++++++++++++++++++++++++---
 virt/kvm/kvm_main.c            |  5 ++---
 5 files changed, 50 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9cf1ba865562..d0d32e67ebf3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10499,20 +10499,17 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
 	bool req_immediate_exit = false;
 
-	/* Forbid vmenter if vcpu dirty ring is soft-full */
-	if (unlikely(vcpu->kvm->dirty_ring_size &&
-		     kvm_dirty_ring_soft_full(&vcpu->dirty_ring))) {
-		vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL;
-		trace_kvm_dirty_ring_exit(vcpu);
-		r = 0;
-		goto out;
-	}
-
 	if (kvm_request_pending(vcpu)) {
 		if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) {
 			r = -EIO;
 			goto out;
 		}
+
+		if (kvm_dirty_ring_check_request(vcpu)) {
+			r = 0;
+			goto out;
+		}
+
 		if (kvm_check_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu)) {
 			if (unlikely(!kvm_x86_ops.nested_ops->get_nested_state_pages(vcpu))) {
 				r = 0;
diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
index 906f899813dc..53a36f38d15e 100644
--- a/include/linux/kvm_dirty_ring.h
+++ b/include/linux/kvm_dirty_ring.h
@@ -43,13 +43,12 @@ static inline int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring,
 	return 0;
 }
 
-static inline int kvm_dirty_ring_reset(struct kvm *kvm,
-				       struct kvm_dirty_ring *ring)
+static inline int kvm_dirty_ring_reset(struct kvm_vcpu *vcpu)
 {
 	return 0;
 }
 
-static inline void kvm_dirty_ring_push(struct kvm_dirty_ring *ring,
+static inline void kvm_dirty_ring_push(struct kvm_vcpu *vcpu,
 				       u32 slot, u64 offset)
 {
 }
@@ -64,11 +63,6 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
 {
 }
 
-static inline bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring)
-{
-	return true;
-}
-
 #else /* CONFIG_HAVE_KVM_DIRTY_RING */
 
 u32 kvm_dirty_ring_get_rsvd_entries(void);
@@ -78,19 +72,20 @@ int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size);
  * called with kvm->slots_lock held, returns the number of
  * processed pages.
  */
-int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring);
+int kvm_dirty_ring_reset(struct kvm_vcpu *vcpu);
 
 /*
  * returns =0: successfully pushed
  *         <0: unable to push, need to wait
  */
-void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset);
+void kvm_dirty_ring_push(struct kvm_vcpu *vcpu, u32 slot, u64 offset);
+
+bool kvm_dirty_ring_check_request(struct kvm_vcpu *vcpu);
 
 /* for use in vm_operations_struct */
 struct page *kvm_dirty_ring_get_page(struct kvm_dirty_ring *ring, u32 offset);
 
 void kvm_dirty_ring_free(struct kvm_dirty_ring *ring);
-bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring);
 
 #endif /* CONFIG_HAVE_KVM_DIRTY_RING */
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 00c3448ba7f8..648d663f32c4 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -153,10 +153,11 @@ static inline bool is_error_page(struct page *page)
  * Architecture-independent vcpu->requests bit members
  * Bits 3-7 are reserved for more arch-independent bits.
  */
-#define KVM_REQ_TLB_FLUSH         (0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
-#define KVM_REQ_VM_DEAD           (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
-#define KVM_REQ_UNBLOCK           2
-#define KVM_REQUEST_ARCH_BASE     8
+#define KVM_REQ_TLB_FLUSH		(0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_VM_DEAD			(1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_UNBLOCK			2
+#define KVM_REQ_DIRTY_RING_SOFT_FULL	3
+#define KVM_REQUEST_ARCH_BASE		8
 
 /*
  * KVM_REQ_OUTSIDE_GUEST_MODE exists is purely as way to force the vCPU to
diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index d6fabf238032..6091e1403bc8 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -26,7 +26,7 @@ static u32 kvm_dirty_ring_used(struct kvm_dirty_ring *ring)
 	return READ_ONCE(ring->dirty_index) - READ_ONCE(ring->reset_index);
 }
 
-bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring)
+static bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring)
 {
 	return kvm_dirty_ring_used(ring) >= ring->soft_limit;
 }
@@ -87,8 +87,10 @@ static inline bool kvm_dirty_gfn_harvested(struct kvm_dirty_gfn *gfn)
 	return smp_load_acquire(&gfn->flags) & KVM_DIRTY_GFN_F_RESET;
 }
 
-int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
+int kvm_dirty_ring_reset(struct kvm_vcpu *vcpu)
 {
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_dirty_ring *ring = &vcpu->dirty_ring;
 	u32 cur_slot, next_slot;
 	u64 cur_offset, next_offset;
 	unsigned long mask;
@@ -142,13 +144,17 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
 
 	kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
 
+	if (!kvm_dirty_ring_soft_full(ring))
+		kvm_clear_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu);
+
 	trace_kvm_dirty_ring_reset(ring);
 
 	return count;
 }
 
-void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset)
+void kvm_dirty_ring_push(struct kvm_vcpu *vcpu, u32 slot, u64 offset)
 {
+	struct kvm_dirty_ring *ring = &vcpu->dirty_ring;
 	struct kvm_dirty_gfn *entry;
 
 	/* It should never get full */
@@ -166,6 +172,28 @@ void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset)
 	kvm_dirty_gfn_set_dirtied(entry);
 	ring->dirty_index++;
 	trace_kvm_dirty_ring_push(ring, slot, offset);
+
+	if (kvm_dirty_ring_soft_full(ring))
+		kvm_make_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu);
+}
+
+bool kvm_dirty_ring_check_request(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * The VCPU isn't runnable when the dirty ring becomes soft full.
+	 * The KVM_REQ_DIRTY_RING_SOFT_FULL event is always set to prevent
+	 * the VCPU from running until the dirty pages are harvested and
+	 * the dirty ring is reset by userspace.
+	 */
+	if (kvm_check_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu) &&
+	    kvm_dirty_ring_soft_full(&vcpu->dirty_ring)) {
+		kvm_make_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu);
+		vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL;
+		trace_kvm_dirty_ring_exit(vcpu);
+		return true;
+	}
+
+	return false;
 }
 
 struct page *kvm_dirty_ring_get_page(struct kvm_dirty_ring *ring, u32 offset)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1376a47fedee..30ff73931e1c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3314,8 +3314,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
 		u32 slot = (memslot->as_id << 16) | memslot->id;
 
 		if (kvm->dirty_ring_size)
-			kvm_dirty_ring_push(&vcpu->dirty_ring,
-					    slot, rel_gfn);
+			kvm_dirty_ring_push(vcpu, slot, rel_gfn);
 		else
 			set_bit_le(rel_gfn, memslot->dirty_bitmap);
 	}
@@ -4543,7 +4542,7 @@ static int kvm_vm_ioctl_reset_dirty_pages(struct kvm *kvm)
 	mutex_lock(&kvm->slots_lock);
 
 	kvm_for_each_vcpu(i, vcpu, kvm)
-		cleared += kvm_dirty_ring_reset(vcpu->kvm, &vcpu->dirty_ring);
+		cleared += kvm_dirty_ring_reset(vcpu);
 
 	mutex_unlock(&kvm->slots_lock);
 
-- 
2.23.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v7 1/9] KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL
@ 2022-10-31  0:36   ` Gavin Shan
  0 siblings, 0 replies; 68+ messages in thread
From: Gavin Shan @ 2022-10-31  0:36 UTC (permalink / raw)
  To: kvmarm
  Cc: kvm, kvmarm, andrew.jones, ajones, maz, bgardon, catalin.marinas,
	dmatlack, will, pbonzini, peterx, oliver.upton, seanjc,
	james.morse, shuah, suzuki.poulose, alexandru.elisei, zhenyzha,
	shan.gavin

The VCPU isn't expected to be runnable when the dirty ring becomes soft
full, until the dirty pages are harvested and the dirty ring is reset
from userspace. So there is a check in each guest's entrace to see if
the dirty ring is soft full or not. The VCPU is stopped from running if
its dirty ring has been soft full. The similar check will be needed when
the feature is going to be supported on ARM64. As Marc Zyngier suggested,
a new event will avoid pointless overhead to check the size of the dirty
ring ('vcpu->kvm->dirty_ring_size') in each guest's entrance.

Add KVM_REQ_DIRTY_RING_SOFT_FULL. The event is raised when the dirty ring
becomes soft full in kvm_dirty_ring_push(). The event is cleared in the
check, done in the newly added helper kvm_dirty_ring_check_request(), or
when the dirty ring is reset by userspace. Since the VCPU is not runnable
when the dirty ring becomes soft full, the KVM_REQ_DIRTY_RING_SOFT_FULL
event is always set to prevent the VCPU from running until the dirty pages
are harvested and the dirty ring is reset by userspace.

kvm_dirty_ring_soft_full() becomes a private function with the newly added
helper kvm_dirty_ring_check_request(). The alignment for the various event
definitions in kvm_host.h is changed to tab character by the way. In order
to avoid using 'container_of()', the argument @ring is replaced by @vcpu
in kvm_dirty_ring_push() and kvm_dirty_ring_reset(). The argument @kvm to
kvm_dirty_ring_reset() is dropped since it can be retrieved from the VCPU.

Link: https://lore.kernel.org/kvmarm/87lerkwtm5.wl-maz@kernel.org
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 arch/x86/kvm/x86.c             | 15 ++++++---------
 include/linux/kvm_dirty_ring.h | 17 ++++++-----------
 include/linux/kvm_host.h       |  9 +++++----
 virt/kvm/dirty_ring.c          | 34 +++++++++++++++++++++++++++++++---
 virt/kvm/kvm_main.c            |  5 ++---
 5 files changed, 50 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9cf1ba865562..d0d32e67ebf3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10499,20 +10499,17 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
 	bool req_immediate_exit = false;
 
-	/* Forbid vmenter if vcpu dirty ring is soft-full */
-	if (unlikely(vcpu->kvm->dirty_ring_size &&
-		     kvm_dirty_ring_soft_full(&vcpu->dirty_ring))) {
-		vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL;
-		trace_kvm_dirty_ring_exit(vcpu);
-		r = 0;
-		goto out;
-	}
-
 	if (kvm_request_pending(vcpu)) {
 		if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) {
 			r = -EIO;
 			goto out;
 		}
+
+		if (kvm_dirty_ring_check_request(vcpu)) {
+			r = 0;
+			goto out;
+		}
+
 		if (kvm_check_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu)) {
 			if (unlikely(!kvm_x86_ops.nested_ops->get_nested_state_pages(vcpu))) {
 				r = 0;
diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
index 906f899813dc..53a36f38d15e 100644
--- a/include/linux/kvm_dirty_ring.h
+++ b/include/linux/kvm_dirty_ring.h
@@ -43,13 +43,12 @@ static inline int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring,
 	return 0;
 }
 
-static inline int kvm_dirty_ring_reset(struct kvm *kvm,
-				       struct kvm_dirty_ring *ring)
+static inline int kvm_dirty_ring_reset(struct kvm_vcpu *vcpu)
 {
 	return 0;
 }
 
-static inline void kvm_dirty_ring_push(struct kvm_dirty_ring *ring,
+static inline void kvm_dirty_ring_push(struct kvm_vcpu *vcpu,
 				       u32 slot, u64 offset)
 {
 }
@@ -64,11 +63,6 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
 {
 }
 
-static inline bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring)
-{
-	return true;
-}
-
 #else /* CONFIG_HAVE_KVM_DIRTY_RING */
 
 u32 kvm_dirty_ring_get_rsvd_entries(void);
@@ -78,19 +72,20 @@ int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size);
  * called with kvm->slots_lock held, returns the number of
  * processed pages.
  */
-int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring);
+int kvm_dirty_ring_reset(struct kvm_vcpu *vcpu);
 
 /*
  * returns =0: successfully pushed
  *         <0: unable to push, need to wait
  */
-void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset);
+void kvm_dirty_ring_push(struct kvm_vcpu *vcpu, u32 slot, u64 offset);
+
+bool kvm_dirty_ring_check_request(struct kvm_vcpu *vcpu);
 
 /* for use in vm_operations_struct */
 struct page *kvm_dirty_ring_get_page(struct kvm_dirty_ring *ring, u32 offset);
 
 void kvm_dirty_ring_free(struct kvm_dirty_ring *ring);
-bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring);
 
 #endif /* CONFIG_HAVE_KVM_DIRTY_RING */
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 00c3448ba7f8..648d663f32c4 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -153,10 +153,11 @@ static inline bool is_error_page(struct page *page)
  * Architecture-independent vcpu->requests bit members
  * Bits 3-7 are reserved for more arch-independent bits.
  */
-#define KVM_REQ_TLB_FLUSH         (0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
-#define KVM_REQ_VM_DEAD           (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
-#define KVM_REQ_UNBLOCK           2
-#define KVM_REQUEST_ARCH_BASE     8
+#define KVM_REQ_TLB_FLUSH		(0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_VM_DEAD			(1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_UNBLOCK			2
+#define KVM_REQ_DIRTY_RING_SOFT_FULL	3
+#define KVM_REQUEST_ARCH_BASE		8
 
 /*
  * KVM_REQ_OUTSIDE_GUEST_MODE exists is purely as way to force the vCPU to
diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index d6fabf238032..6091e1403bc8 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -26,7 +26,7 @@ static u32 kvm_dirty_ring_used(struct kvm_dirty_ring *ring)
 	return READ_ONCE(ring->dirty_index) - READ_ONCE(ring->reset_index);
 }
 
-bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring)
+static bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring)
 {
 	return kvm_dirty_ring_used(ring) >= ring->soft_limit;
 }
@@ -87,8 +87,10 @@ static inline bool kvm_dirty_gfn_harvested(struct kvm_dirty_gfn *gfn)
 	return smp_load_acquire(&gfn->flags) & KVM_DIRTY_GFN_F_RESET;
 }
 
-int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
+int kvm_dirty_ring_reset(struct kvm_vcpu *vcpu)
 {
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_dirty_ring *ring = &vcpu->dirty_ring;
 	u32 cur_slot, next_slot;
 	u64 cur_offset, next_offset;
 	unsigned long mask;
@@ -142,13 +144,17 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
 
 	kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
 
+	if (!kvm_dirty_ring_soft_full(ring))
+		kvm_clear_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu);
+
 	trace_kvm_dirty_ring_reset(ring);
 
 	return count;
 }
 
-void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset)
+void kvm_dirty_ring_push(struct kvm_vcpu *vcpu, u32 slot, u64 offset)
 {
+	struct kvm_dirty_ring *ring = &vcpu->dirty_ring;
 	struct kvm_dirty_gfn *entry;
 
 	/* It should never get full */
@@ -166,6 +172,28 @@ void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset)
 	kvm_dirty_gfn_set_dirtied(entry);
 	ring->dirty_index++;
 	trace_kvm_dirty_ring_push(ring, slot, offset);
+
+	if (kvm_dirty_ring_soft_full(ring))
+		kvm_make_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu);
+}
+
+bool kvm_dirty_ring_check_request(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * The VCPU isn't runnable when the dirty ring becomes soft full.
+	 * The KVM_REQ_DIRTY_RING_SOFT_FULL event is always set to prevent
+	 * the VCPU from running until the dirty pages are harvested and
+	 * the dirty ring is reset by userspace.
+	 */
+	if (kvm_check_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu) &&
+	    kvm_dirty_ring_soft_full(&vcpu->dirty_ring)) {
+		kvm_make_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu);
+		vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL;
+		trace_kvm_dirty_ring_exit(vcpu);
+		return true;
+	}
+
+	return false;
 }
 
 struct page *kvm_dirty_ring_get_page(struct kvm_dirty_ring *ring, u32 offset)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1376a47fedee..30ff73931e1c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3314,8 +3314,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
 		u32 slot = (memslot->as_id << 16) | memslot->id;
 
 		if (kvm->dirty_ring_size)
-			kvm_dirty_ring_push(&vcpu->dirty_ring,
-					    slot, rel_gfn);
+			kvm_dirty_ring_push(vcpu, slot, rel_gfn);
 		else
 			set_bit_le(rel_gfn, memslot->dirty_bitmap);
 	}
@@ -4543,7 +4542,7 @@ static int kvm_vm_ioctl_reset_dirty_pages(struct kvm *kvm)
 	mutex_lock(&kvm->slots_lock);
 
 	kvm_for_each_vcpu(i, vcpu, kvm)
-		cleared += kvm_dirty_ring_reset(vcpu->kvm, &vcpu->dirty_ring);
+		cleared += kvm_dirty_ring_reset(vcpu);
 
 	mutex_unlock(&kvm->slots_lock);
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v7 2/9] KVM: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h
  2022-10-31  0:36 ` Gavin Shan
@ 2022-10-31  0:36   ` Gavin Shan
  -1 siblings, 0 replies; 68+ messages in thread
From: Gavin Shan @ 2022-10-31  0:36 UTC (permalink / raw)
  To: kvmarm
  Cc: shuah, catalin.marinas, kvm, maz, andrew.jones, shan.gavin,
	bgardon, dmatlack, pbonzini, zhenyzha, will, kvmarm, ajones

Not all architectures like ARM64 need to override the function. Move
its declaration to kvm_dirty_ring.h to avoid the following compiling
warning on ARM64 when the feature is enabled.

  arch/arm64/kvm/../../../virt/kvm/dirty_ring.c:14:12:        \
  warning: no previous prototype for 'kvm_cpu_dirty_log_size' \
  [-Wmissing-prototypes]                                      \
  int __weak kvm_cpu_dirty_log_size(void)

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 2 --
 include/linux/kvm_dirty_ring.h  | 1 +
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7551b6f9c31c..b4dbde7d9eb1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2090,8 +2090,6 @@ static inline int kvm_cpu_get_apicid(int mps_cpu)
 #define GET_SMSTATE(type, buf, offset)		\
 	(*(type *)((buf) + (offset) - 0x7e00))
 
-int kvm_cpu_dirty_log_size(void);
-
 int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
 
 #define KVM_CLOCK_VALID_FLAGS						\
diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
index 53a36f38d15e..04290eda0852 100644
--- a/include/linux/kvm_dirty_ring.h
+++ b/include/linux/kvm_dirty_ring.h
@@ -65,6 +65,7 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
 
 #else /* CONFIG_HAVE_KVM_DIRTY_RING */
 
+int kvm_cpu_dirty_log_size(void);
 u32 kvm_dirty_ring_get_rsvd_entries(void);
 int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size);
 
-- 
2.23.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v7 2/9] KVM: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h
@ 2022-10-31  0:36   ` Gavin Shan
  0 siblings, 0 replies; 68+ messages in thread
From: Gavin Shan @ 2022-10-31  0:36 UTC (permalink / raw)
  To: kvmarm
  Cc: kvm, kvmarm, andrew.jones, ajones, maz, bgardon, catalin.marinas,
	dmatlack, will, pbonzini, peterx, oliver.upton, seanjc,
	james.morse, shuah, suzuki.poulose, alexandru.elisei, zhenyzha,
	shan.gavin

Not all architectures like ARM64 need to override the function. Move
its declaration to kvm_dirty_ring.h to avoid the following compiling
warning on ARM64 when the feature is enabled.

  arch/arm64/kvm/../../../virt/kvm/dirty_ring.c:14:12:        \
  warning: no previous prototype for 'kvm_cpu_dirty_log_size' \
  [-Wmissing-prototypes]                                      \
  int __weak kvm_cpu_dirty_log_size(void)

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 2 --
 include/linux/kvm_dirty_ring.h  | 1 +
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7551b6f9c31c..b4dbde7d9eb1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2090,8 +2090,6 @@ static inline int kvm_cpu_get_apicid(int mps_cpu)
 #define GET_SMSTATE(type, buf, offset)		\
 	(*(type *)((buf) + (offset) - 0x7e00))
 
-int kvm_cpu_dirty_log_size(void);
-
 int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
 
 #define KVM_CLOCK_VALID_FLAGS						\
diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
index 53a36f38d15e..04290eda0852 100644
--- a/include/linux/kvm_dirty_ring.h
+++ b/include/linux/kvm_dirty_ring.h
@@ -65,6 +65,7 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
 
 #else /* CONFIG_HAVE_KVM_DIRTY_RING */
 
+int kvm_cpu_dirty_log_size(void);
 u32 kvm_dirty_ring_get_rsvd_entries(void);
 int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size);
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v7 3/9] KVM: Check KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL} prior to enabling them
  2022-10-31  0:36 ` Gavin Shan
@ 2022-10-31  0:36   ` Gavin Shan
  -1 siblings, 0 replies; 68+ messages in thread
From: Gavin Shan @ 2022-10-31  0:36 UTC (permalink / raw)
  To: kvmarm
  Cc: shuah, catalin.marinas, kvm, maz, andrew.jones, shan.gavin,
	bgardon, dmatlack, pbonzini, zhenyzha, will, kvmarm, ajones

There are two capabilities related to ring-based dirty page tracking:
KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL. Both are
supported by x86. However, arm64 supports KVM_CAP_DIRTY_LOG_RING_ACQ_REL
only when the feature is supported on arm64. The userspace doesn't have
to enable the advertised capability, meaning KVM_CAP_DIRTY_LOG_RING can
be enabled on arm64 by userspace and it's wrong.

Fix it by double checking if the capability has been advertised prior to
enabling it. It's rejected to enable the capability if it hasn't been
advertised.

Fixes: 17601bfed909 ("KVM: Add KVM_CAP_DIRTY_LOG_RING_ACQ_REL capability and config option")
Reported-by: Sean Christopherson <seanjc@google.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 virt/kvm/kvm_main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 30ff73931e1c..91cf51a25394 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4584,6 +4584,9 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
 	}
 	case KVM_CAP_DIRTY_LOG_RING:
 	case KVM_CAP_DIRTY_LOG_RING_ACQ_REL:
+		if (!kvm_vm_ioctl_check_extension_generic(kvm, cap->cap))
+			return -EINVAL;
+
 		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
 	default:
 		return kvm_vm_ioctl_enable_cap(kvm, cap);
-- 
2.23.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v7 3/9] KVM: Check KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL} prior to enabling them
@ 2022-10-31  0:36   ` Gavin Shan
  0 siblings, 0 replies; 68+ messages in thread
From: Gavin Shan @ 2022-10-31  0:36 UTC (permalink / raw)
  To: kvmarm
  Cc: kvm, kvmarm, andrew.jones, ajones, maz, bgardon, catalin.marinas,
	dmatlack, will, pbonzini, peterx, oliver.upton, seanjc,
	james.morse, shuah, suzuki.poulose, alexandru.elisei, zhenyzha,
	shan.gavin

There are two capabilities related to ring-based dirty page tracking:
KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL. Both are
supported by x86. However, arm64 supports KVM_CAP_DIRTY_LOG_RING_ACQ_REL
only when the feature is supported on arm64. The userspace doesn't have
to enable the advertised capability, meaning KVM_CAP_DIRTY_LOG_RING can
be enabled on arm64 by userspace and it's wrong.

Fix it by double checking if the capability has been advertised prior to
enabling it. It's rejected to enable the capability if it hasn't been
advertised.

Fixes: 17601bfed909 ("KVM: Add KVM_CAP_DIRTY_LOG_RING_ACQ_REL capability and config option")
Reported-by: Sean Christopherson <seanjc@google.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 virt/kvm/kvm_main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 30ff73931e1c..91cf51a25394 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4584,6 +4584,9 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
 	}
 	case KVM_CAP_DIRTY_LOG_RING:
 	case KVM_CAP_DIRTY_LOG_RING_ACQ_REL:
+		if (!kvm_vm_ioctl_check_extension_generic(kvm, cap->cap))
+			return -EINVAL;
+
 		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
 	default:
 		return kvm_vm_ioctl_enable_cap(kvm, cap);
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v7 4/9] KVM: Support dirty ring in conjunction with bitmap
  2022-10-31  0:36 ` Gavin Shan
@ 2022-10-31  0:36   ` Gavin Shan
  -1 siblings, 0 replies; 68+ messages in thread
From: Gavin Shan @ 2022-10-31  0:36 UTC (permalink / raw)
  To: kvmarm
  Cc: shuah, catalin.marinas, kvm, maz, andrew.jones, shan.gavin,
	bgardon, dmatlack, pbonzini, zhenyzha, will, kvmarm, ajones

ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
enabled. It's conflicting with that ring-based dirty page tracking always
requires a running VCPU context.

Introduce a new flavor of dirty ring that requires the use of both VCPU
dirty rings and a dirty bitmap. The expectation is that for non-VCPU
sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
the dirty bitmap. Userspace should scan the dirty bitmap before migrating
the VM to the target.

Use an additional capability to advertise this behavior. The newly added
capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.

Suggested-by: Marc Zyngier <maz@kernel.org>
Suggested-by: Peter Xu <peterx@redhat.com>
Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 Documentation/virt/kvm/api.rst | 31 ++++++++++++++++++++++++-------
 include/linux/kvm_dirty_ring.h |  6 ++++++
 include/linux/kvm_host.h       |  1 +
 include/uapi/linux/kvm.h       |  1 +
 virt/kvm/Kconfig               |  8 ++++++++
 virt/kvm/dirty_ring.c          |  5 +++++
 virt/kvm/kvm_main.c            | 31 ++++++++++++++++++++++---------
 7 files changed, 67 insertions(+), 16 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index eee9f857a986..4d4eeb5c3c5a 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8003,13 +8003,6 @@ flushing is done by the KVM_GET_DIRTY_LOG ioctl).  To achieve that, one
 needs to kick the vcpu out of KVM_RUN using a signal.  The resulting
 vmexit ensures that all dirty GFNs are flushed to the dirty rings.
 
-NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the corresponding
-ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls
-KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG.  After enabling
-KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual
-machine will switch to ring-buffer dirty page tracking and further
-KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail.
-
 NOTE: KVM_CAP_DIRTY_LOG_RING_ACQ_REL is the only capability that
 should be exposed by weakly ordered architecture, in order to indicate
 the additional memory ordering requirements imposed on userspace when
@@ -8018,6 +8011,30 @@ Architecture with TSO-like ordering (such as x86) are allowed to
 expose both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL
 to userspace.
 
+After using the dirty rings, the userspace needs to detect the capability
+of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures
+need to be backed by per-slot bitmaps. With this capability advertised
+and supported, it means the architecture can dirty guest pages without
+vcpu/ring context, so that some of the dirty information will still be
+maintained in the bitmap structure.
+
+Note that the bitmap here is only a backup of the ring structure, and
+normally should only contain a very small amount of dirty pages, which
+needs to be transferred during VM downtime. Collecting the dirty bitmap
+should be the very last thing that the VMM does before transmitting state
+to the target VM. VMM needs to ensure that the dirty state is final and
+avoid missing dirty pages from another ioctl ordered after the bitmap
+collection.
+
+To collect dirty bits in the backup bitmap, the userspace can use the
+same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG shouldn't be needed
+and its behavior is undefined since collecting the dirty bitmap always
+happens in the last phase of VM's migration.
+
+NOTE: One example of using the backup bitmap is saving arm64 vgic/its
+tables through KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on
+KVM device "kvm-arm-vgic-its" during VM's migration.
+
 8.30 KVM_CAP_XEN_HVM
 --------------------
 
diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
index 04290eda0852..b08b9afd8bdb 100644
--- a/include/linux/kvm_dirty_ring.h
+++ b/include/linux/kvm_dirty_ring.h
@@ -37,6 +37,11 @@ static inline u32 kvm_dirty_ring_get_rsvd_entries(void)
 	return 0;
 }
 
+static inline bool kvm_use_dirty_bitmap(struct kvm *kvm)
+{
+	return true;
+}
+
 static inline int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring,
 				       int index, u32 size)
 {
@@ -66,6 +71,7 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
 #else /* CONFIG_HAVE_KVM_DIRTY_RING */
 
 int kvm_cpu_dirty_log_size(void);
+bool kvm_use_dirty_bitmap(struct kvm *kvm);
 u32 kvm_dirty_ring_get_rsvd_entries(void);
 int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size);
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 648d663f32c4..db83f63f4e61 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -779,6 +779,7 @@ struct kvm {
 	pid_t userspace_pid;
 	unsigned int max_halt_poll_ns;
 	u32 dirty_ring_size;
+	bool dirty_ring_with_bitmap;
 	bool vm_bugged;
 	bool vm_dead;
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 0d5d4419139a..c87b5882d7ae 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1178,6 +1178,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_S390_ZPCI_OP 221
 #define KVM_CAP_S390_CPU_TOPOLOGY 222
 #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
+#define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 224
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 800f9470e36b..228be1145cf3 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -33,6 +33,14 @@ config HAVE_KVM_DIRTY_RING_ACQ_REL
        bool
        select HAVE_KVM_DIRTY_RING
 
+# Only architectures that need to dirty memory outside of a vCPU
+# context should select this, advertising to userspace the
+# requirement to use a dirty bitmap in addition to the vCPU dirty
+# ring.
+config HAVE_KVM_DIRTY_RING_WITH_BITMAP
+	bool
+	depends on HAVE_KVM_DIRTY_RING
+
 config HAVE_KVM_EVENTFD
        bool
        select EVENTFD
diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index 6091e1403bc8..7ce6a5f81c98 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -21,6 +21,11 @@ u32 kvm_dirty_ring_get_rsvd_entries(void)
 	return KVM_DIRTY_RING_RSVD_ENTRIES + kvm_cpu_dirty_log_size();
 }
 
+bool kvm_use_dirty_bitmap(struct kvm *kvm)
+{
+	return !kvm->dirty_ring_size || kvm->dirty_ring_with_bitmap;
+}
+
 static u32 kvm_dirty_ring_used(struct kvm_dirty_ring *ring)
 {
 	return READ_ONCE(ring->dirty_index) - READ_ONCE(ring->reset_index);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 91cf51a25394..0351c8fb41b9 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1617,7 +1617,7 @@ static int kvm_prepare_memory_region(struct kvm *kvm,
 			new->dirty_bitmap = NULL;
 		else if (old && old->dirty_bitmap)
 			new->dirty_bitmap = old->dirty_bitmap;
-		else if (!kvm->dirty_ring_size) {
+		else if (kvm_use_dirty_bitmap(kvm)) {
 			r = kvm_alloc_dirty_bitmap(new);
 			if (r)
 				return r;
@@ -2060,8 +2060,8 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
 	unsigned long n;
 	unsigned long any = 0;
 
-	/* Dirty ring tracking is exclusive to dirty log tracking */
-	if (kvm->dirty_ring_size)
+	/* Dirty ring tracking may be exclusive to dirty log tracking */
+	if (!kvm_use_dirty_bitmap(kvm))
 		return -ENXIO;
 
 	*memslot = NULL;
@@ -2125,8 +2125,8 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
 	unsigned long *dirty_bitmap_buffer;
 	bool flush;
 
-	/* Dirty ring tracking is exclusive to dirty log tracking */
-	if (kvm->dirty_ring_size)
+	/* Dirty ring tracking may be exclusive to dirty log tracking */
+	if (!kvm_use_dirty_bitmap(kvm))
 		return -ENXIO;
 
 	as_id = log->slot >> 16;
@@ -2237,8 +2237,8 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
 	unsigned long *dirty_bitmap_buffer;
 	bool flush;
 
-	/* Dirty ring tracking is exclusive to dirty log tracking */
-	if (kvm->dirty_ring_size)
+	/* Dirty ring tracking may be exclusive to dirty log tracking */
+	if (!kvm_use_dirty_bitmap(kvm))
 		return -ENXIO;
 
 	as_id = log->slot >> 16;
@@ -3305,7 +3305,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
 	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
 
 #ifdef CONFIG_HAVE_KVM_DIRTY_RING
-	if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
+	if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
+		return;
+
+	if (WARN_ON_ONCE(!kvm->dirty_ring_with_bitmap && !vcpu))
 		return;
 #endif
 
@@ -3313,7 +3316,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
 		unsigned long rel_gfn = gfn - memslot->base_gfn;
 		u32 slot = (memslot->as_id << 16) | memslot->id;
 
-		if (kvm->dirty_ring_size)
+		if (kvm->dirty_ring_size && vcpu)
 			kvm_dirty_ring_push(vcpu, slot, rel_gfn);
 		else
 			set_bit_le(rel_gfn, memslot->dirty_bitmap);
@@ -4482,6 +4485,9 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 		return KVM_DIRTY_RING_MAX_ENTRIES * sizeof(struct kvm_dirty_gfn);
 #else
 		return 0;
+#endif
+#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
+	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
 #endif
 	case KVM_CAP_BINARY_STATS_FD:
 	case KVM_CAP_SYSTEM_EVENT_DATA:
@@ -4588,6 +4594,13 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
 			return -EINVAL;
 
 		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
+	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
+		if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
+		    !kvm->dirty_ring_size)
+			return -EINVAL;
+
+		kvm->dirty_ring_with_bitmap = true;
+		return 0;
 	default:
 		return kvm_vm_ioctl_enable_cap(kvm, cap);
 	}
-- 
2.23.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v7 4/9] KVM: Support dirty ring in conjunction with bitmap
@ 2022-10-31  0:36   ` Gavin Shan
  0 siblings, 0 replies; 68+ messages in thread
From: Gavin Shan @ 2022-10-31  0:36 UTC (permalink / raw)
  To: kvmarm
  Cc: kvm, kvmarm, andrew.jones, ajones, maz, bgardon, catalin.marinas,
	dmatlack, will, pbonzini, peterx, oliver.upton, seanjc,
	james.morse, shuah, suzuki.poulose, alexandru.elisei, zhenyzha,
	shan.gavin

ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
enabled. It's conflicting with that ring-based dirty page tracking always
requires a running VCPU context.

Introduce a new flavor of dirty ring that requires the use of both VCPU
dirty rings and a dirty bitmap. The expectation is that for non-VCPU
sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
the dirty bitmap. Userspace should scan the dirty bitmap before migrating
the VM to the target.

Use an additional capability to advertise this behavior. The newly added
capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.

Suggested-by: Marc Zyngier <maz@kernel.org>
Suggested-by: Peter Xu <peterx@redhat.com>
Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 Documentation/virt/kvm/api.rst | 31 ++++++++++++++++++++++++-------
 include/linux/kvm_dirty_ring.h |  6 ++++++
 include/linux/kvm_host.h       |  1 +
 include/uapi/linux/kvm.h       |  1 +
 virt/kvm/Kconfig               |  8 ++++++++
 virt/kvm/dirty_ring.c          |  5 +++++
 virt/kvm/kvm_main.c            | 31 ++++++++++++++++++++++---------
 7 files changed, 67 insertions(+), 16 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index eee9f857a986..4d4eeb5c3c5a 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8003,13 +8003,6 @@ flushing is done by the KVM_GET_DIRTY_LOG ioctl).  To achieve that, one
 needs to kick the vcpu out of KVM_RUN using a signal.  The resulting
 vmexit ensures that all dirty GFNs are flushed to the dirty rings.
 
-NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the corresponding
-ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls
-KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG.  After enabling
-KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual
-machine will switch to ring-buffer dirty page tracking and further
-KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail.
-
 NOTE: KVM_CAP_DIRTY_LOG_RING_ACQ_REL is the only capability that
 should be exposed by weakly ordered architecture, in order to indicate
 the additional memory ordering requirements imposed on userspace when
@@ -8018,6 +8011,30 @@ Architecture with TSO-like ordering (such as x86) are allowed to
 expose both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL
 to userspace.
 
+After using the dirty rings, the userspace needs to detect the capability
+of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures
+need to be backed by per-slot bitmaps. With this capability advertised
+and supported, it means the architecture can dirty guest pages without
+vcpu/ring context, so that some of the dirty information will still be
+maintained in the bitmap structure.
+
+Note that the bitmap here is only a backup of the ring structure, and
+normally should only contain a very small amount of dirty pages, which
+needs to be transferred during VM downtime. Collecting the dirty bitmap
+should be the very last thing that the VMM does before transmitting state
+to the target VM. VMM needs to ensure that the dirty state is final and
+avoid missing dirty pages from another ioctl ordered after the bitmap
+collection.
+
+To collect dirty bits in the backup bitmap, the userspace can use the
+same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG shouldn't be needed
+and its behavior is undefined since collecting the dirty bitmap always
+happens in the last phase of VM's migration.
+
+NOTE: One example of using the backup bitmap is saving arm64 vgic/its
+tables through KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on
+KVM device "kvm-arm-vgic-its" during VM's migration.
+
 8.30 KVM_CAP_XEN_HVM
 --------------------
 
diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
index 04290eda0852..b08b9afd8bdb 100644
--- a/include/linux/kvm_dirty_ring.h
+++ b/include/linux/kvm_dirty_ring.h
@@ -37,6 +37,11 @@ static inline u32 kvm_dirty_ring_get_rsvd_entries(void)
 	return 0;
 }
 
+static inline bool kvm_use_dirty_bitmap(struct kvm *kvm)
+{
+	return true;
+}
+
 static inline int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring,
 				       int index, u32 size)
 {
@@ -66,6 +71,7 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
 #else /* CONFIG_HAVE_KVM_DIRTY_RING */
 
 int kvm_cpu_dirty_log_size(void);
+bool kvm_use_dirty_bitmap(struct kvm *kvm);
 u32 kvm_dirty_ring_get_rsvd_entries(void);
 int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size);
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 648d663f32c4..db83f63f4e61 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -779,6 +779,7 @@ struct kvm {
 	pid_t userspace_pid;
 	unsigned int max_halt_poll_ns;
 	u32 dirty_ring_size;
+	bool dirty_ring_with_bitmap;
 	bool vm_bugged;
 	bool vm_dead;
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 0d5d4419139a..c87b5882d7ae 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1178,6 +1178,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_S390_ZPCI_OP 221
 #define KVM_CAP_S390_CPU_TOPOLOGY 222
 #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
+#define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 224
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 800f9470e36b..228be1145cf3 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -33,6 +33,14 @@ config HAVE_KVM_DIRTY_RING_ACQ_REL
        bool
        select HAVE_KVM_DIRTY_RING
 
+# Only architectures that need to dirty memory outside of a vCPU
+# context should select this, advertising to userspace the
+# requirement to use a dirty bitmap in addition to the vCPU dirty
+# ring.
+config HAVE_KVM_DIRTY_RING_WITH_BITMAP
+	bool
+	depends on HAVE_KVM_DIRTY_RING
+
 config HAVE_KVM_EVENTFD
        bool
        select EVENTFD
diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index 6091e1403bc8..7ce6a5f81c98 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -21,6 +21,11 @@ u32 kvm_dirty_ring_get_rsvd_entries(void)
 	return KVM_DIRTY_RING_RSVD_ENTRIES + kvm_cpu_dirty_log_size();
 }
 
+bool kvm_use_dirty_bitmap(struct kvm *kvm)
+{
+	return !kvm->dirty_ring_size || kvm->dirty_ring_with_bitmap;
+}
+
 static u32 kvm_dirty_ring_used(struct kvm_dirty_ring *ring)
 {
 	return READ_ONCE(ring->dirty_index) - READ_ONCE(ring->reset_index);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 91cf51a25394..0351c8fb41b9 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1617,7 +1617,7 @@ static int kvm_prepare_memory_region(struct kvm *kvm,
 			new->dirty_bitmap = NULL;
 		else if (old && old->dirty_bitmap)
 			new->dirty_bitmap = old->dirty_bitmap;
-		else if (!kvm->dirty_ring_size) {
+		else if (kvm_use_dirty_bitmap(kvm)) {
 			r = kvm_alloc_dirty_bitmap(new);
 			if (r)
 				return r;
@@ -2060,8 +2060,8 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
 	unsigned long n;
 	unsigned long any = 0;
 
-	/* Dirty ring tracking is exclusive to dirty log tracking */
-	if (kvm->dirty_ring_size)
+	/* Dirty ring tracking may be exclusive to dirty log tracking */
+	if (!kvm_use_dirty_bitmap(kvm))
 		return -ENXIO;
 
 	*memslot = NULL;
@@ -2125,8 +2125,8 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
 	unsigned long *dirty_bitmap_buffer;
 	bool flush;
 
-	/* Dirty ring tracking is exclusive to dirty log tracking */
-	if (kvm->dirty_ring_size)
+	/* Dirty ring tracking may be exclusive to dirty log tracking */
+	if (!kvm_use_dirty_bitmap(kvm))
 		return -ENXIO;
 
 	as_id = log->slot >> 16;
@@ -2237,8 +2237,8 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
 	unsigned long *dirty_bitmap_buffer;
 	bool flush;
 
-	/* Dirty ring tracking is exclusive to dirty log tracking */
-	if (kvm->dirty_ring_size)
+	/* Dirty ring tracking may be exclusive to dirty log tracking */
+	if (!kvm_use_dirty_bitmap(kvm))
 		return -ENXIO;
 
 	as_id = log->slot >> 16;
@@ -3305,7 +3305,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
 	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
 
 #ifdef CONFIG_HAVE_KVM_DIRTY_RING
-	if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
+	if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
+		return;
+
+	if (WARN_ON_ONCE(!kvm->dirty_ring_with_bitmap && !vcpu))
 		return;
 #endif
 
@@ -3313,7 +3316,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
 		unsigned long rel_gfn = gfn - memslot->base_gfn;
 		u32 slot = (memslot->as_id << 16) | memslot->id;
 
-		if (kvm->dirty_ring_size)
+		if (kvm->dirty_ring_size && vcpu)
 			kvm_dirty_ring_push(vcpu, slot, rel_gfn);
 		else
 			set_bit_le(rel_gfn, memslot->dirty_bitmap);
@@ -4482,6 +4485,9 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 		return KVM_DIRTY_RING_MAX_ENTRIES * sizeof(struct kvm_dirty_gfn);
 #else
 		return 0;
+#endif
+#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
+	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
 #endif
 	case KVM_CAP_BINARY_STATS_FD:
 	case KVM_CAP_SYSTEM_EVENT_DATA:
@@ -4588,6 +4594,13 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
 			return -EINVAL;
 
 		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
+	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
+		if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
+		    !kvm->dirty_ring_size)
+			return -EINVAL;
+
+		kvm->dirty_ring_with_bitmap = true;
+		return 0;
 	default:
 		return kvm_vm_ioctl_enable_cap(kvm, cap);
 	}
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v7 5/9] KVM: arm64: Improve no-running-vcpu report for dirty ring
  2022-10-31  0:36 ` Gavin Shan
@ 2022-10-31  0:36   ` Gavin Shan
  -1 siblings, 0 replies; 68+ messages in thread
From: Gavin Shan @ 2022-10-31  0:36 UTC (permalink / raw)
  To: kvmarm
  Cc: shuah, catalin.marinas, kvm, maz, andrew.jones, shan.gavin,
	bgardon, dmatlack, pbonzini, zhenyzha, will, kvmarm, ajones

KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP should be enabled only when KVM
device "kvm-arm-vgic-its" is used by userspace. Currently, it's the
only case where a running VCPU is missed for dirty ring. However,
there are potentially other devices introducing similar error in
future.

In order to report those broken devices only, the no-running-vcpu
warning message is escaped from KVM device "kvm-arm-vgic-its". For
this, the function vgic_has_its() needs to be exposed with a more
generic function name (kvm_vgic_has_its()).

Link: https://lore.kernel.org/kvmarm/Y1ghIKrAsRFwSFsO@google.com
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 arch/arm64/kvm/mmu.c               | 14 ++++++++++++++
 arch/arm64/kvm/vgic/vgic-init.c    |  4 ++--
 arch/arm64/kvm/vgic/vgic-irqfd.c   |  4 ++--
 arch/arm64/kvm/vgic/vgic-its.c     |  2 +-
 arch/arm64/kvm/vgic/vgic-mmio-v3.c | 18 ++++--------------
 arch/arm64/kvm/vgic/vgic.c         | 10 ++++++++++
 arch/arm64/kvm/vgic/vgic.h         |  1 -
 include/kvm/arm_vgic.h             |  1 +
 include/linux/kvm_dirty_ring.h     |  1 +
 virt/kvm/dirty_ring.c              |  5 +++++
 virt/kvm/kvm_main.c                |  2 +-
 11 files changed, 41 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 60ee3d9f01f8..e0855b2b3d66 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -932,6 +932,20 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
 }
 
+/*
+ * kvm_arch_allow_write_without_running_vcpu - allow writing guest memory
+ * without the running VCPU when dirty ring is enabled.
+ *
+ * The running VCPU is required to track dirty guest pages when dirty ring
+ * is enabled. Otherwise, the backup bitmap should be used to track the
+ * dirty guest pages. When vgic/its is enabled, we need to use the backup
+ * bitmap to track the dirty guest pages for it.
+ */
+bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
+{
+	return kvm->dirty_ring_with_bitmap && kvm_vgic_has_its(kvm);
+}
+
 static void kvm_send_hwpoison_signal(unsigned long address, short lsb)
 {
 	send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb, current);
diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index f6d4f4052555..4c7f443c6d3d 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -296,7 +296,7 @@ int vgic_init(struct kvm *kvm)
 		}
 	}
 
-	if (vgic_has_its(kvm))
+	if (kvm_vgic_has_its(kvm))
 		vgic_lpi_translation_cache_init(kvm);
 
 	/*
@@ -352,7 +352,7 @@ static void kvm_vgic_dist_destroy(struct kvm *kvm)
 		dist->vgic_cpu_base = VGIC_ADDR_UNDEF;
 	}
 
-	if (vgic_has_its(kvm))
+	if (kvm_vgic_has_its(kvm))
 		vgic_lpi_translation_cache_destroy(kvm);
 
 	if (vgic_supports_direct_msis(kvm))
diff --git a/arch/arm64/kvm/vgic/vgic-irqfd.c b/arch/arm64/kvm/vgic/vgic-irqfd.c
index 475059bacedf..e33cc34bf8f5 100644
--- a/arch/arm64/kvm/vgic/vgic-irqfd.c
+++ b/arch/arm64/kvm/vgic/vgic-irqfd.c
@@ -88,7 +88,7 @@ int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
 {
 	struct kvm_msi msi;
 
-	if (!vgic_has_its(kvm))
+	if (!kvm_vgic_has_its(kvm))
 		return -ENODEV;
 
 	if (!level)
@@ -112,7 +112,7 @@ int kvm_arch_set_irq_inatomic(struct kvm_kernel_irq_routing_entry *e,
 	case KVM_IRQ_ROUTING_MSI: {
 		struct kvm_msi msi;
 
-		if (!vgic_has_its(kvm))
+		if (!kvm_vgic_has_its(kvm))
 			break;
 
 		kvm_populate_msi(e, &msi);
diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
index 733b53055f97..40622da7348a 100644
--- a/arch/arm64/kvm/vgic/vgic-its.c
+++ b/arch/arm64/kvm/vgic/vgic-its.c
@@ -698,7 +698,7 @@ struct vgic_its *vgic_msi_to_its(struct kvm *kvm, struct kvm_msi *msi)
 	struct kvm_io_device *kvm_io_dev;
 	struct vgic_io_device *iodev;
 
-	if (!vgic_has_its(kvm))
+	if (!kvm_vgic_has_its(kvm))
 		return ERR_PTR(-ENODEV);
 
 	if (!(msi->flags & KVM_MSI_VALID_DEVID))
diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
index 91201f743033..10218057c176 100644
--- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
@@ -38,20 +38,10 @@ u64 update_64bit_reg(u64 reg, unsigned int offset, unsigned int len,
 	return reg | ((u64)val << lower);
 }
 
-bool vgic_has_its(struct kvm *kvm)
-{
-	struct vgic_dist *dist = &kvm->arch.vgic;
-
-	if (dist->vgic_model != KVM_DEV_TYPE_ARM_VGIC_V3)
-		return false;
-
-	return dist->has_its;
-}
-
 bool vgic_supports_direct_msis(struct kvm *kvm)
 {
 	return (kvm_vgic_global_state.has_gicv4_1 ||
-		(kvm_vgic_global_state.has_gicv4 && vgic_has_its(kvm)));
+		(kvm_vgic_global_state.has_gicv4 && kvm_vgic_has_its(kvm)));
 }
 
 /*
@@ -78,7 +68,7 @@ static unsigned long vgic_mmio_read_v3_misc(struct kvm_vcpu *vcpu,
 	case GICD_TYPER:
 		value = vgic->nr_spis + VGIC_NR_PRIVATE_IRQS;
 		value = (value >> 5) - 1;
-		if (vgic_has_its(vcpu->kvm)) {
+		if (kvm_vgic_has_its(vcpu->kvm)) {
 			value |= (INTERRUPT_ID_BITS_ITS - 1) << 19;
 			value |= GICD_TYPER_LPIS;
 		} else {
@@ -262,7 +252,7 @@ static void vgic_mmio_write_v3r_ctlr(struct kvm_vcpu *vcpu,
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
 	u32 ctlr;
 
-	if (!vgic_has_its(vcpu->kvm))
+	if (!kvm_vgic_has_its(vcpu->kvm))
 		return;
 
 	if (!(val & GICR_CTLR_ENABLE_LPIS)) {
@@ -326,7 +316,7 @@ static unsigned long vgic_mmio_read_v3r_typer(struct kvm_vcpu *vcpu,
 	value = (u64)(mpidr & GENMASK(23, 0)) << 32;
 	value |= ((target_vcpu_id & 0xffff) << 8);
 
-	if (vgic_has_its(vcpu->kvm))
+	if (kvm_vgic_has_its(vcpu->kvm))
 		value |= GICR_TYPER_PLPIS;
 
 	if (vgic_mmio_vcpu_rdist_is_last(vcpu))
diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index d97e6080b421..9ef7488ed0c7 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -21,6 +21,16 @@ struct vgic_global kvm_vgic_global_state __ro_after_init = {
 	.gicv3_cpuif = STATIC_KEY_FALSE_INIT,
 };
 
+bool kvm_vgic_has_its(struct kvm *kvm)
+{
+	struct vgic_dist *dist = &kvm->arch.vgic;
+
+	if (dist->vgic_model != KVM_DEV_TYPE_ARM_VGIC_V3)
+		return false;
+
+	return dist->has_its;
+}
+
 /*
  * Locking order is always:
  * kvm->lock (mutex)
diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h
index 0c8da72953f0..f91114ee1cd5 100644
--- a/arch/arm64/kvm/vgic/vgic.h
+++ b/arch/arm64/kvm/vgic/vgic.h
@@ -235,7 +235,6 @@ void vgic_v3_load(struct kvm_vcpu *vcpu);
 void vgic_v3_put(struct kvm_vcpu *vcpu);
 void vgic_v3_vmcr_sync(struct kvm_vcpu *vcpu);
 
-bool vgic_has_its(struct kvm *kvm);
 int kvm_vgic_register_its_device(void);
 void vgic_enable_lpis(struct kvm_vcpu *vcpu);
 void vgic_flush_pending_lpis(struct kvm_vcpu *vcpu);
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 4df9e73a8bb5..72e9bc6c66a4 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -374,6 +374,7 @@ int kvm_vgic_map_resources(struct kvm *kvm);
 int kvm_vgic_hyp_init(void);
 void kvm_vgic_init_cpu_hardware(void);
 
+bool kvm_vgic_has_its(struct kvm *kvm);
 int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
 			bool level, void *owner);
 int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, unsigned int host_irq,
diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
index b08b9afd8bdb..bb0a72401b5a 100644
--- a/include/linux/kvm_dirty_ring.h
+++ b/include/linux/kvm_dirty_ring.h
@@ -72,6 +72,7 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
 
 int kvm_cpu_dirty_log_size(void);
 bool kvm_use_dirty_bitmap(struct kvm *kvm);
+bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm);
 u32 kvm_dirty_ring_get_rsvd_entries(void);
 int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size);
 
diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index 7ce6a5f81c98..f27e038043f3 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -26,6 +26,11 @@ bool kvm_use_dirty_bitmap(struct kvm *kvm)
 	return !kvm->dirty_ring_size || kvm->dirty_ring_with_bitmap;
 }
 
+bool __weak kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
+{
+	return kvm->dirty_ring_with_bitmap;
+}
+
 static u32 kvm_dirty_ring_used(struct kvm_dirty_ring *ring)
 {
 	return READ_ONCE(ring->dirty_index) - READ_ONCE(ring->reset_index);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 0351c8fb41b9..e1be4f89df3b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3308,7 +3308,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
 	if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
 		return;
 
-	if (WARN_ON_ONCE(!kvm->dirty_ring_with_bitmap && !vcpu))
+	if (WARN_ON_ONCE(!kvm_arch_allow_write_without_running_vcpu(kvm) && !vcpu))
 		return;
 #endif
 
-- 
2.23.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v7 5/9] KVM: arm64: Improve no-running-vcpu report for dirty ring
@ 2022-10-31  0:36   ` Gavin Shan
  0 siblings, 0 replies; 68+ messages in thread
From: Gavin Shan @ 2022-10-31  0:36 UTC (permalink / raw)
  To: kvmarm
  Cc: kvm, kvmarm, andrew.jones, ajones, maz, bgardon, catalin.marinas,
	dmatlack, will, pbonzini, peterx, oliver.upton, seanjc,
	james.morse, shuah, suzuki.poulose, alexandru.elisei, zhenyzha,
	shan.gavin

KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP should be enabled only when KVM
device "kvm-arm-vgic-its" is used by userspace. Currently, it's the
only case where a running VCPU is missed for dirty ring. However,
there are potentially other devices introducing similar error in
future.

In order to report those broken devices only, the no-running-vcpu
warning message is escaped from KVM device "kvm-arm-vgic-its". For
this, the function vgic_has_its() needs to be exposed with a more
generic function name (kvm_vgic_has_its()).

Link: https://lore.kernel.org/kvmarm/Y1ghIKrAsRFwSFsO@google.com
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 arch/arm64/kvm/mmu.c               | 14 ++++++++++++++
 arch/arm64/kvm/vgic/vgic-init.c    |  4 ++--
 arch/arm64/kvm/vgic/vgic-irqfd.c   |  4 ++--
 arch/arm64/kvm/vgic/vgic-its.c     |  2 +-
 arch/arm64/kvm/vgic/vgic-mmio-v3.c | 18 ++++--------------
 arch/arm64/kvm/vgic/vgic.c         | 10 ++++++++++
 arch/arm64/kvm/vgic/vgic.h         |  1 -
 include/kvm/arm_vgic.h             |  1 +
 include/linux/kvm_dirty_ring.h     |  1 +
 virt/kvm/dirty_ring.c              |  5 +++++
 virt/kvm/kvm_main.c                |  2 +-
 11 files changed, 41 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 60ee3d9f01f8..e0855b2b3d66 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -932,6 +932,20 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
 }
 
+/*
+ * kvm_arch_allow_write_without_running_vcpu - allow writing guest memory
+ * without the running VCPU when dirty ring is enabled.
+ *
+ * The running VCPU is required to track dirty guest pages when dirty ring
+ * is enabled. Otherwise, the backup bitmap should be used to track the
+ * dirty guest pages. When vgic/its is enabled, we need to use the backup
+ * bitmap to track the dirty guest pages for it.
+ */
+bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
+{
+	return kvm->dirty_ring_with_bitmap && kvm_vgic_has_its(kvm);
+}
+
 static void kvm_send_hwpoison_signal(unsigned long address, short lsb)
 {
 	send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb, current);
diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index f6d4f4052555..4c7f443c6d3d 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -296,7 +296,7 @@ int vgic_init(struct kvm *kvm)
 		}
 	}
 
-	if (vgic_has_its(kvm))
+	if (kvm_vgic_has_its(kvm))
 		vgic_lpi_translation_cache_init(kvm);
 
 	/*
@@ -352,7 +352,7 @@ static void kvm_vgic_dist_destroy(struct kvm *kvm)
 		dist->vgic_cpu_base = VGIC_ADDR_UNDEF;
 	}
 
-	if (vgic_has_its(kvm))
+	if (kvm_vgic_has_its(kvm))
 		vgic_lpi_translation_cache_destroy(kvm);
 
 	if (vgic_supports_direct_msis(kvm))
diff --git a/arch/arm64/kvm/vgic/vgic-irqfd.c b/arch/arm64/kvm/vgic/vgic-irqfd.c
index 475059bacedf..e33cc34bf8f5 100644
--- a/arch/arm64/kvm/vgic/vgic-irqfd.c
+++ b/arch/arm64/kvm/vgic/vgic-irqfd.c
@@ -88,7 +88,7 @@ int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
 {
 	struct kvm_msi msi;
 
-	if (!vgic_has_its(kvm))
+	if (!kvm_vgic_has_its(kvm))
 		return -ENODEV;
 
 	if (!level)
@@ -112,7 +112,7 @@ int kvm_arch_set_irq_inatomic(struct kvm_kernel_irq_routing_entry *e,
 	case KVM_IRQ_ROUTING_MSI: {
 		struct kvm_msi msi;
 
-		if (!vgic_has_its(kvm))
+		if (!kvm_vgic_has_its(kvm))
 			break;
 
 		kvm_populate_msi(e, &msi);
diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
index 733b53055f97..40622da7348a 100644
--- a/arch/arm64/kvm/vgic/vgic-its.c
+++ b/arch/arm64/kvm/vgic/vgic-its.c
@@ -698,7 +698,7 @@ struct vgic_its *vgic_msi_to_its(struct kvm *kvm, struct kvm_msi *msi)
 	struct kvm_io_device *kvm_io_dev;
 	struct vgic_io_device *iodev;
 
-	if (!vgic_has_its(kvm))
+	if (!kvm_vgic_has_its(kvm))
 		return ERR_PTR(-ENODEV);
 
 	if (!(msi->flags & KVM_MSI_VALID_DEVID))
diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
index 91201f743033..10218057c176 100644
--- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
@@ -38,20 +38,10 @@ u64 update_64bit_reg(u64 reg, unsigned int offset, unsigned int len,
 	return reg | ((u64)val << lower);
 }
 
-bool vgic_has_its(struct kvm *kvm)
-{
-	struct vgic_dist *dist = &kvm->arch.vgic;
-
-	if (dist->vgic_model != KVM_DEV_TYPE_ARM_VGIC_V3)
-		return false;
-
-	return dist->has_its;
-}
-
 bool vgic_supports_direct_msis(struct kvm *kvm)
 {
 	return (kvm_vgic_global_state.has_gicv4_1 ||
-		(kvm_vgic_global_state.has_gicv4 && vgic_has_its(kvm)));
+		(kvm_vgic_global_state.has_gicv4 && kvm_vgic_has_its(kvm)));
 }
 
 /*
@@ -78,7 +68,7 @@ static unsigned long vgic_mmio_read_v3_misc(struct kvm_vcpu *vcpu,
 	case GICD_TYPER:
 		value = vgic->nr_spis + VGIC_NR_PRIVATE_IRQS;
 		value = (value >> 5) - 1;
-		if (vgic_has_its(vcpu->kvm)) {
+		if (kvm_vgic_has_its(vcpu->kvm)) {
 			value |= (INTERRUPT_ID_BITS_ITS - 1) << 19;
 			value |= GICD_TYPER_LPIS;
 		} else {
@@ -262,7 +252,7 @@ static void vgic_mmio_write_v3r_ctlr(struct kvm_vcpu *vcpu,
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
 	u32 ctlr;
 
-	if (!vgic_has_its(vcpu->kvm))
+	if (!kvm_vgic_has_its(vcpu->kvm))
 		return;
 
 	if (!(val & GICR_CTLR_ENABLE_LPIS)) {
@@ -326,7 +316,7 @@ static unsigned long vgic_mmio_read_v3r_typer(struct kvm_vcpu *vcpu,
 	value = (u64)(mpidr & GENMASK(23, 0)) << 32;
 	value |= ((target_vcpu_id & 0xffff) << 8);
 
-	if (vgic_has_its(vcpu->kvm))
+	if (kvm_vgic_has_its(vcpu->kvm))
 		value |= GICR_TYPER_PLPIS;
 
 	if (vgic_mmio_vcpu_rdist_is_last(vcpu))
diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index d97e6080b421..9ef7488ed0c7 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -21,6 +21,16 @@ struct vgic_global kvm_vgic_global_state __ro_after_init = {
 	.gicv3_cpuif = STATIC_KEY_FALSE_INIT,
 };
 
+bool kvm_vgic_has_its(struct kvm *kvm)
+{
+	struct vgic_dist *dist = &kvm->arch.vgic;
+
+	if (dist->vgic_model != KVM_DEV_TYPE_ARM_VGIC_V3)
+		return false;
+
+	return dist->has_its;
+}
+
 /*
  * Locking order is always:
  * kvm->lock (mutex)
diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h
index 0c8da72953f0..f91114ee1cd5 100644
--- a/arch/arm64/kvm/vgic/vgic.h
+++ b/arch/arm64/kvm/vgic/vgic.h
@@ -235,7 +235,6 @@ void vgic_v3_load(struct kvm_vcpu *vcpu);
 void vgic_v3_put(struct kvm_vcpu *vcpu);
 void vgic_v3_vmcr_sync(struct kvm_vcpu *vcpu);
 
-bool vgic_has_its(struct kvm *kvm);
 int kvm_vgic_register_its_device(void);
 void vgic_enable_lpis(struct kvm_vcpu *vcpu);
 void vgic_flush_pending_lpis(struct kvm_vcpu *vcpu);
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 4df9e73a8bb5..72e9bc6c66a4 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -374,6 +374,7 @@ int kvm_vgic_map_resources(struct kvm *kvm);
 int kvm_vgic_hyp_init(void);
 void kvm_vgic_init_cpu_hardware(void);
 
+bool kvm_vgic_has_its(struct kvm *kvm);
 int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
 			bool level, void *owner);
 int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, unsigned int host_irq,
diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
index b08b9afd8bdb..bb0a72401b5a 100644
--- a/include/linux/kvm_dirty_ring.h
+++ b/include/linux/kvm_dirty_ring.h
@@ -72,6 +72,7 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
 
 int kvm_cpu_dirty_log_size(void);
 bool kvm_use_dirty_bitmap(struct kvm *kvm);
+bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm);
 u32 kvm_dirty_ring_get_rsvd_entries(void);
 int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size);
 
diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index 7ce6a5f81c98..f27e038043f3 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -26,6 +26,11 @@ bool kvm_use_dirty_bitmap(struct kvm *kvm)
 	return !kvm->dirty_ring_size || kvm->dirty_ring_with_bitmap;
 }
 
+bool __weak kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
+{
+	return kvm->dirty_ring_with_bitmap;
+}
+
 static u32 kvm_dirty_ring_used(struct kvm_dirty_ring *ring)
 {
 	return READ_ONCE(ring->dirty_index) - READ_ONCE(ring->reset_index);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 0351c8fb41b9..e1be4f89df3b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3308,7 +3308,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
 	if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
 		return;
 
-	if (WARN_ON_ONCE(!kvm->dirty_ring_with_bitmap && !vcpu))
+	if (WARN_ON_ONCE(!kvm_arch_allow_write_without_running_vcpu(kvm) && !vcpu))
 		return;
 #endif
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v7 6/9] KVM: arm64: Enable ring-based dirty memory tracking
  2022-10-31  0:36 ` Gavin Shan
@ 2022-10-31  0:36   ` Gavin Shan
  -1 siblings, 0 replies; 68+ messages in thread
From: Gavin Shan @ 2022-10-31  0:36 UTC (permalink / raw)
  To: kvmarm
  Cc: shuah, catalin.marinas, kvm, maz, andrew.jones, shan.gavin,
	bgardon, dmatlack, pbonzini, zhenyzha, will, kvmarm, ajones

Enable ring-based dirty memory tracking on arm64 by selecting
CONFIG_HAVE_KVM_DIRTY_{RING_ACQ_REL, RING_WITH_BITMAP} and providing
the ring buffer's physical page offset (KVM_DIRTY_LOG_PAGE_OFFSET).

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 Documentation/virt/kvm/api.rst    | 2 +-
 arch/arm64/include/uapi/asm/kvm.h | 1 +
 arch/arm64/kvm/Kconfig            | 2 ++
 arch/arm64/kvm/arm.c              | 3 +++
 4 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 4d4eeb5c3c5a..06d72bca12c9 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7921,7 +7921,7 @@ regardless of what has actually been exposed through the CPUID leaf.
 8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
 ----------------------------------------------------------
 
-:Architectures: x86
+:Architectures: x86, arm64
 :Parameters: args[0] - size of the dirty log ring
 
 KVM is capable of tracking dirty memory using ring buffers that are
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 316917b98707..a7a857f1784d 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -43,6 +43,7 @@
 #define __KVM_HAVE_VCPU_EVENTS
 
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
+#define KVM_DIRTY_LOG_PAGE_OFFSET 64
 
 #define KVM_REG_SIZE(id)						\
 	(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 815cc118c675..066b053e9eb9 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -32,6 +32,8 @@ menuconfig KVM
 	select KVM_VFIO
 	select HAVE_KVM_EVENTFD
 	select HAVE_KVM_IRQFD
+	select HAVE_KVM_DIRTY_RING_ACQ_REL
+	select HAVE_KVM_DIRTY_RING_WITH_BITMAP
 	select HAVE_KVM_MSI
 	select HAVE_KVM_IRQCHIP
 	select HAVE_KVM_IRQ_ROUTING
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 94d33e296e10..6b097605e38c 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -746,6 +746,9 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
 
 		if (kvm_check_request(KVM_REQ_SUSPEND, vcpu))
 			return kvm_vcpu_suspend(vcpu);
+
+		if (kvm_dirty_ring_check_request(vcpu))
+			return 0;
 	}
 
 	return 1;
-- 
2.23.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v7 6/9] KVM: arm64: Enable ring-based dirty memory tracking
@ 2022-10-31  0:36   ` Gavin Shan
  0 siblings, 0 replies; 68+ messages in thread
From: Gavin Shan @ 2022-10-31  0:36 UTC (permalink / raw)
  To: kvmarm
  Cc: kvm, kvmarm, andrew.jones, ajones, maz, bgardon, catalin.marinas,
	dmatlack, will, pbonzini, peterx, oliver.upton, seanjc,
	james.morse, shuah, suzuki.poulose, alexandru.elisei, zhenyzha,
	shan.gavin

Enable ring-based dirty memory tracking on arm64 by selecting
CONFIG_HAVE_KVM_DIRTY_{RING_ACQ_REL, RING_WITH_BITMAP} and providing
the ring buffer's physical page offset (KVM_DIRTY_LOG_PAGE_OFFSET).

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 Documentation/virt/kvm/api.rst    | 2 +-
 arch/arm64/include/uapi/asm/kvm.h | 1 +
 arch/arm64/kvm/Kconfig            | 2 ++
 arch/arm64/kvm/arm.c              | 3 +++
 4 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 4d4eeb5c3c5a..06d72bca12c9 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7921,7 +7921,7 @@ regardless of what has actually been exposed through the CPUID leaf.
 8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
 ----------------------------------------------------------
 
-:Architectures: x86
+:Architectures: x86, arm64
 :Parameters: args[0] - size of the dirty log ring
 
 KVM is capable of tracking dirty memory using ring buffers that are
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 316917b98707..a7a857f1784d 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -43,6 +43,7 @@
 #define __KVM_HAVE_VCPU_EVENTS
 
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
+#define KVM_DIRTY_LOG_PAGE_OFFSET 64
 
 #define KVM_REG_SIZE(id)						\
 	(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 815cc118c675..066b053e9eb9 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -32,6 +32,8 @@ menuconfig KVM
 	select KVM_VFIO
 	select HAVE_KVM_EVENTFD
 	select HAVE_KVM_IRQFD
+	select HAVE_KVM_DIRTY_RING_ACQ_REL
+	select HAVE_KVM_DIRTY_RING_WITH_BITMAP
 	select HAVE_KVM_MSI
 	select HAVE_KVM_IRQCHIP
 	select HAVE_KVM_IRQ_ROUTING
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 94d33e296e10..6b097605e38c 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -746,6 +746,9 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
 
 		if (kvm_check_request(KVM_REQ_SUSPEND, vcpu))
 			return kvm_vcpu_suspend(vcpu);
+
+		if (kvm_dirty_ring_check_request(vcpu))
+			return 0;
 	}
 
 	return 1;
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v7 7/9] KVM: selftests: Use host page size to map ring buffer in dirty_log_test
  2022-10-31  0:36 ` Gavin Shan
@ 2022-10-31  0:36   ` Gavin Shan
  -1 siblings, 0 replies; 68+ messages in thread
From: Gavin Shan @ 2022-10-31  0:36 UTC (permalink / raw)
  To: kvmarm
  Cc: shuah, catalin.marinas, kvm, maz, andrew.jones, shan.gavin,
	bgardon, dmatlack, pbonzini, zhenyzha, will, kvmarm, ajones

In vcpu_map_dirty_ring(), the guest's page size is used to figure out
the offset in the virtual area. It works fine when we have same page
sizes on host and guest. However, it fails when the page sizes on host
and guest are different on arm64, like below error messages indicates.

  # ./dirty_log_test -M dirty-ring -m 7
  Setting log mode to: 'dirty-ring'
  Test iterations: 32, interval: 10 (ms)
  Testing guest mode: PA-bits:40,  VA-bits:48, 64K pages
  guest physical test memory offset: 0xffbffc0000
  vcpu stops because vcpu is kicked out...
  Notifying vcpu to continue
  vcpu continues now.
  ==== Test Assertion Failure ====
  lib/kvm_util.c:1477: addr == MAP_FAILED
  pid=9000 tid=9000 errno=0 - Success
  1  0x0000000000405f5b: vcpu_map_dirty_ring at kvm_util.c:1477
  2  0x0000000000402ebb: dirty_ring_collect_dirty_pages at dirty_log_test.c:349
  3  0x00000000004029b3: log_mode_collect_dirty_pages at dirty_log_test.c:478
  4  (inlined by) run_test at dirty_log_test.c:778
  5  (inlined by) run_test at dirty_log_test.c:691
  6  0x0000000000403a57: for_each_guest_mode at guest_modes.c:105
  7  0x0000000000401ccf: main at dirty_log_test.c:921
  8  0x0000ffffb06ec79b: ?? ??:0
  9  0x0000ffffb06ec86b: ?? ??:0
  10 0x0000000000401def: _start at ??:?
  Dirty ring mapped private

Fix the issue by using host's page size to map the ring buffer.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 tools/testing/selftests/kvm/lib/kvm_util.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index f1cb1627161f..89a1a420ebd5 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -1506,7 +1506,7 @@ struct kvm_reg_list *vcpu_get_reg_list(struct kvm_vcpu *vcpu)
 
 void *vcpu_map_dirty_ring(struct kvm_vcpu *vcpu)
 {
-	uint32_t page_size = vcpu->vm->page_size;
+	uint32_t page_size = getpagesize();
 	uint32_t size = vcpu->vm->dirty_ring_size;
 
 	TEST_ASSERT(size > 0, "Should enable dirty ring first");
-- 
2.23.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v7 7/9] KVM: selftests: Use host page size to map ring buffer in dirty_log_test
@ 2022-10-31  0:36   ` Gavin Shan
  0 siblings, 0 replies; 68+ messages in thread
From: Gavin Shan @ 2022-10-31  0:36 UTC (permalink / raw)
  To: kvmarm
  Cc: kvm, kvmarm, andrew.jones, ajones, maz, bgardon, catalin.marinas,
	dmatlack, will, pbonzini, peterx, oliver.upton, seanjc,
	james.morse, shuah, suzuki.poulose, alexandru.elisei, zhenyzha,
	shan.gavin

In vcpu_map_dirty_ring(), the guest's page size is used to figure out
the offset in the virtual area. It works fine when we have same page
sizes on host and guest. However, it fails when the page sizes on host
and guest are different on arm64, like below error messages indicates.

  # ./dirty_log_test -M dirty-ring -m 7
  Setting log mode to: 'dirty-ring'
  Test iterations: 32, interval: 10 (ms)
  Testing guest mode: PA-bits:40,  VA-bits:48, 64K pages
  guest physical test memory offset: 0xffbffc0000
  vcpu stops because vcpu is kicked out...
  Notifying vcpu to continue
  vcpu continues now.
  ==== Test Assertion Failure ====
  lib/kvm_util.c:1477: addr == MAP_FAILED
  pid=9000 tid=9000 errno=0 - Success
  1  0x0000000000405f5b: vcpu_map_dirty_ring at kvm_util.c:1477
  2  0x0000000000402ebb: dirty_ring_collect_dirty_pages at dirty_log_test.c:349
  3  0x00000000004029b3: log_mode_collect_dirty_pages at dirty_log_test.c:478
  4  (inlined by) run_test at dirty_log_test.c:778
  5  (inlined by) run_test at dirty_log_test.c:691
  6  0x0000000000403a57: for_each_guest_mode at guest_modes.c:105
  7  0x0000000000401ccf: main at dirty_log_test.c:921
  8  0x0000ffffb06ec79b: ?? ??:0
  9  0x0000ffffb06ec86b: ?? ??:0
  10 0x0000000000401def: _start at ??:?
  Dirty ring mapped private

Fix the issue by using host's page size to map the ring buffer.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 tools/testing/selftests/kvm/lib/kvm_util.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index f1cb1627161f..89a1a420ebd5 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -1506,7 +1506,7 @@ struct kvm_reg_list *vcpu_get_reg_list(struct kvm_vcpu *vcpu)
 
 void *vcpu_map_dirty_ring(struct kvm_vcpu *vcpu)
 {
-	uint32_t page_size = vcpu->vm->page_size;
+	uint32_t page_size = getpagesize();
 	uint32_t size = vcpu->vm->dirty_ring_size;
 
 	TEST_ASSERT(size > 0, "Should enable dirty ring first");
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v7 8/9] KVM: selftests: Clear dirty ring states between two modes in dirty_log_test
  2022-10-31  0:36 ` Gavin Shan
@ 2022-10-31  0:36   ` Gavin Shan
  -1 siblings, 0 replies; 68+ messages in thread
From: Gavin Shan @ 2022-10-31  0:36 UTC (permalink / raw)
  To: kvmarm
  Cc: kvm, kvmarm, andrew.jones, ajones, maz, bgardon, catalin.marinas,
	dmatlack, will, pbonzini, peterx, oliver.upton, seanjc,
	james.morse, shuah, suzuki.poulose, alexandru.elisei, zhenyzha,
	shan.gavin

There are two states, which need to be cleared before next mode
is executed. Otherwise, we will hit failure as the following messages
indicate.

- The variable 'dirty_ring_vcpu_ring_full' shared by main and vcpu
  thread. It's indicating if the vcpu exit due to full ring buffer.
  The value can be carried from previous mode (VM_MODE_P40V48_4K) to
  current one (VM_MODE_P40V48_64K) when VM_MODE_P40V48_16K isn't
  supported.

- The current ring buffer index needs to be reset before next mode
  (VM_MODE_P40V48_64K) is executed. Otherwise, the stale value is
  carried from previous mode (VM_MODE_P40V48_4K).

  # ./dirty_log_test -M dirty-ring
  Setting log mode to: 'dirty-ring'
  Test iterations: 32, interval: 10 (ms)
  Testing guest mode: PA-bits:40,  VA-bits:48,  4K pages
  guest physical test memory offset: 0xffbfffc000
    :
  Dirtied 995328 pages
  Total bits checked: dirty (1012434), clear (7114123), track_next (966700)
  Testing guest mode: PA-bits:40,  VA-bits:48, 64K pages
  guest physical test memory offset: 0xffbffc0000
  vcpu stops because vcpu is kicked out...
  vcpu continues now.
  Notifying vcpu to continue
  Iteration 1 collected 0 pages
  vcpu stops because dirty ring is full...
  vcpu continues now.
  vcpu stops because dirty ring is full...
  vcpu continues now.
  vcpu stops because dirty ring is full...
  ==== Test Assertion Failure ====
  dirty_log_test.c:369: cleared == count
  pid=10541 tid=10541 errno=22 - Invalid argument
     1	0x0000000000403087: dirty_ring_collect_dirty_pages at dirty_log_test.c:369
     2	0x0000000000402a0b: log_mode_collect_dirty_pages at dirty_log_test.c:492
     3	 (inlined by) run_test at dirty_log_test.c:795
     4	 (inlined by) run_test at dirty_log_test.c:705
     5	0x0000000000403a37: for_each_guest_mode at guest_modes.c:100
     6	0x0000000000401ccf: main at dirty_log_test.c:938
     7	0x0000ffff9ecd279b: ?? ??:0
     8	0x0000ffff9ecd286b: ?? ??:0
     9	0x0000000000401def: _start at ??:?
  Reset dirty pages (0) mismatch with collected (35566)

Fix the issues by clearing 'dirty_ring_vcpu_ring_full' and the ring
buffer index before next new mode is to be executed.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 tools/testing/selftests/kvm/dirty_log_test.c | 27 ++++++++++++--------
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index b5234d6efbe1..8758c10ec850 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -226,13 +226,15 @@ static void clear_log_create_vm_done(struct kvm_vm *vm)
 }
 
 static void dirty_log_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
-					  void *bitmap, uint32_t num_pages)
+					  void *bitmap, uint32_t num_pages,
+					  uint32_t *unused)
 {
 	kvm_vm_get_dirty_log(vcpu->vm, slot, bitmap);
 }
 
 static void clear_log_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
-					  void *bitmap, uint32_t num_pages)
+					  void *bitmap, uint32_t num_pages,
+					  uint32_t *unused)
 {
 	kvm_vm_get_dirty_log(vcpu->vm, slot, bitmap);
 	kvm_vm_clear_dirty_log(vcpu->vm, slot, bitmap, 0, num_pages);
@@ -329,10 +331,9 @@ static void dirty_ring_continue_vcpu(void)
 }
 
 static void dirty_ring_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
-					   void *bitmap, uint32_t num_pages)
+					   void *bitmap, uint32_t num_pages,
+					   uint32_t *ring_buf_idx)
 {
-	/* We only have one vcpu */
-	static uint32_t fetch_index = 0;
 	uint32_t count = 0, cleared;
 	bool continued_vcpu = false;
 
@@ -349,7 +350,8 @@ static void dirty_ring_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
 
 	/* Only have one vcpu */
 	count = dirty_ring_collect_one(vcpu_map_dirty_ring(vcpu),
-				       slot, bitmap, num_pages, &fetch_index);
+				       slot, bitmap, num_pages,
+				       ring_buf_idx);
 
 	cleared = kvm_vm_reset_dirty_ring(vcpu->vm);
 
@@ -406,7 +408,8 @@ struct log_mode {
 	void (*create_vm_done)(struct kvm_vm *vm);
 	/* Hook to collect the dirty pages into the bitmap provided */
 	void (*collect_dirty_pages) (struct kvm_vcpu *vcpu, int slot,
-				     void *bitmap, uint32_t num_pages);
+				     void *bitmap, uint32_t num_pages,
+				     uint32_t *ring_buf_idx);
 	/* Hook to call when after each vcpu run */
 	void (*after_vcpu_run)(struct kvm_vcpu *vcpu, int ret, int err);
 	void (*before_vcpu_join) (void);
@@ -471,13 +474,14 @@ static void log_mode_create_vm_done(struct kvm_vm *vm)
 }
 
 static void log_mode_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
-					 void *bitmap, uint32_t num_pages)
+					 void *bitmap, uint32_t num_pages,
+					 uint32_t *ring_buf_idx)
 {
 	struct log_mode *mode = &log_modes[host_log_mode];
 
 	TEST_ASSERT(mode->collect_dirty_pages != NULL,
 		    "collect_dirty_pages() is required for any log mode!");
-	mode->collect_dirty_pages(vcpu, slot, bitmap, num_pages);
+	mode->collect_dirty_pages(vcpu, slot, bitmap, num_pages, ring_buf_idx);
 }
 
 static void log_mode_after_vcpu_run(struct kvm_vcpu *vcpu, int ret, int err)
@@ -696,6 +700,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	struct kvm_vcpu *vcpu;
 	struct kvm_vm *vm;
 	unsigned long *bmap;
+	uint32_t ring_buf_idx = 0;
 
 	if (!log_mode_supported()) {
 		print_skip("Log mode '%s' not supported",
@@ -771,6 +776,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	host_dirty_count = 0;
 	host_clear_count = 0;
 	host_track_next_count = 0;
+	WRITE_ONCE(dirty_ring_vcpu_ring_full, false);
 
 	pthread_create(&vcpu_thread, NULL, vcpu_worker, vcpu);
 
@@ -778,7 +784,8 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 		/* Give the vcpu thread some time to dirty some pages */
 		usleep(p->interval * 1000);
 		log_mode_collect_dirty_pages(vcpu, TEST_MEM_SLOT_INDEX,
-					     bmap, host_num_pages);
+					     bmap, host_num_pages,
+					     &ring_buf_idx);
 
 		/*
 		 * See vcpu_sync_stop_requested definition for details on why
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v7 8/9] KVM: selftests: Clear dirty ring states between two modes in dirty_log_test
@ 2022-10-31  0:36   ` Gavin Shan
  0 siblings, 0 replies; 68+ messages in thread
From: Gavin Shan @ 2022-10-31  0:36 UTC (permalink / raw)
  To: kvmarm
  Cc: shuah, catalin.marinas, kvm, maz, andrew.jones, shan.gavin,
	bgardon, dmatlack, pbonzini, zhenyzha, will, kvmarm, ajones

There are two states, which need to be cleared before next mode
is executed. Otherwise, we will hit failure as the following messages
indicate.

- The variable 'dirty_ring_vcpu_ring_full' shared by main and vcpu
  thread. It's indicating if the vcpu exit due to full ring buffer.
  The value can be carried from previous mode (VM_MODE_P40V48_4K) to
  current one (VM_MODE_P40V48_64K) when VM_MODE_P40V48_16K isn't
  supported.

- The current ring buffer index needs to be reset before next mode
  (VM_MODE_P40V48_64K) is executed. Otherwise, the stale value is
  carried from previous mode (VM_MODE_P40V48_4K).

  # ./dirty_log_test -M dirty-ring
  Setting log mode to: 'dirty-ring'
  Test iterations: 32, interval: 10 (ms)
  Testing guest mode: PA-bits:40,  VA-bits:48,  4K pages
  guest physical test memory offset: 0xffbfffc000
    :
  Dirtied 995328 pages
  Total bits checked: dirty (1012434), clear (7114123), track_next (966700)
  Testing guest mode: PA-bits:40,  VA-bits:48, 64K pages
  guest physical test memory offset: 0xffbffc0000
  vcpu stops because vcpu is kicked out...
  vcpu continues now.
  Notifying vcpu to continue
  Iteration 1 collected 0 pages
  vcpu stops because dirty ring is full...
  vcpu continues now.
  vcpu stops because dirty ring is full...
  vcpu continues now.
  vcpu stops because dirty ring is full...
  ==== Test Assertion Failure ====
  dirty_log_test.c:369: cleared == count
  pid=10541 tid=10541 errno=22 - Invalid argument
     1	0x0000000000403087: dirty_ring_collect_dirty_pages at dirty_log_test.c:369
     2	0x0000000000402a0b: log_mode_collect_dirty_pages at dirty_log_test.c:492
     3	 (inlined by) run_test at dirty_log_test.c:795
     4	 (inlined by) run_test at dirty_log_test.c:705
     5	0x0000000000403a37: for_each_guest_mode at guest_modes.c:100
     6	0x0000000000401ccf: main at dirty_log_test.c:938
     7	0x0000ffff9ecd279b: ?? ??:0
     8	0x0000ffff9ecd286b: ?? ??:0
     9	0x0000000000401def: _start at ??:?
  Reset dirty pages (0) mismatch with collected (35566)

Fix the issues by clearing 'dirty_ring_vcpu_ring_full' and the ring
buffer index before next new mode is to be executed.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 tools/testing/selftests/kvm/dirty_log_test.c | 27 ++++++++++++--------
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index b5234d6efbe1..8758c10ec850 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -226,13 +226,15 @@ static void clear_log_create_vm_done(struct kvm_vm *vm)
 }
 
 static void dirty_log_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
-					  void *bitmap, uint32_t num_pages)
+					  void *bitmap, uint32_t num_pages,
+					  uint32_t *unused)
 {
 	kvm_vm_get_dirty_log(vcpu->vm, slot, bitmap);
 }
 
 static void clear_log_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
-					  void *bitmap, uint32_t num_pages)
+					  void *bitmap, uint32_t num_pages,
+					  uint32_t *unused)
 {
 	kvm_vm_get_dirty_log(vcpu->vm, slot, bitmap);
 	kvm_vm_clear_dirty_log(vcpu->vm, slot, bitmap, 0, num_pages);
@@ -329,10 +331,9 @@ static void dirty_ring_continue_vcpu(void)
 }
 
 static void dirty_ring_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
-					   void *bitmap, uint32_t num_pages)
+					   void *bitmap, uint32_t num_pages,
+					   uint32_t *ring_buf_idx)
 {
-	/* We only have one vcpu */
-	static uint32_t fetch_index = 0;
 	uint32_t count = 0, cleared;
 	bool continued_vcpu = false;
 
@@ -349,7 +350,8 @@ static void dirty_ring_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
 
 	/* Only have one vcpu */
 	count = dirty_ring_collect_one(vcpu_map_dirty_ring(vcpu),
-				       slot, bitmap, num_pages, &fetch_index);
+				       slot, bitmap, num_pages,
+				       ring_buf_idx);
 
 	cleared = kvm_vm_reset_dirty_ring(vcpu->vm);
 
@@ -406,7 +408,8 @@ struct log_mode {
 	void (*create_vm_done)(struct kvm_vm *vm);
 	/* Hook to collect the dirty pages into the bitmap provided */
 	void (*collect_dirty_pages) (struct kvm_vcpu *vcpu, int slot,
-				     void *bitmap, uint32_t num_pages);
+				     void *bitmap, uint32_t num_pages,
+				     uint32_t *ring_buf_idx);
 	/* Hook to call when after each vcpu run */
 	void (*after_vcpu_run)(struct kvm_vcpu *vcpu, int ret, int err);
 	void (*before_vcpu_join) (void);
@@ -471,13 +474,14 @@ static void log_mode_create_vm_done(struct kvm_vm *vm)
 }
 
 static void log_mode_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
-					 void *bitmap, uint32_t num_pages)
+					 void *bitmap, uint32_t num_pages,
+					 uint32_t *ring_buf_idx)
 {
 	struct log_mode *mode = &log_modes[host_log_mode];
 
 	TEST_ASSERT(mode->collect_dirty_pages != NULL,
 		    "collect_dirty_pages() is required for any log mode!");
-	mode->collect_dirty_pages(vcpu, slot, bitmap, num_pages);
+	mode->collect_dirty_pages(vcpu, slot, bitmap, num_pages, ring_buf_idx);
 }
 
 static void log_mode_after_vcpu_run(struct kvm_vcpu *vcpu, int ret, int err)
@@ -696,6 +700,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	struct kvm_vcpu *vcpu;
 	struct kvm_vm *vm;
 	unsigned long *bmap;
+	uint32_t ring_buf_idx = 0;
 
 	if (!log_mode_supported()) {
 		print_skip("Log mode '%s' not supported",
@@ -771,6 +776,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	host_dirty_count = 0;
 	host_clear_count = 0;
 	host_track_next_count = 0;
+	WRITE_ONCE(dirty_ring_vcpu_ring_full, false);
 
 	pthread_create(&vcpu_thread, NULL, vcpu_worker, vcpu);
 
@@ -778,7 +784,8 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 		/* Give the vcpu thread some time to dirty some pages */
 		usleep(p->interval * 1000);
 		log_mode_collect_dirty_pages(vcpu, TEST_MEM_SLOT_INDEX,
-					     bmap, host_num_pages);
+					     bmap, host_num_pages,
+					     &ring_buf_idx);
 
 		/*
 		 * See vcpu_sync_stop_requested definition for details on why
-- 
2.23.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v7 9/9] KVM: selftests: Automate choosing dirty ring size in dirty_log_test
  2022-10-31  0:36 ` Gavin Shan
@ 2022-10-31  0:36   ` Gavin Shan
  -1 siblings, 0 replies; 68+ messages in thread
From: Gavin Shan @ 2022-10-31  0:36 UTC (permalink / raw)
  To: kvmarm
  Cc: shuah, catalin.marinas, kvm, maz, andrew.jones, shan.gavin,
	bgardon, dmatlack, pbonzini, zhenyzha, will, kvmarm, ajones

In the dirty ring case, we rely on vcpu exit due to full dirty ring
state. On ARM64 system, there are 4096 host pages when the host
page size is 64KB. In this case, the vcpu never exits due to the
full dirty ring state. The similar case is 4KB page size on host
and 64KB page size on guest. The vcpu corrupts same set of host
pages, but the dirty page information isn't collected in the main
thread. This leads to infinite loop as the following log shows.

  # ./dirty_log_test -M dirty-ring -c 65536 -m 5
  Setting log mode to: 'dirty-ring'
  Test iterations: 32, interval: 10 (ms)
  Testing guest mode: PA-bits:40,  VA-bits:48,  4K pages
  guest physical test memory offset: 0xffbffe0000
  vcpu stops because vcpu is kicked out...
  Notifying vcpu to continue
  vcpu continues now.
  Iteration 1 collected 576 pages
  <No more output afterwards>

Fix the issue by automatically choosing the best dirty ring size,
to ensure vcpu exit due to full dirty ring state. The option '-c'
becomes a hint to the dirty ring count, instead of the value of it.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 tools/testing/selftests/kvm/dirty_log_test.c | 26 +++++++++++++++++---
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index 8758c10ec850..a87e5f78ebf1 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -24,6 +24,9 @@
 #include "guest_modes.h"
 #include "processor.h"
 
+#define DIRTY_MEM_BITS 30 /* 1G */
+#define PAGE_SHIFT_4K  12
+
 /* The memory slot index to track dirty pages */
 #define TEST_MEM_SLOT_INDEX		1
 
@@ -273,6 +276,24 @@ static bool dirty_ring_supported(void)
 
 static void dirty_ring_create_vm_done(struct kvm_vm *vm)
 {
+	uint64_t pages;
+	uint32_t limit;
+
+	/*
+	 * We rely on vcpu exit due to full dirty ring state. Adjust
+	 * the ring buffer size to ensure we're able to reach the
+	 * full dirty ring state.
+	 */
+	pages = (1ul << (DIRTY_MEM_BITS - vm->page_shift)) + 3;
+	pages = vm_adjust_num_guest_pages(vm->mode, pages);
+	if (vm->page_size < getpagesize())
+		pages = vm_num_host_pages(vm->mode, pages);
+
+	limit = 1 << (31 - __builtin_clz(pages));
+	test_dirty_ring_count = 1 << (31 - __builtin_clz(test_dirty_ring_count));
+	test_dirty_ring_count = min(limit, test_dirty_ring_count);
+	pr_info("dirty ring count: 0x%x\n", test_dirty_ring_count);
+
 	/*
 	 * Switch to dirty ring mode after VM creation but before any
 	 * of the vcpu creation.
@@ -685,9 +706,6 @@ static struct kvm_vm *create_vm(enum vm_guest_mode mode, struct kvm_vcpu **vcpu,
 	return vm;
 }
 
-#define DIRTY_MEM_BITS 30 /* 1G */
-#define PAGE_SHIFT_4K  12
-
 struct test_params {
 	unsigned long iterations;
 	unsigned long interval;
@@ -830,7 +848,7 @@ static void help(char *name)
 	printf("usage: %s [-h] [-i iterations] [-I interval] "
 	       "[-p offset] [-m mode]\n", name);
 	puts("");
-	printf(" -c: specify dirty ring size, in number of entries\n");
+	printf(" -c: hint to dirty ring size, in number of entries\n");
 	printf("     (only useful for dirty-ring test; default: %"PRIu32")\n",
 	       TEST_DIRTY_RING_COUNT);
 	printf(" -i: specify iteration counts (default: %"PRIu64")\n",
-- 
2.23.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v7 9/9] KVM: selftests: Automate choosing dirty ring size in dirty_log_test
@ 2022-10-31  0:36   ` Gavin Shan
  0 siblings, 0 replies; 68+ messages in thread
From: Gavin Shan @ 2022-10-31  0:36 UTC (permalink / raw)
  To: kvmarm
  Cc: kvm, kvmarm, andrew.jones, ajones, maz, bgardon, catalin.marinas,
	dmatlack, will, pbonzini, peterx, oliver.upton, seanjc,
	james.morse, shuah, suzuki.poulose, alexandru.elisei, zhenyzha,
	shan.gavin

In the dirty ring case, we rely on vcpu exit due to full dirty ring
state. On ARM64 system, there are 4096 host pages when the host
page size is 64KB. In this case, the vcpu never exits due to the
full dirty ring state. The similar case is 4KB page size on host
and 64KB page size on guest. The vcpu corrupts same set of host
pages, but the dirty page information isn't collected in the main
thread. This leads to infinite loop as the following log shows.

  # ./dirty_log_test -M dirty-ring -c 65536 -m 5
  Setting log mode to: 'dirty-ring'
  Test iterations: 32, interval: 10 (ms)
  Testing guest mode: PA-bits:40,  VA-bits:48,  4K pages
  guest physical test memory offset: 0xffbffe0000
  vcpu stops because vcpu is kicked out...
  Notifying vcpu to continue
  vcpu continues now.
  Iteration 1 collected 576 pages
  <No more output afterwards>

Fix the issue by automatically choosing the best dirty ring size,
to ensure vcpu exit due to full dirty ring state. The option '-c'
becomes a hint to the dirty ring count, instead of the value of it.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 tools/testing/selftests/kvm/dirty_log_test.c | 26 +++++++++++++++++---
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index 8758c10ec850..a87e5f78ebf1 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -24,6 +24,9 @@
 #include "guest_modes.h"
 #include "processor.h"
 
+#define DIRTY_MEM_BITS 30 /* 1G */
+#define PAGE_SHIFT_4K  12
+
 /* The memory slot index to track dirty pages */
 #define TEST_MEM_SLOT_INDEX		1
 
@@ -273,6 +276,24 @@ static bool dirty_ring_supported(void)
 
 static void dirty_ring_create_vm_done(struct kvm_vm *vm)
 {
+	uint64_t pages;
+	uint32_t limit;
+
+	/*
+	 * We rely on vcpu exit due to full dirty ring state. Adjust
+	 * the ring buffer size to ensure we're able to reach the
+	 * full dirty ring state.
+	 */
+	pages = (1ul << (DIRTY_MEM_BITS - vm->page_shift)) + 3;
+	pages = vm_adjust_num_guest_pages(vm->mode, pages);
+	if (vm->page_size < getpagesize())
+		pages = vm_num_host_pages(vm->mode, pages);
+
+	limit = 1 << (31 - __builtin_clz(pages));
+	test_dirty_ring_count = 1 << (31 - __builtin_clz(test_dirty_ring_count));
+	test_dirty_ring_count = min(limit, test_dirty_ring_count);
+	pr_info("dirty ring count: 0x%x\n", test_dirty_ring_count);
+
 	/*
 	 * Switch to dirty ring mode after VM creation but before any
 	 * of the vcpu creation.
@@ -685,9 +706,6 @@ static struct kvm_vm *create_vm(enum vm_guest_mode mode, struct kvm_vcpu **vcpu,
 	return vm;
 }
 
-#define DIRTY_MEM_BITS 30 /* 1G */
-#define PAGE_SHIFT_4K  12
-
 struct test_params {
 	unsigned long iterations;
 	unsigned long interval;
@@ -830,7 +848,7 @@ static void help(char *name)
 	printf("usage: %s [-h] [-i iterations] [-I interval] "
 	       "[-p offset] [-m mode]\n", name);
 	puts("");
-	printf(" -c: specify dirty ring size, in number of entries\n");
+	printf(" -c: hint to dirty ring size, in number of entries\n");
 	printf("     (only useful for dirty-ring test; default: %"PRIu32")\n",
 	       TEST_DIRTY_RING_COUNT);
 	printf(" -i: specify iteration counts (default: %"PRIu64")\n",
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 5/9] KVM: arm64: Improve no-running-vcpu report for dirty ring
  2022-10-31  0:36   ` Gavin Shan
@ 2022-10-31  9:08     ` Oliver Upton
  -1 siblings, 0 replies; 68+ messages in thread
From: Oliver Upton @ 2022-10-31  9:08 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvmarm, kvm, kvmarm, andrew.jones, ajones, maz, bgardon,
	catalin.marinas, dmatlack, will, pbonzini, peterx, seanjc,
	james.morse, shuah, suzuki.poulose, alexandru.elisei, zhenyzha,
	shan.gavin

On Mon, Oct 31, 2022 at 08:36:17AM +0800, Gavin Shan wrote:
> KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP should be enabled only when KVM
> device "kvm-arm-vgic-its" is used by userspace. Currently, it's the
> only case where a running VCPU is missed for dirty ring. However,
> there are potentially other devices introducing similar error in
> future.
> 
> In order to report those broken devices only, the no-running-vcpu
> warning message is escaped from KVM device "kvm-arm-vgic-its". For
> this, the function vgic_has_its() needs to be exposed with a more
> generic function name (kvm_vgic_has_its()).
> 
> Link: https://lore.kernel.org/kvmarm/Y1ghIKrAsRFwSFsO@google.com
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Gavin Shan <gshan@redhat.com>

I don't think this should be added as a separate patch.

The weak kvm_arch_allow_write_without_running_vcpu() (and adding its
caller) should be rolled into patch 4/9. The arm64 implementation of
that should be introduced in patch 6/9.

> ---
>  arch/arm64/kvm/mmu.c               | 14 ++++++++++++++
>  arch/arm64/kvm/vgic/vgic-init.c    |  4 ++--
>  arch/arm64/kvm/vgic/vgic-irqfd.c   |  4 ++--
>  arch/arm64/kvm/vgic/vgic-its.c     |  2 +-
>  arch/arm64/kvm/vgic/vgic-mmio-v3.c | 18 ++++--------------
>  arch/arm64/kvm/vgic/vgic.c         | 10 ++++++++++
>  arch/arm64/kvm/vgic/vgic.h         |  1 -
>  include/kvm/arm_vgic.h             |  1 +
>  include/linux/kvm_dirty_ring.h     |  1 +
>  virt/kvm/dirty_ring.c              |  5 +++++
>  virt/kvm/kvm_main.c                |  2 +-
>  11 files changed, 41 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 60ee3d9f01f8..e0855b2b3d66 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -932,6 +932,20 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>  	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
>  }
>  
> +/*
> + * kvm_arch_allow_write_without_running_vcpu - allow writing guest memory
> + * without the running VCPU when dirty ring is enabled.
> + *
> + * The running VCPU is required to track dirty guest pages when dirty ring
> + * is enabled. Otherwise, the backup bitmap should be used to track the
> + * dirty guest pages. When vgic/its is enabled, we need to use the backup
> + * bitmap to track the dirty guest pages for it.
> + */
> +bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
> +{
> +	return kvm->dirty_ring_with_bitmap && kvm_vgic_has_its(kvm);
> +}

It is trivial for userspace to cause a WARN to fire like this. Just set
up the VM with !RING_WITH_BITMAP && ITS.

The goal is to catch KVM bugs, not userspace bugs, so I'd suggest only
checking whether or not an ITS is present.

[...]

> diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> index 91201f743033..10218057c176 100644
> --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> @@ -38,20 +38,10 @@ u64 update_64bit_reg(u64 reg, unsigned int offset, unsigned int len,
>  	return reg | ((u64)val << lower);
>  }
>  
> -bool vgic_has_its(struct kvm *kvm)
> -{
> -	struct vgic_dist *dist = &kvm->arch.vgic;
> -
> -	if (dist->vgic_model != KVM_DEV_TYPE_ARM_VGIC_V3)
> -		return false;
> -
> -	return dist->has_its;
> -}
> -

nit: renaming/exposing this helper should be done in a separate patch.
Also, I don't think you need to move it anywhere either.

[...]

> diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
> index 7ce6a5f81c98..f27e038043f3 100644
> --- a/virt/kvm/dirty_ring.c
> +++ b/virt/kvm/dirty_ring.c
> @@ -26,6 +26,11 @@ bool kvm_use_dirty_bitmap(struct kvm *kvm)
>  	return !kvm->dirty_ring_size || kvm->dirty_ring_with_bitmap;
>  }
>  
> +bool __weak kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
> +{
> +	return kvm->dirty_ring_with_bitmap;
> +}
> +

Same comment on the arm64 implementation applies here. This should just
return false by default.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 5/9] KVM: arm64: Improve no-running-vcpu report for dirty ring
@ 2022-10-31  9:08     ` Oliver Upton
  0 siblings, 0 replies; 68+ messages in thread
From: Oliver Upton @ 2022-10-31  9:08 UTC (permalink / raw)
  To: Gavin Shan
  Cc: shuah, catalin.marinas, kvm, maz, andrew.jones, dmatlack,
	shan.gavin, bgardon, kvmarm, pbonzini, zhenyzha, will, kvmarm,
	ajones

On Mon, Oct 31, 2022 at 08:36:17AM +0800, Gavin Shan wrote:
> KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP should be enabled only when KVM
> device "kvm-arm-vgic-its" is used by userspace. Currently, it's the
> only case where a running VCPU is missed for dirty ring. However,
> there are potentially other devices introducing similar error in
> future.
> 
> In order to report those broken devices only, the no-running-vcpu
> warning message is escaped from KVM device "kvm-arm-vgic-its". For
> this, the function vgic_has_its() needs to be exposed with a more
> generic function name (kvm_vgic_has_its()).
> 
> Link: https://lore.kernel.org/kvmarm/Y1ghIKrAsRFwSFsO@google.com
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Gavin Shan <gshan@redhat.com>

I don't think this should be added as a separate patch.

The weak kvm_arch_allow_write_without_running_vcpu() (and adding its
caller) should be rolled into patch 4/9. The arm64 implementation of
that should be introduced in patch 6/9.

> ---
>  arch/arm64/kvm/mmu.c               | 14 ++++++++++++++
>  arch/arm64/kvm/vgic/vgic-init.c    |  4 ++--
>  arch/arm64/kvm/vgic/vgic-irqfd.c   |  4 ++--
>  arch/arm64/kvm/vgic/vgic-its.c     |  2 +-
>  arch/arm64/kvm/vgic/vgic-mmio-v3.c | 18 ++++--------------
>  arch/arm64/kvm/vgic/vgic.c         | 10 ++++++++++
>  arch/arm64/kvm/vgic/vgic.h         |  1 -
>  include/kvm/arm_vgic.h             |  1 +
>  include/linux/kvm_dirty_ring.h     |  1 +
>  virt/kvm/dirty_ring.c              |  5 +++++
>  virt/kvm/kvm_main.c                |  2 +-
>  11 files changed, 41 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 60ee3d9f01f8..e0855b2b3d66 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -932,6 +932,20 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>  	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
>  }
>  
> +/*
> + * kvm_arch_allow_write_without_running_vcpu - allow writing guest memory
> + * without the running VCPU when dirty ring is enabled.
> + *
> + * The running VCPU is required to track dirty guest pages when dirty ring
> + * is enabled. Otherwise, the backup bitmap should be used to track the
> + * dirty guest pages. When vgic/its is enabled, we need to use the backup
> + * bitmap to track the dirty guest pages for it.
> + */
> +bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
> +{
> +	return kvm->dirty_ring_with_bitmap && kvm_vgic_has_its(kvm);
> +}

It is trivial for userspace to cause a WARN to fire like this. Just set
up the VM with !RING_WITH_BITMAP && ITS.

The goal is to catch KVM bugs, not userspace bugs, so I'd suggest only
checking whether or not an ITS is present.

[...]

> diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> index 91201f743033..10218057c176 100644
> --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> @@ -38,20 +38,10 @@ u64 update_64bit_reg(u64 reg, unsigned int offset, unsigned int len,
>  	return reg | ((u64)val << lower);
>  }
>  
> -bool vgic_has_its(struct kvm *kvm)
> -{
> -	struct vgic_dist *dist = &kvm->arch.vgic;
> -
> -	if (dist->vgic_model != KVM_DEV_TYPE_ARM_VGIC_V3)
> -		return false;
> -
> -	return dist->has_its;
> -}
> -

nit: renaming/exposing this helper should be done in a separate patch.
Also, I don't think you need to move it anywhere either.

[...]

> diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
> index 7ce6a5f81c98..f27e038043f3 100644
> --- a/virt/kvm/dirty_ring.c
> +++ b/virt/kvm/dirty_ring.c
> @@ -26,6 +26,11 @@ bool kvm_use_dirty_bitmap(struct kvm *kvm)
>  	return !kvm->dirty_ring_size || kvm->dirty_ring_with_bitmap;
>  }
>  
> +bool __weak kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
> +{
> +	return kvm->dirty_ring_with_bitmap;
> +}
> +

Same comment on the arm64 implementation applies here. This should just
return false by default.

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 3/9] KVM: Check KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL} prior to enabling them
  2022-10-31  0:36   ` Gavin Shan
@ 2022-10-31  9:18     ` Oliver Upton
  -1 siblings, 0 replies; 68+ messages in thread
From: Oliver Upton @ 2022-10-31  9:18 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvmarm, kvm, kvmarm, andrew.jones, ajones, maz, bgardon,
	catalin.marinas, dmatlack, will, pbonzini, peterx, seanjc,
	james.morse, shuah, suzuki.poulose, alexandru.elisei, zhenyzha,
	shan.gavin

On Mon, Oct 31, 2022 at 08:36:15AM +0800, Gavin Shan wrote:
> There are two capabilities related to ring-based dirty page tracking:
> KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL. Both are
> supported by x86. However, arm64 supports KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> only when the feature is supported on arm64. The userspace doesn't have
> to enable the advertised capability, meaning KVM_CAP_DIRTY_LOG_RING can
> be enabled on arm64 by userspace and it's wrong.
> 
> Fix it by double checking if the capability has been advertised prior to
> enabling it. It's rejected to enable the capability if it hasn't been
> advertised.
> 
> Fixes: 17601bfed909 ("KVM: Add KVM_CAP_DIRTY_LOG_RING_ACQ_REL capability and config option")
> Reported-by: Sean Christopherson <seanjc@google.com>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Gavin Shan <gshan@redhat.com>

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>

This patch should be picked up separate from this series for 6.1. The
original patch went through kvmarm and I think there are a few other
arm64 fixes to be sent out anyway.

Marc, can you grab this? :)

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 3/9] KVM: Check KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL} prior to enabling them
@ 2022-10-31  9:18     ` Oliver Upton
  0 siblings, 0 replies; 68+ messages in thread
From: Oliver Upton @ 2022-10-31  9:18 UTC (permalink / raw)
  To: Gavin Shan
  Cc: shuah, catalin.marinas, kvm, maz, andrew.jones, dmatlack,
	shan.gavin, bgardon, kvmarm, pbonzini, zhenyzha, will, kvmarm,
	ajones

On Mon, Oct 31, 2022 at 08:36:15AM +0800, Gavin Shan wrote:
> There are two capabilities related to ring-based dirty page tracking:
> KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL. Both are
> supported by x86. However, arm64 supports KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> only when the feature is supported on arm64. The userspace doesn't have
> to enable the advertised capability, meaning KVM_CAP_DIRTY_LOG_RING can
> be enabled on arm64 by userspace and it's wrong.
> 
> Fix it by double checking if the capability has been advertised prior to
> enabling it. It's rejected to enable the capability if it hasn't been
> advertised.
> 
> Fixes: 17601bfed909 ("KVM: Add KVM_CAP_DIRTY_LOG_RING_ACQ_REL capability and config option")
> Reported-by: Sean Christopherson <seanjc@google.com>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Gavin Shan <gshan@redhat.com>

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>

This patch should be picked up separate from this series for 6.1. The
original patch went through kvmarm and I think there are a few other
arm64 fixes to be sent out anyway.

Marc, can you grab this? :)

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: (subset) [PATCH v7 0/9] KVM: arm64: Enable ring-based dirty memory tracking
  2022-10-31  0:36 ` Gavin Shan
@ 2022-10-31 17:23   ` Marc Zyngier
  -1 siblings, 0 replies; 68+ messages in thread
From: Marc Zyngier @ 2022-10-31 17:23 UTC (permalink / raw)
  To: Gavin Shan, kvmarm
  Cc: ajones, andrew.jones, suzuki.poulose, will, james.morse, kvm,
	oliver.upton, kvmarm, peterx, dmatlack, shuah, catalin.marinas,
	alexandru.elisei, pbonzini, seanjc, shan.gavin, bgardon,
	zhenyzha

On Mon, 31 Oct 2022 08:36:12 +0800, Gavin Shan wrote:
> This series enables the ring-based dirty memory tracking for ARM64.
> The feature has been available and enabled on x86 for a while. It
> is beneficial when the number of dirty pages is small in a checkpointing
> system or live migration scenario. More details can be found from
> fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking").
> 
> v6: https://lore.kernel.org/kvmarm/20221011061447.131531-1-gshan@redhat.com/
> v5: https://lore.kernel.org/all/20221005004154.83502-1-gshan@redhat.com/
> v4: https://lore.kernel.org/kvmarm/20220927005439.21130-1-gshan@redhat.com/
> v3: https://lore.kernel.org/r/20220922003214.276736-1-gshan@redhat.com
> v2: https://lore.kernel.org/lkml/YyiV%2Fl7O23aw5aaO@xz-m1.local/T/
> v1: https://lore.kernel.org/lkml/20220819005601.198436-1-gshan@redhat.com
> 
> [...]

Applied to fixes, thanks!

[3/9] KVM: Check KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL} prior to enabling them
      commit: 7a2726ec3290c52f52ce8d5f5af73ab8c7681bc1

Cheers,

	M.
-- 
Without deviation from the norm, progress is not possible.



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: (subset) [PATCH v7 0/9] KVM: arm64: Enable ring-based dirty memory tracking
@ 2022-10-31 17:23   ` Marc Zyngier
  0 siblings, 0 replies; 68+ messages in thread
From: Marc Zyngier @ 2022-10-31 17:23 UTC (permalink / raw)
  To: Gavin Shan, kvmarm
  Cc: kvm, shuah, andrew.jones, bgardon, shan.gavin, catalin.marinas,
	dmatlack, pbonzini, zhenyzha, will, kvmarm, ajones

On Mon, 31 Oct 2022 08:36:12 +0800, Gavin Shan wrote:
> This series enables the ring-based dirty memory tracking for ARM64.
> The feature has been available and enabled on x86 for a while. It
> is beneficial when the number of dirty pages is small in a checkpointing
> system or live migration scenario. More details can be found from
> fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking").
> 
> v6: https://lore.kernel.org/kvmarm/20221011061447.131531-1-gshan@redhat.com/
> v5: https://lore.kernel.org/all/20221005004154.83502-1-gshan@redhat.com/
> v4: https://lore.kernel.org/kvmarm/20220927005439.21130-1-gshan@redhat.com/
> v3: https://lore.kernel.org/r/20220922003214.276736-1-gshan@redhat.com
> v2: https://lore.kernel.org/lkml/YyiV%2Fl7O23aw5aaO@xz-m1.local/T/
> v1: https://lore.kernel.org/lkml/20220819005601.198436-1-gshan@redhat.com
> 
> [...]

Applied to fixes, thanks!

[3/9] KVM: Check KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL} prior to enabling them
      commit: 7a2726ec3290c52f52ce8d5f5af73ab8c7681bc1

Cheers,

	M.
-- 
Without deviation from the norm, progress is not possible.


_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 5/9] KVM: arm64: Improve no-running-vcpu report for dirty ring
  2022-10-31  9:08     ` Oliver Upton
@ 2022-10-31 23:08       ` Gavin Shan
  -1 siblings, 0 replies; 68+ messages in thread
From: Gavin Shan @ 2022-10-31 23:08 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, kvm, kvmarm, andrew.jones, ajones, maz, bgardon,
	catalin.marinas, dmatlack, will, pbonzini, peterx, seanjc,
	james.morse, shuah, suzuki.poulose, alexandru.elisei, zhenyzha,
	shan.gavin

On 10/31/22 5:08 PM, Oliver Upton wrote:
> On Mon, Oct 31, 2022 at 08:36:17AM +0800, Gavin Shan wrote:
>> KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP should be enabled only when KVM
>> device "kvm-arm-vgic-its" is used by userspace. Currently, it's the
>> only case where a running VCPU is missed for dirty ring. However,
>> there are potentially other devices introducing similar error in
>> future.
>>
>> In order to report those broken devices only, the no-running-vcpu
>> warning message is escaped from KVM device "kvm-arm-vgic-its". For
>> this, the function vgic_has_its() needs to be exposed with a more
>> generic function name (kvm_vgic_has_its()).
>>
>> Link: https://lore.kernel.org/kvmarm/Y1ghIKrAsRFwSFsO@google.com
>> Suggested-by: Sean Christopherson <seanjc@google.com>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
> 
> I don't think this should be added as a separate patch.
> 
> The weak kvm_arch_allow_write_without_running_vcpu() (and adding its
> caller) should be rolled into patch 4/9. The arm64 implementation of
> that should be introduced in patch 6/9.
> 

Ok, the changes will be distributed in PATCH[4/9] and PATCH[6/9].

>> ---
>>   arch/arm64/kvm/mmu.c               | 14 ++++++++++++++
>>   arch/arm64/kvm/vgic/vgic-init.c    |  4 ++--
>>   arch/arm64/kvm/vgic/vgic-irqfd.c   |  4 ++--
>>   arch/arm64/kvm/vgic/vgic-its.c     |  2 +-
>>   arch/arm64/kvm/vgic/vgic-mmio-v3.c | 18 ++++--------------
>>   arch/arm64/kvm/vgic/vgic.c         | 10 ++++++++++
>>   arch/arm64/kvm/vgic/vgic.h         |  1 -
>>   include/kvm/arm_vgic.h             |  1 +
>>   include/linux/kvm_dirty_ring.h     |  1 +
>>   virt/kvm/dirty_ring.c              |  5 +++++
>>   virt/kvm/kvm_main.c                |  2 +-
>>   11 files changed, 41 insertions(+), 21 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index 60ee3d9f01f8..e0855b2b3d66 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -932,6 +932,20 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>>   	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
>>   }
>>   
>> +/*
>> + * kvm_arch_allow_write_without_running_vcpu - allow writing guest memory
>> + * without the running VCPU when dirty ring is enabled.
>> + *
>> + * The running VCPU is required to track dirty guest pages when dirty ring
>> + * is enabled. Otherwise, the backup bitmap should be used to track the
>> + * dirty guest pages. When vgic/its is enabled, we need to use the backup
>> + * bitmap to track the dirty guest pages for it.
>> + */
>> +bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
>> +{
>> +	return kvm->dirty_ring_with_bitmap && kvm_vgic_has_its(kvm);
>> +}
> 
> It is trivial for userspace to cause a WARN to fire like this. Just set
> up the VM with !RING_WITH_BITMAP && ITS.
> 
> The goal is to catch KVM bugs, not userspace bugs, so I'd suggest only
> checking whether or not an ITS is present.
> 
> [...]
> 

Ok. 'kvm->dirty_ring_with_bitmap' needn't to be checked here if we don't
plan to catch userspace bug. Marc had suggestions to escape from the
no-running-vcpu check only when vgic/its tables are being restored [1].

In order to cover Marc's concern, I would introduce a different helper
kvm_vgic_save_its_tables_in_progress(), which simply returns
'bool struct vgic_dist::save_its_tables_in_progress'. The newly added
field is set and cleared in vgic_its_ctrl(). All these changes will be
folded to PATCH[v7 6/9]. Oliver and Marc, could you please let me know
if the changes sounds good?

    static int vgic_its_ctrl(struct kvm *kvm, struct vgic_its *its, u64 attr)
    {
        const struct vgic_its_abi *abi = vgic_its_get_abi(its);
        struct vgic_dist *dist = &kvm->arch.vgic;
        int ret = 0;
          :
        switch (attr) {
        case KVM_DEV_ARM_ITS_CTRL_RESET:
             vgic_its_reset(kvm, its);
             break;
        case KVM_DEV_ARM_ITS_SAVE_TABLES:
             dist->save_its_tables_in_progress = true;
             ret = abi->save_tables(its);
             dist->save_its_tables_in_progress = false;
             break;
        case KVM_DEV_ARM_ITS_RESTORE_TABLES:
             ret = abi->restore_tables(its);
             break;
        }
        :
     }
  
[1] https://lore.kernel.org/kvmarm/2ce535e9-f57a-0ab6-5c30-2b8afd4472e6@redhat.com/T/#mcf10e2d3ca0235ab1cac8793d894c1634666d280

>> diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> index 91201f743033..10218057c176 100644
>> --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> @@ -38,20 +38,10 @@ u64 update_64bit_reg(u64 reg, unsigned int offset, unsigned int len,
>>   	return reg | ((u64)val << lower);
>>   }
>>   
>> -bool vgic_has_its(struct kvm *kvm)
>> -{
>> -	struct vgic_dist *dist = &kvm->arch.vgic;
>> -
>> -	if (dist->vgic_model != KVM_DEV_TYPE_ARM_VGIC_V3)
>> -		return false;
>> -
>> -	return dist->has_its;
>> -}
>> -
> 
> nit: renaming/exposing this helper should be done in a separate patch.
> Also, I don't think you need to move it anywhere either.
> 
> [...]
> 

As Marc suggested, we tend to escape the site of saving vgic/its tables from
the no-running-vcpu check. So we need a new helper kvm_vgic_save_its_tables_in_progress()
instead, meaning kvm_vgic_has_its() isn't needed.

>> diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
>> index 7ce6a5f81c98..f27e038043f3 100644
>> --- a/virt/kvm/dirty_ring.c
>> +++ b/virt/kvm/dirty_ring.c
>> @@ -26,6 +26,11 @@ bool kvm_use_dirty_bitmap(struct kvm *kvm)
>>   	return !kvm->dirty_ring_size || kvm->dirty_ring_with_bitmap;
>>   }
>>   
>> +bool __weak kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
>> +{
>> +	return kvm->dirty_ring_with_bitmap;
>> +}
>> +
> 
> Same comment on the arm64 implementation applies here. This should just
> return false by default.
> 

Ok. It return 'false' and the addition of kvm_arch_allow_write_without_running_vcpu()
will be folded to PATCH[4/9], as you suggested.

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 5/9] KVM: arm64: Improve no-running-vcpu report for dirty ring
@ 2022-10-31 23:08       ` Gavin Shan
  0 siblings, 0 replies; 68+ messages in thread
From: Gavin Shan @ 2022-10-31 23:08 UTC (permalink / raw)
  To: Oliver Upton
  Cc: shuah, catalin.marinas, kvm, maz, andrew.jones, dmatlack,
	shan.gavin, bgardon, kvmarm, pbonzini, zhenyzha, will, kvmarm,
	ajones

On 10/31/22 5:08 PM, Oliver Upton wrote:
> On Mon, Oct 31, 2022 at 08:36:17AM +0800, Gavin Shan wrote:
>> KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP should be enabled only when KVM
>> device "kvm-arm-vgic-its" is used by userspace. Currently, it's the
>> only case where a running VCPU is missed for dirty ring. However,
>> there are potentially other devices introducing similar error in
>> future.
>>
>> In order to report those broken devices only, the no-running-vcpu
>> warning message is escaped from KVM device "kvm-arm-vgic-its". For
>> this, the function vgic_has_its() needs to be exposed with a more
>> generic function name (kvm_vgic_has_its()).
>>
>> Link: https://lore.kernel.org/kvmarm/Y1ghIKrAsRFwSFsO@google.com
>> Suggested-by: Sean Christopherson <seanjc@google.com>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
> 
> I don't think this should be added as a separate patch.
> 
> The weak kvm_arch_allow_write_without_running_vcpu() (and adding its
> caller) should be rolled into patch 4/9. The arm64 implementation of
> that should be introduced in patch 6/9.
> 

Ok, the changes will be distributed in PATCH[4/9] and PATCH[6/9].

>> ---
>>   arch/arm64/kvm/mmu.c               | 14 ++++++++++++++
>>   arch/arm64/kvm/vgic/vgic-init.c    |  4 ++--
>>   arch/arm64/kvm/vgic/vgic-irqfd.c   |  4 ++--
>>   arch/arm64/kvm/vgic/vgic-its.c     |  2 +-
>>   arch/arm64/kvm/vgic/vgic-mmio-v3.c | 18 ++++--------------
>>   arch/arm64/kvm/vgic/vgic.c         | 10 ++++++++++
>>   arch/arm64/kvm/vgic/vgic.h         |  1 -
>>   include/kvm/arm_vgic.h             |  1 +
>>   include/linux/kvm_dirty_ring.h     |  1 +
>>   virt/kvm/dirty_ring.c              |  5 +++++
>>   virt/kvm/kvm_main.c                |  2 +-
>>   11 files changed, 41 insertions(+), 21 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index 60ee3d9f01f8..e0855b2b3d66 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -932,6 +932,20 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>>   	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
>>   }
>>   
>> +/*
>> + * kvm_arch_allow_write_without_running_vcpu - allow writing guest memory
>> + * without the running VCPU when dirty ring is enabled.
>> + *
>> + * The running VCPU is required to track dirty guest pages when dirty ring
>> + * is enabled. Otherwise, the backup bitmap should be used to track the
>> + * dirty guest pages. When vgic/its is enabled, we need to use the backup
>> + * bitmap to track the dirty guest pages for it.
>> + */
>> +bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
>> +{
>> +	return kvm->dirty_ring_with_bitmap && kvm_vgic_has_its(kvm);
>> +}
> 
> It is trivial for userspace to cause a WARN to fire like this. Just set
> up the VM with !RING_WITH_BITMAP && ITS.
> 
> The goal is to catch KVM bugs, not userspace bugs, so I'd suggest only
> checking whether or not an ITS is present.
> 
> [...]
> 

Ok. 'kvm->dirty_ring_with_bitmap' needn't to be checked here if we don't
plan to catch userspace bug. Marc had suggestions to escape from the
no-running-vcpu check only when vgic/its tables are being restored [1].

In order to cover Marc's concern, I would introduce a different helper
kvm_vgic_save_its_tables_in_progress(), which simply returns
'bool struct vgic_dist::save_its_tables_in_progress'. The newly added
field is set and cleared in vgic_its_ctrl(). All these changes will be
folded to PATCH[v7 6/9]. Oliver and Marc, could you please let me know
if the changes sounds good?

    static int vgic_its_ctrl(struct kvm *kvm, struct vgic_its *its, u64 attr)
    {
        const struct vgic_its_abi *abi = vgic_its_get_abi(its);
        struct vgic_dist *dist = &kvm->arch.vgic;
        int ret = 0;
          :
        switch (attr) {
        case KVM_DEV_ARM_ITS_CTRL_RESET:
             vgic_its_reset(kvm, its);
             break;
        case KVM_DEV_ARM_ITS_SAVE_TABLES:
             dist->save_its_tables_in_progress = true;
             ret = abi->save_tables(its);
             dist->save_its_tables_in_progress = false;
             break;
        case KVM_DEV_ARM_ITS_RESTORE_TABLES:
             ret = abi->restore_tables(its);
             break;
        }
        :
     }
  
[1] https://lore.kernel.org/kvmarm/2ce535e9-f57a-0ab6-5c30-2b8afd4472e6@redhat.com/T/#mcf10e2d3ca0235ab1cac8793d894c1634666d280

>> diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> index 91201f743033..10218057c176 100644
>> --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> @@ -38,20 +38,10 @@ u64 update_64bit_reg(u64 reg, unsigned int offset, unsigned int len,
>>   	return reg | ((u64)val << lower);
>>   }
>>   
>> -bool vgic_has_its(struct kvm *kvm)
>> -{
>> -	struct vgic_dist *dist = &kvm->arch.vgic;
>> -
>> -	if (dist->vgic_model != KVM_DEV_TYPE_ARM_VGIC_V3)
>> -		return false;
>> -
>> -	return dist->has_its;
>> -}
>> -
> 
> nit: renaming/exposing this helper should be done in a separate patch.
> Also, I don't think you need to move it anywhere either.
> 
> [...]
> 

As Marc suggested, we tend to escape the site of saving vgic/its tables from
the no-running-vcpu check. So we need a new helper kvm_vgic_save_its_tables_in_progress()
instead, meaning kvm_vgic_has_its() isn't needed.

>> diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
>> index 7ce6a5f81c98..f27e038043f3 100644
>> --- a/virt/kvm/dirty_ring.c
>> +++ b/virt/kvm/dirty_ring.c
>> @@ -26,6 +26,11 @@ bool kvm_use_dirty_bitmap(struct kvm *kvm)
>>   	return !kvm->dirty_ring_size || kvm->dirty_ring_with_bitmap;
>>   }
>>   
>> +bool __weak kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
>> +{
>> +	return kvm->dirty_ring_with_bitmap;
>> +}
>> +
> 
> Same comment on the arm64 implementation applies here. This should just
> return false by default.
> 

Ok. It return 'false' and the addition of kvm_arch_allow_write_without_running_vcpu()
will be folded to PATCH[4/9], as you suggested.

Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 1/9] KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL
  2022-10-31  0:36   ` Gavin Shan
@ 2022-11-01 19:39     ` Sean Christopherson
  -1 siblings, 0 replies; 68+ messages in thread
From: Sean Christopherson @ 2022-11-01 19:39 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvmarm, kvm, kvmarm, andrew.jones, ajones, maz, bgardon,
	catalin.marinas, dmatlack, will, pbonzini, peterx, oliver.upton,
	james.morse, shuah, suzuki.poulose, alexandru.elisei, zhenyzha,
	shan.gavin

On Mon, Oct 31, 2022, Gavin Shan wrote:
> The VCPU isn't expected to be runnable when the dirty ring becomes soft
> full, until the dirty pages are harvested and the dirty ring is reset
> from userspace. So there is a check in each guest's entrace to see if
> the dirty ring is soft full or not. The VCPU is stopped from running if
> its dirty ring has been soft full. The similar check will be needed when
> the feature is going to be supported on ARM64. As Marc Zyngier suggested,
> a new event will avoid pointless overhead to check the size of the dirty
> ring ('vcpu->kvm->dirty_ring_size') in each guest's entrance.
> 
> Add KVM_REQ_DIRTY_RING_SOFT_FULL. The event is raised when the dirty ring
> becomes soft full in kvm_dirty_ring_push(). The event is cleared in the
> check, done in the newly added helper kvm_dirty_ring_check_request(), or
> when the dirty ring is reset by userspace. Since the VCPU is not runnable
> when the dirty ring becomes soft full, the KVM_REQ_DIRTY_RING_SOFT_FULL
> event is always set to prevent the VCPU from running until the dirty pages
> are harvested and the dirty ring is reset by userspace.
> 
> kvm_dirty_ring_soft_full() becomes a private function with the newly added
> helper kvm_dirty_ring_check_request(). The alignment for the various event
> definitions in kvm_host.h is changed to tab character by the way. In order
> to avoid using 'container_of()', the argument @ring is replaced by @vcpu
> in kvm_dirty_ring_push() and kvm_dirty_ring_reset(). The argument @kvm to
> kvm_dirty_ring_reset() is dropped since it can be retrieved from the VCPU.
> 
> Link: https://lore.kernel.org/kvmarm/87lerkwtm5.wl-maz@kernel.org
> Suggested-by: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> ---

Reviewed-by: Sean Christopherson <seanjc@google.com>

> @@ -142,13 +144,17 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
>  
>  	kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
>  
> +	if (!kvm_dirty_ring_soft_full(ring))
> +		kvm_clear_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu);
> +

Marc, Peter, and/or Paolo, can you confirm that clearing the request here won't
cause ordering problems?  Logically, this makes perfect sense (to me, since I
suggested it), but I'm mildly concerned I'm overlooking an edge case where KVM
could end up with a soft-full ring but no pending request.

>  	trace_kvm_dirty_ring_reset(ring);
>  
>  	return count;
>  }
>  

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 1/9] KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL
@ 2022-11-01 19:39     ` Sean Christopherson
  0 siblings, 0 replies; 68+ messages in thread
From: Sean Christopherson @ 2022-11-01 19:39 UTC (permalink / raw)
  To: Gavin Shan
  Cc: shuah, catalin.marinas, kvm, maz, andrew.jones, dmatlack,
	shan.gavin, bgardon, kvmarm, pbonzini, zhenyzha, will, kvmarm,
	ajones

On Mon, Oct 31, 2022, Gavin Shan wrote:
> The VCPU isn't expected to be runnable when the dirty ring becomes soft
> full, until the dirty pages are harvested and the dirty ring is reset
> from userspace. So there is a check in each guest's entrace to see if
> the dirty ring is soft full or not. The VCPU is stopped from running if
> its dirty ring has been soft full. The similar check will be needed when
> the feature is going to be supported on ARM64. As Marc Zyngier suggested,
> a new event will avoid pointless overhead to check the size of the dirty
> ring ('vcpu->kvm->dirty_ring_size') in each guest's entrance.
> 
> Add KVM_REQ_DIRTY_RING_SOFT_FULL. The event is raised when the dirty ring
> becomes soft full in kvm_dirty_ring_push(). The event is cleared in the
> check, done in the newly added helper kvm_dirty_ring_check_request(), or
> when the dirty ring is reset by userspace. Since the VCPU is not runnable
> when the dirty ring becomes soft full, the KVM_REQ_DIRTY_RING_SOFT_FULL
> event is always set to prevent the VCPU from running until the dirty pages
> are harvested and the dirty ring is reset by userspace.
> 
> kvm_dirty_ring_soft_full() becomes a private function with the newly added
> helper kvm_dirty_ring_check_request(). The alignment for the various event
> definitions in kvm_host.h is changed to tab character by the way. In order
> to avoid using 'container_of()', the argument @ring is replaced by @vcpu
> in kvm_dirty_ring_push() and kvm_dirty_ring_reset(). The argument @kvm to
> kvm_dirty_ring_reset() is dropped since it can be retrieved from the VCPU.
> 
> Link: https://lore.kernel.org/kvmarm/87lerkwtm5.wl-maz@kernel.org
> Suggested-by: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> ---

Reviewed-by: Sean Christopherson <seanjc@google.com>

> @@ -142,13 +144,17 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
>  
>  	kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
>  
> +	if (!kvm_dirty_ring_soft_full(ring))
> +		kvm_clear_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu);
> +

Marc, Peter, and/or Paolo, can you confirm that clearing the request here won't
cause ordering problems?  Logically, this makes perfect sense (to me, since I
suggested it), but I'm mildly concerned I'm overlooking an edge case where KVM
could end up with a soft-full ring but no pending request.

>  	trace_kvm_dirty_ring_reset(ring);
>  
>  	return count;
>  }
>  
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 1/9] KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL
  2022-11-01 19:39     ` Sean Christopherson
@ 2022-11-02 14:29       ` Peter Xu
  -1 siblings, 0 replies; 68+ messages in thread
From: Peter Xu @ 2022-11-02 14:29 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: shuah, kvm, maz, andrew.jones, dmatlack, will, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, catalin.marinas, kvmarm,
	ajones

On Tue, Nov 01, 2022 at 07:39:25PM +0000, Sean Christopherson wrote:
> > @@ -142,13 +144,17 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
> >  
> >  	kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
> >  
> > +	if (!kvm_dirty_ring_soft_full(ring))
> > +		kvm_clear_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu);
> > +
> 
> Marc, Peter, and/or Paolo, can you confirm that clearing the request here won't
> cause ordering problems?  Logically, this makes perfect sense (to me, since I
> suggested it), but I'm mildly concerned I'm overlooking an edge case where KVM
> could end up with a soft-full ring but no pending request.

I don't see an ordering issue here, as long as kvm_clear_request() is using
atomic version of bit clear, afaict that's genuine RMW and should always
imply a full memory barrier (on any arch?) between the soft full check and
the bit clear.  At least for x86 the lock prefix was applied.

However I don't see anything stops a simple "race" to trigger like below:

          recycle thread                   vcpu thread
          --------------                   -----------
      if (!dirty_ring_soft_full)                                   <--- not full
                                        dirty_ring_push();
                                        if (dirty_ring_soft_full)  <--- full due to the push
                                            set_request(SOFT_FULL);
          clear_request(SOFT_FULL);                                <--- can wrongly clear the request?

But I don't think that's a huge matter, as it'll just let the vcpu to have
one more chance to do another round of KVM_RUN.  Normally I think it means
there can be one more dirty GFN (perhaps there're cases that it can push >1
gfns for one KVM_RUN cycle?  I never figured out the details here, but
still..) pushed to the ring so closer to the hard limit, but we have had a
buffer zone of KVM_DIRTY_RING_RSVD_ENTRIES (64) entries.  So I assume
that's still fine, but maybe worth a short comment here?

I never know what's the maximum possible GFNs being dirtied for a KVM_RUN
cycle.  It would be good if there's an answer to that from anyone.

-- 
Peter Xu

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 1/9] KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL
@ 2022-11-02 14:29       ` Peter Xu
  0 siblings, 0 replies; 68+ messages in thread
From: Peter Xu @ 2022-11-02 14:29 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Gavin Shan, kvmarm, kvm, kvmarm, andrew.jones, ajones, maz,
	bgardon, catalin.marinas, dmatlack, will, pbonzini, oliver.upton,
	james.morse, shuah, suzuki.poulose, alexandru.elisei, zhenyzha,
	shan.gavin

On Tue, Nov 01, 2022 at 07:39:25PM +0000, Sean Christopherson wrote:
> > @@ -142,13 +144,17 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
> >  
> >  	kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
> >  
> > +	if (!kvm_dirty_ring_soft_full(ring))
> > +		kvm_clear_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu);
> > +
> 
> Marc, Peter, and/or Paolo, can you confirm that clearing the request here won't
> cause ordering problems?  Logically, this makes perfect sense (to me, since I
> suggested it), but I'm mildly concerned I'm overlooking an edge case where KVM
> could end up with a soft-full ring but no pending request.

I don't see an ordering issue here, as long as kvm_clear_request() is using
atomic version of bit clear, afaict that's genuine RMW and should always
imply a full memory barrier (on any arch?) between the soft full check and
the bit clear.  At least for x86 the lock prefix was applied.

However I don't see anything stops a simple "race" to trigger like below:

          recycle thread                   vcpu thread
          --------------                   -----------
      if (!dirty_ring_soft_full)                                   <--- not full
                                        dirty_ring_push();
                                        if (dirty_ring_soft_full)  <--- full due to the push
                                            set_request(SOFT_FULL);
          clear_request(SOFT_FULL);                                <--- can wrongly clear the request?

But I don't think that's a huge matter, as it'll just let the vcpu to have
one more chance to do another round of KVM_RUN.  Normally I think it means
there can be one more dirty GFN (perhaps there're cases that it can push >1
gfns for one KVM_RUN cycle?  I never figured out the details here, but
still..) pushed to the ring so closer to the hard limit, but we have had a
buffer zone of KVM_DIRTY_RING_RSVD_ENTRIES (64) entries.  So I assume
that's still fine, but maybe worth a short comment here?

I never know what's the maximum possible GFNs being dirtied for a KVM_RUN
cycle.  It would be good if there's an answer to that from anyone.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 1/9] KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL
  2022-11-01 19:39     ` Sean Christopherson
@ 2022-11-02 14:31       ` Marc Zyngier
  -1 siblings, 0 replies; 68+ messages in thread
From: Marc Zyngier @ 2022-11-02 14:31 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: shuah, kvm, catalin.marinas, andrew.jones, dmatlack, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, will, kvmarm, ajones

On Tue, 01 Nov 2022 19:39:25 +0000,
Sean Christopherson <seanjc@google.com> wrote:
> 
> On Mon, Oct 31, 2022, Gavin Shan wrote:
> > The VCPU isn't expected to be runnable when the dirty ring becomes soft
> > full, until the dirty pages are harvested and the dirty ring is reset
> > from userspace. So there is a check in each guest's entrace to see if
> > the dirty ring is soft full or not. The VCPU is stopped from running if
> > its dirty ring has been soft full. The similar check will be needed when
> > the feature is going to be supported on ARM64. As Marc Zyngier suggested,
> > a new event will avoid pointless overhead to check the size of the dirty
> > ring ('vcpu->kvm->dirty_ring_size') in each guest's entrance.
> > 
> > Add KVM_REQ_DIRTY_RING_SOFT_FULL. The event is raised when the dirty ring
> > becomes soft full in kvm_dirty_ring_push(). The event is cleared in the
> > check, done in the newly added helper kvm_dirty_ring_check_request(), or
> > when the dirty ring is reset by userspace. Since the VCPU is not runnable
> > when the dirty ring becomes soft full, the KVM_REQ_DIRTY_RING_SOFT_FULL
> > event is always set to prevent the VCPU from running until the dirty pages
> > are harvested and the dirty ring is reset by userspace.
> > 
> > kvm_dirty_ring_soft_full() becomes a private function with the newly added
> > helper kvm_dirty_ring_check_request(). The alignment for the various event
> > definitions in kvm_host.h is changed to tab character by the way. In order
> > to avoid using 'container_of()', the argument @ring is replaced by @vcpu
> > in kvm_dirty_ring_push() and kvm_dirty_ring_reset(). The argument @kvm to
> > kvm_dirty_ring_reset() is dropped since it can be retrieved from the VCPU.
> > 
> > Link: https://lore.kernel.org/kvmarm/87lerkwtm5.wl-maz@kernel.org
> > Suggested-by: Marc Zyngier <maz@kernel.org>
> > Signed-off-by: Gavin Shan <gshan@redhat.com>
> > Reviewed-by: Peter Xu <peterx@redhat.com>
> > ---
> 
> Reviewed-by: Sean Christopherson <seanjc@google.com>
> 
> > @@ -142,13 +144,17 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
> >  
> >  	kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
> >  
> > +	if (!kvm_dirty_ring_soft_full(ring))
> > +		kvm_clear_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu);
> > +
> 
> Marc, Peter, and/or Paolo, can you confirm that clearing the request
> here won't cause ordering problems?  Logically, this makes perfect
> sense (to me, since I suggested it), but I'm mildly concerned I'm
> overlooking an edge case where KVM could end up with a soft-full
> ring but no pending request.

I don't think you'll end-up with a soft-full and no request situation,
as kvm_make_request() enforces ordering, and you're making the request
on the vcpu itself. Even on arm64, this is guaranteed to be ordered
(same CPU, same address).

However, resetting the ring and clearing the request are not ordered,
which can lead to a slightly odd situation where the two events are
out of sync. But kvm_dirty_ring_check_request() requires both the
request to be set and the ring to be full to take any action. This
work around the lack of order.

I'd be much happier if kvm_clear_request() was fully ordered, but I
otherwise don't think we have an issue here.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 1/9] KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL
@ 2022-11-02 14:31       ` Marc Zyngier
  0 siblings, 0 replies; 68+ messages in thread
From: Marc Zyngier @ 2022-11-02 14:31 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Gavin Shan, kvmarm, kvm, kvmarm, andrew.jones, ajones, bgardon,
	catalin.marinas, dmatlack, will, pbonzini, peterx, oliver.upton,
	james.morse, shuah, suzuki.poulose, alexandru.elisei, zhenyzha,
	shan.gavin

On Tue, 01 Nov 2022 19:39:25 +0000,
Sean Christopherson <seanjc@google.com> wrote:
> 
> On Mon, Oct 31, 2022, Gavin Shan wrote:
> > The VCPU isn't expected to be runnable when the dirty ring becomes soft
> > full, until the dirty pages are harvested and the dirty ring is reset
> > from userspace. So there is a check in each guest's entrace to see if
> > the dirty ring is soft full or not. The VCPU is stopped from running if
> > its dirty ring has been soft full. The similar check will be needed when
> > the feature is going to be supported on ARM64. As Marc Zyngier suggested,
> > a new event will avoid pointless overhead to check the size of the dirty
> > ring ('vcpu->kvm->dirty_ring_size') in each guest's entrance.
> > 
> > Add KVM_REQ_DIRTY_RING_SOFT_FULL. The event is raised when the dirty ring
> > becomes soft full in kvm_dirty_ring_push(). The event is cleared in the
> > check, done in the newly added helper kvm_dirty_ring_check_request(), or
> > when the dirty ring is reset by userspace. Since the VCPU is not runnable
> > when the dirty ring becomes soft full, the KVM_REQ_DIRTY_RING_SOFT_FULL
> > event is always set to prevent the VCPU from running until the dirty pages
> > are harvested and the dirty ring is reset by userspace.
> > 
> > kvm_dirty_ring_soft_full() becomes a private function with the newly added
> > helper kvm_dirty_ring_check_request(). The alignment for the various event
> > definitions in kvm_host.h is changed to tab character by the way. In order
> > to avoid using 'container_of()', the argument @ring is replaced by @vcpu
> > in kvm_dirty_ring_push() and kvm_dirty_ring_reset(). The argument @kvm to
> > kvm_dirty_ring_reset() is dropped since it can be retrieved from the VCPU.
> > 
> > Link: https://lore.kernel.org/kvmarm/87lerkwtm5.wl-maz@kernel.org
> > Suggested-by: Marc Zyngier <maz@kernel.org>
> > Signed-off-by: Gavin Shan <gshan@redhat.com>
> > Reviewed-by: Peter Xu <peterx@redhat.com>
> > ---
> 
> Reviewed-by: Sean Christopherson <seanjc@google.com>
> 
> > @@ -142,13 +144,17 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
> >  
> >  	kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
> >  
> > +	if (!kvm_dirty_ring_soft_full(ring))
> > +		kvm_clear_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu);
> > +
> 
> Marc, Peter, and/or Paolo, can you confirm that clearing the request
> here won't cause ordering problems?  Logically, this makes perfect
> sense (to me, since I suggested it), but I'm mildly concerned I'm
> overlooking an edge case where KVM could end up with a soft-full
> ring but no pending request.

I don't think you'll end-up with a soft-full and no request situation,
as kvm_make_request() enforces ordering, and you're making the request
on the vcpu itself. Even on arm64, this is guaranteed to be ordered
(same CPU, same address).

However, resetting the ring and clearing the request are not ordered,
which can lead to a slightly odd situation where the two events are
out of sync. But kvm_dirty_ring_check_request() requires both the
request to be set and the ring to be full to take any action. This
work around the lack of order.

I'd be much happier if kvm_clear_request() was fully ordered, but I
otherwise don't think we have an issue here.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 1/9] KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL
  2022-11-02 14:29       ` Peter Xu
@ 2022-11-02 15:58         ` Marc Zyngier
  -1 siblings, 0 replies; 68+ messages in thread
From: Marc Zyngier @ 2022-11-02 15:58 UTC (permalink / raw)
  To: Peter Xu
  Cc: shuah, kvm, andrew.jones, dmatlack, will, shan.gavin, bgardon,
	kvmarm, pbonzini, zhenyzha, catalin.marinas, kvmarm, ajones

On Wed, 02 Nov 2022 14:29:26 +0000,
Peter Xu <peterx@redhat.com> wrote:
> 
> On Tue, Nov 01, 2022 at 07:39:25PM +0000, Sean Christopherson wrote:
> > > @@ -142,13 +144,17 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
> > >  
> > >  	kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
> > >  
> > > +	if (!kvm_dirty_ring_soft_full(ring))
> > > +		kvm_clear_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu);
> > > +
> > 
> > Marc, Peter, and/or Paolo, can you confirm that clearing the
> > request here won't cause ordering problems?  Logically, this makes
> > perfect sense (to me, since I suggested it), but I'm mildly
> > concerned I'm overlooking an edge case where KVM could end up with
> > a soft-full ring but no pending request.
> 
> I don't see an ordering issue here, as long as kvm_clear_request() is using
> atomic version of bit clear, afaict that's genuine RMW and should always
> imply a full memory barrier (on any arch?) between the soft full check and
> the bit clear.  At least for x86 the lock prefix was applied.

No, clear_bit() is not a full barrier. It only atomic, and thus
completely unordered (see Documentation/atomic_bitops.txt). If you
want a full barrier, you need to use test_and_clear_bit().

> 
> However I don't see anything stops a simple "race" to trigger like below:
> 
>           recycle thread                   vcpu thread
>           --------------                   -----------
>       if (!dirty_ring_soft_full)                                   <--- not full
>                                         dirty_ring_push();
>                                         if (dirty_ring_soft_full)  <--- full due to the push
>                                             set_request(SOFT_FULL);
>           clear_request(SOFT_FULL);                                <--- can wrongly clear the request?
>

Hmmm, well spotted. That's another ugly effect of the recycle thread
playing with someone else's toys.

> But I don't think that's a huge matter, as it'll just let the vcpu to have
> one more chance to do another round of KVM_RUN.  Normally I think it means
> there can be one more dirty GFN (perhaps there're cases that it can push >1
> gfns for one KVM_RUN cycle?  I never figured out the details here, but
> still..) pushed to the ring so closer to the hard limit, but we have had a
> buffer zone of KVM_DIRTY_RING_RSVD_ENTRIES (64) entries.  So I assume
> that's still fine, but maybe worth a short comment here?
> 
> I never know what's the maximum possible GFNs being dirtied for a KVM_RUN
> cycle.  It would be good if there's an answer to that from anyone.

This is dangerous, and I'd rather not go there.

It is starting to look like we need the recycle thread to get out of
the way. And to be honest:

+	if (!kvm_dirty_ring_soft_full(ring))
+		kvm_clear_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu);

seems rather superfluous. Only clearing the flag in the vcpu entry
path feels much saner, and I can't see anything that would break.

Thoughts?

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 1/9] KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL
@ 2022-11-02 15:58         ` Marc Zyngier
  0 siblings, 0 replies; 68+ messages in thread
From: Marc Zyngier @ 2022-11-02 15:58 UTC (permalink / raw)
  To: Peter Xu
  Cc: Sean Christopherson, Gavin Shan, kvmarm, kvm, kvmarm,
	andrew.jones, ajones, bgardon, catalin.marinas, dmatlack, will,
	pbonzini, oliver.upton, james.morse, shuah, suzuki.poulose,
	alexandru.elisei, zhenyzha, shan.gavin

On Wed, 02 Nov 2022 14:29:26 +0000,
Peter Xu <peterx@redhat.com> wrote:
> 
> On Tue, Nov 01, 2022 at 07:39:25PM +0000, Sean Christopherson wrote:
> > > @@ -142,13 +144,17 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
> > >  
> > >  	kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
> > >  
> > > +	if (!kvm_dirty_ring_soft_full(ring))
> > > +		kvm_clear_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu);
> > > +
> > 
> > Marc, Peter, and/or Paolo, can you confirm that clearing the
> > request here won't cause ordering problems?  Logically, this makes
> > perfect sense (to me, since I suggested it), but I'm mildly
> > concerned I'm overlooking an edge case where KVM could end up with
> > a soft-full ring but no pending request.
> 
> I don't see an ordering issue here, as long as kvm_clear_request() is using
> atomic version of bit clear, afaict that's genuine RMW and should always
> imply a full memory barrier (on any arch?) between the soft full check and
> the bit clear.  At least for x86 the lock prefix was applied.

No, clear_bit() is not a full barrier. It only atomic, and thus
completely unordered (see Documentation/atomic_bitops.txt). If you
want a full barrier, you need to use test_and_clear_bit().

> 
> However I don't see anything stops a simple "race" to trigger like below:
> 
>           recycle thread                   vcpu thread
>           --------------                   -----------
>       if (!dirty_ring_soft_full)                                   <--- not full
>                                         dirty_ring_push();
>                                         if (dirty_ring_soft_full)  <--- full due to the push
>                                             set_request(SOFT_FULL);
>           clear_request(SOFT_FULL);                                <--- can wrongly clear the request?
>

Hmmm, well spotted. That's another ugly effect of the recycle thread
playing with someone else's toys.

> But I don't think that's a huge matter, as it'll just let the vcpu to have
> one more chance to do another round of KVM_RUN.  Normally I think it means
> there can be one more dirty GFN (perhaps there're cases that it can push >1
> gfns for one KVM_RUN cycle?  I never figured out the details here, but
> still..) pushed to the ring so closer to the hard limit, but we have had a
> buffer zone of KVM_DIRTY_RING_RSVD_ENTRIES (64) entries.  So I assume
> that's still fine, but maybe worth a short comment here?
> 
> I never know what's the maximum possible GFNs being dirtied for a KVM_RUN
> cycle.  It would be good if there's an answer to that from anyone.

This is dangerous, and I'd rather not go there.

It is starting to look like we need the recycle thread to get out of
the way. And to be honest:

+	if (!kvm_dirty_ring_soft_full(ring))
+		kvm_clear_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu);

seems rather superfluous. Only clearing the flag in the vcpu entry
path feels much saner, and I can't see anything that would break.

Thoughts?

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 1/9] KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL
  2022-11-02 15:58         ` Marc Zyngier
@ 2022-11-02 16:11           ` Sean Christopherson
  -1 siblings, 0 replies; 68+ messages in thread
From: Sean Christopherson @ 2022-11-02 16:11 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Peter Xu, Gavin Shan, kvmarm, kvm, kvmarm, andrew.jones, ajones,
	bgardon, catalin.marinas, dmatlack, will, pbonzini, oliver.upton,
	james.morse, shuah, suzuki.poulose, alexandru.elisei, zhenyzha,
	shan.gavin

On Wed, Nov 02, 2022, Marc Zyngier wrote:
> On Wed, 02 Nov 2022 14:29:26 +0000, Peter Xu <peterx@redhat.com> wrote:
> > However I don't see anything stops a simple "race" to trigger like below:
> > 
> >           recycle thread                   vcpu thread
> >           --------------                   -----------
> >       if (!dirty_ring_soft_full)                                   <--- not full
> >                                         dirty_ring_push();
> >                                         if (dirty_ring_soft_full)  <--- full due to the push
> >                                             set_request(SOFT_FULL);
> >           clear_request(SOFT_FULL);                                <--- can wrongly clear the request?
> >
> 
> Hmmm, well spotted. That's another ugly effect of the recycle thread
> playing with someone else's toys.
> 
> > But I don't think that's a huge matter, as it'll just let the vcpu to have
> > one more chance to do another round of KVM_RUN.  Normally I think it means
> > there can be one more dirty GFN (perhaps there're cases that it can push >1
> > gfns for one KVM_RUN cycle?  I never figured out the details here, but
> > still..) pushed to the ring so closer to the hard limit, but we have had a
> > buffer zone of KVM_DIRTY_RING_RSVD_ENTRIES (64) entries.  So I assume
> > that's still fine, but maybe worth a short comment here?
> > 
> > I never know what's the maximum possible GFNs being dirtied for a KVM_RUN
> > cycle.  It would be good if there's an answer to that from anyone.
> 
> This is dangerous, and I'd rather not go there.
> 
> It is starting to look like we need the recycle thread to get out of
> the way. And to be honest:
> 
> +	if (!kvm_dirty_ring_soft_full(ring))
> +		kvm_clear_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu);
> 
> seems rather superfluous. Only clearing the flag in the vcpu entry
> path feels much saner, and I can't see anything that would break.
> 
> Thoughts?

I've no objections to dropping the clear on reset, I suggested it primarily so
that it would be easier to understand what action causes the dirty ring to become
not-full.  I agree that the explicit clear is unnecessary from a functional
perspective.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 1/9] KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL
@ 2022-11-02 16:11           ` Sean Christopherson
  0 siblings, 0 replies; 68+ messages in thread
From: Sean Christopherson @ 2022-11-02 16:11 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: shuah, kvm, catalin.marinas, andrew.jones, dmatlack, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, will, kvmarm, ajones

On Wed, Nov 02, 2022, Marc Zyngier wrote:
> On Wed, 02 Nov 2022 14:29:26 +0000, Peter Xu <peterx@redhat.com> wrote:
> > However I don't see anything stops a simple "race" to trigger like below:
> > 
> >           recycle thread                   vcpu thread
> >           --------------                   -----------
> >       if (!dirty_ring_soft_full)                                   <--- not full
> >                                         dirty_ring_push();
> >                                         if (dirty_ring_soft_full)  <--- full due to the push
> >                                             set_request(SOFT_FULL);
> >           clear_request(SOFT_FULL);                                <--- can wrongly clear the request?
> >
> 
> Hmmm, well spotted. That's another ugly effect of the recycle thread
> playing with someone else's toys.
> 
> > But I don't think that's a huge matter, as it'll just let the vcpu to have
> > one more chance to do another round of KVM_RUN.  Normally I think it means
> > there can be one more dirty GFN (perhaps there're cases that it can push >1
> > gfns for one KVM_RUN cycle?  I never figured out the details here, but
> > still..) pushed to the ring so closer to the hard limit, but we have had a
> > buffer zone of KVM_DIRTY_RING_RSVD_ENTRIES (64) entries.  So I assume
> > that's still fine, but maybe worth a short comment here?
> > 
> > I never know what's the maximum possible GFNs being dirtied for a KVM_RUN
> > cycle.  It would be good if there's an answer to that from anyone.
> 
> This is dangerous, and I'd rather not go there.
> 
> It is starting to look like we need the recycle thread to get out of
> the way. And to be honest:
> 
> +	if (!kvm_dirty_ring_soft_full(ring))
> +		kvm_clear_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu);
> 
> seems rather superfluous. Only clearing the flag in the vcpu entry
> path feels much saner, and I can't see anything that would break.
> 
> Thoughts?

I've no objections to dropping the clear on reset, I suggested it primarily so
that it would be easier to understand what action causes the dirty ring to become
not-full.  I agree that the explicit clear is unnecessary from a functional
perspective.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 1/9] KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL
  2022-11-02 15:58         ` Marc Zyngier
@ 2022-11-02 16:23           ` Peter Xu
  -1 siblings, 0 replies; 68+ messages in thread
From: Peter Xu @ 2022-11-02 16:23 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: shuah, kvm, andrew.jones, dmatlack, will, shan.gavin, bgardon,
	kvmarm, pbonzini, zhenyzha, catalin.marinas, kvmarm, ajones

On Wed, Nov 02, 2022 at 03:58:43PM +0000, Marc Zyngier wrote:
> On Wed, 02 Nov 2022 14:29:26 +0000,
> Peter Xu <peterx@redhat.com> wrote:
> > 
> > On Tue, Nov 01, 2022 at 07:39:25PM +0000, Sean Christopherson wrote:
> > > > @@ -142,13 +144,17 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
> > > >  
> > > >  	kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
> > > >  
> > > > +	if (!kvm_dirty_ring_soft_full(ring))
> > > > +		kvm_clear_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu);
> > > > +
> > > 
> > > Marc, Peter, and/or Paolo, can you confirm that clearing the
> > > request here won't cause ordering problems?  Logically, this makes
> > > perfect sense (to me, since I suggested it), but I'm mildly
> > > concerned I'm overlooking an edge case where KVM could end up with
> > > a soft-full ring but no pending request.
> > 
> > I don't see an ordering issue here, as long as kvm_clear_request() is using
> > atomic version of bit clear, afaict that's genuine RMW and should always
> > imply a full memory barrier (on any arch?) between the soft full check and
> > the bit clear.  At least for x86 the lock prefix was applied.
> 
> No, clear_bit() is not a full barrier. It only atomic, and thus
> completely unordered (see Documentation/atomic_bitops.txt). If you
> want a full barrier, you need to use test_and_clear_bit().

Right, I mixed it up again. :(  It's genuine RMW indeed (unlike _set/_read)
but I forgot it needs to have a retval to have the memory barriers.

Quotting atomic_t.rst:

---8<---
ORDERING  (go read memory-barriers.txt first)
--------

The rule of thumb:

 - non-RMW operations are unordered;

 - RMW operations that have no return value are unordered;

 - RMW operations that have a return value are fully ordered;

 - RMW operations that are conditional are unordered on FAILURE,
   otherwise the above rules apply.
---8<---

Bit clear unordered.

> 
> > 
> > However I don't see anything stops a simple "race" to trigger like below:
> > 
> >           recycle thread                   vcpu thread
> >           --------------                   -----------
> >       if (!dirty_ring_soft_full)                                   <--- not full
> >                                         dirty_ring_push();
> >                                         if (dirty_ring_soft_full)  <--- full due to the push
> >                                             set_request(SOFT_FULL);
> >           clear_request(SOFT_FULL);                                <--- can wrongly clear the request?
> >
> 
> Hmmm, well spotted. That's another ugly effect of the recycle thread
> playing with someone else's toys.
> 
> > But I don't think that's a huge matter, as it'll just let the vcpu to have
> > one more chance to do another round of KVM_RUN.  Normally I think it means
> > there can be one more dirty GFN (perhaps there're cases that it can push >1
> > gfns for one KVM_RUN cycle?  I never figured out the details here, but
> > still..) pushed to the ring so closer to the hard limit, but we have had a
> > buffer zone of KVM_DIRTY_RING_RSVD_ENTRIES (64) entries.  So I assume
> > that's still fine, but maybe worth a short comment here?
> > 
> > I never know what's the maximum possible GFNs being dirtied for a KVM_RUN
> > cycle.  It would be good if there's an answer to that from anyone.
> 
> This is dangerous, and I'd rather not go there.
> 
> It is starting to look like we need the recycle thread to get out of
> the way. And to be honest:
> 
> +	if (!kvm_dirty_ring_soft_full(ring))
> +		kvm_clear_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu);
> 
> seems rather superfluous. Only clearing the flag in the vcpu entry
> path feels much saner, and I can't see anything that would break.
> 
> Thoughts?

Sounds good here.

Might be slightly off-topic: I didn't quickly spot how do we guarantee two
threads doing KVM_RUN ioctl on the same vcpu fd concurrently.  I know
that's insane and could have corrupted things, but I just want to make sure
e.g. even a malicious guest app won't be able to trigger host warnings.

-- 
Peter Xu

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 1/9] KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL
@ 2022-11-02 16:23           ` Peter Xu
  0 siblings, 0 replies; 68+ messages in thread
From: Peter Xu @ 2022-11-02 16:23 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Sean Christopherson, Gavin Shan, kvmarm, kvm, kvmarm,
	andrew.jones, ajones, bgardon, catalin.marinas, dmatlack, will,
	pbonzini, oliver.upton, james.morse, shuah, suzuki.poulose,
	alexandru.elisei, zhenyzha, shan.gavin

On Wed, Nov 02, 2022 at 03:58:43PM +0000, Marc Zyngier wrote:
> On Wed, 02 Nov 2022 14:29:26 +0000,
> Peter Xu <peterx@redhat.com> wrote:
> > 
> > On Tue, Nov 01, 2022 at 07:39:25PM +0000, Sean Christopherson wrote:
> > > > @@ -142,13 +144,17 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
> > > >  
> > > >  	kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
> > > >  
> > > > +	if (!kvm_dirty_ring_soft_full(ring))
> > > > +		kvm_clear_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu);
> > > > +
> > > 
> > > Marc, Peter, and/or Paolo, can you confirm that clearing the
> > > request here won't cause ordering problems?  Logically, this makes
> > > perfect sense (to me, since I suggested it), but I'm mildly
> > > concerned I'm overlooking an edge case where KVM could end up with
> > > a soft-full ring but no pending request.
> > 
> > I don't see an ordering issue here, as long as kvm_clear_request() is using
> > atomic version of bit clear, afaict that's genuine RMW and should always
> > imply a full memory barrier (on any arch?) between the soft full check and
> > the bit clear.  At least for x86 the lock prefix was applied.
> 
> No, clear_bit() is not a full barrier. It only atomic, and thus
> completely unordered (see Documentation/atomic_bitops.txt). If you
> want a full barrier, you need to use test_and_clear_bit().

Right, I mixed it up again. :(  It's genuine RMW indeed (unlike _set/_read)
but I forgot it needs to have a retval to have the memory barriers.

Quotting atomic_t.rst:

---8<---
ORDERING  (go read memory-barriers.txt first)
--------

The rule of thumb:

 - non-RMW operations are unordered;

 - RMW operations that have no return value are unordered;

 - RMW operations that have a return value are fully ordered;

 - RMW operations that are conditional are unordered on FAILURE,
   otherwise the above rules apply.
---8<---

Bit clear unordered.

> 
> > 
> > However I don't see anything stops a simple "race" to trigger like below:
> > 
> >           recycle thread                   vcpu thread
> >           --------------                   -----------
> >       if (!dirty_ring_soft_full)                                   <--- not full
> >                                         dirty_ring_push();
> >                                         if (dirty_ring_soft_full)  <--- full due to the push
> >                                             set_request(SOFT_FULL);
> >           clear_request(SOFT_FULL);                                <--- can wrongly clear the request?
> >
> 
> Hmmm, well spotted. That's another ugly effect of the recycle thread
> playing with someone else's toys.
> 
> > But I don't think that's a huge matter, as it'll just let the vcpu to have
> > one more chance to do another round of KVM_RUN.  Normally I think it means
> > there can be one more dirty GFN (perhaps there're cases that it can push >1
> > gfns for one KVM_RUN cycle?  I never figured out the details here, but
> > still..) pushed to the ring so closer to the hard limit, but we have had a
> > buffer zone of KVM_DIRTY_RING_RSVD_ENTRIES (64) entries.  So I assume
> > that's still fine, but maybe worth a short comment here?
> > 
> > I never know what's the maximum possible GFNs being dirtied for a KVM_RUN
> > cycle.  It would be good if there's an answer to that from anyone.
> 
> This is dangerous, and I'd rather not go there.
> 
> It is starting to look like we need the recycle thread to get out of
> the way. And to be honest:
> 
> +	if (!kvm_dirty_ring_soft_full(ring))
> +		kvm_clear_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu);
> 
> seems rather superfluous. Only clearing the flag in the vcpu entry
> path feels much saner, and I can't see anything that would break.
> 
> Thoughts?

Sounds good here.

Might be slightly off-topic: I didn't quickly spot how do we guarantee two
threads doing KVM_RUN ioctl on the same vcpu fd concurrently.  I know
that's insane and could have corrupted things, but I just want to make sure
e.g. even a malicious guest app won't be able to trigger host warnings.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 1/9] KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL
  2022-11-02 16:23           ` Peter Xu
@ 2022-11-02 16:33             ` Sean Christopherson
  -1 siblings, 0 replies; 68+ messages in thread
From: Sean Christopherson @ 2022-11-02 16:33 UTC (permalink / raw)
  To: Peter Xu
  Cc: shuah, kvm, Marc Zyngier, andrew.jones, dmatlack, will,
	shan.gavin, bgardon, kvmarm, pbonzini, zhenyzha, catalin.marinas,
	kvmarm, ajones

On Wed, Nov 02, 2022, Peter Xu wrote:
> Might be slightly off-topic: I didn't quickly spot how do we guarantee two
> threads doing KVM_RUN ioctl on the same vcpu fd concurrently.  I know
> that's insane and could have corrupted things, but I just want to make sure
> e.g. even a malicious guest app won't be able to trigger host warnings.

kvm_vcpu_ioctl() takes the vCPU's mutex:

static long kvm_vcpu_ioctl(struct file *filp,
			   unsigned int ioctl, unsigned long arg)
{
	...

	/*
	 * Some architectures have vcpu ioctls that are asynchronous to vcpu
	 * execution; mutex_lock() would break them.
	 */
	r = kvm_arch_vcpu_async_ioctl(filp, ioctl, arg);
	if (r != -ENOIOCTLCMD)
		return r;

	if (mutex_lock_killable(&vcpu->mutex))
		return -EINTR;
	switch (ioctl) {
	case KVM_RUN: {
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 1/9] KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL
@ 2022-11-02 16:33             ` Sean Christopherson
  0 siblings, 0 replies; 68+ messages in thread
From: Sean Christopherson @ 2022-11-02 16:33 UTC (permalink / raw)
  To: Peter Xu
  Cc: Marc Zyngier, Gavin Shan, kvmarm, kvm, kvmarm, andrew.jones,
	ajones, bgardon, catalin.marinas, dmatlack, will, pbonzini,
	oliver.upton, james.morse, shuah, suzuki.poulose,
	alexandru.elisei, zhenyzha, shan.gavin

On Wed, Nov 02, 2022, Peter Xu wrote:
> Might be slightly off-topic: I didn't quickly spot how do we guarantee two
> threads doing KVM_RUN ioctl on the same vcpu fd concurrently.  I know
> that's insane and could have corrupted things, but I just want to make sure
> e.g. even a malicious guest app won't be able to trigger host warnings.

kvm_vcpu_ioctl() takes the vCPU's mutex:

static long kvm_vcpu_ioctl(struct file *filp,
			   unsigned int ioctl, unsigned long arg)
{
	...

	/*
	 * Some architectures have vcpu ioctls that are asynchronous to vcpu
	 * execution; mutex_lock() would break them.
	 */
	r = kvm_arch_vcpu_async_ioctl(filp, ioctl, arg);
	if (r != -ENOIOCTLCMD)
		return r;

	if (mutex_lock_killable(&vcpu->mutex))
		return -EINTR;
	switch (ioctl) {
	case KVM_RUN: {

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 1/9] KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL
  2022-11-02 16:33             ` Sean Christopherson
@ 2022-11-02 16:43               ` Peter Xu
  -1 siblings, 0 replies; 68+ messages in thread
From: Peter Xu @ 2022-11-02 16:43 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: shuah, kvm, Marc Zyngier, andrew.jones, dmatlack, will,
	shan.gavin, bgardon, kvmarm, pbonzini, zhenyzha, catalin.marinas,
	kvmarm, ajones

On Wed, Nov 02, 2022 at 04:33:15PM +0000, Sean Christopherson wrote:
> On Wed, Nov 02, 2022, Peter Xu wrote:
> > Might be slightly off-topic: I didn't quickly spot how do we guarantee two
> > threads doing KVM_RUN ioctl on the same vcpu fd concurrently.  I know
> > that's insane and could have corrupted things, but I just want to make sure
> > e.g. even a malicious guest app won't be able to trigger host warnings.
> 
> kvm_vcpu_ioctl() takes the vCPU's mutex:
> 
> static long kvm_vcpu_ioctl(struct file *filp,
> 			   unsigned int ioctl, unsigned long arg)
> {
> 	...
> 
> 	/*
> 	 * Some architectures have vcpu ioctls that are asynchronous to vcpu
> 	 * execution; mutex_lock() would break them.
> 	 */
> 	r = kvm_arch_vcpu_async_ioctl(filp, ioctl, arg);
> 	if (r != -ENOIOCTLCMD)
> 		return r;
> 
> 	if (mutex_lock_killable(&vcpu->mutex))
> 		return -EINTR;
> 	switch (ioctl) {
> 	case KVM_RUN: {

Ah, makes sense, thanks!

-- 
Peter Xu

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 1/9] KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL
@ 2022-11-02 16:43               ` Peter Xu
  0 siblings, 0 replies; 68+ messages in thread
From: Peter Xu @ 2022-11-02 16:43 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Marc Zyngier, Gavin Shan, kvmarm, kvm, kvmarm, andrew.jones,
	ajones, bgardon, catalin.marinas, dmatlack, will, pbonzini,
	oliver.upton, james.morse, shuah, suzuki.poulose,
	alexandru.elisei, zhenyzha, shan.gavin

On Wed, Nov 02, 2022 at 04:33:15PM +0000, Sean Christopherson wrote:
> On Wed, Nov 02, 2022, Peter Xu wrote:
> > Might be slightly off-topic: I didn't quickly spot how do we guarantee two
> > threads doing KVM_RUN ioctl on the same vcpu fd concurrently.  I know
> > that's insane and could have corrupted things, but I just want to make sure
> > e.g. even a malicious guest app won't be able to trigger host warnings.
> 
> kvm_vcpu_ioctl() takes the vCPU's mutex:
> 
> static long kvm_vcpu_ioctl(struct file *filp,
> 			   unsigned int ioctl, unsigned long arg)
> {
> 	...
> 
> 	/*
> 	 * Some architectures have vcpu ioctls that are asynchronous to vcpu
> 	 * execution; mutex_lock() would break them.
> 	 */
> 	r = kvm_arch_vcpu_async_ioctl(filp, ioctl, arg);
> 	if (r != -ENOIOCTLCMD)
> 		return r;
> 
> 	if (mutex_lock_killable(&vcpu->mutex))
> 		return -EINTR;
> 	switch (ioctl) {
> 	case KVM_RUN: {

Ah, makes sense, thanks!

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 1/9] KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL
  2022-11-02 16:11           ` Sean Christopherson
@ 2022-11-02 16:44             ` Marc Zyngier
  -1 siblings, 0 replies; 68+ messages in thread
From: Marc Zyngier @ 2022-11-02 16:44 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: shuah, kvm, catalin.marinas, andrew.jones, dmatlack, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, will, kvmarm, ajones

On Wed, 02 Nov 2022 16:11:07 +0000,
Sean Christopherson <seanjc@google.com> wrote:
> 
> On Wed, Nov 02, 2022, Marc Zyngier wrote:
> > On Wed, 02 Nov 2022 14:29:26 +0000, Peter Xu <peterx@redhat.com> wrote:
> > > However I don't see anything stops a simple "race" to trigger like below:
> > > 
> > >           recycle thread                   vcpu thread
> > >           --------------                   -----------
> > >       if (!dirty_ring_soft_full)                                   <--- not full
> > >                                         dirty_ring_push();
> > >                                         if (dirty_ring_soft_full)  <--- full due to the push
> > >                                             set_request(SOFT_FULL);
> > >           clear_request(SOFT_FULL);                                <--- can wrongly clear the request?
> > >
> > 
> > Hmmm, well spotted. That's another ugly effect of the recycle thread
> > playing with someone else's toys.
> > 
> > > But I don't think that's a huge matter, as it'll just let the vcpu to have
> > > one more chance to do another round of KVM_RUN.  Normally I think it means
> > > there can be one more dirty GFN (perhaps there're cases that it can push >1
> > > gfns for one KVM_RUN cycle?  I never figured out the details here, but
> > > still..) pushed to the ring so closer to the hard limit, but we have had a
> > > buffer zone of KVM_DIRTY_RING_RSVD_ENTRIES (64) entries.  So I assume
> > > that's still fine, but maybe worth a short comment here?
> > > 
> > > I never know what's the maximum possible GFNs being dirtied for a KVM_RUN
> > > cycle.  It would be good if there's an answer to that from anyone.
> > 
> > This is dangerous, and I'd rather not go there.
> > 
> > It is starting to look like we need the recycle thread to get out of
> > the way. And to be honest:
> > 
> > +	if (!kvm_dirty_ring_soft_full(ring))
> > +		kvm_clear_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu);
> > 
> > seems rather superfluous. Only clearing the flag in the vcpu entry
> > path feels much saner, and I can't see anything that would break.
> > 
> > Thoughts?
> 
> I've no objections to dropping the clear on reset, I suggested it
> primarily so that it would be easier to understand what action
> causes the dirty ring to become not-full.  I agree that the explicit
> clear is unnecessary from a functional perspective.

The core of the issue is that the whole request mechanism is a
producer/consumer model, where consuming a request is a CLEAR
action. The standard model is that the vcpu thread is the consumer,
and that any thread (including the vcpu itself) can be a producer.

With this flag clearing being on a non-vcpu thread, you end-up with
two consumers, and things can go subtly wrong.

I'd suggest replacing this hunk with a comment saying that the request
will be cleared by the vcpu thread next time it enters the guest.

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 1/9] KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL
@ 2022-11-02 16:44             ` Marc Zyngier
  0 siblings, 0 replies; 68+ messages in thread
From: Marc Zyngier @ 2022-11-02 16:44 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Peter Xu, Gavin Shan, kvmarm, kvm, kvmarm, andrew.jones, ajones,
	bgardon, catalin.marinas, dmatlack, will, pbonzini, oliver.upton,
	james.morse, shuah, suzuki.poulose, alexandru.elisei, zhenyzha,
	shan.gavin

On Wed, 02 Nov 2022 16:11:07 +0000,
Sean Christopherson <seanjc@google.com> wrote:
> 
> On Wed, Nov 02, 2022, Marc Zyngier wrote:
> > On Wed, 02 Nov 2022 14:29:26 +0000, Peter Xu <peterx@redhat.com> wrote:
> > > However I don't see anything stops a simple "race" to trigger like below:
> > > 
> > >           recycle thread                   vcpu thread
> > >           --------------                   -----------
> > >       if (!dirty_ring_soft_full)                                   <--- not full
> > >                                         dirty_ring_push();
> > >                                         if (dirty_ring_soft_full)  <--- full due to the push
> > >                                             set_request(SOFT_FULL);
> > >           clear_request(SOFT_FULL);                                <--- can wrongly clear the request?
> > >
> > 
> > Hmmm, well spotted. That's another ugly effect of the recycle thread
> > playing with someone else's toys.
> > 
> > > But I don't think that's a huge matter, as it'll just let the vcpu to have
> > > one more chance to do another round of KVM_RUN.  Normally I think it means
> > > there can be one more dirty GFN (perhaps there're cases that it can push >1
> > > gfns for one KVM_RUN cycle?  I never figured out the details here, but
> > > still..) pushed to the ring so closer to the hard limit, but we have had a
> > > buffer zone of KVM_DIRTY_RING_RSVD_ENTRIES (64) entries.  So I assume
> > > that's still fine, but maybe worth a short comment here?
> > > 
> > > I never know what's the maximum possible GFNs being dirtied for a KVM_RUN
> > > cycle.  It would be good if there's an answer to that from anyone.
> > 
> > This is dangerous, and I'd rather not go there.
> > 
> > It is starting to look like we need the recycle thread to get out of
> > the way. And to be honest:
> > 
> > +	if (!kvm_dirty_ring_soft_full(ring))
> > +		kvm_clear_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu);
> > 
> > seems rather superfluous. Only clearing the flag in the vcpu entry
> > path feels much saner, and I can't see anything that would break.
> > 
> > Thoughts?
> 
> I've no objections to dropping the clear on reset, I suggested it
> primarily so that it would be easier to understand what action
> causes the dirty ring to become not-full.  I agree that the explicit
> clear is unnecessary from a functional perspective.

The core of the issue is that the whole request mechanism is a
producer/consumer model, where consuming a request is a CLEAR
action. The standard model is that the vcpu thread is the consumer,
and that any thread (including the vcpu itself) can be a producer.

With this flag clearing being on a non-vcpu thread, you end-up with
two consumers, and things can go subtly wrong.

I'd suggest replacing this hunk with a comment saying that the request
will be cleared by the vcpu thread next time it enters the guest.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 1/9] KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL
  2022-11-02 16:23           ` Peter Xu
@ 2022-11-02 16:48             ` Marc Zyngier
  -1 siblings, 0 replies; 68+ messages in thread
From: Marc Zyngier @ 2022-11-02 16:48 UTC (permalink / raw)
  To: Peter Xu
  Cc: shuah, kvm, andrew.jones, dmatlack, will, shan.gavin, bgardon,
	kvmarm, pbonzini, zhenyzha, catalin.marinas, kvmarm, ajones

On Wed, 02 Nov 2022 16:23:16 +0000,
Peter Xu <peterx@redhat.com> wrote:
> 
> Might be slightly off-topic: I didn't quickly spot how do we guarantee two
> threads doing KVM_RUN ioctl on the same vcpu fd concurrently.  I know
> that's insane and could have corrupted things, but I just want to make sure
> e.g. even a malicious guest app won't be able to trigger host warnings.

In kvm_vcpu_ioctl():

	if (mutex_lock_killable(&vcpu->mutex)) <----- this
		return -EINTR;
	switch (ioctl) {
	case KVM_RUN: {
		struct pid *oldpid;
		r = -EINVAL;
		if (arg)

We simply don't allow two concurrent ioctls to the same vcpu, let
alone two KVM_RUN.

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 1/9] KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL
@ 2022-11-02 16:48             ` Marc Zyngier
  0 siblings, 0 replies; 68+ messages in thread
From: Marc Zyngier @ 2022-11-02 16:48 UTC (permalink / raw)
  To: Peter Xu
  Cc: Sean Christopherson, Gavin Shan, kvmarm, kvm, kvmarm,
	andrew.jones, ajones, bgardon, catalin.marinas, dmatlack, will,
	pbonzini, oliver.upton, james.morse, shuah, suzuki.poulose,
	alexandru.elisei, zhenyzha, shan.gavin

On Wed, 02 Nov 2022 16:23:16 +0000,
Peter Xu <peterx@redhat.com> wrote:
> 
> Might be slightly off-topic: I didn't quickly spot how do we guarantee two
> threads doing KVM_RUN ioctl on the same vcpu fd concurrently.  I know
> that's insane and could have corrupted things, but I just want to make sure
> e.g. even a malicious guest app won't be able to trigger host warnings.

In kvm_vcpu_ioctl():

	if (mutex_lock_killable(&vcpu->mutex)) <----- this
		return -EINTR;
	switch (ioctl) {
	case KVM_RUN: {
		struct pid *oldpid;
		r = -EINVAL;
		if (arg)

We simply don't allow two concurrent ioctls to the same vcpu, let
alone two KVM_RUN.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 5/9] KVM: arm64: Improve no-running-vcpu report for dirty ring
  2022-10-31 23:08       ` Gavin Shan
@ 2022-11-02 17:18         ` Marc Zyngier
  -1 siblings, 0 replies; 68+ messages in thread
From: Marc Zyngier @ 2022-11-02 17:18 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Oliver Upton, kvmarm, kvm, kvmarm, andrew.jones, ajones, bgardon,
	catalin.marinas, dmatlack, will, pbonzini, peterx, seanjc,
	james.morse, shuah, suzuki.poulose, alexandru.elisei, zhenyzha,
	shan.gavin

On Mon, 31 Oct 2022 23:08:32 +0000,
Gavin Shan <gshan@redhat.com> wrote:
> 
>
> In order to cover Marc's concern, I would introduce a different helper
> kvm_vgic_save_its_tables_in_progress(), which simply returns
> 'bool struct vgic_dist::save_its_tables_in_progress'. The newly added
> field is set and cleared in vgic_its_ctrl(). All these changes will be
> folded to PATCH[v7 6/9]. Oliver and Marc, could you please let me know
> if the changes sounds good?
> 
>    static int vgic_its_ctrl(struct kvm *kvm, struct vgic_its *its, u64 attr)
>    {
>        const struct vgic_its_abi *abi = vgic_its_get_abi(its);
>        struct vgic_dist *dist = &kvm->arch.vgic;
>        int ret = 0;
>          :
>        switch (attr) {
>        case KVM_DEV_ARM_ITS_CTRL_RESET:
>             vgic_its_reset(kvm, its);
>             break;
>        case KVM_DEV_ARM_ITS_SAVE_TABLES:
>             dist->save_its_tables_in_progress = true;
>             ret = abi->save_tables(its);
>             dist->save_its_tables_in_progress = false;
>             break;
>        case KVM_DEV_ARM_ITS_RESTORE_TABLES:
>             ret = abi->restore_tables(its);
>             break;
>        }
>        :
>     }

Yes, this is the sort of thing I had in mind. This should make the
whole patch rather trivial, and you could implement
kvm_arch_allow_write_without_running_vcpu() as returning this flag.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 5/9] KVM: arm64: Improve no-running-vcpu report for dirty ring
@ 2022-11-02 17:18         ` Marc Zyngier
  0 siblings, 0 replies; 68+ messages in thread
From: Marc Zyngier @ 2022-11-02 17:18 UTC (permalink / raw)
  To: Gavin Shan
  Cc: shuah, kvm, catalin.marinas, andrew.jones, dmatlack, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, will, kvmarm, ajones

On Mon, 31 Oct 2022 23:08:32 +0000,
Gavin Shan <gshan@redhat.com> wrote:
> 
>
> In order to cover Marc's concern, I would introduce a different helper
> kvm_vgic_save_its_tables_in_progress(), which simply returns
> 'bool struct vgic_dist::save_its_tables_in_progress'. The newly added
> field is set and cleared in vgic_its_ctrl(). All these changes will be
> folded to PATCH[v7 6/9]. Oliver and Marc, could you please let me know
> if the changes sounds good?
> 
>    static int vgic_its_ctrl(struct kvm *kvm, struct vgic_its *its, u64 attr)
>    {
>        const struct vgic_its_abi *abi = vgic_its_get_abi(its);
>        struct vgic_dist *dist = &kvm->arch.vgic;
>        int ret = 0;
>          :
>        switch (attr) {
>        case KVM_DEV_ARM_ITS_CTRL_RESET:
>             vgic_its_reset(kvm, its);
>             break;
>        case KVM_DEV_ARM_ITS_SAVE_TABLES:
>             dist->save_its_tables_in_progress = true;
>             ret = abi->save_tables(its);
>             dist->save_its_tables_in_progress = false;
>             break;
>        case KVM_DEV_ARM_ITS_RESTORE_TABLES:
>             ret = abi->restore_tables(its);
>             break;
>        }
>        :
>     }

Yes, this is the sort of thing I had in mind. This should make the
whole patch rather trivial, and you could implement
kvm_arch_allow_write_without_running_vcpu() as returning this flag.

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 1/9] KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL
  2022-11-02 16:44             ` Marc Zyngier
@ 2022-11-03  0:44               ` Gavin Shan
  -1 siblings, 0 replies; 68+ messages in thread
From: Gavin Shan @ 2022-11-03  0:44 UTC (permalink / raw)
  To: Marc Zyngier, Sean Christopherson
  Cc: shuah, kvm, catalin.marinas, andrew.jones, dmatlack, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, will, kvmarm, ajones

On 11/3/22 12:44 AM, Marc Zyngier wrote:
> On Wed, 02 Nov 2022 16:11:07 +0000,
> Sean Christopherson <seanjc@google.com> wrote:
>>
>> On Wed, Nov 02, 2022, Marc Zyngier wrote:
>>> On Wed, 02 Nov 2022 14:29:26 +0000, Peter Xu <peterx@redhat.com> wrote:
>>>> However I don't see anything stops a simple "race" to trigger like below:
>>>>
>>>>            recycle thread                   vcpu thread
>>>>            --------------                   -----------
>>>>        if (!dirty_ring_soft_full)                                   <--- not full
>>>>                                          dirty_ring_push();
>>>>                                          if (dirty_ring_soft_full)  <--- full due to the push
>>>>                                              set_request(SOFT_FULL);
>>>>            clear_request(SOFT_FULL);                                <--- can wrongly clear the request?
>>>>
>>>
>>> Hmmm, well spotted. That's another ugly effect of the recycle thread
>>> playing with someone else's toys.
>>>
>>>> But I don't think that's a huge matter, as it'll just let the vcpu to have
>>>> one more chance to do another round of KVM_RUN.  Normally I think it means
>>>> there can be one more dirty GFN (perhaps there're cases that it can push >1
>>>> gfns for one KVM_RUN cycle?  I never figured out the details here, but
>>>> still..) pushed to the ring so closer to the hard limit, but we have had a
>>>> buffer zone of KVM_DIRTY_RING_RSVD_ENTRIES (64) entries.  So I assume
>>>> that's still fine, but maybe worth a short comment here?
>>>>
>>>> I never know what's the maximum possible GFNs being dirtied for a KVM_RUN
>>>> cycle.  It would be good if there's an answer to that from anyone.
>>>
>>> This is dangerous, and I'd rather not go there.
>>>
>>> It is starting to look like we need the recycle thread to get out of
>>> the way. And to be honest:
>>>
>>> +	if (!kvm_dirty_ring_soft_full(ring))
>>> +		kvm_clear_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu);
>>>
>>> seems rather superfluous. Only clearing the flag in the vcpu entry
>>> path feels much saner, and I can't see anything that would break.
>>>
>>> Thoughts?
>>
>> I've no objections to dropping the clear on reset, I suggested it
>> primarily so that it would be easier to understand what action
>> causes the dirty ring to become not-full.  I agree that the explicit
>> clear is unnecessary from a functional perspective.
> 
> The core of the issue is that the whole request mechanism is a
> producer/consumer model, where consuming a request is a CLEAR
> action. The standard model is that the vcpu thread is the consumer,
> and that any thread (including the vcpu itself) can be a producer.
> 
> With this flag clearing being on a non-vcpu thread, you end-up with
> two consumers, and things can go subtly wrong.
> 
> I'd suggest replacing this hunk with a comment saying that the request
> will be cleared by the vcpu thread next time it enters the guest.
> 

Thanks, Marc. I will replace the hunk of code with the following
comments, as you suggested, in next respin.

     /*
      * The request KVM_REQ_DIRTY_RING_SOFT_FULL will be cleared
      * by the VCPU thread next time when it enters the guest.
      */

I will post v8 after Peter/Sean/Oliver take a look on [PATCH v7 4/9].
I think we're settled on other patches.

Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 1/9] KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL
@ 2022-11-03  0:44               ` Gavin Shan
  0 siblings, 0 replies; 68+ messages in thread
From: Gavin Shan @ 2022-11-03  0:44 UTC (permalink / raw)
  To: Marc Zyngier, Sean Christopherson
  Cc: Peter Xu, kvmarm, kvm, kvmarm, andrew.jones, ajones, bgardon,
	catalin.marinas, dmatlack, will, pbonzini, oliver.upton,
	james.morse, shuah, suzuki.poulose, alexandru.elisei, zhenyzha,
	shan.gavin

On 11/3/22 12:44 AM, Marc Zyngier wrote:
> On Wed, 02 Nov 2022 16:11:07 +0000,
> Sean Christopherson <seanjc@google.com> wrote:
>>
>> On Wed, Nov 02, 2022, Marc Zyngier wrote:
>>> On Wed, 02 Nov 2022 14:29:26 +0000, Peter Xu <peterx@redhat.com> wrote:
>>>> However I don't see anything stops a simple "race" to trigger like below:
>>>>
>>>>            recycle thread                   vcpu thread
>>>>            --------------                   -----------
>>>>        if (!dirty_ring_soft_full)                                   <--- not full
>>>>                                          dirty_ring_push();
>>>>                                          if (dirty_ring_soft_full)  <--- full due to the push
>>>>                                              set_request(SOFT_FULL);
>>>>            clear_request(SOFT_FULL);                                <--- can wrongly clear the request?
>>>>
>>>
>>> Hmmm, well spotted. That's another ugly effect of the recycle thread
>>> playing with someone else's toys.
>>>
>>>> But I don't think that's a huge matter, as it'll just let the vcpu to have
>>>> one more chance to do another round of KVM_RUN.  Normally I think it means
>>>> there can be one more dirty GFN (perhaps there're cases that it can push >1
>>>> gfns for one KVM_RUN cycle?  I never figured out the details here, but
>>>> still..) pushed to the ring so closer to the hard limit, but we have had a
>>>> buffer zone of KVM_DIRTY_RING_RSVD_ENTRIES (64) entries.  So I assume
>>>> that's still fine, but maybe worth a short comment here?
>>>>
>>>> I never know what's the maximum possible GFNs being dirtied for a KVM_RUN
>>>> cycle.  It would be good if there's an answer to that from anyone.
>>>
>>> This is dangerous, and I'd rather not go there.
>>>
>>> It is starting to look like we need the recycle thread to get out of
>>> the way. And to be honest:
>>>
>>> +	if (!kvm_dirty_ring_soft_full(ring))
>>> +		kvm_clear_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu);
>>>
>>> seems rather superfluous. Only clearing the flag in the vcpu entry
>>> path feels much saner, and I can't see anything that would break.
>>>
>>> Thoughts?
>>
>> I've no objections to dropping the clear on reset, I suggested it
>> primarily so that it would be easier to understand what action
>> causes the dirty ring to become not-full.  I agree that the explicit
>> clear is unnecessary from a functional perspective.
> 
> The core of the issue is that the whole request mechanism is a
> producer/consumer model, where consuming a request is a CLEAR
> action. The standard model is that the vcpu thread is the consumer,
> and that any thread (including the vcpu itself) can be a producer.
> 
> With this flag clearing being on a non-vcpu thread, you end-up with
> two consumers, and things can go subtly wrong.
> 
> I'd suggest replacing this hunk with a comment saying that the request
> will be cleared by the vcpu thread next time it enters the guest.
> 

Thanks, Marc. I will replace the hunk of code with the following
comments, as you suggested, in next respin.

     /*
      * The request KVM_REQ_DIRTY_RING_SOFT_FULL will be cleared
      * by the VCPU thread next time when it enters the guest.
      */

I will post v8 after Peter/Sean/Oliver take a look on [PATCH v7 4/9].
I think we're settled on other patches.

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 4/9] KVM: Support dirty ring in conjunction with bitmap
  2022-10-31  0:36   ` Gavin Shan
@ 2022-11-03 19:33     ` Peter Xu
  -1 siblings, 0 replies; 68+ messages in thread
From: Peter Xu @ 2022-11-03 19:33 UTC (permalink / raw)
  To: Gavin Shan
  Cc: shuah, catalin.marinas, kvm, maz, andrew.jones, dmatlack,
	shan.gavin, bgardon, kvmarm, pbonzini, zhenyzha, will, kvmarm,
	ajones

On Mon, Oct 31, 2022 at 08:36:16AM +0800, Gavin Shan wrote:
> ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
> enabled. It's conflicting with that ring-based dirty page tracking always
> requires a running VCPU context.
> 
> Introduce a new flavor of dirty ring that requires the use of both VCPU
> dirty rings and a dirty bitmap. The expectation is that for non-VCPU
> sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
> the dirty bitmap. Userspace should scan the dirty bitmap before migrating
> the VM to the target.
> 
> Use an additional capability to advertise this behavior. The newly added
> capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
> KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
> capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.
> 
> Suggested-by: Marc Zyngier <maz@kernel.org>
> Suggested-by: Peter Xu <peterx@redhat.com>
> Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> Signed-off-by: Gavin Shan <gshan@redhat.com>

Acked-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 4/9] KVM: Support dirty ring in conjunction with bitmap
@ 2022-11-03 19:33     ` Peter Xu
  0 siblings, 0 replies; 68+ messages in thread
From: Peter Xu @ 2022-11-03 19:33 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvmarm, kvm, kvmarm, andrew.jones, ajones, maz, bgardon,
	catalin.marinas, dmatlack, will, pbonzini, oliver.upton, seanjc,
	james.morse, shuah, suzuki.poulose, alexandru.elisei, zhenyzha,
	shan.gavin

On Mon, Oct 31, 2022 at 08:36:16AM +0800, Gavin Shan wrote:
> ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
> enabled. It's conflicting with that ring-based dirty page tracking always
> requires a running VCPU context.
> 
> Introduce a new flavor of dirty ring that requires the use of both VCPU
> dirty rings and a dirty bitmap. The expectation is that for non-VCPU
> sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
> the dirty bitmap. Userspace should scan the dirty bitmap before migrating
> the VM to the target.
> 
> Use an additional capability to advertise this behavior. The newly added
> capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
> KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
> capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.
> 
> Suggested-by: Marc Zyngier <maz@kernel.org>
> Suggested-by: Peter Xu <peterx@redhat.com>
> Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> Signed-off-by: Gavin Shan <gshan@redhat.com>

Acked-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 4/9] KVM: Support dirty ring in conjunction with bitmap
  2022-10-31  0:36   ` Gavin Shan
@ 2022-11-03 23:32     ` Oliver Upton
  -1 siblings, 0 replies; 68+ messages in thread
From: Oliver Upton @ 2022-11-03 23:32 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvmarm, kvm, kvmarm, andrew.jones, ajones, maz, bgardon,
	catalin.marinas, dmatlack, will, pbonzini, peterx, seanjc,
	james.morse, shuah, suzuki.poulose, alexandru.elisei, zhenyzha,
	shan.gavin

On Mon, Oct 31, 2022 at 08:36:16AM +0800, Gavin Shan wrote:
> ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
> enabled. It's conflicting with that ring-based dirty page tracking always
> requires a running VCPU context.
> 
> Introduce a new flavor of dirty ring that requires the use of both VCPU
> dirty rings and a dirty bitmap. The expectation is that for non-VCPU
> sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
> the dirty bitmap. Userspace should scan the dirty bitmap before migrating
> the VM to the target.
> 
> Use an additional capability to advertise this behavior. The newly added
> capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
> KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
> capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.

Whatever ordering requirements we settle on between these capabilities
needs to be documented as well.

[...]

> @@ -4588,6 +4594,13 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>  			return -EINVAL;
>  
>  		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
> +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
> +		if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
> +		    !kvm->dirty_ring_size)

I believe this ordering requirement is problematic, as it piles on top
of an existing problem w.r.t. KVM_CAP_DIRTY_LOG_RING v. memslot
creation.

Example:
 - Enable KVM_CAP_DIRTY_LOG_RING
 - Create some memslots w/ dirty logging enabled (note that the bitmap
   is _not_ allocated)
 - Enable KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
 - Save ITS tables and get a NULL dereference in
   mark_page_dirty_in_slot():

                if (vcpu && kvm->dirty_ring_size)
                        kvm_dirty_ring_push(&vcpu->dirty_ring,
                                            slot, rel_gfn);
                else
------->		set_bit_le(rel_gfn, memslot->dirty_bitmap);

Similarly, KVM may unnecessarily allocate bitmaps if dirty logging is
enabled on memslots before KVM_CAP_DIRTY_LOG_RING is enabled.

You could paper over this issue by disallowing DIRTY_RING_WITH_BITMAP if
DIRTY_LOG_RING has already been enabled, but the better approach would
be to explicitly check kvm_memslots_empty() such that the real
dependency is obvious. Peter, hadn't you mentioned something about
checking against memslots in an earlier revision?

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 4/9] KVM: Support dirty ring in conjunction with bitmap
@ 2022-11-03 23:32     ` Oliver Upton
  0 siblings, 0 replies; 68+ messages in thread
From: Oliver Upton @ 2022-11-03 23:32 UTC (permalink / raw)
  To: Gavin Shan
  Cc: shuah, catalin.marinas, kvm, maz, andrew.jones, dmatlack,
	shan.gavin, bgardon, kvmarm, pbonzini, zhenyzha, will, kvmarm,
	ajones

On Mon, Oct 31, 2022 at 08:36:16AM +0800, Gavin Shan wrote:
> ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
> enabled. It's conflicting with that ring-based dirty page tracking always
> requires a running VCPU context.
> 
> Introduce a new flavor of dirty ring that requires the use of both VCPU
> dirty rings and a dirty bitmap. The expectation is that for non-VCPU
> sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
> the dirty bitmap. Userspace should scan the dirty bitmap before migrating
> the VM to the target.
> 
> Use an additional capability to advertise this behavior. The newly added
> capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
> KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
> capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.

Whatever ordering requirements we settle on between these capabilities
needs to be documented as well.

[...]

> @@ -4588,6 +4594,13 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>  			return -EINVAL;
>  
>  		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
> +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
> +		if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
> +		    !kvm->dirty_ring_size)

I believe this ordering requirement is problematic, as it piles on top
of an existing problem w.r.t. KVM_CAP_DIRTY_LOG_RING v. memslot
creation.

Example:
 - Enable KVM_CAP_DIRTY_LOG_RING
 - Create some memslots w/ dirty logging enabled (note that the bitmap
   is _not_ allocated)
 - Enable KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
 - Save ITS tables and get a NULL dereference in
   mark_page_dirty_in_slot():

                if (vcpu && kvm->dirty_ring_size)
                        kvm_dirty_ring_push(&vcpu->dirty_ring,
                                            slot, rel_gfn);
                else
------->		set_bit_le(rel_gfn, memslot->dirty_bitmap);

Similarly, KVM may unnecessarily allocate bitmaps if dirty logging is
enabled on memslots before KVM_CAP_DIRTY_LOG_RING is enabled.

You could paper over this issue by disallowing DIRTY_RING_WITH_BITMAP if
DIRTY_LOG_RING has already been enabled, but the better approach would
be to explicitly check kvm_memslots_empty() such that the real
dependency is obvious. Peter, hadn't you mentioned something about
checking against memslots in an earlier revision?

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 4/9] KVM: Support dirty ring in conjunction with bitmap
  2022-11-03 23:32     ` Oliver Upton
@ 2022-11-04  0:12       ` Gavin Shan
  -1 siblings, 0 replies; 68+ messages in thread
From: Gavin Shan @ 2022-11-04  0:12 UTC (permalink / raw)
  To: Oliver Upton
  Cc: shuah, catalin.marinas, kvm, maz, andrew.jones, dmatlack,
	shan.gavin, bgardon, kvmarm, pbonzini, zhenyzha, will, kvmarm,
	ajones

Hi Oliver,

On 11/4/22 7:32 AM, Oliver Upton wrote:
> On Mon, Oct 31, 2022 at 08:36:16AM +0800, Gavin Shan wrote:
>> ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
>> enabled. It's conflicting with that ring-based dirty page tracking always
>> requires a running VCPU context.
>>
>> Introduce a new flavor of dirty ring that requires the use of both VCPU
>> dirty rings and a dirty bitmap. The expectation is that for non-VCPU
>> sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
>> the dirty bitmap. Userspace should scan the dirty bitmap before migrating
>> the VM to the target.
>>
>> Use an additional capability to advertise this behavior. The newly added
>> capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
>> KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
>> capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.
> 
> Whatever ordering requirements we settle on between these capabilities
> needs to be documented as well.
> 
> [...]
> 

It's mentioned in 'Documentation/virt/kvm/api.rst' as below.

   After using the dirty rings, the userspace needs to detect the capability
   of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures
   need to be backed by per-slot bitmaps. With this capability advertised
   and supported, it means the architecture can dirty guest pages without
   vcpu/ring context, so that some of the dirty information will still be
   maintained in the bitmap structure.

The description may be not obvious about the ordering. For this, I can
add the following sentence at end of the section.

   The capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP can't be enabled
   until the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL has been enabled.

>> @@ -4588,6 +4594,13 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>>   			return -EINVAL;
>>   
>>   		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
>> +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
>> +		if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
>> +		    !kvm->dirty_ring_size)
> 
> I believe this ordering requirement is problematic, as it piles on top
> of an existing problem w.r.t. KVM_CAP_DIRTY_LOG_RING v. memslot
> creation.
> 
> Example:
>   - Enable KVM_CAP_DIRTY_LOG_RING
>   - Create some memslots w/ dirty logging enabled (note that the bitmap
>     is _not_ allocated)
>   - Enable KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
>   - Save ITS tables and get a NULL dereference in
>     mark_page_dirty_in_slot():
> 
>                  if (vcpu && kvm->dirty_ring_size)
>                          kvm_dirty_ring_push(&vcpu->dirty_ring,
>                                              slot, rel_gfn);
>                  else
> ------->		set_bit_le(rel_gfn, memslot->dirty_bitmap);
> 
> Similarly, KVM may unnecessarily allocate bitmaps if dirty logging is
> enabled on memslots before KVM_CAP_DIRTY_LOG_RING is enabled.
> 
> You could paper over this issue by disallowing DIRTY_RING_WITH_BITMAP if
> DIRTY_LOG_RING has already been enabled, but the better approach would
> be to explicitly check kvm_memslots_empty() such that the real
> dependency is obvious. Peter, hadn't you mentioned something about
> checking against memslots in an earlier revision?
> 

The userspace (QEMU) needs to ensure that no dirty bitmap is created
before the capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP is enabled.
It's unknown by QEMU that vgic/its is used when KVM_CAP_DIRTY_LOG_RING_ACQ_REL
is enabled.

    kvm_initialization
      enable_KVM_CAP_DIRTY_LOG_RING_ACQ_REL        // Where KVM_CAP_DIRTY_LOG_RING is enabled
    board_initialization                           // Where QEMU knows if vgic/its is used
      add_memory_slots
    kvm_post_initialization
      enable_KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
    :
    start_migration
      enable_dirty_page_tracking
        create_dirty_bitmap                       // With KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP enabled

Thanks,
Gavin


_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 4/9] KVM: Support dirty ring in conjunction with bitmap
@ 2022-11-04  0:12       ` Gavin Shan
  0 siblings, 0 replies; 68+ messages in thread
From: Gavin Shan @ 2022-11-04  0:12 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, kvm, kvmarm, andrew.jones, ajones, maz, bgardon,
	catalin.marinas, dmatlack, will, pbonzini, peterx, seanjc,
	james.morse, shuah, suzuki.poulose, alexandru.elisei, zhenyzha,
	shan.gavin

Hi Oliver,

On 11/4/22 7:32 AM, Oliver Upton wrote:
> On Mon, Oct 31, 2022 at 08:36:16AM +0800, Gavin Shan wrote:
>> ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
>> enabled. It's conflicting with that ring-based dirty page tracking always
>> requires a running VCPU context.
>>
>> Introduce a new flavor of dirty ring that requires the use of both VCPU
>> dirty rings and a dirty bitmap. The expectation is that for non-VCPU
>> sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
>> the dirty bitmap. Userspace should scan the dirty bitmap before migrating
>> the VM to the target.
>>
>> Use an additional capability to advertise this behavior. The newly added
>> capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
>> KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
>> capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.
> 
> Whatever ordering requirements we settle on between these capabilities
> needs to be documented as well.
> 
> [...]
> 

It's mentioned in 'Documentation/virt/kvm/api.rst' as below.

   After using the dirty rings, the userspace needs to detect the capability
   of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures
   need to be backed by per-slot bitmaps. With this capability advertised
   and supported, it means the architecture can dirty guest pages without
   vcpu/ring context, so that some of the dirty information will still be
   maintained in the bitmap structure.

The description may be not obvious about the ordering. For this, I can
add the following sentence at end of the section.

   The capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP can't be enabled
   until the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL has been enabled.

>> @@ -4588,6 +4594,13 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>>   			return -EINVAL;
>>   
>>   		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
>> +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
>> +		if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
>> +		    !kvm->dirty_ring_size)
> 
> I believe this ordering requirement is problematic, as it piles on top
> of an existing problem w.r.t. KVM_CAP_DIRTY_LOG_RING v. memslot
> creation.
> 
> Example:
>   - Enable KVM_CAP_DIRTY_LOG_RING
>   - Create some memslots w/ dirty logging enabled (note that the bitmap
>     is _not_ allocated)
>   - Enable KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
>   - Save ITS tables and get a NULL dereference in
>     mark_page_dirty_in_slot():
> 
>                  if (vcpu && kvm->dirty_ring_size)
>                          kvm_dirty_ring_push(&vcpu->dirty_ring,
>                                              slot, rel_gfn);
>                  else
> ------->		set_bit_le(rel_gfn, memslot->dirty_bitmap);
> 
> Similarly, KVM may unnecessarily allocate bitmaps if dirty logging is
> enabled on memslots before KVM_CAP_DIRTY_LOG_RING is enabled.
> 
> You could paper over this issue by disallowing DIRTY_RING_WITH_BITMAP if
> DIRTY_LOG_RING has already been enabled, but the better approach would
> be to explicitly check kvm_memslots_empty() such that the real
> dependency is obvious. Peter, hadn't you mentioned something about
> checking against memslots in an earlier revision?
> 

The userspace (QEMU) needs to ensure that no dirty bitmap is created
before the capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP is enabled.
It's unknown by QEMU that vgic/its is used when KVM_CAP_DIRTY_LOG_RING_ACQ_REL
is enabled.

    kvm_initialization
      enable_KVM_CAP_DIRTY_LOG_RING_ACQ_REL        // Where KVM_CAP_DIRTY_LOG_RING is enabled
    board_initialization                           // Where QEMU knows if vgic/its is used
      add_memory_slots
    kvm_post_initialization
      enable_KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
    :
    start_migration
      enable_dirty_page_tracking
        create_dirty_bitmap                       // With KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP enabled

Thanks,
Gavin



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 4/9] KVM: Support dirty ring in conjunction with bitmap
  2022-11-04  0:12       ` Gavin Shan
@ 2022-11-04  1:06         ` Oliver Upton
  -1 siblings, 0 replies; 68+ messages in thread
From: Oliver Upton @ 2022-11-04  1:06 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvmarm, kvm, kvmarm, andrew.jones, ajones, maz, bgardon,
	catalin.marinas, dmatlack, will, pbonzini, peterx, seanjc,
	james.morse, shuah, suzuki.poulose, alexandru.elisei, zhenyzha,
	shan.gavin

On Fri, Nov 04, 2022 at 08:12:21AM +0800, Gavin Shan wrote:
> Hi Oliver,
> 
> On 11/4/22 7:32 AM, Oliver Upton wrote:
> > On Mon, Oct 31, 2022 at 08:36:16AM +0800, Gavin Shan wrote:
> > > ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
> > > enabled. It's conflicting with that ring-based dirty page tracking always
> > > requires a running VCPU context.
> > > 
> > > Introduce a new flavor of dirty ring that requires the use of both VCPU
> > > dirty rings and a dirty bitmap. The expectation is that for non-VCPU
> > > sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
> > > the dirty bitmap. Userspace should scan the dirty bitmap before migrating
> > > the VM to the target.
> > > 
> > > Use an additional capability to advertise this behavior. The newly added
> > > capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
> > > KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
> > > capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.
> > 
> > Whatever ordering requirements we settle on between these capabilities
> > needs to be documented as well.
> > 
> > [...]
> > 
> 
> It's mentioned in 'Documentation/virt/kvm/api.rst' as below.
> 
>   After using the dirty rings, the userspace needs to detect the capability
>   of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures
>   need to be backed by per-slot bitmaps. With this capability advertised
>   and supported, it means the architecture can dirty guest pages without
>   vcpu/ring context, so that some of the dirty information will still be
>   maintained in the bitmap structure.
> 
> The description may be not obvious about the ordering. For this, I can
> add the following sentence at end of the section.
> 
>   The capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP can't be enabled
>   until the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL has been enabled.
> 
> > > @@ -4588,6 +4594,13 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
> > >   			return -EINVAL;
> > >   		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
> > > +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
> > > +		if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
> > > +		    !kvm->dirty_ring_size)
> > 
> > I believe this ordering requirement is problematic, as it piles on top
> > of an existing problem w.r.t. KVM_CAP_DIRTY_LOG_RING v. memslot
> > creation.
> > 
> > Example:
> >   - Enable KVM_CAP_DIRTY_LOG_RING
> >   - Create some memslots w/ dirty logging enabled (note that the bitmap
> >     is _not_ allocated)
> >   - Enable KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
> >   - Save ITS tables and get a NULL dereference in
> >     mark_page_dirty_in_slot():
> > 
> >                  if (vcpu && kvm->dirty_ring_size)
> >                          kvm_dirty_ring_push(&vcpu->dirty_ring,
> >                                              slot, rel_gfn);
> >                  else
> > ------->		set_bit_le(rel_gfn, memslot->dirty_bitmap);
> > 
> > Similarly, KVM may unnecessarily allocate bitmaps if dirty logging is
> > enabled on memslots before KVM_CAP_DIRTY_LOG_RING is enabled.
> > 
> > You could paper over this issue by disallowing DIRTY_RING_WITH_BITMAP if
> > DIRTY_LOG_RING has already been enabled, but the better approach would
> > be to explicitly check kvm_memslots_empty() such that the real
> > dependency is obvious. Peter, hadn't you mentioned something about
> > checking against memslots in an earlier revision?
> > 
> 
> The userspace (QEMU) needs to ensure that no dirty bitmap is created
> before the capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP is enabled.
> It's unknown by QEMU that vgic/its is used when KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> is enabled.

I'm not worried about what QEMU (or any particular VMM for that matter)
does with the UAPI. The problem is this patch provides a trivial way for
userspace to cause a NULL dereference in the kernel. Imposing ordering
between the cap and memslot creation avoids the problem altogether.

So, looking at your example:

>    kvm_initialization
>      enable_KVM_CAP_DIRTY_LOG_RING_ACQ_REL        // Where KVM_CAP_DIRTY_LOG_RING is enabled
>    board_initialization                           // Where QEMU knows if vgic/its is used

Is it possible that QEMU could hoist enabling RING_WITH_BITMAP here?
Based on your description QEMU has decided to use the vGIC ITS but
hasn't yet added any memslots.

>      add_memory_slots
>    kvm_post_initialization
>      enable_KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
>    :
>    start_migration
>      enable_dirty_page_tracking
>        create_dirty_bitmap                       // With KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP enabled

Just to make sure we're on the same page, there's two issues:

 (1) If DIRTY_LOG_RING is enabled before memslot creation and
     RING_WITH_BITMAP is enabled after memslots have been created w/
     dirty logging enabled, memslot->dirty_bitmap == NULL and the
     kernel will fault when attempting to save the ITS tables.

 (2) Not your code, but a similar issue. If DIRTY_LOG_RING[_ACQ_REL] is
     enabled after memslots have been created w/ dirty logging enabled,
     memslot->dirty_bitmap != NULL and that memory is wasted until the
     memslot is freed.

I don't expect you to fix #2, though I've mentioned it because using the
same approach to #1 and #2 would be nice.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 4/9] KVM: Support dirty ring in conjunction with bitmap
@ 2022-11-04  1:06         ` Oliver Upton
  0 siblings, 0 replies; 68+ messages in thread
From: Oliver Upton @ 2022-11-04  1:06 UTC (permalink / raw)
  To: Gavin Shan
  Cc: shuah, catalin.marinas, kvm, maz, andrew.jones, dmatlack,
	shan.gavin, bgardon, kvmarm, pbonzini, zhenyzha, will, kvmarm,
	ajones

On Fri, Nov 04, 2022 at 08:12:21AM +0800, Gavin Shan wrote:
> Hi Oliver,
> 
> On 11/4/22 7:32 AM, Oliver Upton wrote:
> > On Mon, Oct 31, 2022 at 08:36:16AM +0800, Gavin Shan wrote:
> > > ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
> > > enabled. It's conflicting with that ring-based dirty page tracking always
> > > requires a running VCPU context.
> > > 
> > > Introduce a new flavor of dirty ring that requires the use of both VCPU
> > > dirty rings and a dirty bitmap. The expectation is that for non-VCPU
> > > sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
> > > the dirty bitmap. Userspace should scan the dirty bitmap before migrating
> > > the VM to the target.
> > > 
> > > Use an additional capability to advertise this behavior. The newly added
> > > capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
> > > KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
> > > capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.
> > 
> > Whatever ordering requirements we settle on between these capabilities
> > needs to be documented as well.
> > 
> > [...]
> > 
> 
> It's mentioned in 'Documentation/virt/kvm/api.rst' as below.
> 
>   After using the dirty rings, the userspace needs to detect the capability
>   of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures
>   need to be backed by per-slot bitmaps. With this capability advertised
>   and supported, it means the architecture can dirty guest pages without
>   vcpu/ring context, so that some of the dirty information will still be
>   maintained in the bitmap structure.
> 
> The description may be not obvious about the ordering. For this, I can
> add the following sentence at end of the section.
> 
>   The capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP can't be enabled
>   until the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL has been enabled.
> 
> > > @@ -4588,6 +4594,13 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
> > >   			return -EINVAL;
> > >   		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
> > > +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
> > > +		if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
> > > +		    !kvm->dirty_ring_size)
> > 
> > I believe this ordering requirement is problematic, as it piles on top
> > of an existing problem w.r.t. KVM_CAP_DIRTY_LOG_RING v. memslot
> > creation.
> > 
> > Example:
> >   - Enable KVM_CAP_DIRTY_LOG_RING
> >   - Create some memslots w/ dirty logging enabled (note that the bitmap
> >     is _not_ allocated)
> >   - Enable KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
> >   - Save ITS tables and get a NULL dereference in
> >     mark_page_dirty_in_slot():
> > 
> >                  if (vcpu && kvm->dirty_ring_size)
> >                          kvm_dirty_ring_push(&vcpu->dirty_ring,
> >                                              slot, rel_gfn);
> >                  else
> > ------->		set_bit_le(rel_gfn, memslot->dirty_bitmap);
> > 
> > Similarly, KVM may unnecessarily allocate bitmaps if dirty logging is
> > enabled on memslots before KVM_CAP_DIRTY_LOG_RING is enabled.
> > 
> > You could paper over this issue by disallowing DIRTY_RING_WITH_BITMAP if
> > DIRTY_LOG_RING has already been enabled, but the better approach would
> > be to explicitly check kvm_memslots_empty() such that the real
> > dependency is obvious. Peter, hadn't you mentioned something about
> > checking against memslots in an earlier revision?
> > 
> 
> The userspace (QEMU) needs to ensure that no dirty bitmap is created
> before the capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP is enabled.
> It's unknown by QEMU that vgic/its is used when KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> is enabled.

I'm not worried about what QEMU (or any particular VMM for that matter)
does with the UAPI. The problem is this patch provides a trivial way for
userspace to cause a NULL dereference in the kernel. Imposing ordering
between the cap and memslot creation avoids the problem altogether.

So, looking at your example:

>    kvm_initialization
>      enable_KVM_CAP_DIRTY_LOG_RING_ACQ_REL        // Where KVM_CAP_DIRTY_LOG_RING is enabled
>    board_initialization                           // Where QEMU knows if vgic/its is used

Is it possible that QEMU could hoist enabling RING_WITH_BITMAP here?
Based on your description QEMU has decided to use the vGIC ITS but
hasn't yet added any memslots.

>      add_memory_slots
>    kvm_post_initialization
>      enable_KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
>    :
>    start_migration
>      enable_dirty_page_tracking
>        create_dirty_bitmap                       // With KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP enabled

Just to make sure we're on the same page, there's two issues:

 (1) If DIRTY_LOG_RING is enabled before memslot creation and
     RING_WITH_BITMAP is enabled after memslots have been created w/
     dirty logging enabled, memslot->dirty_bitmap == NULL and the
     kernel will fault when attempting to save the ITS tables.

 (2) Not your code, but a similar issue. If DIRTY_LOG_RING[_ACQ_REL] is
     enabled after memslots have been created w/ dirty logging enabled,
     memslot->dirty_bitmap != NULL and that memory is wasted until the
     memslot is freed.

I don't expect you to fix #2, though I've mentioned it because using the
same approach to #1 and #2 would be nice.

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 4/9] KVM: Support dirty ring in conjunction with bitmap
  2022-11-04  1:06         ` Oliver Upton
@ 2022-11-04  6:57           ` Gavin Shan
  -1 siblings, 0 replies; 68+ messages in thread
From: Gavin Shan @ 2022-11-04  6:57 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, kvm, kvmarm, andrew.jones, ajones, maz, bgardon,
	catalin.marinas, dmatlack, will, pbonzini, peterx, seanjc,
	james.morse, shuah, suzuki.poulose, alexandru.elisei, zhenyzha,
	shan.gavin

Hi Oliver,

On 11/4/22 9:06 AM, Oliver Upton wrote:
> On Fri, Nov 04, 2022 at 08:12:21AM +0800, Gavin Shan wrote:
>> On 11/4/22 7:32 AM, Oliver Upton wrote:
>>> On Mon, Oct 31, 2022 at 08:36:16AM +0800, Gavin Shan wrote:
>>>> ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
>>>> enabled. It's conflicting with that ring-based dirty page tracking always
>>>> requires a running VCPU context.
>>>>
>>>> Introduce a new flavor of dirty ring that requires the use of both VCPU
>>>> dirty rings and a dirty bitmap. The expectation is that for non-VCPU
>>>> sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
>>>> the dirty bitmap. Userspace should scan the dirty bitmap before migrating
>>>> the VM to the target.
>>>>
>>>> Use an additional capability to advertise this behavior. The newly added
>>>> capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
>>>> KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
>>>> capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.
>>>
>>> Whatever ordering requirements we settle on between these capabilities
>>> needs to be documented as well.
>>>
>>> [...]
>>>
>>
>> It's mentioned in 'Documentation/virt/kvm/api.rst' as below.
>>
>>    After using the dirty rings, the userspace needs to detect the capability
>>    of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures
>>    need to be backed by per-slot bitmaps. With this capability advertised
>>    and supported, it means the architecture can dirty guest pages without
>>    vcpu/ring context, so that some of the dirty information will still be
>>    maintained in the bitmap structure.
>>
>> The description may be not obvious about the ordering. For this, I can
>> add the following sentence at end of the section.
>>
>>    The capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP can't be enabled
>>    until the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL has been enabled.
>>
>>>> @@ -4588,6 +4594,13 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>>>>    			return -EINVAL;
>>>>    		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
>>>> +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
>>>> +		if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
>>>> +		    !kvm->dirty_ring_size)
>>>
>>> I believe this ordering requirement is problematic, as it piles on top
>>> of an existing problem w.r.t. KVM_CAP_DIRTY_LOG_RING v. memslot
>>> creation.
>>>
>>> Example:
>>>    - Enable KVM_CAP_DIRTY_LOG_RING
>>>    - Create some memslots w/ dirty logging enabled (note that the bitmap
>>>      is _not_ allocated)
>>>    - Enable KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
>>>    - Save ITS tables and get a NULL dereference in
>>>      mark_page_dirty_in_slot():
>>>
>>>                   if (vcpu && kvm->dirty_ring_size)
>>>                           kvm_dirty_ring_push(&vcpu->dirty_ring,
>>>                                               slot, rel_gfn);
>>>                   else
>>> ------->		set_bit_le(rel_gfn, memslot->dirty_bitmap);
>>>
>>> Similarly, KVM may unnecessarily allocate bitmaps if dirty logging is
>>> enabled on memslots before KVM_CAP_DIRTY_LOG_RING is enabled.
>>>
>>> You could paper over this issue by disallowing DIRTY_RING_WITH_BITMAP if
>>> DIRTY_LOG_RING has already been enabled, but the better approach would
>>> be to explicitly check kvm_memslots_empty() such that the real
>>> dependency is obvious. Peter, hadn't you mentioned something about
>>> checking against memslots in an earlier revision?
>>>
>>
>> The userspace (QEMU) needs to ensure that no dirty bitmap is created
>> before the capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP is enabled.
>> It's unknown by QEMU that vgic/its is used when KVM_CAP_DIRTY_LOG_RING_ACQ_REL
>> is enabled.
> 
> I'm not worried about what QEMU (or any particular VMM for that matter)
> does with the UAPI. The problem is this patch provides a trivial way for
> userspace to cause a NULL dereference in the kernel. Imposing ordering
> between the cap and memslot creation avoids the problem altogether.
> 
> So, looking at your example:
> 
>>     kvm_initialization
>>       enable_KVM_CAP_DIRTY_LOG_RING_ACQ_REL        // Where KVM_CAP_DIRTY_LOG_RING is enabled
>>     board_initialization                           // Where QEMU knows if vgic/its is used
> 
> Is it possible that QEMU could hoist enabling RING_WITH_BITMAP here?
> Based on your description QEMU has decided to use the vGIC ITS but
> hasn't yet added any memslots.
> 

It's possible to add ARM specific helper kvm_arm_enable_dirty_ring_with_bitmap()
in qemu/target/arm.c, to enable RING_WITH_BITMAP if needed. The newly added
function can be called here when vgic/its is used.

>>       add_memory_slots
>>     kvm_post_initialization
>>       enable_KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
>>     :
>>     start_migration
>>       enable_dirty_page_tracking
>>         create_dirty_bitmap                       // With KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP enabled
> 
> Just to make sure we're on the same page, there's two issues:
> 
>   (1) If DIRTY_LOG_RING is enabled before memslot creation and
>       RING_WITH_BITMAP is enabled after memslots have been created w/
>       dirty logging enabled, memslot->dirty_bitmap == NULL and the
>       kernel will fault when attempting to save the ITS tables.
> 
>   (2) Not your code, but a similar issue. If DIRTY_LOG_RING[_ACQ_REL] is
>       enabled after memslots have been created w/ dirty logging enabled,
>       memslot->dirty_bitmap != NULL and that memory is wasted until the
>       memslot is freed.
> 
> I don't expect you to fix #2, though I've mentioned it because using the
> same approach to #1 and #2 would be nice.
> 

Yes, I got your points. Case (2) is still possible to happen with QEMU
excluded. However, QEMU is always enabling DIRTY_LOG_RING[_ACQ_REL] before
any memory slot is created. I agree that we need to ensure there are no
memory slots when DIRTY_LOG_RING[_ACQ_REL] is enabled.

For case (1), we can ensure RING_WTIH_BITMAP is enabled before any memory
slot is added, as below. QEMU needs a new helper (as above) to enable it
on board's level.

Lets fix both with a new helper in PATCH[v8 4/9] like below?

   static inline bool kvm_vm_has_memslot_pages(struct kvm *kvm)
   {
       bool has_memslot_pages;

       mutex_lock(&kvm->slots_lock);

       has_memslot_pages = !!kvm->nr_memslot_pages;

       mutex_unlock(&kvm->slots_lock);

       return has_memslot_pages;
   }

   static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
                                              struct kvm_enable_cap *cap)
   {
       :
       switch (cap->cap) {
       case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
           if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
               !kvm->dirty_ring_size || kvm_vm_has_memslot_pages(kvm))
               return -EINVAL;

           kvm->dirty_ring_with_bitmap = true;

           return 0;
       default:
           return kvm_vm_ioctl_enable_cap(kvm, cap);
       }
   }

   static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 size)
   {
       :
       /* We only allow it to set once */
       if (kvm->dirty_ring_size)
           return -EINVAL;

       if (kvm_vm_has_memslot_pages(kvm))
           return -EINVAL;
       :
   }

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 4/9] KVM: Support dirty ring in conjunction with bitmap
@ 2022-11-04  6:57           ` Gavin Shan
  0 siblings, 0 replies; 68+ messages in thread
From: Gavin Shan @ 2022-11-04  6:57 UTC (permalink / raw)
  To: Oliver Upton
  Cc: shuah, catalin.marinas, kvm, maz, andrew.jones, dmatlack,
	shan.gavin, bgardon, kvmarm, pbonzini, zhenyzha, will, kvmarm,
	ajones

Hi Oliver,

On 11/4/22 9:06 AM, Oliver Upton wrote:
> On Fri, Nov 04, 2022 at 08:12:21AM +0800, Gavin Shan wrote:
>> On 11/4/22 7:32 AM, Oliver Upton wrote:
>>> On Mon, Oct 31, 2022 at 08:36:16AM +0800, Gavin Shan wrote:
>>>> ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
>>>> enabled. It's conflicting with that ring-based dirty page tracking always
>>>> requires a running VCPU context.
>>>>
>>>> Introduce a new flavor of dirty ring that requires the use of both VCPU
>>>> dirty rings and a dirty bitmap. The expectation is that for non-VCPU
>>>> sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
>>>> the dirty bitmap. Userspace should scan the dirty bitmap before migrating
>>>> the VM to the target.
>>>>
>>>> Use an additional capability to advertise this behavior. The newly added
>>>> capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
>>>> KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
>>>> capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.
>>>
>>> Whatever ordering requirements we settle on between these capabilities
>>> needs to be documented as well.
>>>
>>> [...]
>>>
>>
>> It's mentioned in 'Documentation/virt/kvm/api.rst' as below.
>>
>>    After using the dirty rings, the userspace needs to detect the capability
>>    of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures
>>    need to be backed by per-slot bitmaps. With this capability advertised
>>    and supported, it means the architecture can dirty guest pages without
>>    vcpu/ring context, so that some of the dirty information will still be
>>    maintained in the bitmap structure.
>>
>> The description may be not obvious about the ordering. For this, I can
>> add the following sentence at end of the section.
>>
>>    The capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP can't be enabled
>>    until the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL has been enabled.
>>
>>>> @@ -4588,6 +4594,13 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>>>>    			return -EINVAL;
>>>>    		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
>>>> +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
>>>> +		if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
>>>> +		    !kvm->dirty_ring_size)
>>>
>>> I believe this ordering requirement is problematic, as it piles on top
>>> of an existing problem w.r.t. KVM_CAP_DIRTY_LOG_RING v. memslot
>>> creation.
>>>
>>> Example:
>>>    - Enable KVM_CAP_DIRTY_LOG_RING
>>>    - Create some memslots w/ dirty logging enabled (note that the bitmap
>>>      is _not_ allocated)
>>>    - Enable KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
>>>    - Save ITS tables and get a NULL dereference in
>>>      mark_page_dirty_in_slot():
>>>
>>>                   if (vcpu && kvm->dirty_ring_size)
>>>                           kvm_dirty_ring_push(&vcpu->dirty_ring,
>>>                                               slot, rel_gfn);
>>>                   else
>>> ------->		set_bit_le(rel_gfn, memslot->dirty_bitmap);
>>>
>>> Similarly, KVM may unnecessarily allocate bitmaps if dirty logging is
>>> enabled on memslots before KVM_CAP_DIRTY_LOG_RING is enabled.
>>>
>>> You could paper over this issue by disallowing DIRTY_RING_WITH_BITMAP if
>>> DIRTY_LOG_RING has already been enabled, but the better approach would
>>> be to explicitly check kvm_memslots_empty() such that the real
>>> dependency is obvious. Peter, hadn't you mentioned something about
>>> checking against memslots in an earlier revision?
>>>
>>
>> The userspace (QEMU) needs to ensure that no dirty bitmap is created
>> before the capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP is enabled.
>> It's unknown by QEMU that vgic/its is used when KVM_CAP_DIRTY_LOG_RING_ACQ_REL
>> is enabled.
> 
> I'm not worried about what QEMU (or any particular VMM for that matter)
> does with the UAPI. The problem is this patch provides a trivial way for
> userspace to cause a NULL dereference in the kernel. Imposing ordering
> between the cap and memslot creation avoids the problem altogether.
> 
> So, looking at your example:
> 
>>     kvm_initialization
>>       enable_KVM_CAP_DIRTY_LOG_RING_ACQ_REL        // Where KVM_CAP_DIRTY_LOG_RING is enabled
>>     board_initialization                           // Where QEMU knows if vgic/its is used
> 
> Is it possible that QEMU could hoist enabling RING_WITH_BITMAP here?
> Based on your description QEMU has decided to use the vGIC ITS but
> hasn't yet added any memslots.
> 

It's possible to add ARM specific helper kvm_arm_enable_dirty_ring_with_bitmap()
in qemu/target/arm.c, to enable RING_WITH_BITMAP if needed. The newly added
function can be called here when vgic/its is used.

>>       add_memory_slots
>>     kvm_post_initialization
>>       enable_KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
>>     :
>>     start_migration
>>       enable_dirty_page_tracking
>>         create_dirty_bitmap                       // With KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP enabled
> 
> Just to make sure we're on the same page, there's two issues:
> 
>   (1) If DIRTY_LOG_RING is enabled before memslot creation and
>       RING_WITH_BITMAP is enabled after memslots have been created w/
>       dirty logging enabled, memslot->dirty_bitmap == NULL and the
>       kernel will fault when attempting to save the ITS tables.
> 
>   (2) Not your code, but a similar issue. If DIRTY_LOG_RING[_ACQ_REL] is
>       enabled after memslots have been created w/ dirty logging enabled,
>       memslot->dirty_bitmap != NULL and that memory is wasted until the
>       memslot is freed.
> 
> I don't expect you to fix #2, though I've mentioned it because using the
> same approach to #1 and #2 would be nice.
> 

Yes, I got your points. Case (2) is still possible to happen with QEMU
excluded. However, QEMU is always enabling DIRTY_LOG_RING[_ACQ_REL] before
any memory slot is created. I agree that we need to ensure there are no
memory slots when DIRTY_LOG_RING[_ACQ_REL] is enabled.

For case (1), we can ensure RING_WTIH_BITMAP is enabled before any memory
slot is added, as below. QEMU needs a new helper (as above) to enable it
on board's level.

Lets fix both with a new helper in PATCH[v8 4/9] like below?

   static inline bool kvm_vm_has_memslot_pages(struct kvm *kvm)
   {
       bool has_memslot_pages;

       mutex_lock(&kvm->slots_lock);

       has_memslot_pages = !!kvm->nr_memslot_pages;

       mutex_unlock(&kvm->slots_lock);

       return has_memslot_pages;
   }

   static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
                                              struct kvm_enable_cap *cap)
   {
       :
       switch (cap->cap) {
       case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
           if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
               !kvm->dirty_ring_size || kvm_vm_has_memslot_pages(kvm))
               return -EINVAL;

           kvm->dirty_ring_with_bitmap = true;

           return 0;
       default:
           return kvm_vm_ioctl_enable_cap(kvm, cap);
       }
   }

   static int kvm_vm_ioctl_enable_dirty_log_ring(struct kvm *kvm, u32 size)
   {
       :
       /* We only allow it to set once */
       if (kvm->dirty_ring_size)
           return -EINVAL;

       if (kvm_vm_has_memslot_pages(kvm))
           return -EINVAL;
       :
   }

Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 4/9] KVM: Support dirty ring in conjunction with bitmap
  2022-11-04  6:57           ` Gavin Shan
@ 2022-11-04 20:12             ` Oliver Upton
  -1 siblings, 0 replies; 68+ messages in thread
From: Oliver Upton @ 2022-11-04 20:12 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvmarm, kvm, kvmarm, andrew.jones, ajones, maz, bgardon,
	catalin.marinas, dmatlack, will, pbonzini, peterx, seanjc,
	james.morse, shuah, suzuki.poulose, alexandru.elisei, zhenyzha,
	shan.gavin

On Fri, Nov 04, 2022 at 02:57:15PM +0800, Gavin Shan wrote:
> On 11/4/22 9:06 AM, Oliver Upton wrote:

[...]

> > Just to make sure we're on the same page, there's two issues:
> > 
> >   (1) If DIRTY_LOG_RING is enabled before memslot creation and
> >       RING_WITH_BITMAP is enabled after memslots have been created w/
> >       dirty logging enabled, memslot->dirty_bitmap == NULL and the
> >       kernel will fault when attempting to save the ITS tables.
> > 
> >   (2) Not your code, but a similar issue. If DIRTY_LOG_RING[_ACQ_REL] is
> >       enabled after memslots have been created w/ dirty logging enabled,
> >       memslot->dirty_bitmap != NULL and that memory is wasted until the
> >       memslot is freed.
> > 
> > I don't expect you to fix #2, though I've mentioned it because using the
> > same approach to #1 and #2 would be nice.
> > 
> 
> Yes, I got your points. Case (2) is still possible to happen with QEMU
> excluded. However, QEMU is always enabling DIRTY_LOG_RING[_ACQ_REL] before
> any memory slot is created. I agree that we need to ensure there are no
> memory slots when DIRTY_LOG_RING[_ACQ_REL] is enabled.
> 
> For case (1), we can ensure RING_WTIH_BITMAP is enabled before any memory
> slot is added, as below. QEMU needs a new helper (as above) to enable it
> on board's level.
> 
> Lets fix both with a new helper in PATCH[v8 4/9] like below?

I agree that we should address (1) like this, but in (2) requiring that
no memslots were created before enabling the existing capabilities would
be a change in ABI. If we can get away with that, great, but otherwise
we may need to delete the bitmaps associated with all memslots when the
cap is enabled.

>   static inline bool kvm_vm_has_memslot_pages(struct kvm *kvm)
>   {
>       bool has_memslot_pages;
> 
>       mutex_lock(&kvm->slots_lock);
> 
>       has_memslot_pages = !!kvm->nr_memslot_pages;
> 
>       mutex_unlock(&kvm->slots_lock);
> 
>       return has_memslot_pages;
>   }

Do we need to build another helper for this? kvm_memslots_empty() will
tell you whether or not a memslot has been created by checking the gfn
tree.

On top of that, the memslot check and setting
kvm->dirty_ring_with_bitmap must happen behind the slots_lock. Otherwise
you could still wind up creating memslots w/o bitmaps.


Something like:


diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 91cf51a25394..420cc101a16e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4588,6 +4588,32 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
 			return -EINVAL;
 
 		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
+
+	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP: {
+		struct kvm_memslots *slots;
+		int r = -EINVAL;
+
+		if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
+		    !kvm->dirty_ring_size)
+			return r;
+
+		mutex_lock(&kvm->slots_lock);
+
+		slots = kvm_memslots(kvm);
+
+		/*
+		 * Avoid a race between memslot creation and enabling the ring +
+		 * bitmap capability to guarantee that no memslots have been
+		 * created without a bitmap.
+		 */
+		if (kvm_memslots_empty(slots)) {
+			kvm->dirty_ring_with_bitmap = cap->args[0];
+			r = 0;
+		}
+
+		mutex_unlock(&kvm->slots_lock);
+		return r;
+	}
 	default:
 		return kvm_vm_ioctl_enable_cap(kvm, cap);
 	}

--
Thanks,
Oliver

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 4/9] KVM: Support dirty ring in conjunction with bitmap
@ 2022-11-04 20:12             ` Oliver Upton
  0 siblings, 0 replies; 68+ messages in thread
From: Oliver Upton @ 2022-11-04 20:12 UTC (permalink / raw)
  To: Gavin Shan
  Cc: shuah, catalin.marinas, kvm, maz, andrew.jones, dmatlack,
	shan.gavin, bgardon, kvmarm, pbonzini, zhenyzha, will, kvmarm,
	ajones

On Fri, Nov 04, 2022 at 02:57:15PM +0800, Gavin Shan wrote:
> On 11/4/22 9:06 AM, Oliver Upton wrote:

[...]

> > Just to make sure we're on the same page, there's two issues:
> > 
> >   (1) If DIRTY_LOG_RING is enabled before memslot creation and
> >       RING_WITH_BITMAP is enabled after memslots have been created w/
> >       dirty logging enabled, memslot->dirty_bitmap == NULL and the
> >       kernel will fault when attempting to save the ITS tables.
> > 
> >   (2) Not your code, but a similar issue. If DIRTY_LOG_RING[_ACQ_REL] is
> >       enabled after memslots have been created w/ dirty logging enabled,
> >       memslot->dirty_bitmap != NULL and that memory is wasted until the
> >       memslot is freed.
> > 
> > I don't expect you to fix #2, though I've mentioned it because using the
> > same approach to #1 and #2 would be nice.
> > 
> 
> Yes, I got your points. Case (2) is still possible to happen with QEMU
> excluded. However, QEMU is always enabling DIRTY_LOG_RING[_ACQ_REL] before
> any memory slot is created. I agree that we need to ensure there are no
> memory slots when DIRTY_LOG_RING[_ACQ_REL] is enabled.
> 
> For case (1), we can ensure RING_WTIH_BITMAP is enabled before any memory
> slot is added, as below. QEMU needs a new helper (as above) to enable it
> on board's level.
> 
> Lets fix both with a new helper in PATCH[v8 4/9] like below?

I agree that we should address (1) like this, but in (2) requiring that
no memslots were created before enabling the existing capabilities would
be a change in ABI. If we can get away with that, great, but otherwise
we may need to delete the bitmaps associated with all memslots when the
cap is enabled.

>   static inline bool kvm_vm_has_memslot_pages(struct kvm *kvm)
>   {
>       bool has_memslot_pages;
> 
>       mutex_lock(&kvm->slots_lock);
> 
>       has_memslot_pages = !!kvm->nr_memslot_pages;
> 
>       mutex_unlock(&kvm->slots_lock);
> 
>       return has_memslot_pages;
>   }

Do we need to build another helper for this? kvm_memslots_empty() will
tell you whether or not a memslot has been created by checking the gfn
tree.

On top of that, the memslot check and setting
kvm->dirty_ring_with_bitmap must happen behind the slots_lock. Otherwise
you could still wind up creating memslots w/o bitmaps.


Something like:


diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 91cf51a25394..420cc101a16e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4588,6 +4588,32 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
 			return -EINVAL;
 
 		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
+
+	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP: {
+		struct kvm_memslots *slots;
+		int r = -EINVAL;
+
+		if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
+		    !kvm->dirty_ring_size)
+			return r;
+
+		mutex_lock(&kvm->slots_lock);
+
+		slots = kvm_memslots(kvm);
+
+		/*
+		 * Avoid a race between memslot creation and enabling the ring +
+		 * bitmap capability to guarantee that no memslots have been
+		 * created without a bitmap.
+		 */
+		if (kvm_memslots_empty(slots)) {
+			kvm->dirty_ring_with_bitmap = cap->args[0];
+			r = 0;
+		}
+
+		mutex_unlock(&kvm->slots_lock);
+		return r;
+	}
 	default:
 		return kvm_vm_ioctl_enable_cap(kvm, cap);
 	}

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 4/9] KVM: Support dirty ring in conjunction with bitmap
  2022-11-04 20:12             ` Oliver Upton
@ 2022-11-04 21:57               ` Gavin Shan
  -1 siblings, 0 replies; 68+ messages in thread
From: Gavin Shan @ 2022-11-04 21:57 UTC (permalink / raw)
  To: Oliver Upton
  Cc: shuah, catalin.marinas, kvm, maz, andrew.jones, dmatlack,
	shan.gavin, bgardon, kvmarm, pbonzini, zhenyzha, will, kvmarm,
	ajones

Hi Oliver,

On 11/5/22 4:12 AM, Oliver Upton wrote:
> On Fri, Nov 04, 2022 at 02:57:15PM +0800, Gavin Shan wrote:
>> On 11/4/22 9:06 AM, Oliver Upton wrote:
> 
> [...]
> 
>>> Just to make sure we're on the same page, there's two issues:
>>>
>>>    (1) If DIRTY_LOG_RING is enabled before memslot creation and
>>>        RING_WITH_BITMAP is enabled after memslots have been created w/
>>>        dirty logging enabled, memslot->dirty_bitmap == NULL and the
>>>        kernel will fault when attempting to save the ITS tables.
>>>
>>>    (2) Not your code, but a similar issue. If DIRTY_LOG_RING[_ACQ_REL] is
>>>        enabled after memslots have been created w/ dirty logging enabled,
>>>        memslot->dirty_bitmap != NULL and that memory is wasted until the
>>>        memslot is freed.
>>>
>>> I don't expect you to fix #2, though I've mentioned it because using the
>>> same approach to #1 and #2 would be nice.
>>>
>>
>> Yes, I got your points. Case (2) is still possible to happen with QEMU
>> excluded. However, QEMU is always enabling DIRTY_LOG_RING[_ACQ_REL] before
>> any memory slot is created. I agree that we need to ensure there are no
>> memory slots when DIRTY_LOG_RING[_ACQ_REL] is enabled.
>>
>> For case (1), we can ensure RING_WTIH_BITMAP is enabled before any memory
>> slot is added, as below. QEMU needs a new helper (as above) to enable it
>> on board's level.
>>
>> Lets fix both with a new helper in PATCH[v8 4/9] like below?
> 
> I agree that we should address (1) like this, but in (2) requiring that
> no memslots were created before enabling the existing capabilities would
> be a change in ABI. If we can get away with that, great, but otherwise
> we may need to delete the bitmaps associated with all memslots when the
> cap is enabled.
> 

I had the assumption QEMU and kvm/selftests are the only consumers to
use DIRTY_RING. In this case, requiring that no memslots were created
to enable DIRTY_RING won't break userspace. Following your thoughts,
the tracked dirty pages in the bitmap also need to be synchronized to
the per-vcpu-ring before the bitmap can be destroyed. We don't have
per-vcpu-ring at this stage.

>>    static inline bool kvm_vm_has_memslot_pages(struct kvm *kvm)
>>    {
>>        bool has_memslot_pages;
>>
>>        mutex_lock(&kvm->slots_lock);
>>
>>        has_memslot_pages = !!kvm->nr_memslot_pages;
>>
>>        mutex_unlock(&kvm->slots_lock);
>>
>>        return has_memslot_pages;
>>    }
> 
> Do we need to build another helper for this? kvm_memslots_empty() will
> tell you whether or not a memslot has been created by checking the gfn
> tree.
> 

The helper was introduced to be shared when DIRTY_RING[_ACQ_REL] and
DIRTY_RING_WITH_BITMAP are enabled. Since the issue (2) isn't concern
to us, lets put it aside and the helper isn't needed. kvm_memslots_empty()
has same effect as to 'kvm->nr_memslot_pages', it's fine to use
kvm_memslots_empty(), which is more generic.

> On top of that, the memslot check and setting
> kvm->dirty_ring_with_bitmap must happen behind the slots_lock. Otherwise
> you could still wind up creating memslots w/o bitmaps.
> 

Agree.

> 
> Something like:
> 
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 91cf51a25394..420cc101a16e 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -4588,6 +4588,32 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>   			return -EINVAL;
>   
>   		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
> +
> +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP: {
> +		struct kvm_memslots *slots;
> +		int r = -EINVAL;
> +
> +		if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
> +		    !kvm->dirty_ring_size)
> +			return r;
> +
> +		mutex_lock(&kvm->slots_lock);
> +
> +		slots = kvm_memslots(kvm);
> +
> +		/*
> +		 * Avoid a race between memslot creation and enabling the ring +
> +		 * bitmap capability to guarantee that no memslots have been
> +		 * created without a bitmap.
> +		 */
> +		if (kvm_memslots_empty(slots)) {
> +			kvm->dirty_ring_with_bitmap = cap->args[0];
> +			r = 0;
> +		}
> +
> +		mutex_unlock(&kvm->slots_lock);
> +		return r;
> +	}
>   	default:
>   		return kvm_vm_ioctl_enable_cap(kvm, cap);
>   	}
> 

The proposed changes look good to me. It will be integrated to PATCH[v8 4/9].
By the way, v8 will be posted shortly.

Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 4/9] KVM: Support dirty ring in conjunction with bitmap
@ 2022-11-04 21:57               ` Gavin Shan
  0 siblings, 0 replies; 68+ messages in thread
From: Gavin Shan @ 2022-11-04 21:57 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, kvm, kvmarm, andrew.jones, ajones, maz, bgardon,
	catalin.marinas, dmatlack, will, pbonzini, peterx, seanjc,
	james.morse, shuah, suzuki.poulose, alexandru.elisei, zhenyzha,
	shan.gavin

Hi Oliver,

On 11/5/22 4:12 AM, Oliver Upton wrote:
> On Fri, Nov 04, 2022 at 02:57:15PM +0800, Gavin Shan wrote:
>> On 11/4/22 9:06 AM, Oliver Upton wrote:
> 
> [...]
> 
>>> Just to make sure we're on the same page, there's two issues:
>>>
>>>    (1) If DIRTY_LOG_RING is enabled before memslot creation and
>>>        RING_WITH_BITMAP is enabled after memslots have been created w/
>>>        dirty logging enabled, memslot->dirty_bitmap == NULL and the
>>>        kernel will fault when attempting to save the ITS tables.
>>>
>>>    (2) Not your code, but a similar issue. If DIRTY_LOG_RING[_ACQ_REL] is
>>>        enabled after memslots have been created w/ dirty logging enabled,
>>>        memslot->dirty_bitmap != NULL and that memory is wasted until the
>>>        memslot is freed.
>>>
>>> I don't expect you to fix #2, though I've mentioned it because using the
>>> same approach to #1 and #2 would be nice.
>>>
>>
>> Yes, I got your points. Case (2) is still possible to happen with QEMU
>> excluded. However, QEMU is always enabling DIRTY_LOG_RING[_ACQ_REL] before
>> any memory slot is created. I agree that we need to ensure there are no
>> memory slots when DIRTY_LOG_RING[_ACQ_REL] is enabled.
>>
>> For case (1), we can ensure RING_WTIH_BITMAP is enabled before any memory
>> slot is added, as below. QEMU needs a new helper (as above) to enable it
>> on board's level.
>>
>> Lets fix both with a new helper in PATCH[v8 4/9] like below?
> 
> I agree that we should address (1) like this, but in (2) requiring that
> no memslots were created before enabling the existing capabilities would
> be a change in ABI. If we can get away with that, great, but otherwise
> we may need to delete the bitmaps associated with all memslots when the
> cap is enabled.
> 

I had the assumption QEMU and kvm/selftests are the only consumers to
use DIRTY_RING. In this case, requiring that no memslots were created
to enable DIRTY_RING won't break userspace. Following your thoughts,
the tracked dirty pages in the bitmap also need to be synchronized to
the per-vcpu-ring before the bitmap can be destroyed. We don't have
per-vcpu-ring at this stage.

>>    static inline bool kvm_vm_has_memslot_pages(struct kvm *kvm)
>>    {
>>        bool has_memslot_pages;
>>
>>        mutex_lock(&kvm->slots_lock);
>>
>>        has_memslot_pages = !!kvm->nr_memslot_pages;
>>
>>        mutex_unlock(&kvm->slots_lock);
>>
>>        return has_memslot_pages;
>>    }
> 
> Do we need to build another helper for this? kvm_memslots_empty() will
> tell you whether or not a memslot has been created by checking the gfn
> tree.
> 

The helper was introduced to be shared when DIRTY_RING[_ACQ_REL] and
DIRTY_RING_WITH_BITMAP are enabled. Since the issue (2) isn't concern
to us, lets put it aside and the helper isn't needed. kvm_memslots_empty()
has same effect as to 'kvm->nr_memslot_pages', it's fine to use
kvm_memslots_empty(), which is more generic.

> On top of that, the memslot check and setting
> kvm->dirty_ring_with_bitmap must happen behind the slots_lock. Otherwise
> you could still wind up creating memslots w/o bitmaps.
> 

Agree.

> 
> Something like:
> 
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 91cf51a25394..420cc101a16e 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -4588,6 +4588,32 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>   			return -EINVAL;
>   
>   		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
> +
> +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP: {
> +		struct kvm_memslots *slots;
> +		int r = -EINVAL;
> +
> +		if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
> +		    !kvm->dirty_ring_size)
> +			return r;
> +
> +		mutex_lock(&kvm->slots_lock);
> +
> +		slots = kvm_memslots(kvm);
> +
> +		/*
> +		 * Avoid a race between memslot creation and enabling the ring +
> +		 * bitmap capability to guarantee that no memslots have been
> +		 * created without a bitmap.
> +		 */
> +		if (kvm_memslots_empty(slots)) {
> +			kvm->dirty_ring_with_bitmap = cap->args[0];
> +			r = 0;
> +		}
> +
> +		mutex_unlock(&kvm->slots_lock);
> +		return r;
> +	}
>   	default:
>   		return kvm_vm_ioctl_enable_cap(kvm, cap);
>   	}
> 

The proposed changes look good to me. It will be integrated to PATCH[v8 4/9].
By the way, v8 will be posted shortly.

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 4/9] KVM: Support dirty ring in conjunction with bitmap
  2022-11-04 21:57               ` Gavin Shan
@ 2022-11-04 22:23                 ` Oliver Upton
  -1 siblings, 0 replies; 68+ messages in thread
From: Oliver Upton @ 2022-11-04 22:23 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvmarm, kvm, kvmarm, andrew.jones, ajones, maz, bgardon,
	catalin.marinas, dmatlack, will, pbonzini, peterx, seanjc,
	james.morse, shuah, suzuki.poulose, alexandru.elisei, zhenyzha,
	shan.gavin

On Sat, Nov 05, 2022 at 05:57:33AM +0800, Gavin Shan wrote:
> On 11/5/22 4:12 AM, Oliver Upton wrote:
> > I agree that we should address (1) like this, but in (2) requiring that
> > no memslots were created before enabling the existing capabilities would
> > be a change in ABI. If we can get away with that, great, but otherwise
> > we may need to delete the bitmaps associated with all memslots when the
> > cap is enabled.
> > 
> 
> I had the assumption QEMU and kvm/selftests are the only consumers to
> use DIRTY_RING. In this case, requiring that no memslots were created
> to enable DIRTY_RING won't break userspace.
> Following your thoughts, the tracked dirty pages in the bitmap also
> need to be synchronized to the per-vcpu-ring before the bitmap can be
> destroyed.

Eh, I don't think we'd need to go that far. No matter what, any dirty
bits that were present in the bitmap could never be read again anyway,
as we reject KVM_GET_DIRTY_LOG if kvm->dirty_ring_size != 0.

> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 91cf51a25394..420cc101a16e 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -4588,6 +4588,32 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
> >   			return -EINVAL;
> >   		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
> > +
> > +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP: {
> > +		struct kvm_memslots *slots;
> > +		int r = -EINVAL;
> > +
> > +		if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
> > +		    !kvm->dirty_ring_size)
> > +			return r;
> > +
> > +		mutex_lock(&kvm->slots_lock);
> > +
> > +		slots = kvm_memslots(kvm);
> > +
> > +		/*
> > +		 * Avoid a race between memslot creation and enabling the ring +
> > +		 * bitmap capability to guarantee that no memslots have been
> > +		 * created without a bitmap.
> > +		 */
> > +		if (kvm_memslots_empty(slots)) {
> > +			kvm->dirty_ring_with_bitmap = cap->args[0];
> > +			r = 0;
> > +		}
> > +
> > +		mutex_unlock(&kvm->slots_lock);
> > +		return r;
> > +	}
> >   	default:
> >   		return kvm_vm_ioctl_enable_cap(kvm, cap);
> >   	}
> > 
> 
> The proposed changes look good to me. It will be integrated to PATCH[v8 4/9].
> By the way, v8 will be posted shortly.

Excellent, thanks!

--
Best,
Oliver

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v7 4/9] KVM: Support dirty ring in conjunction with bitmap
@ 2022-11-04 22:23                 ` Oliver Upton
  0 siblings, 0 replies; 68+ messages in thread
From: Oliver Upton @ 2022-11-04 22:23 UTC (permalink / raw)
  To: Gavin Shan
  Cc: shuah, catalin.marinas, kvm, maz, andrew.jones, dmatlack,
	shan.gavin, bgardon, kvmarm, pbonzini, zhenyzha, will, kvmarm,
	ajones

On Sat, Nov 05, 2022 at 05:57:33AM +0800, Gavin Shan wrote:
> On 11/5/22 4:12 AM, Oliver Upton wrote:
> > I agree that we should address (1) like this, but in (2) requiring that
> > no memslots were created before enabling the existing capabilities would
> > be a change in ABI. If we can get away with that, great, but otherwise
> > we may need to delete the bitmaps associated with all memslots when the
> > cap is enabled.
> > 
> 
> I had the assumption QEMU and kvm/selftests are the only consumers to
> use DIRTY_RING. In this case, requiring that no memslots were created
> to enable DIRTY_RING won't break userspace.
> Following your thoughts, the tracked dirty pages in the bitmap also
> need to be synchronized to the per-vcpu-ring before the bitmap can be
> destroyed.

Eh, I don't think we'd need to go that far. No matter what, any dirty
bits that were present in the bitmap could never be read again anyway,
as we reject KVM_GET_DIRTY_LOG if kvm->dirty_ring_size != 0.

> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 91cf51a25394..420cc101a16e 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -4588,6 +4588,32 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
> >   			return -EINVAL;
> >   		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
> > +
> > +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP: {
> > +		struct kvm_memslots *slots;
> > +		int r = -EINVAL;
> > +
> > +		if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
> > +		    !kvm->dirty_ring_size)
> > +			return r;
> > +
> > +		mutex_lock(&kvm->slots_lock);
> > +
> > +		slots = kvm_memslots(kvm);
> > +
> > +		/*
> > +		 * Avoid a race between memslot creation and enabling the ring +
> > +		 * bitmap capability to guarantee that no memslots have been
> > +		 * created without a bitmap.
> > +		 */
> > +		if (kvm_memslots_empty(slots)) {
> > +			kvm->dirty_ring_with_bitmap = cap->args[0];
> > +			r = 0;
> > +		}
> > +
> > +		mutex_unlock(&kvm->slots_lock);
> > +		return r;
> > +	}
> >   	default:
> >   		return kvm_vm_ioctl_enable_cap(kvm, cap);
> >   	}
> > 
> 
> The proposed changes look good to me. It will be integrated to PATCH[v8 4/9].
> By the way, v8 will be posted shortly.

Excellent, thanks!

--
Best,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2022-11-04 22:23 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-31  0:36 [PATCH v7 0/9] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
2022-10-31  0:36 ` Gavin Shan
2022-10-31  0:36 ` [PATCH v7 1/9] KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL Gavin Shan
2022-10-31  0:36   ` Gavin Shan
2022-11-01 19:39   ` Sean Christopherson
2022-11-01 19:39     ` Sean Christopherson
2022-11-02 14:29     ` Peter Xu
2022-11-02 14:29       ` Peter Xu
2022-11-02 15:58       ` Marc Zyngier
2022-11-02 15:58         ` Marc Zyngier
2022-11-02 16:11         ` Sean Christopherson
2022-11-02 16:11           ` Sean Christopherson
2022-11-02 16:44           ` Marc Zyngier
2022-11-02 16:44             ` Marc Zyngier
2022-11-03  0:44             ` Gavin Shan
2022-11-03  0:44               ` Gavin Shan
2022-11-02 16:23         ` Peter Xu
2022-11-02 16:23           ` Peter Xu
2022-11-02 16:33           ` Sean Christopherson
2022-11-02 16:33             ` Sean Christopherson
2022-11-02 16:43             ` Peter Xu
2022-11-02 16:43               ` Peter Xu
2022-11-02 16:48           ` Marc Zyngier
2022-11-02 16:48             ` Marc Zyngier
2022-11-02 14:31     ` Marc Zyngier
2022-11-02 14:31       ` Marc Zyngier
2022-10-31  0:36 ` [PATCH v7 2/9] KVM: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h Gavin Shan
2022-10-31  0:36   ` Gavin Shan
2022-10-31  0:36 ` [PATCH v7 3/9] KVM: Check KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL} prior to enabling them Gavin Shan
2022-10-31  0:36   ` Gavin Shan
2022-10-31  9:18   ` Oliver Upton
2022-10-31  9:18     ` Oliver Upton
2022-10-31  0:36 ` [PATCH v7 4/9] KVM: Support dirty ring in conjunction with bitmap Gavin Shan
2022-10-31  0:36   ` Gavin Shan
2022-11-03 19:33   ` Peter Xu
2022-11-03 19:33     ` Peter Xu
2022-11-03 23:32   ` Oliver Upton
2022-11-03 23:32     ` Oliver Upton
2022-11-04  0:12     ` Gavin Shan
2022-11-04  0:12       ` Gavin Shan
2022-11-04  1:06       ` Oliver Upton
2022-11-04  1:06         ` Oliver Upton
2022-11-04  6:57         ` Gavin Shan
2022-11-04  6:57           ` Gavin Shan
2022-11-04 20:12           ` Oliver Upton
2022-11-04 20:12             ` Oliver Upton
2022-11-04 21:57             ` Gavin Shan
2022-11-04 21:57               ` Gavin Shan
2022-11-04 22:23               ` Oliver Upton
2022-11-04 22:23                 ` Oliver Upton
2022-10-31  0:36 ` [PATCH v7 5/9] KVM: arm64: Improve no-running-vcpu report for dirty ring Gavin Shan
2022-10-31  0:36   ` Gavin Shan
2022-10-31  9:08   ` Oliver Upton
2022-10-31  9:08     ` Oliver Upton
2022-10-31 23:08     ` Gavin Shan
2022-10-31 23:08       ` Gavin Shan
2022-11-02 17:18       ` Marc Zyngier
2022-11-02 17:18         ` Marc Zyngier
2022-10-31  0:36 ` [PATCH v7 6/9] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
2022-10-31  0:36   ` Gavin Shan
2022-10-31  0:36 ` [PATCH v7 7/9] KVM: selftests: Use host page size to map ring buffer in dirty_log_test Gavin Shan
2022-10-31  0:36   ` Gavin Shan
2022-10-31  0:36 ` [PATCH v7 8/9] KVM: selftests: Clear dirty ring states between two modes " Gavin Shan
2022-10-31  0:36   ` Gavin Shan
2022-10-31  0:36 ` [PATCH v7 9/9] KVM: selftests: Automate choosing dirty ring size " Gavin Shan
2022-10-31  0:36   ` Gavin Shan
2022-10-31 17:23 ` (subset) [PATCH v7 0/9] KVM: arm64: Enable ring-based dirty memory tracking Marc Zyngier
2022-10-31 17:23   ` Marc Zyngier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.