All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 0/7] KVM: arm64: Enable ring-based dirty memory tracking
@ 2022-11-04 23:40 ` Gavin Shan
  0 siblings, 0 replies; 66+ messages in thread
From: Gavin Shan @ 2022-11-04 23:40 UTC (permalink / raw)
  To: kvmarm
  Cc: maz, kvm, catalin.marinas, andrew.jones, will, shan.gavin,
	bgardon, dmatlack, pbonzini, zhenyzha, shuah, kvmarm, ajones

This series enables the ring-based dirty memory tracking for ARM64.
The feature has been available and enabled on x86 for a while. It
is beneficial when the number of dirty pages is small in a checkpointing
system or live migration scenario. More details can be found from
fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking").

This series is applied to v6.1.rc3, plus commit c227590467cb ("KVM:
Check KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL} prior to enabling them").
The commit is currently in Marc's 'fixes' branch, targeting v6.1.rc4/5.

v7: https://lore.kernel.org/kvmarm/20221031003621.164306-1-gshan@redhat.com/
v6: https://lore.kernel.org/kvmarm/20221011061447.131531-1-gshan@redhat.com/
v5: https://lore.kernel.org/all/20221005004154.83502-1-gshan@redhat.com/
v4: https://lore.kernel.org/kvmarm/20220927005439.21130-1-gshan@redhat.com/
v3: https://lore.kernel.org/r/20220922003214.276736-1-gshan@redhat.com
v2: https://lore.kernel.org/lkml/YyiV%2Fl7O23aw5aaO@xz-m1.local/T/
v1: https://lore.kernel.org/lkml/20220819005601.198436-1-gshan@redhat.com

Testing
=======
(1) kvm/selftests/dirty_log_test
(2) Live migration by QEMU

Changelog
=========
v8:
  * Pick review-by and ack-by                                   (Peter/Sean)
  * Drop chunk of code to clear KVM_REQ_DIRTY_RING_SOFT_FULL
    in kvm_dirty_ring_reset(). Add comments to say the event
    will be cleared by the VCPU thread next time when it enters
    the guest. All other changes related to kvm_dirty_ring_reset()
    are dropped in PATCH[v8 1/7].                               (Sean/Peter/Marc)
  * Drop PATCH[v7 3/7] since it has been merged                 (Marc/Oliver)
  * Document the order of DIRTY_RING_{ACQ_REL, WITH_BITMAP},
    add check to ensure no memslots are created when
    DIRTY_RING_WITH_BITMAP is enabled, and add weak function
    kvm_arch_allow_write_without_running_vcpu() in PATCH[v8 3/7] (Oliver)
  * Only keep ourself out of non-running-vcpu radar when vgic/its
    tables are being saved in PATCH[v8 4/7]                      (Marc/Sean)
v7:
  * Cut down #ifdef, avoid using 'container_of()', move the
    dirty-ring check after KVM_REQ_VM_DEAD, add comments
    for kvm_dirty_ring_check_request(), use tab character
    for KVM event definitions in kvm_host.h in PATCH[v7 01]    (Sean)
  * Add PATCH[v7 03] to recheck if the capability has
    been advertised prior to enable RING/RING_ACEL_REL         (Sean)
  * Improve the description about capability RING_WITH_BITMAP,
    rename kvm_dirty_ring_exclusive() to kvm_use_dirty_bitmap()
    in PATCH[v7 04/09]                                         (Peter/Oliver/Sean)
  * Add PATCH[v7 05/09] to improve no-running-vcpu report      (Marc/Sean)
  * Improve commit messages                                    (Sean/Oliver)
v6:
  * Add CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP, for arm64
    to advertise KVM_CAP_DIRTY_RING_WITH_BITMAP in
    PATCH[v6 3/8]                                              (Oliver/Peter)
  * Add helper kvm_dirty_ring_exclusive() to check if
    traditional bitmap-based dirty log tracking is
    exclusive to dirty-ring in PATCH[v6 3/8]                   (Peter)
  * Enable KVM_CAP_DIRTY_RING_WITH_BITMAP in PATCH[v6 5/8]     (Gavin)
v5:
  * Drop empty stub kvm_dirty_ring_check_request()             (Marc/Peter)
  * Add PATCH[v5 3/7] to allow using bitmap, indicated by
    KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP                        (Marc/Peter)
v4:
  * Commit log improvement                                     (Marc)
  * Add helper kvm_dirty_ring_check_request()                  (Marc)
  * Drop ifdef for kvm_cpu_dirty_log_size()                    (Marc)
v3:
  * Check KVM_REQ_RING_SOFT_RULL inside kvm_request_pending()  (Peter)
  * Move declaration of kvm_cpu_dirty_log_size()               (test-robot)
v2:
  * Introduce KVM_REQ_RING_SOFT_FULL                           (Marc)
  * Changelog improvement                                      (Marc)
  * Fix dirty_log_test without knowing host page size          (Drew)

Gavin Shan (7):
  KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL
  KVM: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h
  KVM: Support dirty ring in conjunction with bitmap
  KVM: arm64: Enable ring-based dirty memory tracking
  KVM: selftests: Use host page size to map ring buffer in
    dirty_log_test
  KVM: selftests: Clear dirty ring states between two modes in
    dirty_log_test
  KVM: selftests: Automate choosing dirty ring size in dirty_log_test

 Documentation/virt/kvm/api.rst               | 35 ++++++++++---
 arch/arm64/include/uapi/asm/kvm.h            |  1 +
 arch/arm64/kvm/Kconfig                       |  2 +
 arch/arm64/kvm/arm.c                         |  3 ++
 arch/arm64/kvm/mmu.c                         | 15 ++++++
 arch/arm64/kvm/vgic/vgic-its.c               |  3 ++
 arch/arm64/kvm/vgic/vgic-mmio-v3.c           |  7 +++
 arch/x86/include/asm/kvm_host.h              |  2 -
 arch/x86/kvm/x86.c                           | 15 +++---
 include/kvm/arm_vgic.h                       |  2 +
 include/linux/kvm_dirty_ring.h               | 20 +++++---
 include/linux/kvm_host.h                     | 10 ++--
 include/uapi/linux/kvm.h                     |  1 +
 tools/testing/selftests/kvm/dirty_log_test.c | 53 ++++++++++++++------
 tools/testing/selftests/kvm/lib/kvm_util.c   |  2 +-
 virt/kvm/Kconfig                             |  8 +++
 virt/kvm/dirty_ring.c                        | 42 +++++++++++++++-
 virt/kvm/kvm_main.c                          | 52 +++++++++++++++----
 18 files changed, 214 insertions(+), 59 deletions(-)

-- 
2.23.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v8 0/7] KVM: arm64: Enable ring-based dirty memory tracking
@ 2022-11-04 23:40 ` Gavin Shan
  0 siblings, 0 replies; 66+ messages in thread
From: Gavin Shan @ 2022-11-04 23:40 UTC (permalink / raw)
  To: kvmarm
  Cc: kvmarm, kvm, shuah, catalin.marinas, andrew.jones, ajones,
	bgardon, dmatlack, will, suzuki.poulose, alexandru.elisei,
	pbonzini, maz, peterx, seanjc, oliver.upton, zhenyzha,
	shan.gavin

This series enables the ring-based dirty memory tracking for ARM64.
The feature has been available and enabled on x86 for a while. It
is beneficial when the number of dirty pages is small in a checkpointing
system or live migration scenario. More details can be found from
fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking").

This series is applied to v6.1.rc3, plus commit c227590467cb ("KVM:
Check KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL} prior to enabling them").
The commit is currently in Marc's 'fixes' branch, targeting v6.1.rc4/5.

v7: https://lore.kernel.org/kvmarm/20221031003621.164306-1-gshan@redhat.com/
v6: https://lore.kernel.org/kvmarm/20221011061447.131531-1-gshan@redhat.com/
v5: https://lore.kernel.org/all/20221005004154.83502-1-gshan@redhat.com/
v4: https://lore.kernel.org/kvmarm/20220927005439.21130-1-gshan@redhat.com/
v3: https://lore.kernel.org/r/20220922003214.276736-1-gshan@redhat.com
v2: https://lore.kernel.org/lkml/YyiV%2Fl7O23aw5aaO@xz-m1.local/T/
v1: https://lore.kernel.org/lkml/20220819005601.198436-1-gshan@redhat.com

Testing
=======
(1) kvm/selftests/dirty_log_test
(2) Live migration by QEMU

Changelog
=========
v8:
  * Pick review-by and ack-by                                   (Peter/Sean)
  * Drop chunk of code to clear KVM_REQ_DIRTY_RING_SOFT_FULL
    in kvm_dirty_ring_reset(). Add comments to say the event
    will be cleared by the VCPU thread next time when it enters
    the guest. All other changes related to kvm_dirty_ring_reset()
    are dropped in PATCH[v8 1/7].                               (Sean/Peter/Marc)
  * Drop PATCH[v7 3/7] since it has been merged                 (Marc/Oliver)
  * Document the order of DIRTY_RING_{ACQ_REL, WITH_BITMAP},
    add check to ensure no memslots are created when
    DIRTY_RING_WITH_BITMAP is enabled, and add weak function
    kvm_arch_allow_write_without_running_vcpu() in PATCH[v8 3/7] (Oliver)
  * Only keep ourself out of non-running-vcpu radar when vgic/its
    tables are being saved in PATCH[v8 4/7]                      (Marc/Sean)
v7:
  * Cut down #ifdef, avoid using 'container_of()', move the
    dirty-ring check after KVM_REQ_VM_DEAD, add comments
    for kvm_dirty_ring_check_request(), use tab character
    for KVM event definitions in kvm_host.h in PATCH[v7 01]    (Sean)
  * Add PATCH[v7 03] to recheck if the capability has
    been advertised prior to enable RING/RING_ACEL_REL         (Sean)
  * Improve the description about capability RING_WITH_BITMAP,
    rename kvm_dirty_ring_exclusive() to kvm_use_dirty_bitmap()
    in PATCH[v7 04/09]                                         (Peter/Oliver/Sean)
  * Add PATCH[v7 05/09] to improve no-running-vcpu report      (Marc/Sean)
  * Improve commit messages                                    (Sean/Oliver)
v6:
  * Add CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP, for arm64
    to advertise KVM_CAP_DIRTY_RING_WITH_BITMAP in
    PATCH[v6 3/8]                                              (Oliver/Peter)
  * Add helper kvm_dirty_ring_exclusive() to check if
    traditional bitmap-based dirty log tracking is
    exclusive to dirty-ring in PATCH[v6 3/8]                   (Peter)
  * Enable KVM_CAP_DIRTY_RING_WITH_BITMAP in PATCH[v6 5/8]     (Gavin)
v5:
  * Drop empty stub kvm_dirty_ring_check_request()             (Marc/Peter)
  * Add PATCH[v5 3/7] to allow using bitmap, indicated by
    KVM_CAP_DIRTY_LOG_RING_ALLOW_BITMAP                        (Marc/Peter)
v4:
  * Commit log improvement                                     (Marc)
  * Add helper kvm_dirty_ring_check_request()                  (Marc)
  * Drop ifdef for kvm_cpu_dirty_log_size()                    (Marc)
v3:
  * Check KVM_REQ_RING_SOFT_RULL inside kvm_request_pending()  (Peter)
  * Move declaration of kvm_cpu_dirty_log_size()               (test-robot)
v2:
  * Introduce KVM_REQ_RING_SOFT_FULL                           (Marc)
  * Changelog improvement                                      (Marc)
  * Fix dirty_log_test without knowing host page size          (Drew)

Gavin Shan (7):
  KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL
  KVM: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h
  KVM: Support dirty ring in conjunction with bitmap
  KVM: arm64: Enable ring-based dirty memory tracking
  KVM: selftests: Use host page size to map ring buffer in
    dirty_log_test
  KVM: selftests: Clear dirty ring states between two modes in
    dirty_log_test
  KVM: selftests: Automate choosing dirty ring size in dirty_log_test

 Documentation/virt/kvm/api.rst               | 35 ++++++++++---
 arch/arm64/include/uapi/asm/kvm.h            |  1 +
 arch/arm64/kvm/Kconfig                       |  2 +
 arch/arm64/kvm/arm.c                         |  3 ++
 arch/arm64/kvm/mmu.c                         | 15 ++++++
 arch/arm64/kvm/vgic/vgic-its.c               |  3 ++
 arch/arm64/kvm/vgic/vgic-mmio-v3.c           |  7 +++
 arch/x86/include/asm/kvm_host.h              |  2 -
 arch/x86/kvm/x86.c                           | 15 +++---
 include/kvm/arm_vgic.h                       |  2 +
 include/linux/kvm_dirty_ring.h               | 20 +++++---
 include/linux/kvm_host.h                     | 10 ++--
 include/uapi/linux/kvm.h                     |  1 +
 tools/testing/selftests/kvm/dirty_log_test.c | 53 ++++++++++++++------
 tools/testing/selftests/kvm/lib/kvm_util.c   |  2 +-
 virt/kvm/Kconfig                             |  8 +++
 virt/kvm/dirty_ring.c                        | 42 +++++++++++++++-
 virt/kvm/kvm_main.c                          | 52 +++++++++++++++----
 18 files changed, 214 insertions(+), 59 deletions(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v8 1/7] KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL
  2022-11-04 23:40 ` Gavin Shan
@ 2022-11-04 23:40   ` Gavin Shan
  -1 siblings, 0 replies; 66+ messages in thread
From: Gavin Shan @ 2022-11-04 23:40 UTC (permalink / raw)
  To: kvmarm
  Cc: maz, kvm, catalin.marinas, andrew.jones, will, shan.gavin,
	bgardon, dmatlack, pbonzini, zhenyzha, shuah, kvmarm, ajones

The VCPU isn't expected to be runnable when the dirty ring becomes soft
full, until the dirty pages are harvested and the dirty ring is reset
from userspace. So there is a check in each guest's entrace to see if
the dirty ring is soft full or not. The VCPU is stopped from running if
its dirty ring has been soft full. The similar check will be needed when
the feature is going to be supported on ARM64. As Marc Zyngier suggested,
a new event will avoid pointless overhead to check the size of the dirty
ring ('vcpu->kvm->dirty_ring_size') in each guest's entrance.

Add KVM_REQ_DIRTY_RING_SOFT_FULL. The event is raised when the dirty ring
becomes soft full in kvm_dirty_ring_push(). The event is only cleared in
the check, done in the newly added helper kvm_dirty_ring_check_request().
Since the VCPU is not runnable when the dirty ring becomes soft full, the
KVM_REQ_DIRTY_RING_SOFT_FULL event is always set to prevent the VCPU from
running until the dirty pages are harvested and the dirty ring is reset by
userspace.

kvm_dirty_ring_soft_full() becomes a private function with the newly added
helper kvm_dirty_ring_check_request(). The alignment for the various event
definitions in kvm_host.h is changed to tab character by the way. In order
to avoid using 'container_of()', the argument @ring is replaced by @vcpu
in kvm_dirty_ring_push().

Link: https://lore.kernel.org/kvmarm/87lerkwtm5.wl-maz@kernel.org
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c             | 15 ++++++---------
 include/linux/kvm_dirty_ring.h | 12 ++++--------
 include/linux/kvm_host.h       |  9 +++++----
 virt/kvm/dirty_ring.c          | 32 ++++++++++++++++++++++++++++++--
 virt/kvm/kvm_main.c            |  3 +--
 5 files changed, 46 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 521b433f978c..fd3347e31f5b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10512,20 +10512,17 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
 	bool req_immediate_exit = false;
 
-	/* Forbid vmenter if vcpu dirty ring is soft-full */
-	if (unlikely(vcpu->kvm->dirty_ring_size &&
-		     kvm_dirty_ring_soft_full(&vcpu->dirty_ring))) {
-		vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL;
-		trace_kvm_dirty_ring_exit(vcpu);
-		r = 0;
-		goto out;
-	}
-
 	if (kvm_request_pending(vcpu)) {
 		if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) {
 			r = -EIO;
 			goto out;
 		}
+
+		if (kvm_dirty_ring_check_request(vcpu)) {
+			r = 0;
+			goto out;
+		}
+
 		if (kvm_check_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu)) {
 			if (unlikely(!kvm_x86_ops.nested_ops->get_nested_state_pages(vcpu))) {
 				r = 0;
diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
index 906f899813dc..9c13c4c3d30c 100644
--- a/include/linux/kvm_dirty_ring.h
+++ b/include/linux/kvm_dirty_ring.h
@@ -49,7 +49,7 @@ static inline int kvm_dirty_ring_reset(struct kvm *kvm,
 	return 0;
 }
 
-static inline void kvm_dirty_ring_push(struct kvm_dirty_ring *ring,
+static inline void kvm_dirty_ring_push(struct kvm_vcpu *vcpu,
 				       u32 slot, u64 offset)
 {
 }
@@ -64,11 +64,6 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
 {
 }
 
-static inline bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring)
-{
-	return true;
-}
-
 #else /* CONFIG_HAVE_KVM_DIRTY_RING */
 
 u32 kvm_dirty_ring_get_rsvd_entries(void);
@@ -84,13 +79,14 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring);
  * returns =0: successfully pushed
  *         <0: unable to push, need to wait
  */
-void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset);
+void kvm_dirty_ring_push(struct kvm_vcpu *vcpu, u32 slot, u64 offset);
+
+bool kvm_dirty_ring_check_request(struct kvm_vcpu *vcpu);
 
 /* for use in vm_operations_struct */
 struct page *kvm_dirty_ring_get_page(struct kvm_dirty_ring *ring, u32 offset);
 
 void kvm_dirty_ring_free(struct kvm_dirty_ring *ring);
-bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring);
 
 #endif /* CONFIG_HAVE_KVM_DIRTY_RING */
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 18592bdf4c1b..6fab55e58111 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -153,10 +153,11 @@ static inline bool is_error_page(struct page *page)
  * Architecture-independent vcpu->requests bit members
  * Bits 3-7 are reserved for more arch-independent bits.
  */
-#define KVM_REQ_TLB_FLUSH         (0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
-#define KVM_REQ_VM_DEAD           (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
-#define KVM_REQ_UNBLOCK           2
-#define KVM_REQUEST_ARCH_BASE     8
+#define KVM_REQ_TLB_FLUSH		(0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_VM_DEAD			(1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_UNBLOCK			2
+#define KVM_REQ_DIRTY_RING_SOFT_FULL	3
+#define KVM_REQUEST_ARCH_BASE		8
 
 /*
  * KVM_REQ_OUTSIDE_GUEST_MODE exists is purely as way to force the vCPU to
diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index d6fabf238032..fecbb7d75ad2 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -26,7 +26,7 @@ static u32 kvm_dirty_ring_used(struct kvm_dirty_ring *ring)
 	return READ_ONCE(ring->dirty_index) - READ_ONCE(ring->reset_index);
 }
 
-bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring)
+static bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring)
 {
 	return kvm_dirty_ring_used(ring) >= ring->soft_limit;
 }
@@ -142,13 +142,19 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
 
 	kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
 
+	/*
+	 * The request KVM_REQ_DIRTY_RING_SOFT_FULL will be cleared
+	 * by the VCPU thread next time when it enters the guest.
+	 */
+
 	trace_kvm_dirty_ring_reset(ring);
 
 	return count;
 }
 
-void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset)
+void kvm_dirty_ring_push(struct kvm_vcpu *vcpu, u32 slot, u64 offset)
 {
+	struct kvm_dirty_ring *ring = &vcpu->dirty_ring;
 	struct kvm_dirty_gfn *entry;
 
 	/* It should never get full */
@@ -166,6 +172,28 @@ void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset)
 	kvm_dirty_gfn_set_dirtied(entry);
 	ring->dirty_index++;
 	trace_kvm_dirty_ring_push(ring, slot, offset);
+
+	if (kvm_dirty_ring_soft_full(ring))
+		kvm_make_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu);
+}
+
+bool kvm_dirty_ring_check_request(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * The VCPU isn't runnable when the dirty ring becomes soft full.
+	 * The KVM_REQ_DIRTY_RING_SOFT_FULL event is always set to prevent
+	 * the VCPU from running until the dirty pages are harvested and
+	 * the dirty ring is reset by userspace.
+	 */
+	if (kvm_check_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu) &&
+	    kvm_dirty_ring_soft_full(&vcpu->dirty_ring)) {
+		kvm_make_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu);
+		vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL;
+		trace_kvm_dirty_ring_exit(vcpu);
+		return true;
+	}
+
+	return false;
 }
 
 struct page *kvm_dirty_ring_get_page(struct kvm_dirty_ring *ring, u32 offset)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 25d7872b29c1..c865d7d82685 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3314,8 +3314,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
 		u32 slot = (memslot->as_id << 16) | memslot->id;
 
 		if (kvm->dirty_ring_size)
-			kvm_dirty_ring_push(&vcpu->dirty_ring,
-					    slot, rel_gfn);
+			kvm_dirty_ring_push(vcpu, slot, rel_gfn);
 		else
 			set_bit_le(rel_gfn, memslot->dirty_bitmap);
 	}
-- 
2.23.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v8 1/7] KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL
@ 2022-11-04 23:40   ` Gavin Shan
  0 siblings, 0 replies; 66+ messages in thread
From: Gavin Shan @ 2022-11-04 23:40 UTC (permalink / raw)
  To: kvmarm
  Cc: kvmarm, kvm, shuah, catalin.marinas, andrew.jones, ajones,
	bgardon, dmatlack, will, suzuki.poulose, alexandru.elisei,
	pbonzini, maz, peterx, seanjc, oliver.upton, zhenyzha,
	shan.gavin

The VCPU isn't expected to be runnable when the dirty ring becomes soft
full, until the dirty pages are harvested and the dirty ring is reset
from userspace. So there is a check in each guest's entrace to see if
the dirty ring is soft full or not. The VCPU is stopped from running if
its dirty ring has been soft full. The similar check will be needed when
the feature is going to be supported on ARM64. As Marc Zyngier suggested,
a new event will avoid pointless overhead to check the size of the dirty
ring ('vcpu->kvm->dirty_ring_size') in each guest's entrance.

Add KVM_REQ_DIRTY_RING_SOFT_FULL. The event is raised when the dirty ring
becomes soft full in kvm_dirty_ring_push(). The event is only cleared in
the check, done in the newly added helper kvm_dirty_ring_check_request().
Since the VCPU is not runnable when the dirty ring becomes soft full, the
KVM_REQ_DIRTY_RING_SOFT_FULL event is always set to prevent the VCPU from
running until the dirty pages are harvested and the dirty ring is reset by
userspace.

kvm_dirty_ring_soft_full() becomes a private function with the newly added
helper kvm_dirty_ring_check_request(). The alignment for the various event
definitions in kvm_host.h is changed to tab character by the way. In order
to avoid using 'container_of()', the argument @ring is replaced by @vcpu
in kvm_dirty_ring_push().

Link: https://lore.kernel.org/kvmarm/87lerkwtm5.wl-maz@kernel.org
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/x86.c             | 15 ++++++---------
 include/linux/kvm_dirty_ring.h | 12 ++++--------
 include/linux/kvm_host.h       |  9 +++++----
 virt/kvm/dirty_ring.c          | 32 ++++++++++++++++++++++++++++++--
 virt/kvm/kvm_main.c            |  3 +--
 5 files changed, 46 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 521b433f978c..fd3347e31f5b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10512,20 +10512,17 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
 	bool req_immediate_exit = false;
 
-	/* Forbid vmenter if vcpu dirty ring is soft-full */
-	if (unlikely(vcpu->kvm->dirty_ring_size &&
-		     kvm_dirty_ring_soft_full(&vcpu->dirty_ring))) {
-		vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL;
-		trace_kvm_dirty_ring_exit(vcpu);
-		r = 0;
-		goto out;
-	}
-
 	if (kvm_request_pending(vcpu)) {
 		if (kvm_check_request(KVM_REQ_VM_DEAD, vcpu)) {
 			r = -EIO;
 			goto out;
 		}
+
+		if (kvm_dirty_ring_check_request(vcpu)) {
+			r = 0;
+			goto out;
+		}
+
 		if (kvm_check_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu)) {
 			if (unlikely(!kvm_x86_ops.nested_ops->get_nested_state_pages(vcpu))) {
 				r = 0;
diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
index 906f899813dc..9c13c4c3d30c 100644
--- a/include/linux/kvm_dirty_ring.h
+++ b/include/linux/kvm_dirty_ring.h
@@ -49,7 +49,7 @@ static inline int kvm_dirty_ring_reset(struct kvm *kvm,
 	return 0;
 }
 
-static inline void kvm_dirty_ring_push(struct kvm_dirty_ring *ring,
+static inline void kvm_dirty_ring_push(struct kvm_vcpu *vcpu,
 				       u32 slot, u64 offset)
 {
 }
@@ -64,11 +64,6 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
 {
 }
 
-static inline bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring)
-{
-	return true;
-}
-
 #else /* CONFIG_HAVE_KVM_DIRTY_RING */
 
 u32 kvm_dirty_ring_get_rsvd_entries(void);
@@ -84,13 +79,14 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring);
  * returns =0: successfully pushed
  *         <0: unable to push, need to wait
  */
-void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset);
+void kvm_dirty_ring_push(struct kvm_vcpu *vcpu, u32 slot, u64 offset);
+
+bool kvm_dirty_ring_check_request(struct kvm_vcpu *vcpu);
 
 /* for use in vm_operations_struct */
 struct page *kvm_dirty_ring_get_page(struct kvm_dirty_ring *ring, u32 offset);
 
 void kvm_dirty_ring_free(struct kvm_dirty_ring *ring);
-bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring);
 
 #endif /* CONFIG_HAVE_KVM_DIRTY_RING */
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 18592bdf4c1b..6fab55e58111 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -153,10 +153,11 @@ static inline bool is_error_page(struct page *page)
  * Architecture-independent vcpu->requests bit members
  * Bits 3-7 are reserved for more arch-independent bits.
  */
-#define KVM_REQ_TLB_FLUSH         (0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
-#define KVM_REQ_VM_DEAD           (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
-#define KVM_REQ_UNBLOCK           2
-#define KVM_REQUEST_ARCH_BASE     8
+#define KVM_REQ_TLB_FLUSH		(0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_VM_DEAD			(1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_UNBLOCK			2
+#define KVM_REQ_DIRTY_RING_SOFT_FULL	3
+#define KVM_REQUEST_ARCH_BASE		8
 
 /*
  * KVM_REQ_OUTSIDE_GUEST_MODE exists is purely as way to force the vCPU to
diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index d6fabf238032..fecbb7d75ad2 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -26,7 +26,7 @@ static u32 kvm_dirty_ring_used(struct kvm_dirty_ring *ring)
 	return READ_ONCE(ring->dirty_index) - READ_ONCE(ring->reset_index);
 }
 
-bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring)
+static bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring)
 {
 	return kvm_dirty_ring_used(ring) >= ring->soft_limit;
 }
@@ -142,13 +142,19 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
 
 	kvm_reset_dirty_gfn(kvm, cur_slot, cur_offset, mask);
 
+	/*
+	 * The request KVM_REQ_DIRTY_RING_SOFT_FULL will be cleared
+	 * by the VCPU thread next time when it enters the guest.
+	 */
+
 	trace_kvm_dirty_ring_reset(ring);
 
 	return count;
 }
 
-void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset)
+void kvm_dirty_ring_push(struct kvm_vcpu *vcpu, u32 slot, u64 offset)
 {
+	struct kvm_dirty_ring *ring = &vcpu->dirty_ring;
 	struct kvm_dirty_gfn *entry;
 
 	/* It should never get full */
@@ -166,6 +172,28 @@ void kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset)
 	kvm_dirty_gfn_set_dirtied(entry);
 	ring->dirty_index++;
 	trace_kvm_dirty_ring_push(ring, slot, offset);
+
+	if (kvm_dirty_ring_soft_full(ring))
+		kvm_make_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu);
+}
+
+bool kvm_dirty_ring_check_request(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * The VCPU isn't runnable when the dirty ring becomes soft full.
+	 * The KVM_REQ_DIRTY_RING_SOFT_FULL event is always set to prevent
+	 * the VCPU from running until the dirty pages are harvested and
+	 * the dirty ring is reset by userspace.
+	 */
+	if (kvm_check_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu) &&
+	    kvm_dirty_ring_soft_full(&vcpu->dirty_ring)) {
+		kvm_make_request(KVM_REQ_DIRTY_RING_SOFT_FULL, vcpu);
+		vcpu->run->exit_reason = KVM_EXIT_DIRTY_RING_FULL;
+		trace_kvm_dirty_ring_exit(vcpu);
+		return true;
+	}
+
+	return false;
 }
 
 struct page *kvm_dirty_ring_get_page(struct kvm_dirty_ring *ring, u32 offset)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 25d7872b29c1..c865d7d82685 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3314,8 +3314,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
 		u32 slot = (memslot->as_id << 16) | memslot->id;
 
 		if (kvm->dirty_ring_size)
-			kvm_dirty_ring_push(&vcpu->dirty_ring,
-					    slot, rel_gfn);
+			kvm_dirty_ring_push(vcpu, slot, rel_gfn);
 		else
 			set_bit_le(rel_gfn, memslot->dirty_bitmap);
 	}
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v8 2/7] KVM: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h
  2022-11-04 23:40 ` Gavin Shan
@ 2022-11-04 23:40   ` Gavin Shan
  -1 siblings, 0 replies; 66+ messages in thread
From: Gavin Shan @ 2022-11-04 23:40 UTC (permalink / raw)
  To: kvmarm
  Cc: maz, kvm, catalin.marinas, andrew.jones, will, shan.gavin,
	bgardon, dmatlack, pbonzini, zhenyzha, shuah, kvmarm, ajones

Not all architectures like ARM64 need to override the function. Move
its declaration to kvm_dirty_ring.h to avoid the following compiling
warning on ARM64 when the feature is enabled.

  arch/arm64/kvm/../../../virt/kvm/dirty_ring.c:14:12:        \
  warning: no previous prototype for 'kvm_cpu_dirty_log_size' \
  [-Wmissing-prototypes]                                      \
  int __weak kvm_cpu_dirty_log_size(void)

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 2 --
 include/linux/kvm_dirty_ring.h  | 1 +
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7551b6f9c31c..b4dbde7d9eb1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2090,8 +2090,6 @@ static inline int kvm_cpu_get_apicid(int mps_cpu)
 #define GET_SMSTATE(type, buf, offset)		\
 	(*(type *)((buf) + (offset) - 0x7e00))
 
-int kvm_cpu_dirty_log_size(void);
-
 int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
 
 #define KVM_CLOCK_VALID_FLAGS						\
diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
index 9c13c4c3d30c..199ead37b104 100644
--- a/include/linux/kvm_dirty_ring.h
+++ b/include/linux/kvm_dirty_ring.h
@@ -66,6 +66,7 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
 
 #else /* CONFIG_HAVE_KVM_DIRTY_RING */
 
+int kvm_cpu_dirty_log_size(void);
 u32 kvm_dirty_ring_get_rsvd_entries(void);
 int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size);
 
-- 
2.23.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v8 2/7] KVM: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h
@ 2022-11-04 23:40   ` Gavin Shan
  0 siblings, 0 replies; 66+ messages in thread
From: Gavin Shan @ 2022-11-04 23:40 UTC (permalink / raw)
  To: kvmarm
  Cc: kvmarm, kvm, shuah, catalin.marinas, andrew.jones, ajones,
	bgardon, dmatlack, will, suzuki.poulose, alexandru.elisei,
	pbonzini, maz, peterx, seanjc, oliver.upton, zhenyzha,
	shan.gavin

Not all architectures like ARM64 need to override the function. Move
its declaration to kvm_dirty_ring.h to avoid the following compiling
warning on ARM64 when the feature is enabled.

  arch/arm64/kvm/../../../virt/kvm/dirty_ring.c:14:12:        \
  warning: no previous prototype for 'kvm_cpu_dirty_log_size' \
  [-Wmissing-prototypes]                                      \
  int __weak kvm_cpu_dirty_log_size(void)

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 arch/x86/include/asm/kvm_host.h | 2 --
 include/linux/kvm_dirty_ring.h  | 1 +
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7551b6f9c31c..b4dbde7d9eb1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2090,8 +2090,6 @@ static inline int kvm_cpu_get_apicid(int mps_cpu)
 #define GET_SMSTATE(type, buf, offset)		\
 	(*(type *)((buf) + (offset) - 0x7e00))
 
-int kvm_cpu_dirty_log_size(void);
-
 int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
 
 #define KVM_CLOCK_VALID_FLAGS						\
diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
index 9c13c4c3d30c..199ead37b104 100644
--- a/include/linux/kvm_dirty_ring.h
+++ b/include/linux/kvm_dirty_ring.h
@@ -66,6 +66,7 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
 
 #else /* CONFIG_HAVE_KVM_DIRTY_RING */
 
+int kvm_cpu_dirty_log_size(void);
 u32 kvm_dirty_ring_get_rsvd_entries(void);
 int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size);
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
  2022-11-04 23:40 ` Gavin Shan
@ 2022-11-04 23:40   ` Gavin Shan
  -1 siblings, 0 replies; 66+ messages in thread
From: Gavin Shan @ 2022-11-04 23:40 UTC (permalink / raw)
  To: kvmarm
  Cc: maz, kvm, catalin.marinas, andrew.jones, will, shan.gavin,
	bgardon, dmatlack, pbonzini, zhenyzha, shuah, kvmarm, ajones

ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
enabled. It's conflicting with that ring-based dirty page tracking always
requires a running VCPU context.

Introduce a new flavor of dirty ring that requires the use of both VCPU
dirty rings and a dirty bitmap. The expectation is that for non-VCPU
sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
the dirty bitmap. Userspace should scan the dirty bitmap before migrating
the VM to the target.

Use an additional capability to advertise this behavior. The newly added
capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.

Suggested-by: Marc Zyngier <maz@kernel.org>
Suggested-by: Peter Xu <peterx@redhat.com>
Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
---
 Documentation/virt/kvm/api.rst | 33 ++++++++++++++++++-----
 include/linux/kvm_dirty_ring.h |  7 +++++
 include/linux/kvm_host.h       |  1 +
 include/uapi/linux/kvm.h       |  1 +
 virt/kvm/Kconfig               |  8 ++++++
 virt/kvm/dirty_ring.c          | 10 +++++++
 virt/kvm/kvm_main.c            | 49 +++++++++++++++++++++++++++-------
 7 files changed, 93 insertions(+), 16 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index eee9f857a986..2ec32bd41792 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8003,13 +8003,6 @@ flushing is done by the KVM_GET_DIRTY_LOG ioctl).  To achieve that, one
 needs to kick the vcpu out of KVM_RUN using a signal.  The resulting
 vmexit ensures that all dirty GFNs are flushed to the dirty rings.
 
-NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the corresponding
-ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls
-KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG.  After enabling
-KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual
-machine will switch to ring-buffer dirty page tracking and further
-KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail.
-
 NOTE: KVM_CAP_DIRTY_LOG_RING_ACQ_REL is the only capability that
 should be exposed by weakly ordered architecture, in order to indicate
 the additional memory ordering requirements imposed on userspace when
@@ -8018,6 +8011,32 @@ Architecture with TSO-like ordering (such as x86) are allowed to
 expose both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL
 to userspace.
 
+After using the dirty rings, the userspace needs to detect the capability
+of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures
+need to be backed by per-slot bitmaps. With this capability advertised
+and supported, it means the architecture can dirty guest pages without
+vcpu/ring context, so that some of the dirty information will still be
+maintained in the bitmap structure. KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
+can't be enabled until the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL
+has been enabled.
+
+Note that the bitmap here is only a backup of the ring structure, and
+normally should only contain a very small amount of dirty pages, which
+needs to be transferred during VM downtime. Collecting the dirty bitmap
+should be the very last thing that the VMM does before transmitting state
+to the target VM. VMM needs to ensure that the dirty state is final and
+avoid missing dirty pages from another ioctl ordered after the bitmap
+collection.
+
+To collect dirty bits in the backup bitmap, the userspace can use the
+same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG shouldn't be needed
+and its behavior is undefined since collecting the dirty bitmap always
+happens in the last phase of VM's migration.
+
+NOTE: One example of using the backup bitmap is saving arm64 vgic/its
+tables through KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on
+KVM device "kvm-arm-vgic-its" during VM's migration.
+
 8.30 KVM_CAP_XEN_HVM
 --------------------
 
diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
index 199ead37b104..4862c98d80d3 100644
--- a/include/linux/kvm_dirty_ring.h
+++ b/include/linux/kvm_dirty_ring.h
@@ -37,6 +37,11 @@ static inline u32 kvm_dirty_ring_get_rsvd_entries(void)
 	return 0;
 }
 
+static inline bool kvm_use_dirty_bitmap(struct kvm *kvm)
+{
+	return true;
+}
+
 static inline int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring,
 				       int index, u32 size)
 {
@@ -67,6 +72,8 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
 #else /* CONFIG_HAVE_KVM_DIRTY_RING */
 
 int kvm_cpu_dirty_log_size(void);
+bool kvm_use_dirty_bitmap(struct kvm *kvm);
+bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm);
 u32 kvm_dirty_ring_get_rsvd_entries(void);
 int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size);
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6fab55e58111..f51eb9419bfc 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -779,6 +779,7 @@ struct kvm {
 	pid_t userspace_pid;
 	unsigned int max_halt_poll_ns;
 	u32 dirty_ring_size;
+	bool dirty_ring_with_bitmap;
 	bool vm_bugged;
 	bool vm_dead;
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 0d5d4419139a..c87b5882d7ae 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1178,6 +1178,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_S390_ZPCI_OP 221
 #define KVM_CAP_S390_CPU_TOPOLOGY 222
 #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
+#define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 224
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 800f9470e36b..228be1145cf3 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -33,6 +33,14 @@ config HAVE_KVM_DIRTY_RING_ACQ_REL
        bool
        select HAVE_KVM_DIRTY_RING
 
+# Only architectures that need to dirty memory outside of a vCPU
+# context should select this, advertising to userspace the
+# requirement to use a dirty bitmap in addition to the vCPU dirty
+# ring.
+config HAVE_KVM_DIRTY_RING_WITH_BITMAP
+	bool
+	depends on HAVE_KVM_DIRTY_RING
+
 config HAVE_KVM_EVENTFD
        bool
        select EVENTFD
diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index fecbb7d75ad2..758679724447 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -21,6 +21,16 @@ u32 kvm_dirty_ring_get_rsvd_entries(void)
 	return KVM_DIRTY_RING_RSVD_ENTRIES + kvm_cpu_dirty_log_size();
 }
 
+bool kvm_use_dirty_bitmap(struct kvm *kvm)
+{
+	return !kvm->dirty_ring_size || kvm->dirty_ring_with_bitmap;
+}
+
+bool __weak kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
+{
+	return false;
+}
+
 static u32 kvm_dirty_ring_used(struct kvm_dirty_ring *ring)
 {
 	return READ_ONCE(ring->dirty_index) - READ_ONCE(ring->reset_index);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c865d7d82685..746133b23a66 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1617,7 +1617,7 @@ static int kvm_prepare_memory_region(struct kvm *kvm,
 			new->dirty_bitmap = NULL;
 		else if (old && old->dirty_bitmap)
 			new->dirty_bitmap = old->dirty_bitmap;
-		else if (!kvm->dirty_ring_size) {
+		else if (kvm_use_dirty_bitmap(kvm)) {
 			r = kvm_alloc_dirty_bitmap(new);
 			if (r)
 				return r;
@@ -2060,8 +2060,8 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
 	unsigned long n;
 	unsigned long any = 0;
 
-	/* Dirty ring tracking is exclusive to dirty log tracking */
-	if (kvm->dirty_ring_size)
+	/* Dirty ring tracking may be exclusive to dirty log tracking */
+	if (!kvm_use_dirty_bitmap(kvm))
 		return -ENXIO;
 
 	*memslot = NULL;
@@ -2125,8 +2125,8 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
 	unsigned long *dirty_bitmap_buffer;
 	bool flush;
 
-	/* Dirty ring tracking is exclusive to dirty log tracking */
-	if (kvm->dirty_ring_size)
+	/* Dirty ring tracking may be exclusive to dirty log tracking */
+	if (!kvm_use_dirty_bitmap(kvm))
 		return -ENXIO;
 
 	as_id = log->slot >> 16;
@@ -2237,8 +2237,8 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
 	unsigned long *dirty_bitmap_buffer;
 	bool flush;
 
-	/* Dirty ring tracking is exclusive to dirty log tracking */
-	if (kvm->dirty_ring_size)
+	/* Dirty ring tracking may be exclusive to dirty log tracking */
+	if (!kvm_use_dirty_bitmap(kvm))
 		return -ENXIO;
 
 	as_id = log->slot >> 16;
@@ -3305,7 +3305,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
 	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
 
 #ifdef CONFIG_HAVE_KVM_DIRTY_RING
-	if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
+	if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
+		return;
+
+	if (WARN_ON_ONCE(!kvm_arch_allow_write_without_running_vcpu(kvm) && !vcpu))
 		return;
 #endif
 
@@ -3313,7 +3316,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
 		unsigned long rel_gfn = gfn - memslot->base_gfn;
 		u32 slot = (memslot->as_id << 16) | memslot->id;
 
-		if (kvm->dirty_ring_size)
+		if (kvm->dirty_ring_size && vcpu)
 			kvm_dirty_ring_push(vcpu, slot, rel_gfn);
 		else
 			set_bit_le(rel_gfn, memslot->dirty_bitmap);
@@ -4482,6 +4485,9 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 		return KVM_DIRTY_RING_MAX_ENTRIES * sizeof(struct kvm_dirty_gfn);
 #else
 		return 0;
+#endif
+#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
+	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
 #endif
 	case KVM_CAP_BINARY_STATS_FD:
 	case KVM_CAP_SYSTEM_EVENT_DATA:
@@ -4588,6 +4594,31 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
 			return -EINVAL;
 
 		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
+	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP: {
+		struct kvm_memslots *slots;
+		int r = -EINVAL;
+
+		if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
+		    !kvm->dirty_ring_size)
+			return r;
+
+		mutex_lock(&kvm->slots_lock);
+
+		slots = kvm_memslots(kvm);
+
+		/*
+		 * Avoid a race between memslot creation and enabling the ring +
+		 * bitmap capability to guarantee that no memslots have been
+		 * created without a bitmap.
+		 */
+		if (kvm_memslots_empty(slots)) {
+			kvm->dirty_ring_with_bitmap = cap->args[0];
+			r = 0;
+		}
+
+		mutex_unlock(&kvm->slots_lock);
+		return r;
+	}
 	default:
 		return kvm_vm_ioctl_enable_cap(kvm, cap);
 	}
-- 
2.23.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
@ 2022-11-04 23:40   ` Gavin Shan
  0 siblings, 0 replies; 66+ messages in thread
From: Gavin Shan @ 2022-11-04 23:40 UTC (permalink / raw)
  To: kvmarm
  Cc: kvmarm, kvm, shuah, catalin.marinas, andrew.jones, ajones,
	bgardon, dmatlack, will, suzuki.poulose, alexandru.elisei,
	pbonzini, maz, peterx, seanjc, oliver.upton, zhenyzha,
	shan.gavin

ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
enabled. It's conflicting with that ring-based dirty page tracking always
requires a running VCPU context.

Introduce a new flavor of dirty ring that requires the use of both VCPU
dirty rings and a dirty bitmap. The expectation is that for non-VCPU
sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
the dirty bitmap. Userspace should scan the dirty bitmap before migrating
the VM to the target.

Use an additional capability to advertise this behavior. The newly added
capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.

Suggested-by: Marc Zyngier <maz@kernel.org>
Suggested-by: Peter Xu <peterx@redhat.com>
Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
---
 Documentation/virt/kvm/api.rst | 33 ++++++++++++++++++-----
 include/linux/kvm_dirty_ring.h |  7 +++++
 include/linux/kvm_host.h       |  1 +
 include/uapi/linux/kvm.h       |  1 +
 virt/kvm/Kconfig               |  8 ++++++
 virt/kvm/dirty_ring.c          | 10 +++++++
 virt/kvm/kvm_main.c            | 49 +++++++++++++++++++++++++++-------
 7 files changed, 93 insertions(+), 16 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index eee9f857a986..2ec32bd41792 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -8003,13 +8003,6 @@ flushing is done by the KVM_GET_DIRTY_LOG ioctl).  To achieve that, one
 needs to kick the vcpu out of KVM_RUN using a signal.  The resulting
 vmexit ensures that all dirty GFNs are flushed to the dirty rings.
 
-NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the corresponding
-ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls
-KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG.  After enabling
-KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual
-machine will switch to ring-buffer dirty page tracking and further
-KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail.
-
 NOTE: KVM_CAP_DIRTY_LOG_RING_ACQ_REL is the only capability that
 should be exposed by weakly ordered architecture, in order to indicate
 the additional memory ordering requirements imposed on userspace when
@@ -8018,6 +8011,32 @@ Architecture with TSO-like ordering (such as x86) are allowed to
 expose both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL
 to userspace.
 
+After using the dirty rings, the userspace needs to detect the capability
+of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures
+need to be backed by per-slot bitmaps. With this capability advertised
+and supported, it means the architecture can dirty guest pages without
+vcpu/ring context, so that some of the dirty information will still be
+maintained in the bitmap structure. KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
+can't be enabled until the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL
+has been enabled.
+
+Note that the bitmap here is only a backup of the ring structure, and
+normally should only contain a very small amount of dirty pages, which
+needs to be transferred during VM downtime. Collecting the dirty bitmap
+should be the very last thing that the VMM does before transmitting state
+to the target VM. VMM needs to ensure that the dirty state is final and
+avoid missing dirty pages from another ioctl ordered after the bitmap
+collection.
+
+To collect dirty bits in the backup bitmap, the userspace can use the
+same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG shouldn't be needed
+and its behavior is undefined since collecting the dirty bitmap always
+happens in the last phase of VM's migration.
+
+NOTE: One example of using the backup bitmap is saving arm64 vgic/its
+tables through KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on
+KVM device "kvm-arm-vgic-its" during VM's migration.
+
 8.30 KVM_CAP_XEN_HVM
 --------------------
 
diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
index 199ead37b104..4862c98d80d3 100644
--- a/include/linux/kvm_dirty_ring.h
+++ b/include/linux/kvm_dirty_ring.h
@@ -37,6 +37,11 @@ static inline u32 kvm_dirty_ring_get_rsvd_entries(void)
 	return 0;
 }
 
+static inline bool kvm_use_dirty_bitmap(struct kvm *kvm)
+{
+	return true;
+}
+
 static inline int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring,
 				       int index, u32 size)
 {
@@ -67,6 +72,8 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
 #else /* CONFIG_HAVE_KVM_DIRTY_RING */
 
 int kvm_cpu_dirty_log_size(void);
+bool kvm_use_dirty_bitmap(struct kvm *kvm);
+bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm);
 u32 kvm_dirty_ring_get_rsvd_entries(void);
 int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size);
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6fab55e58111..f51eb9419bfc 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -779,6 +779,7 @@ struct kvm {
 	pid_t userspace_pid;
 	unsigned int max_halt_poll_ns;
 	u32 dirty_ring_size;
+	bool dirty_ring_with_bitmap;
 	bool vm_bugged;
 	bool vm_dead;
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 0d5d4419139a..c87b5882d7ae 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1178,6 +1178,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_S390_ZPCI_OP 221
 #define KVM_CAP_S390_CPU_TOPOLOGY 222
 #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
+#define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 224
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 800f9470e36b..228be1145cf3 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -33,6 +33,14 @@ config HAVE_KVM_DIRTY_RING_ACQ_REL
        bool
        select HAVE_KVM_DIRTY_RING
 
+# Only architectures that need to dirty memory outside of a vCPU
+# context should select this, advertising to userspace the
+# requirement to use a dirty bitmap in addition to the vCPU dirty
+# ring.
+config HAVE_KVM_DIRTY_RING_WITH_BITMAP
+	bool
+	depends on HAVE_KVM_DIRTY_RING
+
 config HAVE_KVM_EVENTFD
        bool
        select EVENTFD
diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index fecbb7d75ad2..758679724447 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -21,6 +21,16 @@ u32 kvm_dirty_ring_get_rsvd_entries(void)
 	return KVM_DIRTY_RING_RSVD_ENTRIES + kvm_cpu_dirty_log_size();
 }
 
+bool kvm_use_dirty_bitmap(struct kvm *kvm)
+{
+	return !kvm->dirty_ring_size || kvm->dirty_ring_with_bitmap;
+}
+
+bool __weak kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
+{
+	return false;
+}
+
 static u32 kvm_dirty_ring_used(struct kvm_dirty_ring *ring)
 {
 	return READ_ONCE(ring->dirty_index) - READ_ONCE(ring->reset_index);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c865d7d82685..746133b23a66 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1617,7 +1617,7 @@ static int kvm_prepare_memory_region(struct kvm *kvm,
 			new->dirty_bitmap = NULL;
 		else if (old && old->dirty_bitmap)
 			new->dirty_bitmap = old->dirty_bitmap;
-		else if (!kvm->dirty_ring_size) {
+		else if (kvm_use_dirty_bitmap(kvm)) {
 			r = kvm_alloc_dirty_bitmap(new);
 			if (r)
 				return r;
@@ -2060,8 +2060,8 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
 	unsigned long n;
 	unsigned long any = 0;
 
-	/* Dirty ring tracking is exclusive to dirty log tracking */
-	if (kvm->dirty_ring_size)
+	/* Dirty ring tracking may be exclusive to dirty log tracking */
+	if (!kvm_use_dirty_bitmap(kvm))
 		return -ENXIO;
 
 	*memslot = NULL;
@@ -2125,8 +2125,8 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
 	unsigned long *dirty_bitmap_buffer;
 	bool flush;
 
-	/* Dirty ring tracking is exclusive to dirty log tracking */
-	if (kvm->dirty_ring_size)
+	/* Dirty ring tracking may be exclusive to dirty log tracking */
+	if (!kvm_use_dirty_bitmap(kvm))
 		return -ENXIO;
 
 	as_id = log->slot >> 16;
@@ -2237,8 +2237,8 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
 	unsigned long *dirty_bitmap_buffer;
 	bool flush;
 
-	/* Dirty ring tracking is exclusive to dirty log tracking */
-	if (kvm->dirty_ring_size)
+	/* Dirty ring tracking may be exclusive to dirty log tracking */
+	if (!kvm_use_dirty_bitmap(kvm))
 		return -ENXIO;
 
 	as_id = log->slot >> 16;
@@ -3305,7 +3305,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
 	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
 
 #ifdef CONFIG_HAVE_KVM_DIRTY_RING
-	if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
+	if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
+		return;
+
+	if (WARN_ON_ONCE(!kvm_arch_allow_write_without_running_vcpu(kvm) && !vcpu))
 		return;
 #endif
 
@@ -3313,7 +3316,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
 		unsigned long rel_gfn = gfn - memslot->base_gfn;
 		u32 slot = (memslot->as_id << 16) | memslot->id;
 
-		if (kvm->dirty_ring_size)
+		if (kvm->dirty_ring_size && vcpu)
 			kvm_dirty_ring_push(vcpu, slot, rel_gfn);
 		else
 			set_bit_le(rel_gfn, memslot->dirty_bitmap);
@@ -4482,6 +4485,9 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
 		return KVM_DIRTY_RING_MAX_ENTRIES * sizeof(struct kvm_dirty_gfn);
 #else
 		return 0;
+#endif
+#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
+	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
 #endif
 	case KVM_CAP_BINARY_STATS_FD:
 	case KVM_CAP_SYSTEM_EVENT_DATA:
@@ -4588,6 +4594,31 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
 			return -EINVAL;
 
 		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
+	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP: {
+		struct kvm_memslots *slots;
+		int r = -EINVAL;
+
+		if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
+		    !kvm->dirty_ring_size)
+			return r;
+
+		mutex_lock(&kvm->slots_lock);
+
+		slots = kvm_memslots(kvm);
+
+		/*
+		 * Avoid a race between memslot creation and enabling the ring +
+		 * bitmap capability to guarantee that no memslots have been
+		 * created without a bitmap.
+		 */
+		if (kvm_memslots_empty(slots)) {
+			kvm->dirty_ring_with_bitmap = cap->args[0];
+			r = 0;
+		}
+
+		mutex_unlock(&kvm->slots_lock);
+		return r;
+	}
 	default:
 		return kvm_vm_ioctl_enable_cap(kvm, cap);
 	}
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v8 4/7] KVM: arm64: Enable ring-based dirty memory tracking
  2022-11-04 23:40 ` Gavin Shan
@ 2022-11-04 23:40   ` Gavin Shan
  -1 siblings, 0 replies; 66+ messages in thread
From: Gavin Shan @ 2022-11-04 23:40 UTC (permalink / raw)
  To: kvmarm
  Cc: maz, kvm, catalin.marinas, andrew.jones, will, shan.gavin,
	bgardon, dmatlack, pbonzini, zhenyzha, shuah, kvmarm, ajones

Enable ring-based dirty memory tracking on arm64 by selecting
CONFIG_HAVE_KVM_DIRTY_{RING_ACQ_REL, RING_WITH_BITMAP} and providing
the ring buffer's physical page offset (KVM_DIRTY_LOG_PAGE_OFFSET).

Besides, helper kvm_vgic_save_its_tables_in_progress() is added to
indicate if vgic/its tables are being saved or not. The helper is used
in ARM64's kvm_arch_allow_write_without_running_vcpu() to keep the
site of saving vgic/its tables out of no-running-vcpu radar.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 Documentation/virt/kvm/api.rst     |  2 +-
 arch/arm64/include/uapi/asm/kvm.h  |  1 +
 arch/arm64/kvm/Kconfig             |  2 ++
 arch/arm64/kvm/arm.c               |  3 +++
 arch/arm64/kvm/mmu.c               | 15 +++++++++++++++
 arch/arm64/kvm/vgic/vgic-its.c     |  3 +++
 arch/arm64/kvm/vgic/vgic-mmio-v3.c |  7 +++++++
 include/kvm/arm_vgic.h             |  2 ++
 8 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 2ec32bd41792..2fc68f684ad8 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7921,7 +7921,7 @@ regardless of what has actually been exposed through the CPUID leaf.
 8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
 ----------------------------------------------------------
 
-:Architectures: x86
+:Architectures: x86, arm64
 :Parameters: args[0] - size of the dirty log ring
 
 KVM is capable of tracking dirty memory using ring buffers that are
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 316917b98707..a7a857f1784d 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -43,6 +43,7 @@
 #define __KVM_HAVE_VCPU_EVENTS
 
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
+#define KVM_DIRTY_LOG_PAGE_OFFSET 64
 
 #define KVM_REG_SIZE(id)						\
 	(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 815cc118c675..066b053e9eb9 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -32,6 +32,8 @@ menuconfig KVM
 	select KVM_VFIO
 	select HAVE_KVM_EVENTFD
 	select HAVE_KVM_IRQFD
+	select HAVE_KVM_DIRTY_RING_ACQ_REL
+	select HAVE_KVM_DIRTY_RING_WITH_BITMAP
 	select HAVE_KVM_MSI
 	select HAVE_KVM_IRQCHIP
 	select HAVE_KVM_IRQ_ROUTING
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 94d33e296e10..6b097605e38c 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -746,6 +746,9 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
 
 		if (kvm_check_request(KVM_REQ_SUSPEND, vcpu))
 			return kvm_vcpu_suspend(vcpu);
+
+		if (kvm_dirty_ring_check_request(vcpu))
+			return 0;
 	}
 
 	return 1;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 60ee3d9f01f8..fbeb55e45f53 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -932,6 +932,21 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
 }
 
+/*
+ * kvm_arch_allow_write_without_running_vcpu - allow writing guest memory
+ * without the running VCPU when dirty ring is enabled.
+ *
+ * The running VCPU is required to track dirty guest pages when dirty ring
+ * is enabled. Otherwise, the backup bitmap should be used to track the
+ * dirty guest pages. When vgic/its tables are being saved, the backup
+ * bitmap is used to track the dirty guest pages due to the missed running
+ * VCPU in the period.
+ */
+bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
+{
+	return kvm_vgic_save_its_tables_in_progress(kvm);
+}
+
 static void kvm_send_hwpoison_signal(unsigned long address, short lsb)
 {
 	send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb, current);
diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
index 733b53055f97..dd340bb812bd 100644
--- a/arch/arm64/kvm/vgic/vgic-its.c
+++ b/arch/arm64/kvm/vgic/vgic-its.c
@@ -2743,6 +2743,7 @@ static int vgic_its_has_attr(struct kvm_device *dev,
 static int vgic_its_ctrl(struct kvm *kvm, struct vgic_its *its, u64 attr)
 {
 	const struct vgic_its_abi *abi = vgic_its_get_abi(its);
+	struct vgic_dist *dist = &kvm->arch.vgic;
 	int ret = 0;
 
 	if (attr == KVM_DEV_ARM_VGIC_CTRL_INIT) /* Nothing to do */
@@ -2762,7 +2763,9 @@ static int vgic_its_ctrl(struct kvm *kvm, struct vgic_its *its, u64 attr)
 		vgic_its_reset(kvm, its);
 		break;
 	case KVM_DEV_ARM_ITS_SAVE_TABLES:
+		dist->save_its_tables_in_progress = true;
 		ret = abi->save_tables(its);
+		dist->save_its_tables_in_progress = false;
 		break;
 	case KVM_DEV_ARM_ITS_RESTORE_TABLES:
 		ret = abi->restore_tables(its);
diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
index 91201f743033..b63898f86e9e 100644
--- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
@@ -54,6 +54,13 @@ bool vgic_supports_direct_msis(struct kvm *kvm)
 		(kvm_vgic_global_state.has_gicv4 && vgic_has_its(kvm)));
 }
 
+bool kvm_vgic_save_its_tables_in_progress(struct kvm *kvm)
+{
+	struct vgic_dist *dist = &kvm->arch.vgic;
+
+	return dist->save_its_tables_in_progress;
+}
+
 /*
  * The Revision field in the IIDR have the following meanings:
  *
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 4df9e73a8bb5..db42fbd47bcf 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -263,6 +263,7 @@ struct vgic_dist {
 	struct vgic_io_device	dist_iodev;
 
 	bool			has_its;
+	bool			save_its_tables_in_progress;
 
 	/*
 	 * Contains the attributes and gpa of the LPI configuration table.
@@ -374,6 +375,7 @@ int kvm_vgic_map_resources(struct kvm *kvm);
 int kvm_vgic_hyp_init(void);
 void kvm_vgic_init_cpu_hardware(void);
 
+bool kvm_vgic_save_its_tables_in_progress(struct kvm *kvm);
 int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
 			bool level, void *owner);
 int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, unsigned int host_irq,
-- 
2.23.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v8 4/7] KVM: arm64: Enable ring-based dirty memory tracking
@ 2022-11-04 23:40   ` Gavin Shan
  0 siblings, 0 replies; 66+ messages in thread
From: Gavin Shan @ 2022-11-04 23:40 UTC (permalink / raw)
  To: kvmarm
  Cc: kvmarm, kvm, shuah, catalin.marinas, andrew.jones, ajones,
	bgardon, dmatlack, will, suzuki.poulose, alexandru.elisei,
	pbonzini, maz, peterx, seanjc, oliver.upton, zhenyzha,
	shan.gavin

Enable ring-based dirty memory tracking on arm64 by selecting
CONFIG_HAVE_KVM_DIRTY_{RING_ACQ_REL, RING_WITH_BITMAP} and providing
the ring buffer's physical page offset (KVM_DIRTY_LOG_PAGE_OFFSET).

Besides, helper kvm_vgic_save_its_tables_in_progress() is added to
indicate if vgic/its tables are being saved or not. The helper is used
in ARM64's kvm_arch_allow_write_without_running_vcpu() to keep the
site of saving vgic/its tables out of no-running-vcpu radar.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 Documentation/virt/kvm/api.rst     |  2 +-
 arch/arm64/include/uapi/asm/kvm.h  |  1 +
 arch/arm64/kvm/Kconfig             |  2 ++
 arch/arm64/kvm/arm.c               |  3 +++
 arch/arm64/kvm/mmu.c               | 15 +++++++++++++++
 arch/arm64/kvm/vgic/vgic-its.c     |  3 +++
 arch/arm64/kvm/vgic/vgic-mmio-v3.c |  7 +++++++
 include/kvm/arm_vgic.h             |  2 ++
 8 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 2ec32bd41792..2fc68f684ad8 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7921,7 +7921,7 @@ regardless of what has actually been exposed through the CPUID leaf.
 8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
 ----------------------------------------------------------
 
-:Architectures: x86
+:Architectures: x86, arm64
 :Parameters: args[0] - size of the dirty log ring
 
 KVM is capable of tracking dirty memory using ring buffers that are
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 316917b98707..a7a857f1784d 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -43,6 +43,7 @@
 #define __KVM_HAVE_VCPU_EVENTS
 
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
+#define KVM_DIRTY_LOG_PAGE_OFFSET 64
 
 #define KVM_REG_SIZE(id)						\
 	(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 815cc118c675..066b053e9eb9 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -32,6 +32,8 @@ menuconfig KVM
 	select KVM_VFIO
 	select HAVE_KVM_EVENTFD
 	select HAVE_KVM_IRQFD
+	select HAVE_KVM_DIRTY_RING_ACQ_REL
+	select HAVE_KVM_DIRTY_RING_WITH_BITMAP
 	select HAVE_KVM_MSI
 	select HAVE_KVM_IRQCHIP
 	select HAVE_KVM_IRQ_ROUTING
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 94d33e296e10..6b097605e38c 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -746,6 +746,9 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
 
 		if (kvm_check_request(KVM_REQ_SUSPEND, vcpu))
 			return kvm_vcpu_suspend(vcpu);
+
+		if (kvm_dirty_ring_check_request(vcpu))
+			return 0;
 	}
 
 	return 1;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 60ee3d9f01f8..fbeb55e45f53 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -932,6 +932,21 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
 }
 
+/*
+ * kvm_arch_allow_write_without_running_vcpu - allow writing guest memory
+ * without the running VCPU when dirty ring is enabled.
+ *
+ * The running VCPU is required to track dirty guest pages when dirty ring
+ * is enabled. Otherwise, the backup bitmap should be used to track the
+ * dirty guest pages. When vgic/its tables are being saved, the backup
+ * bitmap is used to track the dirty guest pages due to the missed running
+ * VCPU in the period.
+ */
+bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
+{
+	return kvm_vgic_save_its_tables_in_progress(kvm);
+}
+
 static void kvm_send_hwpoison_signal(unsigned long address, short lsb)
 {
 	send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb, current);
diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
index 733b53055f97..dd340bb812bd 100644
--- a/arch/arm64/kvm/vgic/vgic-its.c
+++ b/arch/arm64/kvm/vgic/vgic-its.c
@@ -2743,6 +2743,7 @@ static int vgic_its_has_attr(struct kvm_device *dev,
 static int vgic_its_ctrl(struct kvm *kvm, struct vgic_its *its, u64 attr)
 {
 	const struct vgic_its_abi *abi = vgic_its_get_abi(its);
+	struct vgic_dist *dist = &kvm->arch.vgic;
 	int ret = 0;
 
 	if (attr == KVM_DEV_ARM_VGIC_CTRL_INIT) /* Nothing to do */
@@ -2762,7 +2763,9 @@ static int vgic_its_ctrl(struct kvm *kvm, struct vgic_its *its, u64 attr)
 		vgic_its_reset(kvm, its);
 		break;
 	case KVM_DEV_ARM_ITS_SAVE_TABLES:
+		dist->save_its_tables_in_progress = true;
 		ret = abi->save_tables(its);
+		dist->save_its_tables_in_progress = false;
 		break;
 	case KVM_DEV_ARM_ITS_RESTORE_TABLES:
 		ret = abi->restore_tables(its);
diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
index 91201f743033..b63898f86e9e 100644
--- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
@@ -54,6 +54,13 @@ bool vgic_supports_direct_msis(struct kvm *kvm)
 		(kvm_vgic_global_state.has_gicv4 && vgic_has_its(kvm)));
 }
 
+bool kvm_vgic_save_its_tables_in_progress(struct kvm *kvm)
+{
+	struct vgic_dist *dist = &kvm->arch.vgic;
+
+	return dist->save_its_tables_in_progress;
+}
+
 /*
  * The Revision field in the IIDR have the following meanings:
  *
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 4df9e73a8bb5..db42fbd47bcf 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -263,6 +263,7 @@ struct vgic_dist {
 	struct vgic_io_device	dist_iodev;
 
 	bool			has_its;
+	bool			save_its_tables_in_progress;
 
 	/*
 	 * Contains the attributes and gpa of the LPI configuration table.
@@ -374,6 +375,7 @@ int kvm_vgic_map_resources(struct kvm *kvm);
 int kvm_vgic_hyp_init(void);
 void kvm_vgic_init_cpu_hardware(void);
 
+bool kvm_vgic_save_its_tables_in_progress(struct kvm *kvm);
 int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
 			bool level, void *owner);
 int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, unsigned int host_irq,
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v8 5/7] KVM: selftests: Use host page size to map ring buffer in dirty_log_test
  2022-11-04 23:40 ` Gavin Shan
@ 2022-11-04 23:40   ` Gavin Shan
  -1 siblings, 0 replies; 66+ messages in thread
From: Gavin Shan @ 2022-11-04 23:40 UTC (permalink / raw)
  To: kvmarm
  Cc: maz, kvm, catalin.marinas, andrew.jones, will, shan.gavin,
	bgardon, dmatlack, pbonzini, zhenyzha, shuah, kvmarm, ajones

In vcpu_map_dirty_ring(), the guest's page size is used to figure out
the offset in the virtual area. It works fine when we have same page
sizes on host and guest. However, it fails when the page sizes on host
and guest are different on arm64, like below error messages indicates.

  # ./dirty_log_test -M dirty-ring -m 7
  Setting log mode to: 'dirty-ring'
  Test iterations: 32, interval: 10 (ms)
  Testing guest mode: PA-bits:40,  VA-bits:48, 64K pages
  guest physical test memory offset: 0xffbffc0000
  vcpu stops because vcpu is kicked out...
  Notifying vcpu to continue
  vcpu continues now.
  ==== Test Assertion Failure ====
  lib/kvm_util.c:1477: addr == MAP_FAILED
  pid=9000 tid=9000 errno=0 - Success
  1  0x0000000000405f5b: vcpu_map_dirty_ring at kvm_util.c:1477
  2  0x0000000000402ebb: dirty_ring_collect_dirty_pages at dirty_log_test.c:349
  3  0x00000000004029b3: log_mode_collect_dirty_pages at dirty_log_test.c:478
  4  (inlined by) run_test at dirty_log_test.c:778
  5  (inlined by) run_test at dirty_log_test.c:691
  6  0x0000000000403a57: for_each_guest_mode at guest_modes.c:105
  7  0x0000000000401ccf: main at dirty_log_test.c:921
  8  0x0000ffffb06ec79b: ?? ??:0
  9  0x0000ffffb06ec86b: ?? ??:0
  10 0x0000000000401def: _start at ??:?
  Dirty ring mapped private

Fix the issue by using host's page size to map the ring buffer.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 tools/testing/selftests/kvm/lib/kvm_util.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index f1cb1627161f..89a1a420ebd5 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -1506,7 +1506,7 @@ struct kvm_reg_list *vcpu_get_reg_list(struct kvm_vcpu *vcpu)
 
 void *vcpu_map_dirty_ring(struct kvm_vcpu *vcpu)
 {
-	uint32_t page_size = vcpu->vm->page_size;
+	uint32_t page_size = getpagesize();
 	uint32_t size = vcpu->vm->dirty_ring_size;
 
 	TEST_ASSERT(size > 0, "Should enable dirty ring first");
-- 
2.23.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v8 5/7] KVM: selftests: Use host page size to map ring buffer in dirty_log_test
@ 2022-11-04 23:40   ` Gavin Shan
  0 siblings, 0 replies; 66+ messages in thread
From: Gavin Shan @ 2022-11-04 23:40 UTC (permalink / raw)
  To: kvmarm
  Cc: kvmarm, kvm, shuah, catalin.marinas, andrew.jones, ajones,
	bgardon, dmatlack, will, suzuki.poulose, alexandru.elisei,
	pbonzini, maz, peterx, seanjc, oliver.upton, zhenyzha,
	shan.gavin

In vcpu_map_dirty_ring(), the guest's page size is used to figure out
the offset in the virtual area. It works fine when we have same page
sizes on host and guest. However, it fails when the page sizes on host
and guest are different on arm64, like below error messages indicates.

  # ./dirty_log_test -M dirty-ring -m 7
  Setting log mode to: 'dirty-ring'
  Test iterations: 32, interval: 10 (ms)
  Testing guest mode: PA-bits:40,  VA-bits:48, 64K pages
  guest physical test memory offset: 0xffbffc0000
  vcpu stops because vcpu is kicked out...
  Notifying vcpu to continue
  vcpu continues now.
  ==== Test Assertion Failure ====
  lib/kvm_util.c:1477: addr == MAP_FAILED
  pid=9000 tid=9000 errno=0 - Success
  1  0x0000000000405f5b: vcpu_map_dirty_ring at kvm_util.c:1477
  2  0x0000000000402ebb: dirty_ring_collect_dirty_pages at dirty_log_test.c:349
  3  0x00000000004029b3: log_mode_collect_dirty_pages at dirty_log_test.c:478
  4  (inlined by) run_test at dirty_log_test.c:778
  5  (inlined by) run_test at dirty_log_test.c:691
  6  0x0000000000403a57: for_each_guest_mode at guest_modes.c:105
  7  0x0000000000401ccf: main at dirty_log_test.c:921
  8  0x0000ffffb06ec79b: ?? ??:0
  9  0x0000ffffb06ec86b: ?? ??:0
  10 0x0000000000401def: _start at ??:?
  Dirty ring mapped private

Fix the issue by using host's page size to map the ring buffer.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 tools/testing/selftests/kvm/lib/kvm_util.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index f1cb1627161f..89a1a420ebd5 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -1506,7 +1506,7 @@ struct kvm_reg_list *vcpu_get_reg_list(struct kvm_vcpu *vcpu)
 
 void *vcpu_map_dirty_ring(struct kvm_vcpu *vcpu)
 {
-	uint32_t page_size = vcpu->vm->page_size;
+	uint32_t page_size = getpagesize();
 	uint32_t size = vcpu->vm->dirty_ring_size;
 
 	TEST_ASSERT(size > 0, "Should enable dirty ring first");
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v8 6/7] KVM: selftests: Clear dirty ring states between two modes in dirty_log_test
  2022-11-04 23:40 ` Gavin Shan
@ 2022-11-04 23:40   ` Gavin Shan
  -1 siblings, 0 replies; 66+ messages in thread
From: Gavin Shan @ 2022-11-04 23:40 UTC (permalink / raw)
  To: kvmarm
  Cc: maz, kvm, catalin.marinas, andrew.jones, will, shan.gavin,
	bgardon, dmatlack, pbonzini, zhenyzha, shuah, kvmarm, ajones

There are two states, which need to be cleared before next mode
is executed. Otherwise, we will hit failure as the following messages
indicate.

- The variable 'dirty_ring_vcpu_ring_full' shared by main and vcpu
  thread. It's indicating if the vcpu exit due to full ring buffer.
  The value can be carried from previous mode (VM_MODE_P40V48_4K) to
  current one (VM_MODE_P40V48_64K) when VM_MODE_P40V48_16K isn't
  supported.

- The current ring buffer index needs to be reset before next mode
  (VM_MODE_P40V48_64K) is executed. Otherwise, the stale value is
  carried from previous mode (VM_MODE_P40V48_4K).

  # ./dirty_log_test -M dirty-ring
  Setting log mode to: 'dirty-ring'
  Test iterations: 32, interval: 10 (ms)
  Testing guest mode: PA-bits:40,  VA-bits:48,  4K pages
  guest physical test memory offset: 0xffbfffc000
    :
  Dirtied 995328 pages
  Total bits checked: dirty (1012434), clear (7114123), track_next (966700)
  Testing guest mode: PA-bits:40,  VA-bits:48, 64K pages
  guest physical test memory offset: 0xffbffc0000
  vcpu stops because vcpu is kicked out...
  vcpu continues now.
  Notifying vcpu to continue
  Iteration 1 collected 0 pages
  vcpu stops because dirty ring is full...
  vcpu continues now.
  vcpu stops because dirty ring is full...
  vcpu continues now.
  vcpu stops because dirty ring is full...
  ==== Test Assertion Failure ====
  dirty_log_test.c:369: cleared == count
  pid=10541 tid=10541 errno=22 - Invalid argument
     1	0x0000000000403087: dirty_ring_collect_dirty_pages at dirty_log_test.c:369
     2	0x0000000000402a0b: log_mode_collect_dirty_pages at dirty_log_test.c:492
     3	 (inlined by) run_test at dirty_log_test.c:795
     4	 (inlined by) run_test at dirty_log_test.c:705
     5	0x0000000000403a37: for_each_guest_mode at guest_modes.c:100
     6	0x0000000000401ccf: main at dirty_log_test.c:938
     7	0x0000ffff9ecd279b: ?? ??:0
     8	0x0000ffff9ecd286b: ?? ??:0
     9	0x0000000000401def: _start at ??:?
  Reset dirty pages (0) mismatch with collected (35566)

Fix the issues by clearing 'dirty_ring_vcpu_ring_full' and the ring
buffer index before next new mode is to be executed.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 tools/testing/selftests/kvm/dirty_log_test.c | 27 ++++++++++++--------
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index b5234d6efbe1..8758c10ec850 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -226,13 +226,15 @@ static void clear_log_create_vm_done(struct kvm_vm *vm)
 }
 
 static void dirty_log_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
-					  void *bitmap, uint32_t num_pages)
+					  void *bitmap, uint32_t num_pages,
+					  uint32_t *unused)
 {
 	kvm_vm_get_dirty_log(vcpu->vm, slot, bitmap);
 }
 
 static void clear_log_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
-					  void *bitmap, uint32_t num_pages)
+					  void *bitmap, uint32_t num_pages,
+					  uint32_t *unused)
 {
 	kvm_vm_get_dirty_log(vcpu->vm, slot, bitmap);
 	kvm_vm_clear_dirty_log(vcpu->vm, slot, bitmap, 0, num_pages);
@@ -329,10 +331,9 @@ static void dirty_ring_continue_vcpu(void)
 }
 
 static void dirty_ring_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
-					   void *bitmap, uint32_t num_pages)
+					   void *bitmap, uint32_t num_pages,
+					   uint32_t *ring_buf_idx)
 {
-	/* We only have one vcpu */
-	static uint32_t fetch_index = 0;
 	uint32_t count = 0, cleared;
 	bool continued_vcpu = false;
 
@@ -349,7 +350,8 @@ static void dirty_ring_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
 
 	/* Only have one vcpu */
 	count = dirty_ring_collect_one(vcpu_map_dirty_ring(vcpu),
-				       slot, bitmap, num_pages, &fetch_index);
+				       slot, bitmap, num_pages,
+				       ring_buf_idx);
 
 	cleared = kvm_vm_reset_dirty_ring(vcpu->vm);
 
@@ -406,7 +408,8 @@ struct log_mode {
 	void (*create_vm_done)(struct kvm_vm *vm);
 	/* Hook to collect the dirty pages into the bitmap provided */
 	void (*collect_dirty_pages) (struct kvm_vcpu *vcpu, int slot,
-				     void *bitmap, uint32_t num_pages);
+				     void *bitmap, uint32_t num_pages,
+				     uint32_t *ring_buf_idx);
 	/* Hook to call when after each vcpu run */
 	void (*after_vcpu_run)(struct kvm_vcpu *vcpu, int ret, int err);
 	void (*before_vcpu_join) (void);
@@ -471,13 +474,14 @@ static void log_mode_create_vm_done(struct kvm_vm *vm)
 }
 
 static void log_mode_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
-					 void *bitmap, uint32_t num_pages)
+					 void *bitmap, uint32_t num_pages,
+					 uint32_t *ring_buf_idx)
 {
 	struct log_mode *mode = &log_modes[host_log_mode];
 
 	TEST_ASSERT(mode->collect_dirty_pages != NULL,
 		    "collect_dirty_pages() is required for any log mode!");
-	mode->collect_dirty_pages(vcpu, slot, bitmap, num_pages);
+	mode->collect_dirty_pages(vcpu, slot, bitmap, num_pages, ring_buf_idx);
 }
 
 static void log_mode_after_vcpu_run(struct kvm_vcpu *vcpu, int ret, int err)
@@ -696,6 +700,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	struct kvm_vcpu *vcpu;
 	struct kvm_vm *vm;
 	unsigned long *bmap;
+	uint32_t ring_buf_idx = 0;
 
 	if (!log_mode_supported()) {
 		print_skip("Log mode '%s' not supported",
@@ -771,6 +776,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	host_dirty_count = 0;
 	host_clear_count = 0;
 	host_track_next_count = 0;
+	WRITE_ONCE(dirty_ring_vcpu_ring_full, false);
 
 	pthread_create(&vcpu_thread, NULL, vcpu_worker, vcpu);
 
@@ -778,7 +784,8 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 		/* Give the vcpu thread some time to dirty some pages */
 		usleep(p->interval * 1000);
 		log_mode_collect_dirty_pages(vcpu, TEST_MEM_SLOT_INDEX,
-					     bmap, host_num_pages);
+					     bmap, host_num_pages,
+					     &ring_buf_idx);
 
 		/*
 		 * See vcpu_sync_stop_requested definition for details on why
-- 
2.23.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v8 6/7] KVM: selftests: Clear dirty ring states between two modes in dirty_log_test
@ 2022-11-04 23:40   ` Gavin Shan
  0 siblings, 0 replies; 66+ messages in thread
From: Gavin Shan @ 2022-11-04 23:40 UTC (permalink / raw)
  To: kvmarm
  Cc: kvmarm, kvm, shuah, catalin.marinas, andrew.jones, ajones,
	bgardon, dmatlack, will, suzuki.poulose, alexandru.elisei,
	pbonzini, maz, peterx, seanjc, oliver.upton, zhenyzha,
	shan.gavin

There are two states, which need to be cleared before next mode
is executed. Otherwise, we will hit failure as the following messages
indicate.

- The variable 'dirty_ring_vcpu_ring_full' shared by main and vcpu
  thread. It's indicating if the vcpu exit due to full ring buffer.
  The value can be carried from previous mode (VM_MODE_P40V48_4K) to
  current one (VM_MODE_P40V48_64K) when VM_MODE_P40V48_16K isn't
  supported.

- The current ring buffer index needs to be reset before next mode
  (VM_MODE_P40V48_64K) is executed. Otherwise, the stale value is
  carried from previous mode (VM_MODE_P40V48_4K).

  # ./dirty_log_test -M dirty-ring
  Setting log mode to: 'dirty-ring'
  Test iterations: 32, interval: 10 (ms)
  Testing guest mode: PA-bits:40,  VA-bits:48,  4K pages
  guest physical test memory offset: 0xffbfffc000
    :
  Dirtied 995328 pages
  Total bits checked: dirty (1012434), clear (7114123), track_next (966700)
  Testing guest mode: PA-bits:40,  VA-bits:48, 64K pages
  guest physical test memory offset: 0xffbffc0000
  vcpu stops because vcpu is kicked out...
  vcpu continues now.
  Notifying vcpu to continue
  Iteration 1 collected 0 pages
  vcpu stops because dirty ring is full...
  vcpu continues now.
  vcpu stops because dirty ring is full...
  vcpu continues now.
  vcpu stops because dirty ring is full...
  ==== Test Assertion Failure ====
  dirty_log_test.c:369: cleared == count
  pid=10541 tid=10541 errno=22 - Invalid argument
     1	0x0000000000403087: dirty_ring_collect_dirty_pages at dirty_log_test.c:369
     2	0x0000000000402a0b: log_mode_collect_dirty_pages at dirty_log_test.c:492
     3	 (inlined by) run_test at dirty_log_test.c:795
     4	 (inlined by) run_test at dirty_log_test.c:705
     5	0x0000000000403a37: for_each_guest_mode at guest_modes.c:100
     6	0x0000000000401ccf: main at dirty_log_test.c:938
     7	0x0000ffff9ecd279b: ?? ??:0
     8	0x0000ffff9ecd286b: ?? ??:0
     9	0x0000000000401def: _start at ??:?
  Reset dirty pages (0) mismatch with collected (35566)

Fix the issues by clearing 'dirty_ring_vcpu_ring_full' and the ring
buffer index before next new mode is to be executed.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 tools/testing/selftests/kvm/dirty_log_test.c | 27 ++++++++++++--------
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index b5234d6efbe1..8758c10ec850 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -226,13 +226,15 @@ static void clear_log_create_vm_done(struct kvm_vm *vm)
 }
 
 static void dirty_log_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
-					  void *bitmap, uint32_t num_pages)
+					  void *bitmap, uint32_t num_pages,
+					  uint32_t *unused)
 {
 	kvm_vm_get_dirty_log(vcpu->vm, slot, bitmap);
 }
 
 static void clear_log_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
-					  void *bitmap, uint32_t num_pages)
+					  void *bitmap, uint32_t num_pages,
+					  uint32_t *unused)
 {
 	kvm_vm_get_dirty_log(vcpu->vm, slot, bitmap);
 	kvm_vm_clear_dirty_log(vcpu->vm, slot, bitmap, 0, num_pages);
@@ -329,10 +331,9 @@ static void dirty_ring_continue_vcpu(void)
 }
 
 static void dirty_ring_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
-					   void *bitmap, uint32_t num_pages)
+					   void *bitmap, uint32_t num_pages,
+					   uint32_t *ring_buf_idx)
 {
-	/* We only have one vcpu */
-	static uint32_t fetch_index = 0;
 	uint32_t count = 0, cleared;
 	bool continued_vcpu = false;
 
@@ -349,7 +350,8 @@ static void dirty_ring_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
 
 	/* Only have one vcpu */
 	count = dirty_ring_collect_one(vcpu_map_dirty_ring(vcpu),
-				       slot, bitmap, num_pages, &fetch_index);
+				       slot, bitmap, num_pages,
+				       ring_buf_idx);
 
 	cleared = kvm_vm_reset_dirty_ring(vcpu->vm);
 
@@ -406,7 +408,8 @@ struct log_mode {
 	void (*create_vm_done)(struct kvm_vm *vm);
 	/* Hook to collect the dirty pages into the bitmap provided */
 	void (*collect_dirty_pages) (struct kvm_vcpu *vcpu, int slot,
-				     void *bitmap, uint32_t num_pages);
+				     void *bitmap, uint32_t num_pages,
+				     uint32_t *ring_buf_idx);
 	/* Hook to call when after each vcpu run */
 	void (*after_vcpu_run)(struct kvm_vcpu *vcpu, int ret, int err);
 	void (*before_vcpu_join) (void);
@@ -471,13 +474,14 @@ static void log_mode_create_vm_done(struct kvm_vm *vm)
 }
 
 static void log_mode_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
-					 void *bitmap, uint32_t num_pages)
+					 void *bitmap, uint32_t num_pages,
+					 uint32_t *ring_buf_idx)
 {
 	struct log_mode *mode = &log_modes[host_log_mode];
 
 	TEST_ASSERT(mode->collect_dirty_pages != NULL,
 		    "collect_dirty_pages() is required for any log mode!");
-	mode->collect_dirty_pages(vcpu, slot, bitmap, num_pages);
+	mode->collect_dirty_pages(vcpu, slot, bitmap, num_pages, ring_buf_idx);
 }
 
 static void log_mode_after_vcpu_run(struct kvm_vcpu *vcpu, int ret, int err)
@@ -696,6 +700,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	struct kvm_vcpu *vcpu;
 	struct kvm_vm *vm;
 	unsigned long *bmap;
+	uint32_t ring_buf_idx = 0;
 
 	if (!log_mode_supported()) {
 		print_skip("Log mode '%s' not supported",
@@ -771,6 +776,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 	host_dirty_count = 0;
 	host_clear_count = 0;
 	host_track_next_count = 0;
+	WRITE_ONCE(dirty_ring_vcpu_ring_full, false);
 
 	pthread_create(&vcpu_thread, NULL, vcpu_worker, vcpu);
 
@@ -778,7 +784,8 @@ static void run_test(enum vm_guest_mode mode, void *arg)
 		/* Give the vcpu thread some time to dirty some pages */
 		usleep(p->interval * 1000);
 		log_mode_collect_dirty_pages(vcpu, TEST_MEM_SLOT_INDEX,
-					     bmap, host_num_pages);
+					     bmap, host_num_pages,
+					     &ring_buf_idx);
 
 		/*
 		 * See vcpu_sync_stop_requested definition for details on why
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v8 7/7] KVM: selftests: Automate choosing dirty ring size in dirty_log_test
  2022-11-04 23:40 ` Gavin Shan
@ 2022-11-04 23:40   ` Gavin Shan
  -1 siblings, 0 replies; 66+ messages in thread
From: Gavin Shan @ 2022-11-04 23:40 UTC (permalink / raw)
  To: kvmarm
  Cc: maz, kvm, catalin.marinas, andrew.jones, will, shan.gavin,
	bgardon, dmatlack, pbonzini, zhenyzha, shuah, kvmarm, ajones

In the dirty ring case, we rely on vcpu exit due to full dirty ring
state. On ARM64 system, there are 4096 host pages when the host
page size is 64KB. In this case, the vcpu never exits due to the
full dirty ring state. The similar case is 4KB page size on host
and 64KB page size on guest. The vcpu corrupts same set of host
pages, but the dirty page information isn't collected in the main
thread. This leads to infinite loop as the following log shows.

  # ./dirty_log_test -M dirty-ring -c 65536 -m 5
  Setting log mode to: 'dirty-ring'
  Test iterations: 32, interval: 10 (ms)
  Testing guest mode: PA-bits:40,  VA-bits:48,  4K pages
  guest physical test memory offset: 0xffbffe0000
  vcpu stops because vcpu is kicked out...
  Notifying vcpu to continue
  vcpu continues now.
  Iteration 1 collected 576 pages
  <No more output afterwards>

Fix the issue by automatically choosing the best dirty ring size,
to ensure vcpu exit due to full dirty ring state. The option '-c'
becomes a hint to the dirty ring count, instead of the value of it.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 tools/testing/selftests/kvm/dirty_log_test.c | 26 +++++++++++++++++---
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index 8758c10ec850..a87e5f78ebf1 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -24,6 +24,9 @@
 #include "guest_modes.h"
 #include "processor.h"
 
+#define DIRTY_MEM_BITS 30 /* 1G */
+#define PAGE_SHIFT_4K  12
+
 /* The memory slot index to track dirty pages */
 #define TEST_MEM_SLOT_INDEX		1
 
@@ -273,6 +276,24 @@ static bool dirty_ring_supported(void)
 
 static void dirty_ring_create_vm_done(struct kvm_vm *vm)
 {
+	uint64_t pages;
+	uint32_t limit;
+
+	/*
+	 * We rely on vcpu exit due to full dirty ring state. Adjust
+	 * the ring buffer size to ensure we're able to reach the
+	 * full dirty ring state.
+	 */
+	pages = (1ul << (DIRTY_MEM_BITS - vm->page_shift)) + 3;
+	pages = vm_adjust_num_guest_pages(vm->mode, pages);
+	if (vm->page_size < getpagesize())
+		pages = vm_num_host_pages(vm->mode, pages);
+
+	limit = 1 << (31 - __builtin_clz(pages));
+	test_dirty_ring_count = 1 << (31 - __builtin_clz(test_dirty_ring_count));
+	test_dirty_ring_count = min(limit, test_dirty_ring_count);
+	pr_info("dirty ring count: 0x%x\n", test_dirty_ring_count);
+
 	/*
 	 * Switch to dirty ring mode after VM creation but before any
 	 * of the vcpu creation.
@@ -685,9 +706,6 @@ static struct kvm_vm *create_vm(enum vm_guest_mode mode, struct kvm_vcpu **vcpu,
 	return vm;
 }
 
-#define DIRTY_MEM_BITS 30 /* 1G */
-#define PAGE_SHIFT_4K  12
-
 struct test_params {
 	unsigned long iterations;
 	unsigned long interval;
@@ -830,7 +848,7 @@ static void help(char *name)
 	printf("usage: %s [-h] [-i iterations] [-I interval] "
 	       "[-p offset] [-m mode]\n", name);
 	puts("");
-	printf(" -c: specify dirty ring size, in number of entries\n");
+	printf(" -c: hint to dirty ring size, in number of entries\n");
 	printf("     (only useful for dirty-ring test; default: %"PRIu32")\n",
 	       TEST_DIRTY_RING_COUNT);
 	printf(" -i: specify iteration counts (default: %"PRIu64")\n",
-- 
2.23.0

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH v8 7/7] KVM: selftests: Automate choosing dirty ring size in dirty_log_test
@ 2022-11-04 23:40   ` Gavin Shan
  0 siblings, 0 replies; 66+ messages in thread
From: Gavin Shan @ 2022-11-04 23:40 UTC (permalink / raw)
  To: kvmarm
  Cc: kvmarm, kvm, shuah, catalin.marinas, andrew.jones, ajones,
	bgardon, dmatlack, will, suzuki.poulose, alexandru.elisei,
	pbonzini, maz, peterx, seanjc, oliver.upton, zhenyzha,
	shan.gavin

In the dirty ring case, we rely on vcpu exit due to full dirty ring
state. On ARM64 system, there are 4096 host pages when the host
page size is 64KB. In this case, the vcpu never exits due to the
full dirty ring state. The similar case is 4KB page size on host
and 64KB page size on guest. The vcpu corrupts same set of host
pages, but the dirty page information isn't collected in the main
thread. This leads to infinite loop as the following log shows.

  # ./dirty_log_test -M dirty-ring -c 65536 -m 5
  Setting log mode to: 'dirty-ring'
  Test iterations: 32, interval: 10 (ms)
  Testing guest mode: PA-bits:40,  VA-bits:48,  4K pages
  guest physical test memory offset: 0xffbffe0000
  vcpu stops because vcpu is kicked out...
  Notifying vcpu to continue
  vcpu continues now.
  Iteration 1 collected 576 pages
  <No more output afterwards>

Fix the issue by automatically choosing the best dirty ring size,
to ensure vcpu exit due to full dirty ring state. The option '-c'
becomes a hint to the dirty ring count, instead of the value of it.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 tools/testing/selftests/kvm/dirty_log_test.c | 26 +++++++++++++++++---
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index 8758c10ec850..a87e5f78ebf1 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -24,6 +24,9 @@
 #include "guest_modes.h"
 #include "processor.h"
 
+#define DIRTY_MEM_BITS 30 /* 1G */
+#define PAGE_SHIFT_4K  12
+
 /* The memory slot index to track dirty pages */
 #define TEST_MEM_SLOT_INDEX		1
 
@@ -273,6 +276,24 @@ static bool dirty_ring_supported(void)
 
 static void dirty_ring_create_vm_done(struct kvm_vm *vm)
 {
+	uint64_t pages;
+	uint32_t limit;
+
+	/*
+	 * We rely on vcpu exit due to full dirty ring state. Adjust
+	 * the ring buffer size to ensure we're able to reach the
+	 * full dirty ring state.
+	 */
+	pages = (1ul << (DIRTY_MEM_BITS - vm->page_shift)) + 3;
+	pages = vm_adjust_num_guest_pages(vm->mode, pages);
+	if (vm->page_size < getpagesize())
+		pages = vm_num_host_pages(vm->mode, pages);
+
+	limit = 1 << (31 - __builtin_clz(pages));
+	test_dirty_ring_count = 1 << (31 - __builtin_clz(test_dirty_ring_count));
+	test_dirty_ring_count = min(limit, test_dirty_ring_count);
+	pr_info("dirty ring count: 0x%x\n", test_dirty_ring_count);
+
 	/*
 	 * Switch to dirty ring mode after VM creation but before any
 	 * of the vcpu creation.
@@ -685,9 +706,6 @@ static struct kvm_vm *create_vm(enum vm_guest_mode mode, struct kvm_vcpu **vcpu,
 	return vm;
 }
 
-#define DIRTY_MEM_BITS 30 /* 1G */
-#define PAGE_SHIFT_4K  12
-
 struct test_params {
 	unsigned long iterations;
 	unsigned long interval;
@@ -830,7 +848,7 @@ static void help(char *name)
 	printf("usage: %s [-h] [-i iterations] [-I interval] "
 	       "[-p offset] [-m mode]\n", name);
 	puts("");
-	printf(" -c: specify dirty ring size, in number of entries\n");
+	printf(" -c: hint to dirty ring size, in number of entries\n");
 	printf("     (only useful for dirty-ring test; default: %"PRIu32")\n",
 	       TEST_DIRTY_RING_COUNT);
 	printf(" -i: specify iteration counts (default: %"PRIu64")\n",
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
  2022-11-04 23:40   ` Gavin Shan
@ 2022-11-06 15:43     ` Marc Zyngier
  -1 siblings, 0 replies; 66+ messages in thread
From: Marc Zyngier @ 2022-11-06 15:43 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvm, catalin.marinas, andrew.jones, dmatlack, will, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, shuah, kvmarm, ajones

On Fri, 04 Nov 2022 23:40:45 +0000,
Gavin Shan <gshan@redhat.com> wrote:
> 
> ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
> enabled. It's conflicting with that ring-based dirty page tracking always
> requires a running VCPU context.
> 
> Introduce a new flavor of dirty ring that requires the use of both VCPU
> dirty rings and a dirty bitmap. The expectation is that for non-VCPU
> sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
> the dirty bitmap. Userspace should scan the dirty bitmap before migrating
> the VM to the target.
> 
> Use an additional capability to advertise this behavior. The newly added
> capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
> KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
> capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.
> 
> Suggested-by: Marc Zyngier <maz@kernel.org>
> Suggested-by: Peter Xu <peterx@redhat.com>
> Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> Acked-by: Peter Xu <peterx@redhat.com>
> ---
>  Documentation/virt/kvm/api.rst | 33 ++++++++++++++++++-----
>  include/linux/kvm_dirty_ring.h |  7 +++++
>  include/linux/kvm_host.h       |  1 +
>  include/uapi/linux/kvm.h       |  1 +
>  virt/kvm/Kconfig               |  8 ++++++
>  virt/kvm/dirty_ring.c          | 10 +++++++
>  virt/kvm/kvm_main.c            | 49 +++++++++++++++++++++++++++-------
>  7 files changed, 93 insertions(+), 16 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index eee9f857a986..2ec32bd41792 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -8003,13 +8003,6 @@ flushing is done by the KVM_GET_DIRTY_LOG ioctl).  To achieve that, one
>  needs to kick the vcpu out of KVM_RUN using a signal.  The resulting
>  vmexit ensures that all dirty GFNs are flushed to the dirty rings.
>  
> -NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the corresponding
> -ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls
> -KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG.  After enabling
> -KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual
> -machine will switch to ring-buffer dirty page tracking and further
> -KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail.
> -
>  NOTE: KVM_CAP_DIRTY_LOG_RING_ACQ_REL is the only capability that
>  should be exposed by weakly ordered architecture, in order to indicate
>  the additional memory ordering requirements imposed on userspace when
> @@ -8018,6 +8011,32 @@ Architecture with TSO-like ordering (such as x86) are allowed to
>  expose both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL
>  to userspace.
>  
> +After using the dirty rings, the userspace needs to detect the capability

using? or enabling? What comes after suggest the latter.

> +of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures
> +need to be backed by per-slot bitmaps. With this capability advertised
> +and supported, it means the architecture can dirty guest pages without

If it is advertised, it is supported, right?

> +vcpu/ring context, so that some of the dirty information will still be
> +maintained in the bitmap structure. KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
> +can't be enabled until the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> +has been enabled.
> +
> +Note that the bitmap here is only a backup of the ring structure, and
> +normally should only contain a very small amount of dirty pages, which

I don't think we can claim this. It is whatever amount of memory is
dirtied outside of a vcpu context, and we shouldn't make any claim
regarding the number of dirty pages.

> +needs to be transferred during VM downtime. Collecting the dirty bitmap
> +should be the very last thing that the VMM does before transmitting state
> +to the target VM. VMM needs to ensure that the dirty state is final and
> +avoid missing dirty pages from another ioctl ordered after the bitmap
> +collection.
> +
> +To collect dirty bits in the backup bitmap, the userspace can use the
> +same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG shouldn't be needed
> +and its behavior is undefined since collecting the dirty bitmap always
> +happens in the last phase of VM's migration.

It isn't clear to me why KVM_CLEAR_DIRTY_LOG should be called out. If
you have multiple devices that dirty the memory, such as multiple
ITSs, why shouldn't userspace be allowed to snapshot the dirty state
multiple time? This doesn't seem like a reasonable restriction, and I
really dislike the idea of undefined behaviour here.

> +
> +NOTE: One example of using the backup bitmap is saving arm64 vgic/its
> +tables through KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on
> +KVM device "kvm-arm-vgic-its" during VM's migration.

It would be good to have something about this in the ITS
documentation. Something along these lines:

diff --git a/Documentation/virt/kvm/devices/arm-vgic-its.rst b/Documentation/virt/kvm/devices/arm-vgic-its.rst
index d257eddbae29..e053124f77c4 100644
--- a/Documentation/virt/kvm/devices/arm-vgic-its.rst
+++ b/Documentation/virt/kvm/devices/arm-vgic-its.rst
@@ -52,7 +52,10 @@ KVM_DEV_ARM_VGIC_GRP_CTRL
 
     KVM_DEV_ARM_ITS_SAVE_TABLES
       save the ITS table data into guest RAM, at the location provisioned
-      by the guest in corresponding registers/table entries.
+      by the guest in corresponding registers/table entries. Should userspace
+      require a form of dirty tracking to identify which pages are modified
+      by the saving process, it should use a bitmap even if using another
+      mechanism to track the memory dirtied by the vCPUs.
 
       The layout of the tables in guest memory defines an ABI. The entries
       are laid out in little endian format as described in the last paragraph.


> +
>  8.30 KVM_CAP_XEN_HVM
>  --------------------
>  
> diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
> index 199ead37b104..4862c98d80d3 100644
> --- a/include/linux/kvm_dirty_ring.h
> +++ b/include/linux/kvm_dirty_ring.h
> @@ -37,6 +37,11 @@ static inline u32 kvm_dirty_ring_get_rsvd_entries(void)
>  	return 0;
>  }
>  
> +static inline bool kvm_use_dirty_bitmap(struct kvm *kvm)
> +{
> +	return true;
> +}
> +
>  static inline int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring,
>  				       int index, u32 size)
>  {
> @@ -67,6 +72,8 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
>  #else /* CONFIG_HAVE_KVM_DIRTY_RING */
>  
>  int kvm_cpu_dirty_log_size(void);
> +bool kvm_use_dirty_bitmap(struct kvm *kvm);
> +bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm);
>  u32 kvm_dirty_ring_get_rsvd_entries(void);
>  int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size);
>  
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 6fab55e58111..f51eb9419bfc 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -779,6 +779,7 @@ struct kvm {
>  	pid_t userspace_pid;
>  	unsigned int max_halt_poll_ns;
>  	u32 dirty_ring_size;
> +	bool dirty_ring_with_bitmap;
>  	bool vm_bugged;
>  	bool vm_dead;
>  
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 0d5d4419139a..c87b5882d7ae 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1178,6 +1178,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_S390_ZPCI_OP 221
>  #define KVM_CAP_S390_CPU_TOPOLOGY 222
>  #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
> +#define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 224
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 800f9470e36b..228be1145cf3 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -33,6 +33,14 @@ config HAVE_KVM_DIRTY_RING_ACQ_REL
>         bool
>         select HAVE_KVM_DIRTY_RING
>  
> +# Only architectures that need to dirty memory outside of a vCPU
> +# context should select this, advertising to userspace the
> +# requirement to use a dirty bitmap in addition to the vCPU dirty
> +# ring.
> +config HAVE_KVM_DIRTY_RING_WITH_BITMAP
> +	bool
> +	depends on HAVE_KVM_DIRTY_RING
> +
>  config HAVE_KVM_EVENTFD
>         bool
>         select EVENTFD
> diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
> index fecbb7d75ad2..758679724447 100644
> --- a/virt/kvm/dirty_ring.c
> +++ b/virt/kvm/dirty_ring.c
> @@ -21,6 +21,16 @@ u32 kvm_dirty_ring_get_rsvd_entries(void)
>  	return KVM_DIRTY_RING_RSVD_ENTRIES + kvm_cpu_dirty_log_size();
>  }
>  
> +bool kvm_use_dirty_bitmap(struct kvm *kvm)
> +{
> +	return !kvm->dirty_ring_size || kvm->dirty_ring_with_bitmap;
> +}
> +
> +bool __weak kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
> +{
> +	return false;
> +}
> +
>  static u32 kvm_dirty_ring_used(struct kvm_dirty_ring *ring)
>  {
>  	return READ_ONCE(ring->dirty_index) - READ_ONCE(ring->reset_index);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index c865d7d82685..746133b23a66 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1617,7 +1617,7 @@ static int kvm_prepare_memory_region(struct kvm *kvm,
>  			new->dirty_bitmap = NULL;
>  		else if (old && old->dirty_bitmap)
>  			new->dirty_bitmap = old->dirty_bitmap;
> -		else if (!kvm->dirty_ring_size) {
> +		else if (kvm_use_dirty_bitmap(kvm)) {
>  			r = kvm_alloc_dirty_bitmap(new);
>  			if (r)
>  				return r;
> @@ -2060,8 +2060,8 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
>  	unsigned long n;
>  	unsigned long any = 0;
>  
> -	/* Dirty ring tracking is exclusive to dirty log tracking */
> -	if (kvm->dirty_ring_size)
> +	/* Dirty ring tracking may be exclusive to dirty log tracking */
> +	if (!kvm_use_dirty_bitmap(kvm))
>  		return -ENXIO;
>  
>  	*memslot = NULL;
> @@ -2125,8 +2125,8 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
>  	unsigned long *dirty_bitmap_buffer;
>  	bool flush;
>  
> -	/* Dirty ring tracking is exclusive to dirty log tracking */
> -	if (kvm->dirty_ring_size)
> +	/* Dirty ring tracking may be exclusive to dirty log tracking */
> +	if (!kvm_use_dirty_bitmap(kvm))
>  		return -ENXIO;
>  
>  	as_id = log->slot >> 16;
> @@ -2237,8 +2237,8 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
>  	unsigned long *dirty_bitmap_buffer;
>  	bool flush;
>  
> -	/* Dirty ring tracking is exclusive to dirty log tracking */
> -	if (kvm->dirty_ring_size)
> +	/* Dirty ring tracking may be exclusive to dirty log tracking */
> +	if (!kvm_use_dirty_bitmap(kvm))
>  		return -ENXIO;
>  
>  	as_id = log->slot >> 16;
> @@ -3305,7 +3305,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>  	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>  
>  #ifdef CONFIG_HAVE_KVM_DIRTY_RING
> -	if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
> +	if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
> +		return;
> +
> +	if (WARN_ON_ONCE(!kvm_arch_allow_write_without_running_vcpu(kvm) && !vcpu))
>  		return;
>  #endif
>  
> @@ -3313,7 +3316,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>  		unsigned long rel_gfn = gfn - memslot->base_gfn;
>  		u32 slot = (memslot->as_id << 16) | memslot->id;
>  
> -		if (kvm->dirty_ring_size)
> +		if (kvm->dirty_ring_size && vcpu)
>  			kvm_dirty_ring_push(vcpu, slot, rel_gfn);
>  		else
>  			set_bit_le(rel_gfn, memslot->dirty_bitmap);
> @@ -4482,6 +4485,9 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
>  		return KVM_DIRTY_RING_MAX_ENTRIES * sizeof(struct kvm_dirty_gfn);
>  #else
>  		return 0;
> +#endif
> +#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
> +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
>  #endif
>  	case KVM_CAP_BINARY_STATS_FD:
>  	case KVM_CAP_SYSTEM_EVENT_DATA:
> @@ -4588,6 +4594,31 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>  			return -EINVAL;
>  
>  		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
> +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP: {
> +		struct kvm_memslots *slots;
> +		int r = -EINVAL;
> +
> +		if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
> +		    !kvm->dirty_ring_size)
> +			return r;
> +
> +		mutex_lock(&kvm->slots_lock);
> +
> +		slots = kvm_memslots(kvm);
> +
> +		/*
> +		 * Avoid a race between memslot creation and enabling the ring +
> +		 * bitmap capability to guarantee that no memslots have been
> +		 * created without a bitmap.

It should be called out in the documentation that this capability must
be enabled before any memslot is created.

> +		 */
> +		if (kvm_memslots_empty(slots)) {
> +			kvm->dirty_ring_with_bitmap = cap->args[0];
> +			r = 0;
> +		}
> +
> +		mutex_unlock(&kvm->slots_lock);
> +		return r;
> +	}
>  	default:
>  		return kvm_vm_ioctl_enable_cap(kvm, cap);
>  	}

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
@ 2022-11-06 15:43     ` Marc Zyngier
  0 siblings, 0 replies; 66+ messages in thread
From: Marc Zyngier @ 2022-11-06 15:43 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvmarm, kvmarm, kvm, shuah, catalin.marinas, andrew.jones,
	ajones, bgardon, dmatlack, will, suzuki.poulose,
	alexandru.elisei, pbonzini, peterx, seanjc, oliver.upton,
	zhenyzha, shan.gavin

On Fri, 04 Nov 2022 23:40:45 +0000,
Gavin Shan <gshan@redhat.com> wrote:
> 
> ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
> enabled. It's conflicting with that ring-based dirty page tracking always
> requires a running VCPU context.
> 
> Introduce a new flavor of dirty ring that requires the use of both VCPU
> dirty rings and a dirty bitmap. The expectation is that for non-VCPU
> sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
> the dirty bitmap. Userspace should scan the dirty bitmap before migrating
> the VM to the target.
> 
> Use an additional capability to advertise this behavior. The newly added
> capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
> KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
> capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.
> 
> Suggested-by: Marc Zyngier <maz@kernel.org>
> Suggested-by: Peter Xu <peterx@redhat.com>
> Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> Acked-by: Peter Xu <peterx@redhat.com>
> ---
>  Documentation/virt/kvm/api.rst | 33 ++++++++++++++++++-----
>  include/linux/kvm_dirty_ring.h |  7 +++++
>  include/linux/kvm_host.h       |  1 +
>  include/uapi/linux/kvm.h       |  1 +
>  virt/kvm/Kconfig               |  8 ++++++
>  virt/kvm/dirty_ring.c          | 10 +++++++
>  virt/kvm/kvm_main.c            | 49 +++++++++++++++++++++++++++-------
>  7 files changed, 93 insertions(+), 16 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index eee9f857a986..2ec32bd41792 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -8003,13 +8003,6 @@ flushing is done by the KVM_GET_DIRTY_LOG ioctl).  To achieve that, one
>  needs to kick the vcpu out of KVM_RUN using a signal.  The resulting
>  vmexit ensures that all dirty GFNs are flushed to the dirty rings.
>  
> -NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the corresponding
> -ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls
> -KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG.  After enabling
> -KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual
> -machine will switch to ring-buffer dirty page tracking and further
> -KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail.
> -
>  NOTE: KVM_CAP_DIRTY_LOG_RING_ACQ_REL is the only capability that
>  should be exposed by weakly ordered architecture, in order to indicate
>  the additional memory ordering requirements imposed on userspace when
> @@ -8018,6 +8011,32 @@ Architecture with TSO-like ordering (such as x86) are allowed to
>  expose both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL
>  to userspace.
>  
> +After using the dirty rings, the userspace needs to detect the capability

using? or enabling? What comes after suggest the latter.

> +of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures
> +need to be backed by per-slot bitmaps. With this capability advertised
> +and supported, it means the architecture can dirty guest pages without

If it is advertised, it is supported, right?

> +vcpu/ring context, so that some of the dirty information will still be
> +maintained in the bitmap structure. KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
> +can't be enabled until the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> +has been enabled.
> +
> +Note that the bitmap here is only a backup of the ring structure, and
> +normally should only contain a very small amount of dirty pages, which

I don't think we can claim this. It is whatever amount of memory is
dirtied outside of a vcpu context, and we shouldn't make any claim
regarding the number of dirty pages.

> +needs to be transferred during VM downtime. Collecting the dirty bitmap
> +should be the very last thing that the VMM does before transmitting state
> +to the target VM. VMM needs to ensure that the dirty state is final and
> +avoid missing dirty pages from another ioctl ordered after the bitmap
> +collection.
> +
> +To collect dirty bits in the backup bitmap, the userspace can use the
> +same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG shouldn't be needed
> +and its behavior is undefined since collecting the dirty bitmap always
> +happens in the last phase of VM's migration.

It isn't clear to me why KVM_CLEAR_DIRTY_LOG should be called out. If
you have multiple devices that dirty the memory, such as multiple
ITSs, why shouldn't userspace be allowed to snapshot the dirty state
multiple time? This doesn't seem like a reasonable restriction, and I
really dislike the idea of undefined behaviour here.

> +
> +NOTE: One example of using the backup bitmap is saving arm64 vgic/its
> +tables through KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on
> +KVM device "kvm-arm-vgic-its" during VM's migration.

It would be good to have something about this in the ITS
documentation. Something along these lines:

diff --git a/Documentation/virt/kvm/devices/arm-vgic-its.rst b/Documentation/virt/kvm/devices/arm-vgic-its.rst
index d257eddbae29..e053124f77c4 100644
--- a/Documentation/virt/kvm/devices/arm-vgic-its.rst
+++ b/Documentation/virt/kvm/devices/arm-vgic-its.rst
@@ -52,7 +52,10 @@ KVM_DEV_ARM_VGIC_GRP_CTRL
 
     KVM_DEV_ARM_ITS_SAVE_TABLES
       save the ITS table data into guest RAM, at the location provisioned
-      by the guest in corresponding registers/table entries.
+      by the guest in corresponding registers/table entries. Should userspace
+      require a form of dirty tracking to identify which pages are modified
+      by the saving process, it should use a bitmap even if using another
+      mechanism to track the memory dirtied by the vCPUs.
 
       The layout of the tables in guest memory defines an ABI. The entries
       are laid out in little endian format as described in the last paragraph.


> +
>  8.30 KVM_CAP_XEN_HVM
>  --------------------
>  
> diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
> index 199ead37b104..4862c98d80d3 100644
> --- a/include/linux/kvm_dirty_ring.h
> +++ b/include/linux/kvm_dirty_ring.h
> @@ -37,6 +37,11 @@ static inline u32 kvm_dirty_ring_get_rsvd_entries(void)
>  	return 0;
>  }
>  
> +static inline bool kvm_use_dirty_bitmap(struct kvm *kvm)
> +{
> +	return true;
> +}
> +
>  static inline int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring,
>  				       int index, u32 size)
>  {
> @@ -67,6 +72,8 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
>  #else /* CONFIG_HAVE_KVM_DIRTY_RING */
>  
>  int kvm_cpu_dirty_log_size(void);
> +bool kvm_use_dirty_bitmap(struct kvm *kvm);
> +bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm);
>  u32 kvm_dirty_ring_get_rsvd_entries(void);
>  int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size);
>  
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 6fab55e58111..f51eb9419bfc 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -779,6 +779,7 @@ struct kvm {
>  	pid_t userspace_pid;
>  	unsigned int max_halt_poll_ns;
>  	u32 dirty_ring_size;
> +	bool dirty_ring_with_bitmap;
>  	bool vm_bugged;
>  	bool vm_dead;
>  
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 0d5d4419139a..c87b5882d7ae 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1178,6 +1178,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_S390_ZPCI_OP 221
>  #define KVM_CAP_S390_CPU_TOPOLOGY 222
>  #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
> +#define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 224
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 800f9470e36b..228be1145cf3 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -33,6 +33,14 @@ config HAVE_KVM_DIRTY_RING_ACQ_REL
>         bool
>         select HAVE_KVM_DIRTY_RING
>  
> +# Only architectures that need to dirty memory outside of a vCPU
> +# context should select this, advertising to userspace the
> +# requirement to use a dirty bitmap in addition to the vCPU dirty
> +# ring.
> +config HAVE_KVM_DIRTY_RING_WITH_BITMAP
> +	bool
> +	depends on HAVE_KVM_DIRTY_RING
> +
>  config HAVE_KVM_EVENTFD
>         bool
>         select EVENTFD
> diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
> index fecbb7d75ad2..758679724447 100644
> --- a/virt/kvm/dirty_ring.c
> +++ b/virt/kvm/dirty_ring.c
> @@ -21,6 +21,16 @@ u32 kvm_dirty_ring_get_rsvd_entries(void)
>  	return KVM_DIRTY_RING_RSVD_ENTRIES + kvm_cpu_dirty_log_size();
>  }
>  
> +bool kvm_use_dirty_bitmap(struct kvm *kvm)
> +{
> +	return !kvm->dirty_ring_size || kvm->dirty_ring_with_bitmap;
> +}
> +
> +bool __weak kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
> +{
> +	return false;
> +}
> +
>  static u32 kvm_dirty_ring_used(struct kvm_dirty_ring *ring)
>  {
>  	return READ_ONCE(ring->dirty_index) - READ_ONCE(ring->reset_index);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index c865d7d82685..746133b23a66 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1617,7 +1617,7 @@ static int kvm_prepare_memory_region(struct kvm *kvm,
>  			new->dirty_bitmap = NULL;
>  		else if (old && old->dirty_bitmap)
>  			new->dirty_bitmap = old->dirty_bitmap;
> -		else if (!kvm->dirty_ring_size) {
> +		else if (kvm_use_dirty_bitmap(kvm)) {
>  			r = kvm_alloc_dirty_bitmap(new);
>  			if (r)
>  				return r;
> @@ -2060,8 +2060,8 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
>  	unsigned long n;
>  	unsigned long any = 0;
>  
> -	/* Dirty ring tracking is exclusive to dirty log tracking */
> -	if (kvm->dirty_ring_size)
> +	/* Dirty ring tracking may be exclusive to dirty log tracking */
> +	if (!kvm_use_dirty_bitmap(kvm))
>  		return -ENXIO;
>  
>  	*memslot = NULL;
> @@ -2125,8 +2125,8 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
>  	unsigned long *dirty_bitmap_buffer;
>  	bool flush;
>  
> -	/* Dirty ring tracking is exclusive to dirty log tracking */
> -	if (kvm->dirty_ring_size)
> +	/* Dirty ring tracking may be exclusive to dirty log tracking */
> +	if (!kvm_use_dirty_bitmap(kvm))
>  		return -ENXIO;
>  
>  	as_id = log->slot >> 16;
> @@ -2237,8 +2237,8 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
>  	unsigned long *dirty_bitmap_buffer;
>  	bool flush;
>  
> -	/* Dirty ring tracking is exclusive to dirty log tracking */
> -	if (kvm->dirty_ring_size)
> +	/* Dirty ring tracking may be exclusive to dirty log tracking */
> +	if (!kvm_use_dirty_bitmap(kvm))
>  		return -ENXIO;
>  
>  	as_id = log->slot >> 16;
> @@ -3305,7 +3305,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>  	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>  
>  #ifdef CONFIG_HAVE_KVM_DIRTY_RING
> -	if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
> +	if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
> +		return;
> +
> +	if (WARN_ON_ONCE(!kvm_arch_allow_write_without_running_vcpu(kvm) && !vcpu))
>  		return;
>  #endif
>  
> @@ -3313,7 +3316,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>  		unsigned long rel_gfn = gfn - memslot->base_gfn;
>  		u32 slot = (memslot->as_id << 16) | memslot->id;
>  
> -		if (kvm->dirty_ring_size)
> +		if (kvm->dirty_ring_size && vcpu)
>  			kvm_dirty_ring_push(vcpu, slot, rel_gfn);
>  		else
>  			set_bit_le(rel_gfn, memslot->dirty_bitmap);
> @@ -4482,6 +4485,9 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
>  		return KVM_DIRTY_RING_MAX_ENTRIES * sizeof(struct kvm_dirty_gfn);
>  #else
>  		return 0;
> +#endif
> +#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
> +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
>  #endif
>  	case KVM_CAP_BINARY_STATS_FD:
>  	case KVM_CAP_SYSTEM_EVENT_DATA:
> @@ -4588,6 +4594,31 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>  			return -EINVAL;
>  
>  		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
> +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP: {
> +		struct kvm_memslots *slots;
> +		int r = -EINVAL;
> +
> +		if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
> +		    !kvm->dirty_ring_size)
> +			return r;
> +
> +		mutex_lock(&kvm->slots_lock);
> +
> +		slots = kvm_memslots(kvm);
> +
> +		/*
> +		 * Avoid a race between memslot creation and enabling the ring +
> +		 * bitmap capability to guarantee that no memslots have been
> +		 * created without a bitmap.

It should be called out in the documentation that this capability must
be enabled before any memslot is created.

> +		 */
> +		if (kvm_memslots_empty(slots)) {
> +			kvm->dirty_ring_with_bitmap = cap->args[0];
> +			r = 0;
> +		}
> +
> +		mutex_unlock(&kvm->slots_lock);
> +		return r;
> +	}
>  	default:
>  		return kvm_vm_ioctl_enable_cap(kvm, cap);
>  	}

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 4/7] KVM: arm64: Enable ring-based dirty memory tracking
  2022-11-04 23:40   ` Gavin Shan
@ 2022-11-06 15:50     ` Marc Zyngier
  -1 siblings, 0 replies; 66+ messages in thread
From: Marc Zyngier @ 2022-11-06 15:50 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvmarm, kvmarm, kvm, shuah, catalin.marinas, andrew.jones,
	ajones, bgardon, dmatlack, will, suzuki.poulose,
	alexandru.elisei, pbonzini, peterx, seanjc, oliver.upton,
	zhenyzha, shan.gavin

On Fri, 04 Nov 2022 23:40:46 +0000,
Gavin Shan <gshan@redhat.com> wrote:
> 
> Enable ring-based dirty memory tracking on arm64 by selecting
> CONFIG_HAVE_KVM_DIRTY_{RING_ACQ_REL, RING_WITH_BITMAP} and providing
> the ring buffer's physical page offset (KVM_DIRTY_LOG_PAGE_OFFSET).
> 
> Besides, helper kvm_vgic_save_its_tables_in_progress() is added to
> indicate if vgic/its tables are being saved or not. The helper is used
> in ARM64's kvm_arch_allow_write_without_running_vcpu() to keep the
> site of saving vgic/its tables out of no-running-vcpu radar.
> 
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
>  Documentation/virt/kvm/api.rst     |  2 +-
>  arch/arm64/include/uapi/asm/kvm.h  |  1 +
>  arch/arm64/kvm/Kconfig             |  2 ++
>  arch/arm64/kvm/arm.c               |  3 +++
>  arch/arm64/kvm/mmu.c               | 15 +++++++++++++++
>  arch/arm64/kvm/vgic/vgic-its.c     |  3 +++
>  arch/arm64/kvm/vgic/vgic-mmio-v3.c |  7 +++++++
>  include/kvm/arm_vgic.h             |  2 ++
>  8 files changed, 34 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 2ec32bd41792..2fc68f684ad8 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -7921,7 +7921,7 @@ regardless of what has actually been exposed through the CPUID leaf.
>  8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
>  ----------------------------------------------------------
>  
> -:Architectures: x86
> +:Architectures: x86, arm64
>  :Parameters: args[0] - size of the dirty log ring
>  
>  KVM is capable of tracking dirty memory using ring buffers that are
> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> index 316917b98707..a7a857f1784d 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -43,6 +43,7 @@
>  #define __KVM_HAVE_VCPU_EVENTS
>  
>  #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
> +#define KVM_DIRTY_LOG_PAGE_OFFSET 64
>  
>  #define KVM_REG_SIZE(id)						\
>  	(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> index 815cc118c675..066b053e9eb9 100644
> --- a/arch/arm64/kvm/Kconfig
> +++ b/arch/arm64/kvm/Kconfig
> @@ -32,6 +32,8 @@ menuconfig KVM
>  	select KVM_VFIO
>  	select HAVE_KVM_EVENTFD
>  	select HAVE_KVM_IRQFD
> +	select HAVE_KVM_DIRTY_RING_ACQ_REL
> +	select HAVE_KVM_DIRTY_RING_WITH_BITMAP
>  	select HAVE_KVM_MSI
>  	select HAVE_KVM_IRQCHIP
>  	select HAVE_KVM_IRQ_ROUTING
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 94d33e296e10..6b097605e38c 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -746,6 +746,9 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
>  
>  		if (kvm_check_request(KVM_REQ_SUSPEND, vcpu))
>  			return kvm_vcpu_suspend(vcpu);
> +
> +		if (kvm_dirty_ring_check_request(vcpu))
> +			return 0;
>  	}
>  
>  	return 1;
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 60ee3d9f01f8..fbeb55e45f53 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -932,6 +932,21 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>  	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
>  }
>  
> +/*
> + * kvm_arch_allow_write_without_running_vcpu - allow writing guest memory
> + * without the running VCPU when dirty ring is enabled.
> + *
> + * The running VCPU is required to track dirty guest pages when dirty ring
> + * is enabled. Otherwise, the backup bitmap should be used to track the
> + * dirty guest pages. When vgic/its tables are being saved, the backup
> + * bitmap is used to track the dirty guest pages due to the missed running
> + * VCPU in the period.
> + */
> +bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
> +{
> +	return kvm_vgic_save_its_tables_in_progress(kvm);

I don't think we need the extra level of abstraction here. Just return
kvm->arch.vgic.save_its_tables_in_progress and be done with it.

You can also move the helper to the vgic-its code since they are
closely related for now.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 4/7] KVM: arm64: Enable ring-based dirty memory tracking
@ 2022-11-06 15:50     ` Marc Zyngier
  0 siblings, 0 replies; 66+ messages in thread
From: Marc Zyngier @ 2022-11-06 15:50 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvm, catalin.marinas, andrew.jones, dmatlack, will, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, shuah, kvmarm, ajones

On Fri, 04 Nov 2022 23:40:46 +0000,
Gavin Shan <gshan@redhat.com> wrote:
> 
> Enable ring-based dirty memory tracking on arm64 by selecting
> CONFIG_HAVE_KVM_DIRTY_{RING_ACQ_REL, RING_WITH_BITMAP} and providing
> the ring buffer's physical page offset (KVM_DIRTY_LOG_PAGE_OFFSET).
> 
> Besides, helper kvm_vgic_save_its_tables_in_progress() is added to
> indicate if vgic/its tables are being saved or not. The helper is used
> in ARM64's kvm_arch_allow_write_without_running_vcpu() to keep the
> site of saving vgic/its tables out of no-running-vcpu radar.
> 
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
>  Documentation/virt/kvm/api.rst     |  2 +-
>  arch/arm64/include/uapi/asm/kvm.h  |  1 +
>  arch/arm64/kvm/Kconfig             |  2 ++
>  arch/arm64/kvm/arm.c               |  3 +++
>  arch/arm64/kvm/mmu.c               | 15 +++++++++++++++
>  arch/arm64/kvm/vgic/vgic-its.c     |  3 +++
>  arch/arm64/kvm/vgic/vgic-mmio-v3.c |  7 +++++++
>  include/kvm/arm_vgic.h             |  2 ++
>  8 files changed, 34 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 2ec32bd41792..2fc68f684ad8 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -7921,7 +7921,7 @@ regardless of what has actually been exposed through the CPUID leaf.
>  8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
>  ----------------------------------------------------------
>  
> -:Architectures: x86
> +:Architectures: x86, arm64
>  :Parameters: args[0] - size of the dirty log ring
>  
>  KVM is capable of tracking dirty memory using ring buffers that are
> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> index 316917b98707..a7a857f1784d 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -43,6 +43,7 @@
>  #define __KVM_HAVE_VCPU_EVENTS
>  
>  #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
> +#define KVM_DIRTY_LOG_PAGE_OFFSET 64
>  
>  #define KVM_REG_SIZE(id)						\
>  	(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> index 815cc118c675..066b053e9eb9 100644
> --- a/arch/arm64/kvm/Kconfig
> +++ b/arch/arm64/kvm/Kconfig
> @@ -32,6 +32,8 @@ menuconfig KVM
>  	select KVM_VFIO
>  	select HAVE_KVM_EVENTFD
>  	select HAVE_KVM_IRQFD
> +	select HAVE_KVM_DIRTY_RING_ACQ_REL
> +	select HAVE_KVM_DIRTY_RING_WITH_BITMAP
>  	select HAVE_KVM_MSI
>  	select HAVE_KVM_IRQCHIP
>  	select HAVE_KVM_IRQ_ROUTING
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 94d33e296e10..6b097605e38c 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -746,6 +746,9 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
>  
>  		if (kvm_check_request(KVM_REQ_SUSPEND, vcpu))
>  			return kvm_vcpu_suspend(vcpu);
> +
> +		if (kvm_dirty_ring_check_request(vcpu))
> +			return 0;
>  	}
>  
>  	return 1;
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 60ee3d9f01f8..fbeb55e45f53 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -932,6 +932,21 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>  	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
>  }
>  
> +/*
> + * kvm_arch_allow_write_without_running_vcpu - allow writing guest memory
> + * without the running VCPU when dirty ring is enabled.
> + *
> + * The running VCPU is required to track dirty guest pages when dirty ring
> + * is enabled. Otherwise, the backup bitmap should be used to track the
> + * dirty guest pages. When vgic/its tables are being saved, the backup
> + * bitmap is used to track the dirty guest pages due to the missed running
> + * VCPU in the period.
> + */
> +bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
> +{
> +	return kvm_vgic_save_its_tables_in_progress(kvm);

I don't think we need the extra level of abstraction here. Just return
kvm->arch.vgic.save_its_tables_in_progress and be done with it.

You can also move the helper to the vgic-its code since they are
closely related for now.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 0/7] KVM: arm64: Enable ring-based dirty memory tracking
  2022-11-04 23:40 ` Gavin Shan
@ 2022-11-06 16:08   ` Marc Zyngier
  -1 siblings, 0 replies; 66+ messages in thread
From: Marc Zyngier @ 2022-11-06 16:08 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvmarm, kvmarm, kvm, shuah, catalin.marinas, andrew.jones,
	ajones, bgardon, dmatlack, will, suzuki.poulose,
	alexandru.elisei, pbonzini, peterx, seanjc, oliver.upton,
	zhenyzha, shan.gavin

On Fri, 04 Nov 2022 23:40:42 +0000,
Gavin Shan <gshan@redhat.com> wrote:
> 
> This series enables the ring-based dirty memory tracking for ARM64.
> The feature has been available and enabled on x86 for a while. It
> is beneficial when the number of dirty pages is small in a checkpointing
> system or live migration scenario. More details can be found from
> fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking").
> 
> This series is applied to v6.1.rc3, plus commit c227590467cb ("KVM:
> Check KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL} prior to enabling them").
> The commit is currently in Marc's 'fixes' branch, targeting v6.1.rc4/5.

This is starting to look good to me, and my only concerns are around
the documentation and the bit of nitpicking on patch 4. If we can
converge quickly on that, I'd like to queue this quickly and leave it
to simmer in -next.

> v7: https://lore.kernel.org/kvmarm/20221031003621.164306-1-gshan@redhat.com/
> v6: https://lore.kernel.org/kvmarm/20221011061447.131531-1-gshan@redhat.com/
> v5: https://lore.kernel.org/all/20221005004154.83502-1-gshan@redhat.com/
> v4: https://lore.kernel.org/kvmarm/20220927005439.21130-1-gshan@redhat.com/
> v3: https://lore.kernel.org/r/20220922003214.276736-1-gshan@redhat.com
> v2: https://lore.kernel.org/lkml/YyiV%2Fl7O23aw5aaO@xz-m1.local/T/
> v1: https://lore.kernel.org/lkml/20220819005601.198436-1-gshan@redhat.com
> 
> Testing
> =======
> (1) kvm/selftests/dirty_log_test
> (2) Live migration by QEMU

Could you point to a branch that has the required QEMU changes?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 0/7] KVM: arm64: Enable ring-based dirty memory tracking
@ 2022-11-06 16:08   ` Marc Zyngier
  0 siblings, 0 replies; 66+ messages in thread
From: Marc Zyngier @ 2022-11-06 16:08 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvm, catalin.marinas, andrew.jones, dmatlack, will, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, shuah, kvmarm, ajones

On Fri, 04 Nov 2022 23:40:42 +0000,
Gavin Shan <gshan@redhat.com> wrote:
> 
> This series enables the ring-based dirty memory tracking for ARM64.
> The feature has been available and enabled on x86 for a while. It
> is beneficial when the number of dirty pages is small in a checkpointing
> system or live migration scenario. More details can be found from
> fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking").
> 
> This series is applied to v6.1.rc3, plus commit c227590467cb ("KVM:
> Check KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL} prior to enabling them").
> The commit is currently in Marc's 'fixes' branch, targeting v6.1.rc4/5.

This is starting to look good to me, and my only concerns are around
the documentation and the bit of nitpicking on patch 4. If we can
converge quickly on that, I'd like to queue this quickly and leave it
to simmer in -next.

> v7: https://lore.kernel.org/kvmarm/20221031003621.164306-1-gshan@redhat.com/
> v6: https://lore.kernel.org/kvmarm/20221011061447.131531-1-gshan@redhat.com/
> v5: https://lore.kernel.org/all/20221005004154.83502-1-gshan@redhat.com/
> v4: https://lore.kernel.org/kvmarm/20220927005439.21130-1-gshan@redhat.com/
> v3: https://lore.kernel.org/r/20220922003214.276736-1-gshan@redhat.com
> v2: https://lore.kernel.org/lkml/YyiV%2Fl7O23aw5aaO@xz-m1.local/T/
> v1: https://lore.kernel.org/lkml/20220819005601.198436-1-gshan@redhat.com
> 
> Testing
> =======
> (1) kvm/selftests/dirty_log_test
> (2) Live migration by QEMU

Could you point to a branch that has the required QEMU changes?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
  2022-11-06 15:43     ` Marc Zyngier
@ 2022-11-06 16:22       ` Peter Xu
  -1 siblings, 0 replies; 66+ messages in thread
From: Peter Xu @ 2022-11-06 16:22 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, catalin.marinas, andrew.jones, dmatlack, will, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, shuah, kvmarm, ajones

Hi, Marc,

On Sun, Nov 06, 2022 at 03:43:17PM +0000, Marc Zyngier wrote:
> > +Note that the bitmap here is only a backup of the ring structure, and
> > +normally should only contain a very small amount of dirty pages, which
> 
> I don't think we can claim this. It is whatever amount of memory is
> dirtied outside of a vcpu context, and we shouldn't make any claim
> regarding the number of dirty pages.

The thing is the current with-bitmap design assumes that the two logs are
collected in different windows of migration, while the dirty log is only
collected after the VM is stopped.  So collecting dirty bitmap and sending
the dirty pages within the bitmap will be part of the VM downtime.

It will stop to make sense if the dirty bitmap can contain a large portion
of the guest memory, because then it'll be simpler to just stop the VM,
transfer pages, and restart on dest node without any tracking mechanism.

[1]

> 
> > +needs to be transferred during VM downtime. Collecting the dirty bitmap
> > +should be the very last thing that the VMM does before transmitting state
> > +to the target VM. VMM needs to ensure that the dirty state is final and
> > +avoid missing dirty pages from another ioctl ordered after the bitmap
> > +collection.
> > +
> > +To collect dirty bits in the backup bitmap, the userspace can use the
> > +same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG shouldn't be needed
> > +and its behavior is undefined since collecting the dirty bitmap always
> > +happens in the last phase of VM's migration.
> 
> It isn't clear to me why KVM_CLEAR_DIRTY_LOG should be called out. If
> you have multiple devices that dirty the memory, such as multiple
> ITSs, why shouldn't userspace be allowed to snapshot the dirty state
> multiple time? This doesn't seem like a reasonable restriction, and I
> really dislike the idea of undefined behaviour here.

I suggested the paragraph because it's very natural to ask whether we'd
need to CLEAR_LOG for this special GET_LOG phase, so I thought this could
be helpful as a reference to answer that.

I wanted to make it clear that we don't need CLEAR_LOG at all in this case,
as fundamentally clear log is about re-protect the guest pages, but if
we're with the restriction of above (having the dirty bmap the last to
collect and once and for all) then it'll make no sense to protect the guest
page at all at this stage since src host shouldn't run after the GET_LOG
then the CLEAR_LOG will be a vain effort.

I used "undefined" here just to be loose on the interface, also a hint that
we should never do that for this specific GET_LOG.  If we want, we can
ignore CLEAR_LOG in the future with ALLOW_BITMAP, and the undefined also
provides the flexibility, but that's not really that important.

The wording could definitely be improved, or maybe even avoid mentioning
the CLEAR_LOG might help, but IIUC the major thing to reach the consensus
is not CLEAR_LOG itself but on whether we can have that assumption [1] and
whether such a design of using dirty bmap is acceptable in general.

Thanks,

-- 
Peter Xu

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
@ 2022-11-06 16:22       ` Peter Xu
  0 siblings, 0 replies; 66+ messages in thread
From: Peter Xu @ 2022-11-06 16:22 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Gavin Shan, kvmarm, kvmarm, kvm, shuah, catalin.marinas,
	andrew.jones, ajones, bgardon, dmatlack, will, suzuki.poulose,
	alexandru.elisei, pbonzini, seanjc, oliver.upton, zhenyzha,
	shan.gavin

Hi, Marc,

On Sun, Nov 06, 2022 at 03:43:17PM +0000, Marc Zyngier wrote:
> > +Note that the bitmap here is only a backup of the ring structure, and
> > +normally should only contain a very small amount of dirty pages, which
> 
> I don't think we can claim this. It is whatever amount of memory is
> dirtied outside of a vcpu context, and we shouldn't make any claim
> regarding the number of dirty pages.

The thing is the current with-bitmap design assumes that the two logs are
collected in different windows of migration, while the dirty log is only
collected after the VM is stopped.  So collecting dirty bitmap and sending
the dirty pages within the bitmap will be part of the VM downtime.

It will stop to make sense if the dirty bitmap can contain a large portion
of the guest memory, because then it'll be simpler to just stop the VM,
transfer pages, and restart on dest node without any tracking mechanism.

[1]

> 
> > +needs to be transferred during VM downtime. Collecting the dirty bitmap
> > +should be the very last thing that the VMM does before transmitting state
> > +to the target VM. VMM needs to ensure that the dirty state is final and
> > +avoid missing dirty pages from another ioctl ordered after the bitmap
> > +collection.
> > +
> > +To collect dirty bits in the backup bitmap, the userspace can use the
> > +same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG shouldn't be needed
> > +and its behavior is undefined since collecting the dirty bitmap always
> > +happens in the last phase of VM's migration.
> 
> It isn't clear to me why KVM_CLEAR_DIRTY_LOG should be called out. If
> you have multiple devices that dirty the memory, such as multiple
> ITSs, why shouldn't userspace be allowed to snapshot the dirty state
> multiple time? This doesn't seem like a reasonable restriction, and I
> really dislike the idea of undefined behaviour here.

I suggested the paragraph because it's very natural to ask whether we'd
need to CLEAR_LOG for this special GET_LOG phase, so I thought this could
be helpful as a reference to answer that.

I wanted to make it clear that we don't need CLEAR_LOG at all in this case,
as fundamentally clear log is about re-protect the guest pages, but if
we're with the restriction of above (having the dirty bmap the last to
collect and once and for all) then it'll make no sense to protect the guest
page at all at this stage since src host shouldn't run after the GET_LOG
then the CLEAR_LOG will be a vain effort.

I used "undefined" here just to be loose on the interface, also a hint that
we should never do that for this specific GET_LOG.  If we want, we can
ignore CLEAR_LOG in the future with ALLOW_BITMAP, and the undefined also
provides the flexibility, but that's not really that important.

The wording could definitely be improved, or maybe even avoid mentioning
the CLEAR_LOG might help, but IIUC the major thing to reach the consensus
is not CLEAR_LOG itself but on whether we can have that assumption [1] and
whether such a design of using dirty bmap is acceptable in general.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
  2022-11-06 16:22       ` Peter Xu
@ 2022-11-06 20:12         ` Marc Zyngier
  -1 siblings, 0 replies; 66+ messages in thread
From: Marc Zyngier @ 2022-11-06 20:12 UTC (permalink / raw)
  To: Peter Xu
  Cc: Gavin Shan, kvmarm, kvmarm, kvm, shuah, catalin.marinas,
	andrew.jones, ajones, bgardon, dmatlack, will, suzuki.poulose,
	alexandru.elisei, pbonzini, seanjc, oliver.upton, zhenyzha,
	shan.gavin

Hi Peter,

On Sun, 06 Nov 2022 16:22:29 +0000,
Peter Xu <peterx@redhat.com> wrote:
> 
> Hi, Marc,
> 
> On Sun, Nov 06, 2022 at 03:43:17PM +0000, Marc Zyngier wrote:
> > > +Note that the bitmap here is only a backup of the ring structure, and
> > > +normally should only contain a very small amount of dirty pages, which
> > 
> > I don't think we can claim this. It is whatever amount of memory is
> > dirtied outside of a vcpu context, and we shouldn't make any claim
> > regarding the number of dirty pages.
> 
> The thing is the current with-bitmap design assumes that the two logs are
> collected in different windows of migration, while the dirty log is only
> collected after the VM is stopped.  So collecting dirty bitmap and sending
> the dirty pages within the bitmap will be part of the VM downtime.
> 
> It will stop to make sense if the dirty bitmap can contain a large portion
> of the guest memory, because then it'll be simpler to just stop the VM,
> transfer pages, and restart on dest node without any tracking mechanism.

Oh, I absolutely agree that the whole vcpu dirty ring makes zero sense
in general. It only makes sense if the source of the dirty pages is
limited to the vcpus, which is literally a corner case. Look at any
real machine, and you'll quickly realise that this isn't the case, and
that DMA *is* a huge source of dirty pages.

Here, we're just lucky enough not to have much DMA tracking yet. Once
that happens (and I have it from people doing the actual work that it
*is* happening), you'll realise that the dirty ring story is of very
limited use. So I'd rather drop anything quantitative here, as this is
likely to be wrong.

>
> [1]
> 
> > 
> > > +needs to be transferred during VM downtime. Collecting the dirty bitmap
> > > +should be the very last thing that the VMM does before transmitting state
> > > +to the target VM. VMM needs to ensure that the dirty state is final and
> > > +avoid missing dirty pages from another ioctl ordered after the bitmap
> > > +collection.
> > > +
> > > +To collect dirty bits in the backup bitmap, the userspace can use the
> > > +same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG shouldn't be needed
> > > +and its behavior is undefined since collecting the dirty bitmap always
> > > +happens in the last phase of VM's migration.
> > 
> > It isn't clear to me why KVM_CLEAR_DIRTY_LOG should be called out. If
> > you have multiple devices that dirty the memory, such as multiple
> > ITSs, why shouldn't userspace be allowed to snapshot the dirty state
> > multiple time? This doesn't seem like a reasonable restriction, and I
> > really dislike the idea of undefined behaviour here.
> 
> I suggested the paragraph because it's very natural to ask whether we'd
> need to CLEAR_LOG for this special GET_LOG phase, so I thought this could
> be helpful as a reference to answer that.
> 
> I wanted to make it clear that we don't need CLEAR_LOG at all in this case,
> as fundamentally clear log is about re-protect the guest pages, but if
> we're with the restriction of above (having the dirty bmap the last to
> collect and once and for all) then it'll make no sense to protect the guest
> page at all at this stage since src host shouldn't run after the GET_LOG
> then the CLEAR_LOG will be a vain effort.

That's not for you to decide, but userspace. I can perfectly expect
userspace saving an ITS, getting the bitmap, saving the pages and then
*clearing the log* before processing the next ITS. Or anything else.

And by the way, userspace is perfectly entitled to *restart* the VM on
the spot if it wants to. After all, there is absolutely nothing that
says "migration" here. You are reading it all over the place, but
that's not what the API is about.

Frankly, I don't see why we should put random limitation to this
API. We're not in the business of setting policies on what userspace
does.

> 
> I used "undefined" here just to be loose on the interface, also a hint that
> we should never do that for this specific GET_LOG.  If we want, we can
> ignore CLEAR_LOG in the future with ALLOW_BITMAP, and the undefined also
> provides the flexibility, but that's not really that important.
> 
> The wording could definitely be improved, or maybe even avoid mentioning
> the CLEAR_LOG might help, but IIUC the major thing to reach the consensus
> is not CLEAR_LOG itself but on whether we can have that assumption [1] and
> whether such a design of using dirty bmap is acceptable in general.

I don't know what [1] is, but the bitmap should behave correctly, no
matter what userspace does, and provide consistent results. We already
depend on this. If the current API cannot support this correctly, then
we should fix it before I take this series.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
@ 2022-11-06 20:12         ` Marc Zyngier
  0 siblings, 0 replies; 66+ messages in thread
From: Marc Zyngier @ 2022-11-06 20:12 UTC (permalink / raw)
  To: Peter Xu
  Cc: kvm, catalin.marinas, andrew.jones, dmatlack, will, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, shuah, kvmarm, ajones

Hi Peter,

On Sun, 06 Nov 2022 16:22:29 +0000,
Peter Xu <peterx@redhat.com> wrote:
> 
> Hi, Marc,
> 
> On Sun, Nov 06, 2022 at 03:43:17PM +0000, Marc Zyngier wrote:
> > > +Note that the bitmap here is only a backup of the ring structure, and
> > > +normally should only contain a very small amount of dirty pages, which
> > 
> > I don't think we can claim this. It is whatever amount of memory is
> > dirtied outside of a vcpu context, and we shouldn't make any claim
> > regarding the number of dirty pages.
> 
> The thing is the current with-bitmap design assumes that the two logs are
> collected in different windows of migration, while the dirty log is only
> collected after the VM is stopped.  So collecting dirty bitmap and sending
> the dirty pages within the bitmap will be part of the VM downtime.
> 
> It will stop to make sense if the dirty bitmap can contain a large portion
> of the guest memory, because then it'll be simpler to just stop the VM,
> transfer pages, and restart on dest node without any tracking mechanism.

Oh, I absolutely agree that the whole vcpu dirty ring makes zero sense
in general. It only makes sense if the source of the dirty pages is
limited to the vcpus, which is literally a corner case. Look at any
real machine, and you'll quickly realise that this isn't the case, and
that DMA *is* a huge source of dirty pages.

Here, we're just lucky enough not to have much DMA tracking yet. Once
that happens (and I have it from people doing the actual work that it
*is* happening), you'll realise that the dirty ring story is of very
limited use. So I'd rather drop anything quantitative here, as this is
likely to be wrong.

>
> [1]
> 
> > 
> > > +needs to be transferred during VM downtime. Collecting the dirty bitmap
> > > +should be the very last thing that the VMM does before transmitting state
> > > +to the target VM. VMM needs to ensure that the dirty state is final and
> > > +avoid missing dirty pages from another ioctl ordered after the bitmap
> > > +collection.
> > > +
> > > +To collect dirty bits in the backup bitmap, the userspace can use the
> > > +same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG shouldn't be needed
> > > +and its behavior is undefined since collecting the dirty bitmap always
> > > +happens in the last phase of VM's migration.
> > 
> > It isn't clear to me why KVM_CLEAR_DIRTY_LOG should be called out. If
> > you have multiple devices that dirty the memory, such as multiple
> > ITSs, why shouldn't userspace be allowed to snapshot the dirty state
> > multiple time? This doesn't seem like a reasonable restriction, and I
> > really dislike the idea of undefined behaviour here.
> 
> I suggested the paragraph because it's very natural to ask whether we'd
> need to CLEAR_LOG for this special GET_LOG phase, so I thought this could
> be helpful as a reference to answer that.
> 
> I wanted to make it clear that we don't need CLEAR_LOG at all in this case,
> as fundamentally clear log is about re-protect the guest pages, but if
> we're with the restriction of above (having the dirty bmap the last to
> collect and once and for all) then it'll make no sense to protect the guest
> page at all at this stage since src host shouldn't run after the GET_LOG
> then the CLEAR_LOG will be a vain effort.

That's not for you to decide, but userspace. I can perfectly expect
userspace saving an ITS, getting the bitmap, saving the pages and then
*clearing the log* before processing the next ITS. Or anything else.

And by the way, userspace is perfectly entitled to *restart* the VM on
the spot if it wants to. After all, there is absolutely nothing that
says "migration" here. You are reading it all over the place, but
that's not what the API is about.

Frankly, I don't see why we should put random limitation to this
API. We're not in the business of setting policies on what userspace
does.

> 
> I used "undefined" here just to be loose on the interface, also a hint that
> we should never do that for this specific GET_LOG.  If we want, we can
> ignore CLEAR_LOG in the future with ALLOW_BITMAP, and the undefined also
> provides the flexibility, but that's not really that important.
> 
> The wording could definitely be improved, or maybe even avoid mentioning
> the CLEAR_LOG might help, but IIUC the major thing to reach the consensus
> is not CLEAR_LOG itself but on whether we can have that assumption [1] and
> whether such a design of using dirty bmap is acceptable in general.

I don't know what [1] is, but the bitmap should behave correctly, no
matter what userspace does, and provide consistent results. We already
depend on this. If the current API cannot support this correctly, then
we should fix it before I take this series.

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
  2022-11-06 20:12         ` Marc Zyngier
@ 2022-11-06 21:06           ` Peter Xu
  -1 siblings, 0 replies; 66+ messages in thread
From: Peter Xu @ 2022-11-06 21:06 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, catalin.marinas, andrew.jones, dmatlack, will, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, shuah, kvmarm, ajones

On Sun, Nov 06, 2022 at 08:12:22PM +0000, Marc Zyngier wrote:
> Hi Peter,
> 
> On Sun, 06 Nov 2022 16:22:29 +0000,
> Peter Xu <peterx@redhat.com> wrote:
> > 
> > Hi, Marc,
> > 
> > On Sun, Nov 06, 2022 at 03:43:17PM +0000, Marc Zyngier wrote:
> > > > +Note that the bitmap here is only a backup of the ring structure, and
> > > > +normally should only contain a very small amount of dirty pages, which
> > > 
> > > I don't think we can claim this. It is whatever amount of memory is
> > > dirtied outside of a vcpu context, and we shouldn't make any claim
> > > regarding the number of dirty pages.
> > 
> > The thing is the current with-bitmap design assumes that the two logs are
> > collected in different windows of migration, while the dirty log is only
> > collected after the VM is stopped.  So collecting dirty bitmap and sending
> > the dirty pages within the bitmap will be part of the VM downtime.
> > 
> > It will stop to make sense if the dirty bitmap can contain a large portion
> > of the guest memory, because then it'll be simpler to just stop the VM,
> > transfer pages, and restart on dest node without any tracking mechanism.
> 
> Oh, I absolutely agree that the whole vcpu dirty ring makes zero sense
> in general. It only makes sense if the source of the dirty pages is
> limited to the vcpus, which is literally a corner case. Look at any
> real machine, and you'll quickly realise that this isn't the case, and
> that DMA *is* a huge source of dirty pages.
> 
> Here, we're just lucky enough not to have much DMA tracking yet. Once
> that happens (and I have it from people doing the actual work that it
> *is* happening), you'll realise that the dirty ring story is of very
> limited use. So I'd rather drop anything quantitative here, as this is
> likely to be wrong.

Is it a must that arm64 needs to track device DMAs using the same dirty
tracking interface rather than VFIO or any other interface?  It's
definitely not the case for x86, but if it's true for arm64, then could the
DMA be spread across all the guest pages?  If it's also true, I really
don't know how this will work..

We're only syncing the dirty bitmap once right now with the protocol.  If
that can cover most of the guest mem, it's same as non-live.  If we sync it
periodically, then it's the same as enabling dirty-log alone and the rings
are useless.

> 
> >
> > [1]
> > 
> > > 
> > > > +needs to be transferred during VM downtime. Collecting the dirty bitmap
> > > > +should be the very last thing that the VMM does before transmitting state
> > > > +to the target VM. VMM needs to ensure that the dirty state is final and
> > > > +avoid missing dirty pages from another ioctl ordered after the bitmap
> > > > +collection.
> > > > +
> > > > +To collect dirty bits in the backup bitmap, the userspace can use the
> > > > +same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG shouldn't be needed
> > > > +and its behavior is undefined since collecting the dirty bitmap always
> > > > +happens in the last phase of VM's migration.
> > > 
> > > It isn't clear to me why KVM_CLEAR_DIRTY_LOG should be called out. If
> > > you have multiple devices that dirty the memory, such as multiple
> > > ITSs, why shouldn't userspace be allowed to snapshot the dirty state
> > > multiple time? This doesn't seem like a reasonable restriction, and I
> > > really dislike the idea of undefined behaviour here.
> > 
> > I suggested the paragraph because it's very natural to ask whether we'd
> > need to CLEAR_LOG for this special GET_LOG phase, so I thought this could
> > be helpful as a reference to answer that.
> > 
> > I wanted to make it clear that we don't need CLEAR_LOG at all in this case,
> > as fundamentally clear log is about re-protect the guest pages, but if
> > we're with the restriction of above (having the dirty bmap the last to
> > collect and once and for all) then it'll make no sense to protect the guest
> > page at all at this stage since src host shouldn't run after the GET_LOG
> > then the CLEAR_LOG will be a vain effort.
> 
> That's not for you to decide, but userspace. I can perfectly expect
> userspace saving an ITS, getting the bitmap, saving the pages and then
> *clearing the log* before processing the next ITS. Or anything else.

I think I can get your point on why you're not happy with the document, but
IMHO how we document is one thing, how it'll work is another.  I preferred
explicit documentation because it'll help the app developer to support the
interface, also more docs to reference in the future; no strong opinion,
though.

However if there's fundamental statement that was literally wrong, then
it's another thing, and we may need to rethink.

Thanks,

-- 
Peter Xu

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
@ 2022-11-06 21:06           ` Peter Xu
  0 siblings, 0 replies; 66+ messages in thread
From: Peter Xu @ 2022-11-06 21:06 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Gavin Shan, kvmarm, kvmarm, kvm, shuah, catalin.marinas,
	andrew.jones, ajones, bgardon, dmatlack, will, suzuki.poulose,
	alexandru.elisei, pbonzini, seanjc, oliver.upton, zhenyzha,
	shan.gavin

On Sun, Nov 06, 2022 at 08:12:22PM +0000, Marc Zyngier wrote:
> Hi Peter,
> 
> On Sun, 06 Nov 2022 16:22:29 +0000,
> Peter Xu <peterx@redhat.com> wrote:
> > 
> > Hi, Marc,
> > 
> > On Sun, Nov 06, 2022 at 03:43:17PM +0000, Marc Zyngier wrote:
> > > > +Note that the bitmap here is only a backup of the ring structure, and
> > > > +normally should only contain a very small amount of dirty pages, which
> > > 
> > > I don't think we can claim this. It is whatever amount of memory is
> > > dirtied outside of a vcpu context, and we shouldn't make any claim
> > > regarding the number of dirty pages.
> > 
> > The thing is the current with-bitmap design assumes that the two logs are
> > collected in different windows of migration, while the dirty log is only
> > collected after the VM is stopped.  So collecting dirty bitmap and sending
> > the dirty pages within the bitmap will be part of the VM downtime.
> > 
> > It will stop to make sense if the dirty bitmap can contain a large portion
> > of the guest memory, because then it'll be simpler to just stop the VM,
> > transfer pages, and restart on dest node without any tracking mechanism.
> 
> Oh, I absolutely agree that the whole vcpu dirty ring makes zero sense
> in general. It only makes sense if the source of the dirty pages is
> limited to the vcpus, which is literally a corner case. Look at any
> real machine, and you'll quickly realise that this isn't the case, and
> that DMA *is* a huge source of dirty pages.
> 
> Here, we're just lucky enough not to have much DMA tracking yet. Once
> that happens (and I have it from people doing the actual work that it
> *is* happening), you'll realise that the dirty ring story is of very
> limited use. So I'd rather drop anything quantitative here, as this is
> likely to be wrong.

Is it a must that arm64 needs to track device DMAs using the same dirty
tracking interface rather than VFIO or any other interface?  It's
definitely not the case for x86, but if it's true for arm64, then could the
DMA be spread across all the guest pages?  If it's also true, I really
don't know how this will work..

We're only syncing the dirty bitmap once right now with the protocol.  If
that can cover most of the guest mem, it's same as non-live.  If we sync it
periodically, then it's the same as enabling dirty-log alone and the rings
are useless.

> 
> >
> > [1]
> > 
> > > 
> > > > +needs to be transferred during VM downtime. Collecting the dirty bitmap
> > > > +should be the very last thing that the VMM does before transmitting state
> > > > +to the target VM. VMM needs to ensure that the dirty state is final and
> > > > +avoid missing dirty pages from another ioctl ordered after the bitmap
> > > > +collection.
> > > > +
> > > > +To collect dirty bits in the backup bitmap, the userspace can use the
> > > > +same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG shouldn't be needed
> > > > +and its behavior is undefined since collecting the dirty bitmap always
> > > > +happens in the last phase of VM's migration.
> > > 
> > > It isn't clear to me why KVM_CLEAR_DIRTY_LOG should be called out. If
> > > you have multiple devices that dirty the memory, such as multiple
> > > ITSs, why shouldn't userspace be allowed to snapshot the dirty state
> > > multiple time? This doesn't seem like a reasonable restriction, and I
> > > really dislike the idea of undefined behaviour here.
> > 
> > I suggested the paragraph because it's very natural to ask whether we'd
> > need to CLEAR_LOG for this special GET_LOG phase, so I thought this could
> > be helpful as a reference to answer that.
> > 
> > I wanted to make it clear that we don't need CLEAR_LOG at all in this case,
> > as fundamentally clear log is about re-protect the guest pages, but if
> > we're with the restriction of above (having the dirty bmap the last to
> > collect and once and for all) then it'll make no sense to protect the guest
> > page at all at this stage since src host shouldn't run after the GET_LOG
> > then the CLEAR_LOG will be a vain effort.
> 
> That's not for you to decide, but userspace. I can perfectly expect
> userspace saving an ITS, getting the bitmap, saving the pages and then
> *clearing the log* before processing the next ITS. Or anything else.

I think I can get your point on why you're not happy with the document, but
IMHO how we document is one thing, how it'll work is another.  I preferred
explicit documentation because it'll help the app developer to support the
interface, also more docs to reference in the future; no strong opinion,
though.

However if there's fundamental statement that was literally wrong, then
it's another thing, and we may need to rethink.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
  2022-11-06 21:06           ` Peter Xu
@ 2022-11-06 21:23             ` Gavin Shan
  -1 siblings, 0 replies; 66+ messages in thread
From: Gavin Shan @ 2022-11-06 21:23 UTC (permalink / raw)
  To: Peter Xu, Marc Zyngier
  Cc: kvm, catalin.marinas, andrew.jones, dmatlack, will, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, shuah, kvmarm, ajones

Hi Peter and Marc,

On 11/7/22 5:06 AM, Peter Xu wrote:
> On Sun, Nov 06, 2022 at 08:12:22PM +0000, Marc Zyngier wrote:
>> On Sun, 06 Nov 2022 16:22:29 +0000,
>> Peter Xu <peterx@redhat.com> wrote:
>>> On Sun, Nov 06, 2022 at 03:43:17PM +0000, Marc Zyngier wrote:
>>>>> +Note that the bitmap here is only a backup of the ring structure, and
>>>>> +normally should only contain a very small amount of dirty pages, which
>>>>
>>>> I don't think we can claim this. It is whatever amount of memory is
>>>> dirtied outside of a vcpu context, and we shouldn't make any claim
>>>> regarding the number of dirty pages.
>>>
>>> The thing is the current with-bitmap design assumes that the two logs are
>>> collected in different windows of migration, while the dirty log is only
>>> collected after the VM is stopped.  So collecting dirty bitmap and sending
>>> the dirty pages within the bitmap will be part of the VM downtime.
>>>
>>> It will stop to make sense if the dirty bitmap can contain a large portion
>>> of the guest memory, because then it'll be simpler to just stop the VM,
>>> transfer pages, and restart on dest node without any tracking mechanism.
>>
>> Oh, I absolutely agree that the whole vcpu dirty ring makes zero sense
>> in general. It only makes sense if the source of the dirty pages is
>> limited to the vcpus, which is literally a corner case. Look at any
>> real machine, and you'll quickly realise that this isn't the case, and
>> that DMA *is* a huge source of dirty pages.
>>
>> Here, we're just lucky enough not to have much DMA tracking yet. Once
>> that happens (and I have it from people doing the actual work that it
>> *is* happening), you'll realise that the dirty ring story is of very
>> limited use. So I'd rather drop anything quantitative here, as this is
>> likely to be wrong.
> 
> Is it a must that arm64 needs to track device DMAs using the same dirty
> tracking interface rather than VFIO or any other interface?  It's
> definitely not the case for x86, but if it's true for arm64, then could the
> DMA be spread across all the guest pages?  If it's also true, I really
> don't know how this will work..
> 
> We're only syncing the dirty bitmap once right now with the protocol.  If
> that can cover most of the guest mem, it's same as non-live.  If we sync it
> periodically, then it's the same as enabling dirty-log alone and the rings
> are useless.
> 

For vgic/its tables, the number of dirty pages can be huge in theory. However,
they're limited in practice. So I intend to agree with Peter that dirty-ring
should be avoided and dirty-log needs to be used instead when the DMA case
is supported in future. As Peter said, the small amount of dirty pages in
the bitmap is the condition to use it here. I think it makes sense to mention
it in the document.

>>
>>>
>>> [1]
>>>
>>>>
>>>>> +needs to be transferred during VM downtime. Collecting the dirty bitmap
>>>>> +should be the very last thing that the VMM does before transmitting state
>>>>> +to the target VM. VMM needs to ensure that the dirty state is final and
>>>>> +avoid missing dirty pages from another ioctl ordered after the bitmap
>>>>> +collection.
>>>>> +
>>>>> +To collect dirty bits in the backup bitmap, the userspace can use the
>>>>> +same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG shouldn't be needed
>>>>> +and its behavior is undefined since collecting the dirty bitmap always
>>>>> +happens in the last phase of VM's migration.
>>>>
>>>> It isn't clear to me why KVM_CLEAR_DIRTY_LOG should be called out. If
>>>> you have multiple devices that dirty the memory, such as multiple
>>>> ITSs, why shouldn't userspace be allowed to snapshot the dirty state
>>>> multiple time? This doesn't seem like a reasonable restriction, and I
>>>> really dislike the idea of undefined behaviour here.
>>>
>>> I suggested the paragraph because it's very natural to ask whether we'd
>>> need to CLEAR_LOG for this special GET_LOG phase, so I thought this could
>>> be helpful as a reference to answer that.
>>>
>>> I wanted to make it clear that we don't need CLEAR_LOG at all in this case,
>>> as fundamentally clear log is about re-protect the guest pages, but if
>>> we're with the restriction of above (having the dirty bmap the last to
>>> collect and once and for all) then it'll make no sense to protect the guest
>>> page at all at this stage since src host shouldn't run after the GET_LOG
>>> then the CLEAR_LOG will be a vain effort.
>>
>> That's not for you to decide, but userspace. I can perfectly expect
>> userspace saving an ITS, getting the bitmap, saving the pages and then
>> *clearing the log* before processing the next ITS. Or anything else.
> 
> I think I can get your point on why you're not happy with the document, but
> IMHO how we document is one thing, how it'll work is another.  I preferred
> explicit documentation because it'll help the app developer to support the
> interface, also more docs to reference in the future; no strong opinion,
> though.
> 
> However if there's fundamental statement that was literally wrong, then
> it's another thing, and we may need to rethink.
> 

How about to avoid mentioning KVM_CLEAR_DIRTY_LOG here? I don't expect QEMU
to clear the dirty bitmap after it's collected in this particular case.

Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
@ 2022-11-06 21:23             ` Gavin Shan
  0 siblings, 0 replies; 66+ messages in thread
From: Gavin Shan @ 2022-11-06 21:23 UTC (permalink / raw)
  To: Peter Xu, Marc Zyngier
  Cc: kvmarm, kvmarm, kvm, shuah, catalin.marinas, andrew.jones,
	ajones, bgardon, dmatlack, will, suzuki.poulose,
	alexandru.elisei, pbonzini, seanjc, oliver.upton, zhenyzha,
	shan.gavin

Hi Peter and Marc,

On 11/7/22 5:06 AM, Peter Xu wrote:
> On Sun, Nov 06, 2022 at 08:12:22PM +0000, Marc Zyngier wrote:
>> On Sun, 06 Nov 2022 16:22:29 +0000,
>> Peter Xu <peterx@redhat.com> wrote:
>>> On Sun, Nov 06, 2022 at 03:43:17PM +0000, Marc Zyngier wrote:
>>>>> +Note that the bitmap here is only a backup of the ring structure, and
>>>>> +normally should only contain a very small amount of dirty pages, which
>>>>
>>>> I don't think we can claim this. It is whatever amount of memory is
>>>> dirtied outside of a vcpu context, and we shouldn't make any claim
>>>> regarding the number of dirty pages.
>>>
>>> The thing is the current with-bitmap design assumes that the two logs are
>>> collected in different windows of migration, while the dirty log is only
>>> collected after the VM is stopped.  So collecting dirty bitmap and sending
>>> the dirty pages within the bitmap will be part of the VM downtime.
>>>
>>> It will stop to make sense if the dirty bitmap can contain a large portion
>>> of the guest memory, because then it'll be simpler to just stop the VM,
>>> transfer pages, and restart on dest node without any tracking mechanism.
>>
>> Oh, I absolutely agree that the whole vcpu dirty ring makes zero sense
>> in general. It only makes sense if the source of the dirty pages is
>> limited to the vcpus, which is literally a corner case. Look at any
>> real machine, and you'll quickly realise that this isn't the case, and
>> that DMA *is* a huge source of dirty pages.
>>
>> Here, we're just lucky enough not to have much DMA tracking yet. Once
>> that happens (and I have it from people doing the actual work that it
>> *is* happening), you'll realise that the dirty ring story is of very
>> limited use. So I'd rather drop anything quantitative here, as this is
>> likely to be wrong.
> 
> Is it a must that arm64 needs to track device DMAs using the same dirty
> tracking interface rather than VFIO or any other interface?  It's
> definitely not the case for x86, but if it's true for arm64, then could the
> DMA be spread across all the guest pages?  If it's also true, I really
> don't know how this will work..
> 
> We're only syncing the dirty bitmap once right now with the protocol.  If
> that can cover most of the guest mem, it's same as non-live.  If we sync it
> periodically, then it's the same as enabling dirty-log alone and the rings
> are useless.
> 

For vgic/its tables, the number of dirty pages can be huge in theory. However,
they're limited in practice. So I intend to agree with Peter that dirty-ring
should be avoided and dirty-log needs to be used instead when the DMA case
is supported in future. As Peter said, the small amount of dirty pages in
the bitmap is the condition to use it here. I think it makes sense to mention
it in the document.

>>
>>>
>>> [1]
>>>
>>>>
>>>>> +needs to be transferred during VM downtime. Collecting the dirty bitmap
>>>>> +should be the very last thing that the VMM does before transmitting state
>>>>> +to the target VM. VMM needs to ensure that the dirty state is final and
>>>>> +avoid missing dirty pages from another ioctl ordered after the bitmap
>>>>> +collection.
>>>>> +
>>>>> +To collect dirty bits in the backup bitmap, the userspace can use the
>>>>> +same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG shouldn't be needed
>>>>> +and its behavior is undefined since collecting the dirty bitmap always
>>>>> +happens in the last phase of VM's migration.
>>>>
>>>> It isn't clear to me why KVM_CLEAR_DIRTY_LOG should be called out. If
>>>> you have multiple devices that dirty the memory, such as multiple
>>>> ITSs, why shouldn't userspace be allowed to snapshot the dirty state
>>>> multiple time? This doesn't seem like a reasonable restriction, and I
>>>> really dislike the idea of undefined behaviour here.
>>>
>>> I suggested the paragraph because it's very natural to ask whether we'd
>>> need to CLEAR_LOG for this special GET_LOG phase, so I thought this could
>>> be helpful as a reference to answer that.
>>>
>>> I wanted to make it clear that we don't need CLEAR_LOG at all in this case,
>>> as fundamentally clear log is about re-protect the guest pages, but if
>>> we're with the restriction of above (having the dirty bmap the last to
>>> collect and once and for all) then it'll make no sense to protect the guest
>>> page at all at this stage since src host shouldn't run after the GET_LOG
>>> then the CLEAR_LOG will be a vain effort.
>>
>> That's not for you to decide, but userspace. I can perfectly expect
>> userspace saving an ITS, getting the bitmap, saving the pages and then
>> *clearing the log* before processing the next ITS. Or anything else.
> 
> I think I can get your point on why you're not happy with the document, but
> IMHO how we document is one thing, how it'll work is another.  I preferred
> explicit documentation because it'll help the app developer to support the
> interface, also more docs to reference in the future; no strong opinion,
> though.
> 
> However if there's fundamental statement that was literally wrong, then
> it's another thing, and we may need to rethink.
> 

How about to avoid mentioning KVM_CLEAR_DIRTY_LOG here? I don't expect QEMU
to clear the dirty bitmap after it's collected in this particular case.

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
  2022-11-06 15:43     ` Marc Zyngier
@ 2022-11-06 21:40       ` Gavin Shan
  -1 siblings, 0 replies; 66+ messages in thread
From: Gavin Shan @ 2022-11-06 21:40 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, catalin.marinas, andrew.jones, dmatlack, will, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, shuah, kvmarm, ajones

Hi Marc,

On 11/6/22 11:43 PM, Marc Zyngier wrote:
> On Fri, 04 Nov 2022 23:40:45 +0000,
> Gavin Shan <gshan@redhat.com> wrote:
>>
>> ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
>> enabled. It's conflicting with that ring-based dirty page tracking always
>> requires a running VCPU context.
>>
>> Introduce a new flavor of dirty ring that requires the use of both VCPU
>> dirty rings and a dirty bitmap. The expectation is that for non-VCPU
>> sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
>> the dirty bitmap. Userspace should scan the dirty bitmap before migrating
>> the VM to the target.
>>
>> Use an additional capability to advertise this behavior. The newly added
>> capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
>> KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
>> capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.
>>
>> Suggested-by: Marc Zyngier <maz@kernel.org>
>> Suggested-by: Peter Xu <peterx@redhat.com>
>> Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
>> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> Acked-by: Peter Xu <peterx@redhat.com>
>> ---
>>   Documentation/virt/kvm/api.rst | 33 ++++++++++++++++++-----
>>   include/linux/kvm_dirty_ring.h |  7 +++++
>>   include/linux/kvm_host.h       |  1 +
>>   include/uapi/linux/kvm.h       |  1 +
>>   virt/kvm/Kconfig               |  8 ++++++
>>   virt/kvm/dirty_ring.c          | 10 +++++++
>>   virt/kvm/kvm_main.c            | 49 +++++++++++++++++++++++++++-------
>>   7 files changed, 93 insertions(+), 16 deletions(-)
>>
>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>> index eee9f857a986..2ec32bd41792 100644
>> --- a/Documentation/virt/kvm/api.rst
>> +++ b/Documentation/virt/kvm/api.rst
>> @@ -8003,13 +8003,6 @@ flushing is done by the KVM_GET_DIRTY_LOG ioctl).  To achieve that, one
>>   needs to kick the vcpu out of KVM_RUN using a signal.  The resulting
>>   vmexit ensures that all dirty GFNs are flushed to the dirty rings.
>>   
>> -NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the corresponding
>> -ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls
>> -KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG.  After enabling
>> -KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual
>> -machine will switch to ring-buffer dirty page tracking and further
>> -KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail.
>> -
>>   NOTE: KVM_CAP_DIRTY_LOG_RING_ACQ_REL is the only capability that
>>   should be exposed by weakly ordered architecture, in order to indicate
>>   the additional memory ordering requirements imposed on userspace when
>> @@ -8018,6 +8011,32 @@ Architecture with TSO-like ordering (such as x86) are allowed to
>>   expose both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL
>>   to userspace.
>>   
>> +After using the dirty rings, the userspace needs to detect the capability
> 
> using? or enabling? What comes after suggest the latter.
> 

s/using/enabling in next revision :)

>> +of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures
>> +need to be backed by per-slot bitmaps. With this capability advertised
>> +and supported, it means the architecture can dirty guest pages without
> 
> If it is advertised, it is supported, right?
> 

Yes, s/advertised and supported/advertised in next revision.

>> +vcpu/ring context, so that some of the dirty information will still be
>> +maintained in the bitmap structure. KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
>> +can't be enabled until the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL
>> +has been enabled.
>> +
>> +Note that the bitmap here is only a backup of the ring structure, and
>> +normally should only contain a very small amount of dirty pages, which
> 
> I don't think we can claim this. It is whatever amount of memory is
> dirtied outside of a vcpu context, and we shouldn't make any claim
> regarding the number of dirty pages.
> 

It's the pre-requisite to use the backup bitmap. Otherwise, the guest
will experience long down-time during migration, as mentioned by Peter
in another thread. So it's appropriate to mention the limit of dirty
pages here.

>> +needs to be transferred during VM downtime. Collecting the dirty bitmap
>> +should be the very last thing that the VMM does before transmitting state
>> +to the target VM. VMM needs to ensure that the dirty state is final and
>> +avoid missing dirty pages from another ioctl ordered after the bitmap
>> +collection.
>> +
>> +To collect dirty bits in the backup bitmap, the userspace can use the
>> +same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG shouldn't be needed
>> +and its behavior is undefined since collecting the dirty bitmap always
>> +happens in the last phase of VM's migration.
> 
> It isn't clear to me why KVM_CLEAR_DIRTY_LOG should be called out. If
> you have multiple devices that dirty the memory, such as multiple
> ITSs, why shouldn't userspace be allowed to snapshot the dirty state
> multiple time? This doesn't seem like a reasonable restriction, and I
> really dislike the idea of undefined behaviour here.
> 

It was actually documenting the expected QEMU's usage. With QEMU
excluded, KVM_CLEAR_DIRTY_LOG can be used as usual. Undefined behavior
seems not precise here. We can improve it like below, to avoid talking
about 'undefined behaviour'.

   To collect dirty bits in the backup bitmap, the userspace can use the
   same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG shouldn't be needed
   since collecting the dirty bitmap always happens in the last phase of
   VM's migration.

>> +
>> +NOTE: One example of using the backup bitmap is saving arm64 vgic/its
>> +tables through KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on
>> +KVM device "kvm-arm-vgic-its" during VM's migration.
> 
> It would be good to have something about this in the ITS
> documentation. Something along these lines:
> 
> diff --git a/Documentation/virt/kvm/devices/arm-vgic-its.rst b/Documentation/virt/kvm/devices/arm-vgic-its.rst
> index d257eddbae29..e053124f77c4 100644
> --- a/Documentation/virt/kvm/devices/arm-vgic-its.rst
> +++ b/Documentation/virt/kvm/devices/arm-vgic-its.rst
> @@ -52,7 +52,10 @@ KVM_DEV_ARM_VGIC_GRP_CTRL
>   
>       KVM_DEV_ARM_ITS_SAVE_TABLES
>         save the ITS table data into guest RAM, at the location provisioned
> -      by the guest in corresponding registers/table entries.
> +      by the guest in corresponding registers/table entries. Should userspace
> +      require a form of dirty tracking to identify which pages are modified
> +      by the saving process, it should use a bitmap even if using another
> +      mechanism to track the memory dirtied by the vCPUs.
>   
>         The layout of the tables in guest memory defines an ABI. The entries
>         are laid out in little endian format as described in the last paragraph.
> 

Sure, I will have it in next revision.

> 
>> +
>>   8.30 KVM_CAP_XEN_HVM
>>   --------------------
>>   
>> diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
>> index 199ead37b104..4862c98d80d3 100644
>> --- a/include/linux/kvm_dirty_ring.h
>> +++ b/include/linux/kvm_dirty_ring.h
>> @@ -37,6 +37,11 @@ static inline u32 kvm_dirty_ring_get_rsvd_entries(void)
>>   	return 0;
>>   }
>>   
>> +static inline bool kvm_use_dirty_bitmap(struct kvm *kvm)
>> +{
>> +	return true;
>> +}
>> +
>>   static inline int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring,
>>   				       int index, u32 size)
>>   {
>> @@ -67,6 +72,8 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
>>   #else /* CONFIG_HAVE_KVM_DIRTY_RING */
>>   
>>   int kvm_cpu_dirty_log_size(void);
>> +bool kvm_use_dirty_bitmap(struct kvm *kvm);
>> +bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm);
>>   u32 kvm_dirty_ring_get_rsvd_entries(void);
>>   int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size);
>>   
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index 6fab55e58111..f51eb9419bfc 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -779,6 +779,7 @@ struct kvm {
>>   	pid_t userspace_pid;
>>   	unsigned int max_halt_poll_ns;
>>   	u32 dirty_ring_size;
>> +	bool dirty_ring_with_bitmap;
>>   	bool vm_bugged;
>>   	bool vm_dead;
>>   
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 0d5d4419139a..c87b5882d7ae 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -1178,6 +1178,7 @@ struct kvm_ppc_resize_hpt {
>>   #define KVM_CAP_S390_ZPCI_OP 221
>>   #define KVM_CAP_S390_CPU_TOPOLOGY 222
>>   #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
>> +#define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 224
>>   
>>   #ifdef KVM_CAP_IRQ_ROUTING
>>   
>> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
>> index 800f9470e36b..228be1145cf3 100644
>> --- a/virt/kvm/Kconfig
>> +++ b/virt/kvm/Kconfig
>> @@ -33,6 +33,14 @@ config HAVE_KVM_DIRTY_RING_ACQ_REL
>>          bool
>>          select HAVE_KVM_DIRTY_RING
>>   
>> +# Only architectures that need to dirty memory outside of a vCPU
>> +# context should select this, advertising to userspace the
>> +# requirement to use a dirty bitmap in addition to the vCPU dirty
>> +# ring.
>> +config HAVE_KVM_DIRTY_RING_WITH_BITMAP
>> +	bool
>> +	depends on HAVE_KVM_DIRTY_RING
>> +
>>   config HAVE_KVM_EVENTFD
>>          bool
>>          select EVENTFD
>> diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
>> index fecbb7d75ad2..758679724447 100644
>> --- a/virt/kvm/dirty_ring.c
>> +++ b/virt/kvm/dirty_ring.c
>> @@ -21,6 +21,16 @@ u32 kvm_dirty_ring_get_rsvd_entries(void)
>>   	return KVM_DIRTY_RING_RSVD_ENTRIES + kvm_cpu_dirty_log_size();
>>   }
>>   
>> +bool kvm_use_dirty_bitmap(struct kvm *kvm)
>> +{
>> +	return !kvm->dirty_ring_size || kvm->dirty_ring_with_bitmap;
>> +}
>> +
>> +bool __weak kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
>> +{
>> +	return false;
>> +}
>> +
>>   static u32 kvm_dirty_ring_used(struct kvm_dirty_ring *ring)
>>   {
>>   	return READ_ONCE(ring->dirty_index) - READ_ONCE(ring->reset_index);
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index c865d7d82685..746133b23a66 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -1617,7 +1617,7 @@ static int kvm_prepare_memory_region(struct kvm *kvm,
>>   			new->dirty_bitmap = NULL;
>>   		else if (old && old->dirty_bitmap)
>>   			new->dirty_bitmap = old->dirty_bitmap;
>> -		else if (!kvm->dirty_ring_size) {
>> +		else if (kvm_use_dirty_bitmap(kvm)) {
>>   			r = kvm_alloc_dirty_bitmap(new);
>>   			if (r)
>>   				return r;
>> @@ -2060,8 +2060,8 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
>>   	unsigned long n;
>>   	unsigned long any = 0;
>>   
>> -	/* Dirty ring tracking is exclusive to dirty log tracking */
>> -	if (kvm->dirty_ring_size)
>> +	/* Dirty ring tracking may be exclusive to dirty log tracking */
>> +	if (!kvm_use_dirty_bitmap(kvm))
>>   		return -ENXIO;
>>   
>>   	*memslot = NULL;
>> @@ -2125,8 +2125,8 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
>>   	unsigned long *dirty_bitmap_buffer;
>>   	bool flush;
>>   
>> -	/* Dirty ring tracking is exclusive to dirty log tracking */
>> -	if (kvm->dirty_ring_size)
>> +	/* Dirty ring tracking may be exclusive to dirty log tracking */
>> +	if (!kvm_use_dirty_bitmap(kvm))
>>   		return -ENXIO;
>>   
>>   	as_id = log->slot >> 16;
>> @@ -2237,8 +2237,8 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
>>   	unsigned long *dirty_bitmap_buffer;
>>   	bool flush;
>>   
>> -	/* Dirty ring tracking is exclusive to dirty log tracking */
>> -	if (kvm->dirty_ring_size)
>> +	/* Dirty ring tracking may be exclusive to dirty log tracking */
>> +	if (!kvm_use_dirty_bitmap(kvm))
>>   		return -ENXIO;
>>   
>>   	as_id = log->slot >> 16;
>> @@ -3305,7 +3305,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>>   	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>>   
>>   #ifdef CONFIG_HAVE_KVM_DIRTY_RING
>> -	if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
>> +	if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
>> +		return;
>> +
>> +	if (WARN_ON_ONCE(!kvm_arch_allow_write_without_running_vcpu(kvm) && !vcpu))
>>   		return;
>>   #endif
>>   
>> @@ -3313,7 +3316,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>>   		unsigned long rel_gfn = gfn - memslot->base_gfn;
>>   		u32 slot = (memslot->as_id << 16) | memslot->id;
>>   
>> -		if (kvm->dirty_ring_size)
>> +		if (kvm->dirty_ring_size && vcpu)
>>   			kvm_dirty_ring_push(vcpu, slot, rel_gfn);
>>   		else
>>   			set_bit_le(rel_gfn, memslot->dirty_bitmap);
>> @@ -4482,6 +4485,9 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
>>   		return KVM_DIRTY_RING_MAX_ENTRIES * sizeof(struct kvm_dirty_gfn);
>>   #else
>>   		return 0;
>> +#endif
>> +#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
>> +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
>>   #endif
>>   	case KVM_CAP_BINARY_STATS_FD:
>>   	case KVM_CAP_SYSTEM_EVENT_DATA:
>> @@ -4588,6 +4594,31 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>>   			return -EINVAL;
>>   
>>   		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
>> +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP: {
>> +		struct kvm_memslots *slots;
>> +		int r = -EINVAL;
>> +
>> +		if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
>> +		    !kvm->dirty_ring_size)
>> +			return r;
>> +
>> +		mutex_lock(&kvm->slots_lock);
>> +
>> +		slots = kvm_memslots(kvm);
>> +
>> +		/*
>> +		 * Avoid a race between memslot creation and enabling the ring +
>> +		 * bitmap capability to guarantee that no memslots have been
>> +		 * created without a bitmap.
> 
> It should be called out in the documentation that this capability must
> be enabled before any memslot is created.
> 

Right, Will do in next revision.

>> +		 */
>> +		if (kvm_memslots_empty(slots)) {
>> +			kvm->dirty_ring_with_bitmap = cap->args[0];
>> +			r = 0;
>> +		}
>> +
>> +		mutex_unlock(&kvm->slots_lock);
>> +		return r;
>> +	}
>>   	default:
>>   		return kvm_vm_ioctl_enable_cap(kvm, cap);
>>   	}

Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
@ 2022-11-06 21:40       ` Gavin Shan
  0 siblings, 0 replies; 66+ messages in thread
From: Gavin Shan @ 2022-11-06 21:40 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, kvmarm, kvm, shuah, catalin.marinas, andrew.jones,
	ajones, bgardon, dmatlack, will, suzuki.poulose,
	alexandru.elisei, pbonzini, peterx, seanjc, oliver.upton,
	zhenyzha, shan.gavin

Hi Marc,

On 11/6/22 11:43 PM, Marc Zyngier wrote:
> On Fri, 04 Nov 2022 23:40:45 +0000,
> Gavin Shan <gshan@redhat.com> wrote:
>>
>> ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
>> enabled. It's conflicting with that ring-based dirty page tracking always
>> requires a running VCPU context.
>>
>> Introduce a new flavor of dirty ring that requires the use of both VCPU
>> dirty rings and a dirty bitmap. The expectation is that for non-VCPU
>> sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
>> the dirty bitmap. Userspace should scan the dirty bitmap before migrating
>> the VM to the target.
>>
>> Use an additional capability to advertise this behavior. The newly added
>> capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
>> KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
>> capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.
>>
>> Suggested-by: Marc Zyngier <maz@kernel.org>
>> Suggested-by: Peter Xu <peterx@redhat.com>
>> Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
>> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> Acked-by: Peter Xu <peterx@redhat.com>
>> ---
>>   Documentation/virt/kvm/api.rst | 33 ++++++++++++++++++-----
>>   include/linux/kvm_dirty_ring.h |  7 +++++
>>   include/linux/kvm_host.h       |  1 +
>>   include/uapi/linux/kvm.h       |  1 +
>>   virt/kvm/Kconfig               |  8 ++++++
>>   virt/kvm/dirty_ring.c          | 10 +++++++
>>   virt/kvm/kvm_main.c            | 49 +++++++++++++++++++++++++++-------
>>   7 files changed, 93 insertions(+), 16 deletions(-)
>>
>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>> index eee9f857a986..2ec32bd41792 100644
>> --- a/Documentation/virt/kvm/api.rst
>> +++ b/Documentation/virt/kvm/api.rst
>> @@ -8003,13 +8003,6 @@ flushing is done by the KVM_GET_DIRTY_LOG ioctl).  To achieve that, one
>>   needs to kick the vcpu out of KVM_RUN using a signal.  The resulting
>>   vmexit ensures that all dirty GFNs are flushed to the dirty rings.
>>   
>> -NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the corresponding
>> -ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls
>> -KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG.  After enabling
>> -KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual
>> -machine will switch to ring-buffer dirty page tracking and further
>> -KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail.
>> -
>>   NOTE: KVM_CAP_DIRTY_LOG_RING_ACQ_REL is the only capability that
>>   should be exposed by weakly ordered architecture, in order to indicate
>>   the additional memory ordering requirements imposed on userspace when
>> @@ -8018,6 +8011,32 @@ Architecture with TSO-like ordering (such as x86) are allowed to
>>   expose both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL
>>   to userspace.
>>   
>> +After using the dirty rings, the userspace needs to detect the capability
> 
> using? or enabling? What comes after suggest the latter.
> 

s/using/enabling in next revision :)

>> +of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures
>> +need to be backed by per-slot bitmaps. With this capability advertised
>> +and supported, it means the architecture can dirty guest pages without
> 
> If it is advertised, it is supported, right?
> 

Yes, s/advertised and supported/advertised in next revision.

>> +vcpu/ring context, so that some of the dirty information will still be
>> +maintained in the bitmap structure. KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
>> +can't be enabled until the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL
>> +has been enabled.
>> +
>> +Note that the bitmap here is only a backup of the ring structure, and
>> +normally should only contain a very small amount of dirty pages, which
> 
> I don't think we can claim this. It is whatever amount of memory is
> dirtied outside of a vcpu context, and we shouldn't make any claim
> regarding the number of dirty pages.
> 

It's the pre-requisite to use the backup bitmap. Otherwise, the guest
will experience long down-time during migration, as mentioned by Peter
in another thread. So it's appropriate to mention the limit of dirty
pages here.

>> +needs to be transferred during VM downtime. Collecting the dirty bitmap
>> +should be the very last thing that the VMM does before transmitting state
>> +to the target VM. VMM needs to ensure that the dirty state is final and
>> +avoid missing dirty pages from another ioctl ordered after the bitmap
>> +collection.
>> +
>> +To collect dirty bits in the backup bitmap, the userspace can use the
>> +same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG shouldn't be needed
>> +and its behavior is undefined since collecting the dirty bitmap always
>> +happens in the last phase of VM's migration.
> 
> It isn't clear to me why KVM_CLEAR_DIRTY_LOG should be called out. If
> you have multiple devices that dirty the memory, such as multiple
> ITSs, why shouldn't userspace be allowed to snapshot the dirty state
> multiple time? This doesn't seem like a reasonable restriction, and I
> really dislike the idea of undefined behaviour here.
> 

It was actually documenting the expected QEMU's usage. With QEMU
excluded, KVM_CLEAR_DIRTY_LOG can be used as usual. Undefined behavior
seems not precise here. We can improve it like below, to avoid talking
about 'undefined behaviour'.

   To collect dirty bits in the backup bitmap, the userspace can use the
   same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG shouldn't be needed
   since collecting the dirty bitmap always happens in the last phase of
   VM's migration.

>> +
>> +NOTE: One example of using the backup bitmap is saving arm64 vgic/its
>> +tables through KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on
>> +KVM device "kvm-arm-vgic-its" during VM's migration.
> 
> It would be good to have something about this in the ITS
> documentation. Something along these lines:
> 
> diff --git a/Documentation/virt/kvm/devices/arm-vgic-its.rst b/Documentation/virt/kvm/devices/arm-vgic-its.rst
> index d257eddbae29..e053124f77c4 100644
> --- a/Documentation/virt/kvm/devices/arm-vgic-its.rst
> +++ b/Documentation/virt/kvm/devices/arm-vgic-its.rst
> @@ -52,7 +52,10 @@ KVM_DEV_ARM_VGIC_GRP_CTRL
>   
>       KVM_DEV_ARM_ITS_SAVE_TABLES
>         save the ITS table data into guest RAM, at the location provisioned
> -      by the guest in corresponding registers/table entries.
> +      by the guest in corresponding registers/table entries. Should userspace
> +      require a form of dirty tracking to identify which pages are modified
> +      by the saving process, it should use a bitmap even if using another
> +      mechanism to track the memory dirtied by the vCPUs.
>   
>         The layout of the tables in guest memory defines an ABI. The entries
>         are laid out in little endian format as described in the last paragraph.
> 

Sure, I will have it in next revision.

> 
>> +
>>   8.30 KVM_CAP_XEN_HVM
>>   --------------------
>>   
>> diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h
>> index 199ead37b104..4862c98d80d3 100644
>> --- a/include/linux/kvm_dirty_ring.h
>> +++ b/include/linux/kvm_dirty_ring.h
>> @@ -37,6 +37,11 @@ static inline u32 kvm_dirty_ring_get_rsvd_entries(void)
>>   	return 0;
>>   }
>>   
>> +static inline bool kvm_use_dirty_bitmap(struct kvm *kvm)
>> +{
>> +	return true;
>> +}
>> +
>>   static inline int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring,
>>   				       int index, u32 size)
>>   {
>> @@ -67,6 +72,8 @@ static inline void kvm_dirty_ring_free(struct kvm_dirty_ring *ring)
>>   #else /* CONFIG_HAVE_KVM_DIRTY_RING */
>>   
>>   int kvm_cpu_dirty_log_size(void);
>> +bool kvm_use_dirty_bitmap(struct kvm *kvm);
>> +bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm);
>>   u32 kvm_dirty_ring_get_rsvd_entries(void);
>>   int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size);
>>   
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index 6fab55e58111..f51eb9419bfc 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -779,6 +779,7 @@ struct kvm {
>>   	pid_t userspace_pid;
>>   	unsigned int max_halt_poll_ns;
>>   	u32 dirty_ring_size;
>> +	bool dirty_ring_with_bitmap;
>>   	bool vm_bugged;
>>   	bool vm_dead;
>>   
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 0d5d4419139a..c87b5882d7ae 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -1178,6 +1178,7 @@ struct kvm_ppc_resize_hpt {
>>   #define KVM_CAP_S390_ZPCI_OP 221
>>   #define KVM_CAP_S390_CPU_TOPOLOGY 222
>>   #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
>> +#define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 224
>>   
>>   #ifdef KVM_CAP_IRQ_ROUTING
>>   
>> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
>> index 800f9470e36b..228be1145cf3 100644
>> --- a/virt/kvm/Kconfig
>> +++ b/virt/kvm/Kconfig
>> @@ -33,6 +33,14 @@ config HAVE_KVM_DIRTY_RING_ACQ_REL
>>          bool
>>          select HAVE_KVM_DIRTY_RING
>>   
>> +# Only architectures that need to dirty memory outside of a vCPU
>> +# context should select this, advertising to userspace the
>> +# requirement to use a dirty bitmap in addition to the vCPU dirty
>> +# ring.
>> +config HAVE_KVM_DIRTY_RING_WITH_BITMAP
>> +	bool
>> +	depends on HAVE_KVM_DIRTY_RING
>> +
>>   config HAVE_KVM_EVENTFD
>>          bool
>>          select EVENTFD
>> diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
>> index fecbb7d75ad2..758679724447 100644
>> --- a/virt/kvm/dirty_ring.c
>> +++ b/virt/kvm/dirty_ring.c
>> @@ -21,6 +21,16 @@ u32 kvm_dirty_ring_get_rsvd_entries(void)
>>   	return KVM_DIRTY_RING_RSVD_ENTRIES + kvm_cpu_dirty_log_size();
>>   }
>>   
>> +bool kvm_use_dirty_bitmap(struct kvm *kvm)
>> +{
>> +	return !kvm->dirty_ring_size || kvm->dirty_ring_with_bitmap;
>> +}
>> +
>> +bool __weak kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
>> +{
>> +	return false;
>> +}
>> +
>>   static u32 kvm_dirty_ring_used(struct kvm_dirty_ring *ring)
>>   {
>>   	return READ_ONCE(ring->dirty_index) - READ_ONCE(ring->reset_index);
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index c865d7d82685..746133b23a66 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -1617,7 +1617,7 @@ static int kvm_prepare_memory_region(struct kvm *kvm,
>>   			new->dirty_bitmap = NULL;
>>   		else if (old && old->dirty_bitmap)
>>   			new->dirty_bitmap = old->dirty_bitmap;
>> -		else if (!kvm->dirty_ring_size) {
>> +		else if (kvm_use_dirty_bitmap(kvm)) {
>>   			r = kvm_alloc_dirty_bitmap(new);
>>   			if (r)
>>   				return r;
>> @@ -2060,8 +2060,8 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
>>   	unsigned long n;
>>   	unsigned long any = 0;
>>   
>> -	/* Dirty ring tracking is exclusive to dirty log tracking */
>> -	if (kvm->dirty_ring_size)
>> +	/* Dirty ring tracking may be exclusive to dirty log tracking */
>> +	if (!kvm_use_dirty_bitmap(kvm))
>>   		return -ENXIO;
>>   
>>   	*memslot = NULL;
>> @@ -2125,8 +2125,8 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
>>   	unsigned long *dirty_bitmap_buffer;
>>   	bool flush;
>>   
>> -	/* Dirty ring tracking is exclusive to dirty log tracking */
>> -	if (kvm->dirty_ring_size)
>> +	/* Dirty ring tracking may be exclusive to dirty log tracking */
>> +	if (!kvm_use_dirty_bitmap(kvm))
>>   		return -ENXIO;
>>   
>>   	as_id = log->slot >> 16;
>> @@ -2237,8 +2237,8 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
>>   	unsigned long *dirty_bitmap_buffer;
>>   	bool flush;
>>   
>> -	/* Dirty ring tracking is exclusive to dirty log tracking */
>> -	if (kvm->dirty_ring_size)
>> +	/* Dirty ring tracking may be exclusive to dirty log tracking */
>> +	if (!kvm_use_dirty_bitmap(kvm))
>>   		return -ENXIO;
>>   
>>   	as_id = log->slot >> 16;
>> @@ -3305,7 +3305,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>>   	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>>   
>>   #ifdef CONFIG_HAVE_KVM_DIRTY_RING
>> -	if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
>> +	if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
>> +		return;
>> +
>> +	if (WARN_ON_ONCE(!kvm_arch_allow_write_without_running_vcpu(kvm) && !vcpu))
>>   		return;
>>   #endif
>>   
>> @@ -3313,7 +3316,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>>   		unsigned long rel_gfn = gfn - memslot->base_gfn;
>>   		u32 slot = (memslot->as_id << 16) | memslot->id;
>>   
>> -		if (kvm->dirty_ring_size)
>> +		if (kvm->dirty_ring_size && vcpu)
>>   			kvm_dirty_ring_push(vcpu, slot, rel_gfn);
>>   		else
>>   			set_bit_le(rel_gfn, memslot->dirty_bitmap);
>> @@ -4482,6 +4485,9 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg)
>>   		return KVM_DIRTY_RING_MAX_ENTRIES * sizeof(struct kvm_dirty_gfn);
>>   #else
>>   		return 0;
>> +#endif
>> +#ifdef CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP
>> +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
>>   #endif
>>   	case KVM_CAP_BINARY_STATS_FD:
>>   	case KVM_CAP_SYSTEM_EVENT_DATA:
>> @@ -4588,6 +4594,31 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>>   			return -EINVAL;
>>   
>>   		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
>> +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP: {
>> +		struct kvm_memslots *slots;
>> +		int r = -EINVAL;
>> +
>> +		if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
>> +		    !kvm->dirty_ring_size)
>> +			return r;
>> +
>> +		mutex_lock(&kvm->slots_lock);
>> +
>> +		slots = kvm_memslots(kvm);
>> +
>> +		/*
>> +		 * Avoid a race between memslot creation and enabling the ring +
>> +		 * bitmap capability to guarantee that no memslots have been
>> +		 * created without a bitmap.
> 
> It should be called out in the documentation that this capability must
> be enabled before any memslot is created.
> 

Right, Will do in next revision.

>> +		 */
>> +		if (kvm_memslots_empty(slots)) {
>> +			kvm->dirty_ring_with_bitmap = cap->args[0];
>> +			r = 0;
>> +		}
>> +
>> +		mutex_unlock(&kvm->slots_lock);
>> +		return r;
>> +	}
>>   	default:
>>   		return kvm_vm_ioctl_enable_cap(kvm, cap);
>>   	}

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 4/7] KVM: arm64: Enable ring-based dirty memory tracking
  2022-11-06 15:50     ` Marc Zyngier
@ 2022-11-06 21:46       ` Gavin Shan
  -1 siblings, 0 replies; 66+ messages in thread
From: Gavin Shan @ 2022-11-06 21:46 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, catalin.marinas, andrew.jones, dmatlack, will, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, shuah, kvmarm, ajones

Hi Marc,

On 11/6/22 11:50 PM, Marc Zyngier wrote:
> On Fri, 04 Nov 2022 23:40:46 +0000,
> Gavin Shan <gshan@redhat.com> wrote:
>>
>> Enable ring-based dirty memory tracking on arm64 by selecting
>> CONFIG_HAVE_KVM_DIRTY_{RING_ACQ_REL, RING_WITH_BITMAP} and providing
>> the ring buffer's physical page offset (KVM_DIRTY_LOG_PAGE_OFFSET).
>>
>> Besides, helper kvm_vgic_save_its_tables_in_progress() is added to
>> indicate if vgic/its tables are being saved or not. The helper is used
>> in ARM64's kvm_arch_allow_write_without_running_vcpu() to keep the
>> site of saving vgic/its tables out of no-running-vcpu radar.
>>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> ---
>>   Documentation/virt/kvm/api.rst     |  2 +-
>>   arch/arm64/include/uapi/asm/kvm.h  |  1 +
>>   arch/arm64/kvm/Kconfig             |  2 ++
>>   arch/arm64/kvm/arm.c               |  3 +++
>>   arch/arm64/kvm/mmu.c               | 15 +++++++++++++++
>>   arch/arm64/kvm/vgic/vgic-its.c     |  3 +++
>>   arch/arm64/kvm/vgic/vgic-mmio-v3.c |  7 +++++++
>>   include/kvm/arm_vgic.h             |  2 ++
>>   8 files changed, 34 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>> index 2ec32bd41792..2fc68f684ad8 100644
>> --- a/Documentation/virt/kvm/api.rst
>> +++ b/Documentation/virt/kvm/api.rst
>> @@ -7921,7 +7921,7 @@ regardless of what has actually been exposed through the CPUID leaf.
>>   8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
>>   ----------------------------------------------------------
>>   
>> -:Architectures: x86
>> +:Architectures: x86, arm64
>>   :Parameters: args[0] - size of the dirty log ring
>>   
>>   KVM is capable of tracking dirty memory using ring buffers that are
>> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
>> index 316917b98707..a7a857f1784d 100644
>> --- a/arch/arm64/include/uapi/asm/kvm.h
>> +++ b/arch/arm64/include/uapi/asm/kvm.h
>> @@ -43,6 +43,7 @@
>>   #define __KVM_HAVE_VCPU_EVENTS
>>   
>>   #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
>> +#define KVM_DIRTY_LOG_PAGE_OFFSET 64
>>   
>>   #define KVM_REG_SIZE(id)						\
>>   	(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
>> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
>> index 815cc118c675..066b053e9eb9 100644
>> --- a/arch/arm64/kvm/Kconfig
>> +++ b/arch/arm64/kvm/Kconfig
>> @@ -32,6 +32,8 @@ menuconfig KVM
>>   	select KVM_VFIO
>>   	select HAVE_KVM_EVENTFD
>>   	select HAVE_KVM_IRQFD
>> +	select HAVE_KVM_DIRTY_RING_ACQ_REL
>> +	select HAVE_KVM_DIRTY_RING_WITH_BITMAP
>>   	select HAVE_KVM_MSI
>>   	select HAVE_KVM_IRQCHIP
>>   	select HAVE_KVM_IRQ_ROUTING
>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>> index 94d33e296e10..6b097605e38c 100644
>> --- a/arch/arm64/kvm/arm.c
>> +++ b/arch/arm64/kvm/arm.c
>> @@ -746,6 +746,9 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
>>   
>>   		if (kvm_check_request(KVM_REQ_SUSPEND, vcpu))
>>   			return kvm_vcpu_suspend(vcpu);
>> +
>> +		if (kvm_dirty_ring_check_request(vcpu))
>> +			return 0;
>>   	}
>>   
>>   	return 1;
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index 60ee3d9f01f8..fbeb55e45f53 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -932,6 +932,21 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>>   	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
>>   }
>>   
>> +/*
>> + * kvm_arch_allow_write_without_running_vcpu - allow writing guest memory
>> + * without the running VCPU when dirty ring is enabled.
>> + *
>> + * The running VCPU is required to track dirty guest pages when dirty ring
>> + * is enabled. Otherwise, the backup bitmap should be used to track the
>> + * dirty guest pages. When vgic/its tables are being saved, the backup
>> + * bitmap is used to track the dirty guest pages due to the missed running
>> + * VCPU in the period.
>> + */
>> +bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
>> +{
>> +	return kvm_vgic_save_its_tables_in_progress(kvm);
> 
> I don't think we need the extra level of abstraction here. Just return
> kvm->arch.vgic.save_its_tables_in_progress and be done with it.
> 
> You can also move the helper to the vgic-its code since they are
> closely related for now.
> 

Ok. After kvm_arch_allow_write_without_running_vcpu() is moved to vgic-its.c,
do we need to replace 'struct vgic_dist::save_its_tables_in_progress' with
a file-scoped variant ('bool vgic_its_saving_tables') ?

Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 4/7] KVM: arm64: Enable ring-based dirty memory tracking
@ 2022-11-06 21:46       ` Gavin Shan
  0 siblings, 0 replies; 66+ messages in thread
From: Gavin Shan @ 2022-11-06 21:46 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, kvmarm, kvm, shuah, catalin.marinas, andrew.jones,
	ajones, bgardon, dmatlack, will, suzuki.poulose,
	alexandru.elisei, pbonzini, peterx, seanjc, oliver.upton,
	zhenyzha, shan.gavin

Hi Marc,

On 11/6/22 11:50 PM, Marc Zyngier wrote:
> On Fri, 04 Nov 2022 23:40:46 +0000,
> Gavin Shan <gshan@redhat.com> wrote:
>>
>> Enable ring-based dirty memory tracking on arm64 by selecting
>> CONFIG_HAVE_KVM_DIRTY_{RING_ACQ_REL, RING_WITH_BITMAP} and providing
>> the ring buffer's physical page offset (KVM_DIRTY_LOG_PAGE_OFFSET).
>>
>> Besides, helper kvm_vgic_save_its_tables_in_progress() is added to
>> indicate if vgic/its tables are being saved or not. The helper is used
>> in ARM64's kvm_arch_allow_write_without_running_vcpu() to keep the
>> site of saving vgic/its tables out of no-running-vcpu radar.
>>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> ---
>>   Documentation/virt/kvm/api.rst     |  2 +-
>>   arch/arm64/include/uapi/asm/kvm.h  |  1 +
>>   arch/arm64/kvm/Kconfig             |  2 ++
>>   arch/arm64/kvm/arm.c               |  3 +++
>>   arch/arm64/kvm/mmu.c               | 15 +++++++++++++++
>>   arch/arm64/kvm/vgic/vgic-its.c     |  3 +++
>>   arch/arm64/kvm/vgic/vgic-mmio-v3.c |  7 +++++++
>>   include/kvm/arm_vgic.h             |  2 ++
>>   8 files changed, 34 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>> index 2ec32bd41792..2fc68f684ad8 100644
>> --- a/Documentation/virt/kvm/api.rst
>> +++ b/Documentation/virt/kvm/api.rst
>> @@ -7921,7 +7921,7 @@ regardless of what has actually been exposed through the CPUID leaf.
>>   8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
>>   ----------------------------------------------------------
>>   
>> -:Architectures: x86
>> +:Architectures: x86, arm64
>>   :Parameters: args[0] - size of the dirty log ring
>>   
>>   KVM is capable of tracking dirty memory using ring buffers that are
>> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
>> index 316917b98707..a7a857f1784d 100644
>> --- a/arch/arm64/include/uapi/asm/kvm.h
>> +++ b/arch/arm64/include/uapi/asm/kvm.h
>> @@ -43,6 +43,7 @@
>>   #define __KVM_HAVE_VCPU_EVENTS
>>   
>>   #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
>> +#define KVM_DIRTY_LOG_PAGE_OFFSET 64
>>   
>>   #define KVM_REG_SIZE(id)						\
>>   	(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
>> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
>> index 815cc118c675..066b053e9eb9 100644
>> --- a/arch/arm64/kvm/Kconfig
>> +++ b/arch/arm64/kvm/Kconfig
>> @@ -32,6 +32,8 @@ menuconfig KVM
>>   	select KVM_VFIO
>>   	select HAVE_KVM_EVENTFD
>>   	select HAVE_KVM_IRQFD
>> +	select HAVE_KVM_DIRTY_RING_ACQ_REL
>> +	select HAVE_KVM_DIRTY_RING_WITH_BITMAP
>>   	select HAVE_KVM_MSI
>>   	select HAVE_KVM_IRQCHIP
>>   	select HAVE_KVM_IRQ_ROUTING
>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>> index 94d33e296e10..6b097605e38c 100644
>> --- a/arch/arm64/kvm/arm.c
>> +++ b/arch/arm64/kvm/arm.c
>> @@ -746,6 +746,9 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
>>   
>>   		if (kvm_check_request(KVM_REQ_SUSPEND, vcpu))
>>   			return kvm_vcpu_suspend(vcpu);
>> +
>> +		if (kvm_dirty_ring_check_request(vcpu))
>> +			return 0;
>>   	}
>>   
>>   	return 1;
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index 60ee3d9f01f8..fbeb55e45f53 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -932,6 +932,21 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>>   	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
>>   }
>>   
>> +/*
>> + * kvm_arch_allow_write_without_running_vcpu - allow writing guest memory
>> + * without the running VCPU when dirty ring is enabled.
>> + *
>> + * The running VCPU is required to track dirty guest pages when dirty ring
>> + * is enabled. Otherwise, the backup bitmap should be used to track the
>> + * dirty guest pages. When vgic/its tables are being saved, the backup
>> + * bitmap is used to track the dirty guest pages due to the missed running
>> + * VCPU in the period.
>> + */
>> +bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
>> +{
>> +	return kvm_vgic_save_its_tables_in_progress(kvm);
> 
> I don't think we need the extra level of abstraction here. Just return
> kvm->arch.vgic.save_its_tables_in_progress and be done with it.
> 
> You can also move the helper to the vgic-its code since they are
> closely related for now.
> 

Ok. After kvm_arch_allow_write_without_running_vcpu() is moved to vgic-its.c,
do we need to replace 'struct vgic_dist::save_its_tables_in_progress' with
a file-scoped variant ('bool vgic_its_saving_tables') ?

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 0/7] KVM: arm64: Enable ring-based dirty memory tracking
  2022-11-06 16:08   ` Marc Zyngier
@ 2022-11-06 21:50     ` Gavin Shan
  -1 siblings, 0 replies; 66+ messages in thread
From: Gavin Shan @ 2022-11-06 21:50 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, kvmarm, kvm, shuah, catalin.marinas, andrew.jones,
	ajones, bgardon, dmatlack, will, suzuki.poulose,
	alexandru.elisei, pbonzini, peterx, seanjc, oliver.upton,
	zhenyzha, shan.gavin

Hi Marc,

On 11/7/22 12:08 AM, Marc Zyngier wrote:
> On Fri, 04 Nov 2022 23:40:42 +0000,
> Gavin Shan <gshan@redhat.com> wrote:
>>
>> This series enables the ring-based dirty memory tracking for ARM64.
>> The feature has been available and enabled on x86 for a while. It
>> is beneficial when the number of dirty pages is small in a checkpointing
>> system or live migration scenario. More details can be found from
>> fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking").
>>
>> This series is applied to v6.1.rc3, plus commit c227590467cb ("KVM:
>> Check KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL} prior to enabling them").
>> The commit is currently in Marc's 'fixes' branch, targeting v6.1.rc4/5.
> 
> This is starting to look good to me, and my only concerns are around
> the documentation and the bit of nitpicking on patch 4. If we can
> converge quickly on that, I'd like to queue this quickly and leave it
> to simmer in -next.
> 

Ok, thanks.

>> v7: https://lore.kernel.org/kvmarm/20221031003621.164306-1-gshan@redhat.com/
>> v6: https://lore.kernel.org/kvmarm/20221011061447.131531-1-gshan@redhat.com/
>> v5: https://lore.kernel.org/all/20221005004154.83502-1-gshan@redhat.com/
>> v4: https://lore.kernel.org/kvmarm/20220927005439.21130-1-gshan@redhat.com/
>> v3: https://lore.kernel.org/r/20220922003214.276736-1-gshan@redhat.com
>> v2: https://lore.kernel.org/lkml/YyiV%2Fl7O23aw5aaO@xz-m1.local/T/
>> v1: https://lore.kernel.org/lkml/20220819005601.198436-1-gshan@redhat.com
>>
>> Testing
>> =======
>> (1) kvm/selftests/dirty_log_test
>> (2) Live migration by QEMU
> 
> Could you point to a branch that has the required QEMU changes?
> 

I'm still under progress to figure out migrating the (extra) dirty pages,
which is tracked by the backup bitmap. So the branch is pre-mature.

   git@github.com:gwshan/qemu.git ("kvm/arm64_dirtyring")

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 0/7] KVM: arm64: Enable ring-based dirty memory tracking
@ 2022-11-06 21:50     ` Gavin Shan
  0 siblings, 0 replies; 66+ messages in thread
From: Gavin Shan @ 2022-11-06 21:50 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, catalin.marinas, andrew.jones, dmatlack, will, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, shuah, kvmarm, ajones

Hi Marc,

On 11/7/22 12:08 AM, Marc Zyngier wrote:
> On Fri, 04 Nov 2022 23:40:42 +0000,
> Gavin Shan <gshan@redhat.com> wrote:
>>
>> This series enables the ring-based dirty memory tracking for ARM64.
>> The feature has been available and enabled on x86 for a while. It
>> is beneficial when the number of dirty pages is small in a checkpointing
>> system or live migration scenario. More details can be found from
>> fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking").
>>
>> This series is applied to v6.1.rc3, plus commit c227590467cb ("KVM:
>> Check KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL} prior to enabling them").
>> The commit is currently in Marc's 'fixes' branch, targeting v6.1.rc4/5.
> 
> This is starting to look good to me, and my only concerns are around
> the documentation and the bit of nitpicking on patch 4. If we can
> converge quickly on that, I'd like to queue this quickly and leave it
> to simmer in -next.
> 

Ok, thanks.

>> v7: https://lore.kernel.org/kvmarm/20221031003621.164306-1-gshan@redhat.com/
>> v6: https://lore.kernel.org/kvmarm/20221011061447.131531-1-gshan@redhat.com/
>> v5: https://lore.kernel.org/all/20221005004154.83502-1-gshan@redhat.com/
>> v4: https://lore.kernel.org/kvmarm/20220927005439.21130-1-gshan@redhat.com/
>> v3: https://lore.kernel.org/r/20220922003214.276736-1-gshan@redhat.com
>> v2: https://lore.kernel.org/lkml/YyiV%2Fl7O23aw5aaO@xz-m1.local/T/
>> v1: https://lore.kernel.org/lkml/20220819005601.198436-1-gshan@redhat.com
>>
>> Testing
>> =======
>> (1) kvm/selftests/dirty_log_test
>> (2) Live migration by QEMU
> 
> Could you point to a branch that has the required QEMU changes?
> 

I'm still under progress to figure out migrating the (extra) dirty pages,
which is tracked by the backup bitmap. So the branch is pre-mature.

   git@github.com:gwshan/qemu.git ("kvm/arm64_dirtyring")

Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
  2022-11-06 21:06           ` Peter Xu
@ 2022-11-07  9:21             ` Marc Zyngier
  -1 siblings, 0 replies; 66+ messages in thread
From: Marc Zyngier @ 2022-11-07  9:21 UTC (permalink / raw)
  To: Peter Xu
  Cc: Gavin Shan, kvmarm, kvmarm, kvm, shuah, catalin.marinas,
	andrew.jones, ajones, bgardon, dmatlack, will, suzuki.poulose,
	alexandru.elisei, pbonzini, seanjc, oliver.upton, zhenyzha,
	shan.gavin

On Sun, 06 Nov 2022 21:06:43 +0000,
Peter Xu <peterx@redhat.com> wrote:
> 
> On Sun, Nov 06, 2022 at 08:12:22PM +0000, Marc Zyngier wrote:
> > Hi Peter,
> > 
> > On Sun, 06 Nov 2022 16:22:29 +0000,
> > Peter Xu <peterx@redhat.com> wrote:
> > > 
> > > Hi, Marc,
> > > 
> > > On Sun, Nov 06, 2022 at 03:43:17PM +0000, Marc Zyngier wrote:
> > > > > +Note that the bitmap here is only a backup of the ring structure, and
> > > > > +normally should only contain a very small amount of dirty pages, which
> > > > 
> > > > I don't think we can claim this. It is whatever amount of memory is
> > > > dirtied outside of a vcpu context, and we shouldn't make any claim
> > > > regarding the number of dirty pages.
> > > 
> > > The thing is the current with-bitmap design assumes that the two logs are
> > > collected in different windows of migration, while the dirty log is only
> > > collected after the VM is stopped.  So collecting dirty bitmap and sending
> > > the dirty pages within the bitmap will be part of the VM downtime.
> > > 
> > > It will stop to make sense if the dirty bitmap can contain a large portion
> > > of the guest memory, because then it'll be simpler to just stop the VM,
> > > transfer pages, and restart on dest node without any tracking mechanism.
> > 
> > Oh, I absolutely agree that the whole vcpu dirty ring makes zero sense
> > in general. It only makes sense if the source of the dirty pages is
> > limited to the vcpus, which is literally a corner case. Look at any
> > real machine, and you'll quickly realise that this isn't the case, and
> > that DMA *is* a huge source of dirty pages.
> > 
> > Here, we're just lucky enough not to have much DMA tracking yet. Once
> > that happens (and I have it from people doing the actual work that it
> > *is* happening), you'll realise that the dirty ring story is of very
> > limited use. So I'd rather drop anything quantitative here, as this is
> > likely to be wrong.
> 
> Is it a must that arm64 needs to track device DMAs using the same dirty
> tracking interface rather than VFIO or any other interface?

What does it change? At the end of the day, you want a list of dirty
pages. How you obtain it is irrelevant.

> It's
> definitely not the case for x86, but if it's true for arm64, then could the
> DMA be spread across all the guest pages?  If it's also true, I really
> don't know how this will work..

Of course, all pages can be the target of DMA. It works the same way
it works for the ITS: you sync the state, you obtain the dirty bits,
you move on.

And mimicking what x86 does is really not my concern (if you still
think that arm64 is just another flavour of x86, stay tuned!  ;-).

> 
> We're only syncing the dirty bitmap once right now with the protocol.  If
> that can cover most of the guest mem, it's same as non-live.  If we sync it
> periodically, then it's the same as enabling dirty-log alone and the rings
> are useless.

I'm glad that you finally accept it: the ring *ARE* useless in the
general sense. Only limited, CPU-only workloads can make any use of
the current design. This probably covers a large proportion of what
the cloud vendors do, but this doesn't work for general situations
where you have a stream of dirty pages originating outside of the
CPUs.

[...]

> > > I wanted to make it clear that we don't need CLEAR_LOG at all in this case,
> > > as fundamentally clear log is about re-protect the guest pages, but if
> > > we're with the restriction of above (having the dirty bmap the last to
> > > collect and once and for all) then it'll make no sense to protect the guest
> > > page at all at this stage since src host shouldn't run after the GET_LOG
> > > then the CLEAR_LOG will be a vain effort.
> > 
> > That's not for you to decide, but userspace. I can perfectly expect
> > userspace saving an ITS, getting the bitmap, saving the pages and then
> > *clearing the log* before processing the next ITS. Or anything else.
> 
> I think I can get your point on why you're not happy with the document, but
> IMHO how we document is one thing, how it'll work is another.  I preferred
> explicit documentation because it'll help the app developer to support the
> interface, also more docs to reference in the future; no strong opinion,
> though.

Here's my beef with the current documentation: it sets quantitative
expectations. This is wrong. It also introduces undefined behaviours
where there should be none. This is even worse, because there
shouldn't be *any* undefined behaviour today, and I cannot see why the
dirty rings would influence this.

> However if there's fundamental statement that was literally wrong, then
> it's another thing, and we may need to rethink.

See above. If the undefined behaviour was just a mistake, let's drop
it and move on. If you have spotted something that is indeed an
undefined behaviour in using CLEAR_LOG when the VM is stopped, then
live migration is already broken on arm64 *today*.

And if that's the case, we should fix it now instead of adding the
dirty ring stuff.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
@ 2022-11-07  9:21             ` Marc Zyngier
  0 siblings, 0 replies; 66+ messages in thread
From: Marc Zyngier @ 2022-11-07  9:21 UTC (permalink / raw)
  To: Peter Xu
  Cc: kvm, catalin.marinas, andrew.jones, dmatlack, will, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, shuah, kvmarm, ajones

On Sun, 06 Nov 2022 21:06:43 +0000,
Peter Xu <peterx@redhat.com> wrote:
> 
> On Sun, Nov 06, 2022 at 08:12:22PM +0000, Marc Zyngier wrote:
> > Hi Peter,
> > 
> > On Sun, 06 Nov 2022 16:22:29 +0000,
> > Peter Xu <peterx@redhat.com> wrote:
> > > 
> > > Hi, Marc,
> > > 
> > > On Sun, Nov 06, 2022 at 03:43:17PM +0000, Marc Zyngier wrote:
> > > > > +Note that the bitmap here is only a backup of the ring structure, and
> > > > > +normally should only contain a very small amount of dirty pages, which
> > > > 
> > > > I don't think we can claim this. It is whatever amount of memory is
> > > > dirtied outside of a vcpu context, and we shouldn't make any claim
> > > > regarding the number of dirty pages.
> > > 
> > > The thing is the current with-bitmap design assumes that the two logs are
> > > collected in different windows of migration, while the dirty log is only
> > > collected after the VM is stopped.  So collecting dirty bitmap and sending
> > > the dirty pages within the bitmap will be part of the VM downtime.
> > > 
> > > It will stop to make sense if the dirty bitmap can contain a large portion
> > > of the guest memory, because then it'll be simpler to just stop the VM,
> > > transfer pages, and restart on dest node without any tracking mechanism.
> > 
> > Oh, I absolutely agree that the whole vcpu dirty ring makes zero sense
> > in general. It only makes sense if the source of the dirty pages is
> > limited to the vcpus, which is literally a corner case. Look at any
> > real machine, and you'll quickly realise that this isn't the case, and
> > that DMA *is* a huge source of dirty pages.
> > 
> > Here, we're just lucky enough not to have much DMA tracking yet. Once
> > that happens (and I have it from people doing the actual work that it
> > *is* happening), you'll realise that the dirty ring story is of very
> > limited use. So I'd rather drop anything quantitative here, as this is
> > likely to be wrong.
> 
> Is it a must that arm64 needs to track device DMAs using the same dirty
> tracking interface rather than VFIO or any other interface?

What does it change? At the end of the day, you want a list of dirty
pages. How you obtain it is irrelevant.

> It's
> definitely not the case for x86, but if it's true for arm64, then could the
> DMA be spread across all the guest pages?  If it's also true, I really
> don't know how this will work..

Of course, all pages can be the target of DMA. It works the same way
it works for the ITS: you sync the state, you obtain the dirty bits,
you move on.

And mimicking what x86 does is really not my concern (if you still
think that arm64 is just another flavour of x86, stay tuned!  ;-).

> 
> We're only syncing the dirty bitmap once right now with the protocol.  If
> that can cover most of the guest mem, it's same as non-live.  If we sync it
> periodically, then it's the same as enabling dirty-log alone and the rings
> are useless.

I'm glad that you finally accept it: the ring *ARE* useless in the
general sense. Only limited, CPU-only workloads can make any use of
the current design. This probably covers a large proportion of what
the cloud vendors do, but this doesn't work for general situations
where you have a stream of dirty pages originating outside of the
CPUs.

[...]

> > > I wanted to make it clear that we don't need CLEAR_LOG at all in this case,
> > > as fundamentally clear log is about re-protect the guest pages, but if
> > > we're with the restriction of above (having the dirty bmap the last to
> > > collect and once and for all) then it'll make no sense to protect the guest
> > > page at all at this stage since src host shouldn't run after the GET_LOG
> > > then the CLEAR_LOG will be a vain effort.
> > 
> > That's not for you to decide, but userspace. I can perfectly expect
> > userspace saving an ITS, getting the bitmap, saving the pages and then
> > *clearing the log* before processing the next ITS. Or anything else.
> 
> I think I can get your point on why you're not happy with the document, but
> IMHO how we document is one thing, how it'll work is another.  I preferred
> explicit documentation because it'll help the app developer to support the
> interface, also more docs to reference in the future; no strong opinion,
> though.

Here's my beef with the current documentation: it sets quantitative
expectations. This is wrong. It also introduces undefined behaviours
where there should be none. This is even worse, because there
shouldn't be *any* undefined behaviour today, and I cannot see why the
dirty rings would influence this.

> However if there's fundamental statement that was literally wrong, then
> it's another thing, and we may need to rethink.

See above. If the undefined behaviour was just a mistake, let's drop
it and move on. If you have spotted something that is indeed an
undefined behaviour in using CLEAR_LOG when the VM is stopped, then
live migration is already broken on arm64 *today*.

And if that's the case, we should fix it now instead of adding the
dirty ring stuff.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
  2022-11-06 21:23             ` Gavin Shan
@ 2022-11-07  9:38               ` Marc Zyngier
  -1 siblings, 0 replies; 66+ messages in thread
From: Marc Zyngier @ 2022-11-07  9:38 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Peter Xu, kvmarm, kvmarm, kvm, shuah, catalin.marinas,
	andrew.jones, ajones, bgardon, dmatlack, will, suzuki.poulose,
	alexandru.elisei, pbonzini, seanjc, oliver.upton, zhenyzha,
	shan.gavin

On Sun, 06 Nov 2022 21:23:13 +0000,
Gavin Shan <gshan@redhat.com> wrote:
> 
> Hi Peter and Marc,
> 
> On 11/7/22 5:06 AM, Peter Xu wrote:
> > On Sun, Nov 06, 2022 at 08:12:22PM +0000, Marc Zyngier wrote:
> >> On Sun, 06 Nov 2022 16:22:29 +0000,
> >> Peter Xu <peterx@redhat.com> wrote:
> >>> On Sun, Nov 06, 2022 at 03:43:17PM +0000, Marc Zyngier wrote:
> >>>>> +Note that the bitmap here is only a backup of the ring structure, and
> >>>>> +normally should only contain a very small amount of dirty pages, which
> >>>> 
> >>>> I don't think we can claim this. It is whatever amount of memory is
> >>>> dirtied outside of a vcpu context, and we shouldn't make any claim
> >>>> regarding the number of dirty pages.
> >>> 
> >>> The thing is the current with-bitmap design assumes that the two logs are
> >>> collected in different windows of migration, while the dirty log is only
> >>> collected after the VM is stopped.  So collecting dirty bitmap and sending
> >>> the dirty pages within the bitmap will be part of the VM downtime.
> >>> 
> >>> It will stop to make sense if the dirty bitmap can contain a large portion
> >>> of the guest memory, because then it'll be simpler to just stop the VM,
> >>> transfer pages, and restart on dest node without any tracking mechanism.
> >> 
> >> Oh, I absolutely agree that the whole vcpu dirty ring makes zero sense
> >> in general. It only makes sense if the source of the dirty pages is
> >> limited to the vcpus, which is literally a corner case. Look at any
> >> real machine, and you'll quickly realise that this isn't the case, and
> >> that DMA *is* a huge source of dirty pages.
> >> 
> >> Here, we're just lucky enough not to have much DMA tracking yet. Once
> >> that happens (and I have it from people doing the actual work that it
> >> *is* happening), you'll realise that the dirty ring story is of very
> >> limited use. So I'd rather drop anything quantitative here, as this is
> >> likely to be wrong.
> > 
> > Is it a must that arm64 needs to track device DMAs using the same dirty
> > tracking interface rather than VFIO or any other interface?  It's
> > definitely not the case for x86, but if it's true for arm64, then could the
> > DMA be spread across all the guest pages?  If it's also true, I really
> > don't know how this will work..
> > 
> > We're only syncing the dirty bitmap once right now with the protocol.  If
> > that can cover most of the guest mem, it's same as non-live.  If we sync it
> > periodically, then it's the same as enabling dirty-log alone and the rings
> > are useless.
> > 
> 
> For vgic/its tables, the number of dirty pages can be huge in theory. However,
> they're limited in practice. So I intend to agree with Peter that dirty-ring
> should be avoided and dirty-log needs to be used instead when the DMA case
> is supported in future. As Peter said, the small amount of dirty pages in
> the bitmap is the condition to use it here. I think it makes sense to mention
> it in the document.

And again, I disagree. This API has *nothing* to do with the ITS. It
is completely general purpose and should work with anything because it
is designed for that.

The problem is that you're considering that RING+BITMAP is a different
thing from BITMAP alone when it comes to non-CPU traffic. It really
isn't.  We can't say "there will only be a few pages dirtied", because
we simply don't know.

If you really want a quantitative argument then say something like:

"The use of the ring+bitmap combination is only beneficial if there is
only very little memory that is dirtied by non-CPU agents. Consider
using the stand-alone bitmap API if this isn't the case."

which clearly puts the choice in the hand of the user.

[...]

> How about to avoid mentioning KVM_CLEAR_DIRTY_LOG here? I don't expect QEMU
> to clear the dirty bitmap after it's collected in this particular case.

Peter said there is an undefined behaviour. I want to understand
whether this is the case or not. QEMU is only one of the users of this
stuff, as all the vendors have their own custom VMM, and they do
things in funny ways.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
@ 2022-11-07  9:38               ` Marc Zyngier
  0 siblings, 0 replies; 66+ messages in thread
From: Marc Zyngier @ 2022-11-07  9:38 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvm, catalin.marinas, andrew.jones, dmatlack, will, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, shuah, kvmarm, ajones

On Sun, 06 Nov 2022 21:23:13 +0000,
Gavin Shan <gshan@redhat.com> wrote:
> 
> Hi Peter and Marc,
> 
> On 11/7/22 5:06 AM, Peter Xu wrote:
> > On Sun, Nov 06, 2022 at 08:12:22PM +0000, Marc Zyngier wrote:
> >> On Sun, 06 Nov 2022 16:22:29 +0000,
> >> Peter Xu <peterx@redhat.com> wrote:
> >>> On Sun, Nov 06, 2022 at 03:43:17PM +0000, Marc Zyngier wrote:
> >>>>> +Note that the bitmap here is only a backup of the ring structure, and
> >>>>> +normally should only contain a very small amount of dirty pages, which
> >>>> 
> >>>> I don't think we can claim this. It is whatever amount of memory is
> >>>> dirtied outside of a vcpu context, and we shouldn't make any claim
> >>>> regarding the number of dirty pages.
> >>> 
> >>> The thing is the current with-bitmap design assumes that the two logs are
> >>> collected in different windows of migration, while the dirty log is only
> >>> collected after the VM is stopped.  So collecting dirty bitmap and sending
> >>> the dirty pages within the bitmap will be part of the VM downtime.
> >>> 
> >>> It will stop to make sense if the dirty bitmap can contain a large portion
> >>> of the guest memory, because then it'll be simpler to just stop the VM,
> >>> transfer pages, and restart on dest node without any tracking mechanism.
> >> 
> >> Oh, I absolutely agree that the whole vcpu dirty ring makes zero sense
> >> in general. It only makes sense if the source of the dirty pages is
> >> limited to the vcpus, which is literally a corner case. Look at any
> >> real machine, and you'll quickly realise that this isn't the case, and
> >> that DMA *is* a huge source of dirty pages.
> >> 
> >> Here, we're just lucky enough not to have much DMA tracking yet. Once
> >> that happens (and I have it from people doing the actual work that it
> >> *is* happening), you'll realise that the dirty ring story is of very
> >> limited use. So I'd rather drop anything quantitative here, as this is
> >> likely to be wrong.
> > 
> > Is it a must that arm64 needs to track device DMAs using the same dirty
> > tracking interface rather than VFIO or any other interface?  It's
> > definitely not the case for x86, but if it's true for arm64, then could the
> > DMA be spread across all the guest pages?  If it's also true, I really
> > don't know how this will work..
> > 
> > We're only syncing the dirty bitmap once right now with the protocol.  If
> > that can cover most of the guest mem, it's same as non-live.  If we sync it
> > periodically, then it's the same as enabling dirty-log alone and the rings
> > are useless.
> > 
> 
> For vgic/its tables, the number of dirty pages can be huge in theory. However,
> they're limited in practice. So I intend to agree with Peter that dirty-ring
> should be avoided and dirty-log needs to be used instead when the DMA case
> is supported in future. As Peter said, the small amount of dirty pages in
> the bitmap is the condition to use it here. I think it makes sense to mention
> it in the document.

And again, I disagree. This API has *nothing* to do with the ITS. It
is completely general purpose and should work with anything because it
is designed for that.

The problem is that you're considering that RING+BITMAP is a different
thing from BITMAP alone when it comes to non-CPU traffic. It really
isn't.  We can't say "there will only be a few pages dirtied", because
we simply don't know.

If you really want a quantitative argument then say something like:

"The use of the ring+bitmap combination is only beneficial if there is
only very little memory that is dirtied by non-CPU agents. Consider
using the stand-alone bitmap API if this isn't the case."

which clearly puts the choice in the hand of the user.

[...]

> How about to avoid mentioning KVM_CLEAR_DIRTY_LOG here? I don't expect QEMU
> to clear the dirty bitmap after it's collected in this particular case.

Peter said there is an undefined behaviour. I want to understand
whether this is the case or not. QEMU is only one of the users of this
stuff, as all the vendors have their own custom VMM, and they do
things in funny ways.

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
  2022-11-06 21:40       ` Gavin Shan
@ 2022-11-07  9:45         ` Marc Zyngier
  -1 siblings, 0 replies; 66+ messages in thread
From: Marc Zyngier @ 2022-11-07  9:45 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvmarm, kvmarm, kvm, shuah, catalin.marinas, andrew.jones,
	ajones, bgardon, dmatlack, will, suzuki.poulose,
	alexandru.elisei, pbonzini, peterx, seanjc, oliver.upton,
	zhenyzha, shan.gavin

On Sun, 06 Nov 2022 21:40:49 +0000,
Gavin Shan <gshan@redhat.com> wrote:
> 
> Hi Marc,
> 
> On 11/6/22 11:43 PM, Marc Zyngier wrote:
> > On Fri, 04 Nov 2022 23:40:45 +0000,
> > Gavin Shan <gshan@redhat.com> wrote:
> >> 
> >> ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
> >> enabled. It's conflicting with that ring-based dirty page tracking always
> >> requires a running VCPU context.
> >> 
> >> Introduce a new flavor of dirty ring that requires the use of both VCPU
> >> dirty rings and a dirty bitmap. The expectation is that for non-VCPU
> >> sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
> >> the dirty bitmap. Userspace should scan the dirty bitmap before migrating
> >> the VM to the target.
> >> 
> >> Use an additional capability to advertise this behavior. The newly added
> >> capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
> >> KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
> >> capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.
> >> 
> >> Suggested-by: Marc Zyngier <maz@kernel.org>
> >> Suggested-by: Peter Xu <peterx@redhat.com>
> >> Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
> >> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> >> Signed-off-by: Gavin Shan <gshan@redhat.com>
> >> Acked-by: Peter Xu <peterx@redhat.com>
> >> ---
> >>   Documentation/virt/kvm/api.rst | 33 ++++++++++++++++++-----
> >>   include/linux/kvm_dirty_ring.h |  7 +++++
> >>   include/linux/kvm_host.h       |  1 +
> >>   include/uapi/linux/kvm.h       |  1 +
> >>   virt/kvm/Kconfig               |  8 ++++++
> >>   virt/kvm/dirty_ring.c          | 10 +++++++
> >>   virt/kvm/kvm_main.c            | 49 +++++++++++++++++++++++++++-------
> >>   7 files changed, 93 insertions(+), 16 deletions(-)
> >> 
> >> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> >> index eee9f857a986..2ec32bd41792 100644
> >> --- a/Documentation/virt/kvm/api.rst
> >> +++ b/Documentation/virt/kvm/api.rst
> >> @@ -8003,13 +8003,6 @@ flushing is done by the KVM_GET_DIRTY_LOG ioctl).  To achieve that, one
> >>   needs to kick the vcpu out of KVM_RUN using a signal.  The resulting
> >>   vmexit ensures that all dirty GFNs are flushed to the dirty rings.
> >>   -NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the
> >> corresponding
> >> -ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls
> >> -KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG.  After enabling
> >> -KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual
> >> -machine will switch to ring-buffer dirty page tracking and further
> >> -KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail.
> >> -
> >>   NOTE: KVM_CAP_DIRTY_LOG_RING_ACQ_REL is the only capability that
> >>   should be exposed by weakly ordered architecture, in order to indicate
> >>   the additional memory ordering requirements imposed on userspace when
> >> @@ -8018,6 +8011,32 @@ Architecture with TSO-like ordering (such as x86) are allowed to
> >>   expose both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> >>   to userspace.
> >>   +After using the dirty rings, the userspace needs to detect the
> >> capability
> > 
> > using? or enabling? What comes after suggest the latter.
> > 
> 
> s/using/enabling in next revision :)
> 
> >> +of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures
> >> +need to be backed by per-slot bitmaps. With this capability advertised
> >> +and supported, it means the architecture can dirty guest pages without
> > 
> > If it is advertised, it is supported, right?
> > 
> 
> Yes, s/advertised and supported/advertised in next revision.
> 
> >> +vcpu/ring context, so that some of the dirty information will still be
> >> +maintained in the bitmap structure. KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
> >> +can't be enabled until the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> >> +has been enabled.
> >> +
> >> +Note that the bitmap here is only a backup of the ring structure, and
> >> +normally should only contain a very small amount of dirty pages, which
> > 
> > I don't think we can claim this. It is whatever amount of memory is
> > dirtied outside of a vcpu context, and we shouldn't make any claim
> > regarding the number of dirty pages.
> > 
> 
> It's the pre-requisite to use the backup bitmap. Otherwise, the guest
> will experience long down-time during migration, as mentioned by Peter
> in another thread. So it's appropriate to mention the limit of dirty
> pages here.

See my alternative wording for this in the other sub-thread.

> 
> >> +needs to be transferred during VM downtime. Collecting the dirty bitmap
> >> +should be the very last thing that the VMM does before transmitting state
> >> +to the target VM. VMM needs to ensure that the dirty state is final and
> >> +avoid missing dirty pages from another ioctl ordered after the bitmap
> >> +collection.
> >> +
> >> +To collect dirty bits in the backup bitmap, the userspace can use the
> >> +same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG shouldn't be needed
> >> +and its behavior is undefined since collecting the dirty bitmap always
> >> +happens in the last phase of VM's migration.
> > 
> > It isn't clear to me why KVM_CLEAR_DIRTY_LOG should be called out. If
> > you have multiple devices that dirty the memory, such as multiple
> > ITSs, why shouldn't userspace be allowed to snapshot the dirty state
> > multiple time? This doesn't seem like a reasonable restriction, and I
> > really dislike the idea of undefined behaviour here.
> > 
> 
> It was actually documenting the expected QEMU's usage. With QEMU
> excluded, KVM_CLEAR_DIRTY_LOG can be used as usual. Undefined behavior
> seems not precise here. We can improve it like below, to avoid talking
> about 'undefined behaviour'.
> 
>   To collect dirty bits in the backup bitmap, the userspace can use the
>   same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG shouldn't be needed
>   since collecting the dirty bitmap always happens in the last phase of
>   VM's migration.

That's better, but the "shouldn't be needed" wording makes things
ambiguous, and we shouldn't mention migration at all (this is not the
only purpose of this API). I'd suggest this:

   To collect dirty bits in the backup bitmap, userspace can use the
   same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG isn't needed as
   long as all the generation of the dirty bits is done in a single
   pass.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
@ 2022-11-07  9:45         ` Marc Zyngier
  0 siblings, 0 replies; 66+ messages in thread
From: Marc Zyngier @ 2022-11-07  9:45 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvm, catalin.marinas, andrew.jones, dmatlack, will, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, shuah, kvmarm, ajones

On Sun, 06 Nov 2022 21:40:49 +0000,
Gavin Shan <gshan@redhat.com> wrote:
> 
> Hi Marc,
> 
> On 11/6/22 11:43 PM, Marc Zyngier wrote:
> > On Fri, 04 Nov 2022 23:40:45 +0000,
> > Gavin Shan <gshan@redhat.com> wrote:
> >> 
> >> ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
> >> enabled. It's conflicting with that ring-based dirty page tracking always
> >> requires a running VCPU context.
> >> 
> >> Introduce a new flavor of dirty ring that requires the use of both VCPU
> >> dirty rings and a dirty bitmap. The expectation is that for non-VCPU
> >> sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
> >> the dirty bitmap. Userspace should scan the dirty bitmap before migrating
> >> the VM to the target.
> >> 
> >> Use an additional capability to advertise this behavior. The newly added
> >> capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
> >> KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
> >> capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.
> >> 
> >> Suggested-by: Marc Zyngier <maz@kernel.org>
> >> Suggested-by: Peter Xu <peterx@redhat.com>
> >> Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
> >> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> >> Signed-off-by: Gavin Shan <gshan@redhat.com>
> >> Acked-by: Peter Xu <peterx@redhat.com>
> >> ---
> >>   Documentation/virt/kvm/api.rst | 33 ++++++++++++++++++-----
> >>   include/linux/kvm_dirty_ring.h |  7 +++++
> >>   include/linux/kvm_host.h       |  1 +
> >>   include/uapi/linux/kvm.h       |  1 +
> >>   virt/kvm/Kconfig               |  8 ++++++
> >>   virt/kvm/dirty_ring.c          | 10 +++++++
> >>   virt/kvm/kvm_main.c            | 49 +++++++++++++++++++++++++++-------
> >>   7 files changed, 93 insertions(+), 16 deletions(-)
> >> 
> >> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> >> index eee9f857a986..2ec32bd41792 100644
> >> --- a/Documentation/virt/kvm/api.rst
> >> +++ b/Documentation/virt/kvm/api.rst
> >> @@ -8003,13 +8003,6 @@ flushing is done by the KVM_GET_DIRTY_LOG ioctl).  To achieve that, one
> >>   needs to kick the vcpu out of KVM_RUN using a signal.  The resulting
> >>   vmexit ensures that all dirty GFNs are flushed to the dirty rings.
> >>   -NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the
> >> corresponding
> >> -ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls
> >> -KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG.  After enabling
> >> -KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual
> >> -machine will switch to ring-buffer dirty page tracking and further
> >> -KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail.
> >> -
> >>   NOTE: KVM_CAP_DIRTY_LOG_RING_ACQ_REL is the only capability that
> >>   should be exposed by weakly ordered architecture, in order to indicate
> >>   the additional memory ordering requirements imposed on userspace when
> >> @@ -8018,6 +8011,32 @@ Architecture with TSO-like ordering (such as x86) are allowed to
> >>   expose both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> >>   to userspace.
> >>   +After using the dirty rings, the userspace needs to detect the
> >> capability
> > 
> > using? or enabling? What comes after suggest the latter.
> > 
> 
> s/using/enabling in next revision :)
> 
> >> +of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures
> >> +need to be backed by per-slot bitmaps. With this capability advertised
> >> +and supported, it means the architecture can dirty guest pages without
> > 
> > If it is advertised, it is supported, right?
> > 
> 
> Yes, s/advertised and supported/advertised in next revision.
> 
> >> +vcpu/ring context, so that some of the dirty information will still be
> >> +maintained in the bitmap structure. KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
> >> +can't be enabled until the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> >> +has been enabled.
> >> +
> >> +Note that the bitmap here is only a backup of the ring structure, and
> >> +normally should only contain a very small amount of dirty pages, which
> > 
> > I don't think we can claim this. It is whatever amount of memory is
> > dirtied outside of a vcpu context, and we shouldn't make any claim
> > regarding the number of dirty pages.
> > 
> 
> It's the pre-requisite to use the backup bitmap. Otherwise, the guest
> will experience long down-time during migration, as mentioned by Peter
> in another thread. So it's appropriate to mention the limit of dirty
> pages here.

See my alternative wording for this in the other sub-thread.

> 
> >> +needs to be transferred during VM downtime. Collecting the dirty bitmap
> >> +should be the very last thing that the VMM does before transmitting state
> >> +to the target VM. VMM needs to ensure that the dirty state is final and
> >> +avoid missing dirty pages from another ioctl ordered after the bitmap
> >> +collection.
> >> +
> >> +To collect dirty bits in the backup bitmap, the userspace can use the
> >> +same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG shouldn't be needed
> >> +and its behavior is undefined since collecting the dirty bitmap always
> >> +happens in the last phase of VM's migration.
> > 
> > It isn't clear to me why KVM_CLEAR_DIRTY_LOG should be called out. If
> > you have multiple devices that dirty the memory, such as multiple
> > ITSs, why shouldn't userspace be allowed to snapshot the dirty state
> > multiple time? This doesn't seem like a reasonable restriction, and I
> > really dislike the idea of undefined behaviour here.
> > 
> 
> It was actually documenting the expected QEMU's usage. With QEMU
> excluded, KVM_CLEAR_DIRTY_LOG can be used as usual. Undefined behavior
> seems not precise here. We can improve it like below, to avoid talking
> about 'undefined behaviour'.
> 
>   To collect dirty bits in the backup bitmap, the userspace can use the
>   same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG shouldn't be needed
>   since collecting the dirty bitmap always happens in the last phase of
>   VM's migration.

That's better, but the "shouldn't be needed" wording makes things
ambiguous, and we shouldn't mention migration at all (this is not the
only purpose of this API). I'd suggest this:

   To collect dirty bits in the backup bitmap, userspace can use the
   same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG isn't needed as
   long as all the generation of the dirty bits is done in a single
   pass.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 4/7] KVM: arm64: Enable ring-based dirty memory tracking
  2022-11-06 21:46       ` Gavin Shan
@ 2022-11-07  9:47         ` Marc Zyngier
  -1 siblings, 0 replies; 66+ messages in thread
From: Marc Zyngier @ 2022-11-07  9:47 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvmarm, kvmarm, kvm, shuah, catalin.marinas, andrew.jones,
	ajones, bgardon, dmatlack, will, suzuki.poulose,
	alexandru.elisei, pbonzini, peterx, seanjc, oliver.upton,
	zhenyzha, shan.gavin

On Sun, 06 Nov 2022 21:46:19 +0000,
Gavin Shan <gshan@redhat.com> wrote:
> 
> Hi Marc,
> 
> On 11/6/22 11:50 PM, Marc Zyngier wrote:
> > On Fri, 04 Nov 2022 23:40:46 +0000,
> > Gavin Shan <gshan@redhat.com> wrote:
> >> 
> >> Enable ring-based dirty memory tracking on arm64 by selecting
> >> CONFIG_HAVE_KVM_DIRTY_{RING_ACQ_REL, RING_WITH_BITMAP} and providing
> >> the ring buffer's physical page offset (KVM_DIRTY_LOG_PAGE_OFFSET).
> >> 
> >> Besides, helper kvm_vgic_save_its_tables_in_progress() is added to
> >> indicate if vgic/its tables are being saved or not. The helper is used
> >> in ARM64's kvm_arch_allow_write_without_running_vcpu() to keep the
> >> site of saving vgic/its tables out of no-running-vcpu radar.
> >> 
> >> Signed-off-by: Gavin Shan <gshan@redhat.com>
> >> ---
> >>   Documentation/virt/kvm/api.rst     |  2 +-
> >>   arch/arm64/include/uapi/asm/kvm.h  |  1 +
> >>   arch/arm64/kvm/Kconfig             |  2 ++
> >>   arch/arm64/kvm/arm.c               |  3 +++
> >>   arch/arm64/kvm/mmu.c               | 15 +++++++++++++++
> >>   arch/arm64/kvm/vgic/vgic-its.c     |  3 +++
> >>   arch/arm64/kvm/vgic/vgic-mmio-v3.c |  7 +++++++
> >>   include/kvm/arm_vgic.h             |  2 ++
> >>   8 files changed, 34 insertions(+), 1 deletion(-)
> >> 
> >> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> >> index 2ec32bd41792..2fc68f684ad8 100644
> >> --- a/Documentation/virt/kvm/api.rst
> >> +++ b/Documentation/virt/kvm/api.rst
> >> @@ -7921,7 +7921,7 @@ regardless of what has actually been exposed through the CPUID leaf.
> >>   8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> >>   ----------------------------------------------------------
> >>   -:Architectures: x86
> >> +:Architectures: x86, arm64
> >>   :Parameters: args[0] - size of the dirty log ring
> >>     KVM is capable of tracking dirty memory using ring buffers that
> >> are
> >> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> >> index 316917b98707..a7a857f1784d 100644
> >> --- a/arch/arm64/include/uapi/asm/kvm.h
> >> +++ b/arch/arm64/include/uapi/asm/kvm.h
> >> @@ -43,6 +43,7 @@
> >>   #define __KVM_HAVE_VCPU_EVENTS
> >>     #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
> >> +#define KVM_DIRTY_LOG_PAGE_OFFSET 64
> >>     #define KVM_REG_SIZE(id)
> >> \
> >>   	(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
> >> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> >> index 815cc118c675..066b053e9eb9 100644
> >> --- a/arch/arm64/kvm/Kconfig
> >> +++ b/arch/arm64/kvm/Kconfig
> >> @@ -32,6 +32,8 @@ menuconfig KVM
> >>   	select KVM_VFIO
> >>   	select HAVE_KVM_EVENTFD
> >>   	select HAVE_KVM_IRQFD
> >> +	select HAVE_KVM_DIRTY_RING_ACQ_REL
> >> +	select HAVE_KVM_DIRTY_RING_WITH_BITMAP
> >>   	select HAVE_KVM_MSI
> >>   	select HAVE_KVM_IRQCHIP
> >>   	select HAVE_KVM_IRQ_ROUTING
> >> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> >> index 94d33e296e10..6b097605e38c 100644
> >> --- a/arch/arm64/kvm/arm.c
> >> +++ b/arch/arm64/kvm/arm.c
> >> @@ -746,6 +746,9 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
> >>     		if (kvm_check_request(KVM_REQ_SUSPEND, vcpu))
> >>   			return kvm_vcpu_suspend(vcpu);
> >> +
> >> +		if (kvm_dirty_ring_check_request(vcpu))
> >> +			return 0;
> >>   	}
> >>     	return 1;
> >> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> >> index 60ee3d9f01f8..fbeb55e45f53 100644
> >> --- a/arch/arm64/kvm/mmu.c
> >> +++ b/arch/arm64/kvm/mmu.c
> >> @@ -932,6 +932,21 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
> >>   	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
> >>   }
> >>   +/*
> >> + * kvm_arch_allow_write_without_running_vcpu - allow writing guest memory
> >> + * without the running VCPU when dirty ring is enabled.
> >> + *
> >> + * The running VCPU is required to track dirty guest pages when dirty ring
> >> + * is enabled. Otherwise, the backup bitmap should be used to track the
> >> + * dirty guest pages. When vgic/its tables are being saved, the backup
> >> + * bitmap is used to track the dirty guest pages due to the missed running
> >> + * VCPU in the period.
> >> + */
> >> +bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
> >> +{
> >> +	return kvm_vgic_save_its_tables_in_progress(kvm);
> > 
> > I don't think we need the extra level of abstraction here. Just return
> > kvm->arch.vgic.save_its_tables_in_progress and be done with it.
> > 
> > You can also move the helper to the vgic-its code since they are
> > closely related for now.
> > 
> 
> Ok. After kvm_arch_allow_write_without_running_vcpu() is moved to vgic-its.c,
> do we need to replace 'struct vgic_dist::save_its_tables_in_progress' with
> a file-scoped variant ('bool vgic_its_saving_tables') ?

No, this still needs to be per-VM.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 4/7] KVM: arm64: Enable ring-based dirty memory tracking
@ 2022-11-07  9:47         ` Marc Zyngier
  0 siblings, 0 replies; 66+ messages in thread
From: Marc Zyngier @ 2022-11-07  9:47 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvm, catalin.marinas, andrew.jones, dmatlack, will, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, shuah, kvmarm, ajones

On Sun, 06 Nov 2022 21:46:19 +0000,
Gavin Shan <gshan@redhat.com> wrote:
> 
> Hi Marc,
> 
> On 11/6/22 11:50 PM, Marc Zyngier wrote:
> > On Fri, 04 Nov 2022 23:40:46 +0000,
> > Gavin Shan <gshan@redhat.com> wrote:
> >> 
> >> Enable ring-based dirty memory tracking on arm64 by selecting
> >> CONFIG_HAVE_KVM_DIRTY_{RING_ACQ_REL, RING_WITH_BITMAP} and providing
> >> the ring buffer's physical page offset (KVM_DIRTY_LOG_PAGE_OFFSET).
> >> 
> >> Besides, helper kvm_vgic_save_its_tables_in_progress() is added to
> >> indicate if vgic/its tables are being saved or not. The helper is used
> >> in ARM64's kvm_arch_allow_write_without_running_vcpu() to keep the
> >> site of saving vgic/its tables out of no-running-vcpu radar.
> >> 
> >> Signed-off-by: Gavin Shan <gshan@redhat.com>
> >> ---
> >>   Documentation/virt/kvm/api.rst     |  2 +-
> >>   arch/arm64/include/uapi/asm/kvm.h  |  1 +
> >>   arch/arm64/kvm/Kconfig             |  2 ++
> >>   arch/arm64/kvm/arm.c               |  3 +++
> >>   arch/arm64/kvm/mmu.c               | 15 +++++++++++++++
> >>   arch/arm64/kvm/vgic/vgic-its.c     |  3 +++
> >>   arch/arm64/kvm/vgic/vgic-mmio-v3.c |  7 +++++++
> >>   include/kvm/arm_vgic.h             |  2 ++
> >>   8 files changed, 34 insertions(+), 1 deletion(-)
> >> 
> >> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> >> index 2ec32bd41792..2fc68f684ad8 100644
> >> --- a/Documentation/virt/kvm/api.rst
> >> +++ b/Documentation/virt/kvm/api.rst
> >> @@ -7921,7 +7921,7 @@ regardless of what has actually been exposed through the CPUID leaf.
> >>   8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> >>   ----------------------------------------------------------
> >>   -:Architectures: x86
> >> +:Architectures: x86, arm64
> >>   :Parameters: args[0] - size of the dirty log ring
> >>     KVM is capable of tracking dirty memory using ring buffers that
> >> are
> >> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> >> index 316917b98707..a7a857f1784d 100644
> >> --- a/arch/arm64/include/uapi/asm/kvm.h
> >> +++ b/arch/arm64/include/uapi/asm/kvm.h
> >> @@ -43,6 +43,7 @@
> >>   #define __KVM_HAVE_VCPU_EVENTS
> >>     #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
> >> +#define KVM_DIRTY_LOG_PAGE_OFFSET 64
> >>     #define KVM_REG_SIZE(id)
> >> \
> >>   	(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
> >> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> >> index 815cc118c675..066b053e9eb9 100644
> >> --- a/arch/arm64/kvm/Kconfig
> >> +++ b/arch/arm64/kvm/Kconfig
> >> @@ -32,6 +32,8 @@ menuconfig KVM
> >>   	select KVM_VFIO
> >>   	select HAVE_KVM_EVENTFD
> >>   	select HAVE_KVM_IRQFD
> >> +	select HAVE_KVM_DIRTY_RING_ACQ_REL
> >> +	select HAVE_KVM_DIRTY_RING_WITH_BITMAP
> >>   	select HAVE_KVM_MSI
> >>   	select HAVE_KVM_IRQCHIP
> >>   	select HAVE_KVM_IRQ_ROUTING
> >> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> >> index 94d33e296e10..6b097605e38c 100644
> >> --- a/arch/arm64/kvm/arm.c
> >> +++ b/arch/arm64/kvm/arm.c
> >> @@ -746,6 +746,9 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
> >>     		if (kvm_check_request(KVM_REQ_SUSPEND, vcpu))
> >>   			return kvm_vcpu_suspend(vcpu);
> >> +
> >> +		if (kvm_dirty_ring_check_request(vcpu))
> >> +			return 0;
> >>   	}
> >>     	return 1;
> >> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> >> index 60ee3d9f01f8..fbeb55e45f53 100644
> >> --- a/arch/arm64/kvm/mmu.c
> >> +++ b/arch/arm64/kvm/mmu.c
> >> @@ -932,6 +932,21 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
> >>   	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
> >>   }
> >>   +/*
> >> + * kvm_arch_allow_write_without_running_vcpu - allow writing guest memory
> >> + * without the running VCPU when dirty ring is enabled.
> >> + *
> >> + * The running VCPU is required to track dirty guest pages when dirty ring
> >> + * is enabled. Otherwise, the backup bitmap should be used to track the
> >> + * dirty guest pages. When vgic/its tables are being saved, the backup
> >> + * bitmap is used to track the dirty guest pages due to the missed running
> >> + * VCPU in the period.
> >> + */
> >> +bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
> >> +{
> >> +	return kvm_vgic_save_its_tables_in_progress(kvm);
> > 
> > I don't think we need the extra level of abstraction here. Just return
> > kvm->arch.vgic.save_its_tables_in_progress and be done with it.
> > 
> > You can also move the helper to the vgic-its code since they are
> > closely related for now.
> > 
> 
> Ok. After kvm_arch_allow_write_without_running_vcpu() is moved to vgic-its.c,
> do we need to replace 'struct vgic_dist::save_its_tables_in_progress' with
> a file-scoped variant ('bool vgic_its_saving_tables') ?

No, this still needs to be per-VM.

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
  2022-11-04 23:40   ` Gavin Shan
@ 2022-11-07 10:45     ` Gavin Shan
  -1 siblings, 0 replies; 66+ messages in thread
From: Gavin Shan @ 2022-11-07 10:45 UTC (permalink / raw)
  To: kvmarm
  Cc: maz, kvm, catalin.marinas, andrew.jones, will, shan.gavin,
	bgardon, dmatlack, pbonzini, zhenyzha, shuah, kvmarm, ajones

Hi Marc, Peter, Oliver and Sean,

On 11/5/22 7:40 AM, Gavin Shan wrote:
> ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
> enabled. It's conflicting with that ring-based dirty page tracking always
> requires a running VCPU context.
> 
> Introduce a new flavor of dirty ring that requires the use of both VCPU
> dirty rings and a dirty bitmap. The expectation is that for non-VCPU
> sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
> the dirty bitmap. Userspace should scan the dirty bitmap before migrating
> the VM to the target.
> 
> Use an additional capability to advertise this behavior. The newly added
> capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
> KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
> capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.
> 
> Suggested-by: Marc Zyngier <maz@kernel.org>
> Suggested-by: Peter Xu <peterx@redhat.com>
> Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> Acked-by: Peter Xu <peterx@redhat.com>
> ---
>   Documentation/virt/kvm/api.rst | 33 ++++++++++++++++++-----
>   include/linux/kvm_dirty_ring.h |  7 +++++
>   include/linux/kvm_host.h       |  1 +
>   include/uapi/linux/kvm.h       |  1 +
>   virt/kvm/Kconfig               |  8 ++++++
>   virt/kvm/dirty_ring.c          | 10 +++++++
>   virt/kvm/kvm_main.c            | 49 +++++++++++++++++++++++++++-------
>   7 files changed, 93 insertions(+), 16 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index eee9f857a986..2ec32bd41792 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -8003,13 +8003,6 @@ flushing is done by the KVM_GET_DIRTY_LOG ioctl).  To achieve that, one
>   needs to kick the vcpu out of KVM_RUN using a signal.  The resulting
>   vmexit ensures that all dirty GFNs are flushed to the dirty rings.
>   
> -NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the corresponding
> -ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls
> -KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG.  After enabling
> -KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual
> -machine will switch to ring-buffer dirty page tracking and further
> -KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail.
> -
>   NOTE: KVM_CAP_DIRTY_LOG_RING_ACQ_REL is the only capability that
>   should be exposed by weakly ordered architecture, in order to indicate
>   the additional memory ordering requirements imposed on userspace when
> @@ -8018,6 +8011,32 @@ Architecture with TSO-like ordering (such as x86) are allowed to
>   expose both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL
>   to userspace.
>   
> +After using the dirty rings, the userspace needs to detect the capability
> +of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures
> +need to be backed by per-slot bitmaps. With this capability advertised
> +and supported, it means the architecture can dirty guest pages without
> +vcpu/ring context, so that some of the dirty information will still be
> +maintained in the bitmap structure. KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
> +can't be enabled until the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> +has been enabled.
> +
> +Note that the bitmap here is only a backup of the ring structure, and
> +normally should only contain a very small amount of dirty pages, which
> +needs to be transferred during VM downtime. Collecting the dirty bitmap
> +should be the very last thing that the VMM does before transmitting state
> +to the target VM. VMM needs to ensure that the dirty state is final and
> +avoid missing dirty pages from another ioctl ordered after the bitmap
> +collection.
> +
> +To collect dirty bits in the backup bitmap, the userspace can use the
> +same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG shouldn't be needed
> +and its behavior is undefined since collecting the dirty bitmap always
> +happens in the last phase of VM's migration.
> +
> +NOTE: One example of using the backup bitmap is saving arm64 vgic/its
> +tables through KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on
> +KVM device "kvm-arm-vgic-its" during VM's migration.
> +

In order to speed up the review and reduce unnecessary respins. After
collecting comments on PATCH[v8 3/7] from Marc and Peter, I would change
above description as below. Could you please confirm it looks good to you?

In the 4th paragraph, the words starting from "Collecting the dirty bitmap..."
to the end, was previously suggested by Oliver, even Marc suggested to avoid
mentioning "migration".

   After enabling the dirty rings, the userspace needs to detect the
   capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring
   structures need to be backed by per-slot bitmaps. With this capability
   advertised, it means the architecture can dirty guest pages without
   vcpu/ring context, so that some of the dirty information will still be
   maintained in the bitmap structure. KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
   can't be enabled if the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL
   hasn't been enabled, or any memslot has been existing.

   Note that the bitmap here is only a backup of the ring structure. The
   use of the ring and bitmap combination is only beneficial if there is
   only a very small amount of memory that is dirtied out of vcpu/ring
   context. Otherwise, the stand-alone per-slot bitmap mechanism needs to
   be considered.

   To collect dirty bits in the backup bitmap, userspace can use the same
   KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG isn't needed as long as all
   the generation of the dirty bits is done in a single pass. Collecting
   the dirty bitmap should be the very last thing that the VMM does before
   transmitting state to the target VM. VMM needs to ensure that the dirty
   state is final and avoid missing dirty pages from another ioctl ordered
   after the bitmap collection.

   NOTE: One example of using the backup bitmap is saving arm64 vgic/its
   tables through KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on
   KVM device "kvm-arm-vgic-its" during VM's migration.

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
@ 2022-11-07 10:45     ` Gavin Shan
  0 siblings, 0 replies; 66+ messages in thread
From: Gavin Shan @ 2022-11-07 10:45 UTC (permalink / raw)
  To: kvmarm
  Cc: shuah, kvm, maz, bgardon, andrew.jones, shan.gavin,
	catalin.marinas, dmatlack, pbonzini, zhenyzha, will, kvmarm,
	ajones

Hi Marc, Peter, Oliver and Sean,

On 11/5/22 7:40 AM, Gavin Shan wrote:
> ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
> enabled. It's conflicting with that ring-based dirty page tracking always
> requires a running VCPU context.
> 
> Introduce a new flavor of dirty ring that requires the use of both VCPU
> dirty rings and a dirty bitmap. The expectation is that for non-VCPU
> sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
> the dirty bitmap. Userspace should scan the dirty bitmap before migrating
> the VM to the target.
> 
> Use an additional capability to advertise this behavior. The newly added
> capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
> KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
> capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.
> 
> Suggested-by: Marc Zyngier <maz@kernel.org>
> Suggested-by: Peter Xu <peterx@redhat.com>
> Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> Acked-by: Peter Xu <peterx@redhat.com>
> ---
>   Documentation/virt/kvm/api.rst | 33 ++++++++++++++++++-----
>   include/linux/kvm_dirty_ring.h |  7 +++++
>   include/linux/kvm_host.h       |  1 +
>   include/uapi/linux/kvm.h       |  1 +
>   virt/kvm/Kconfig               |  8 ++++++
>   virt/kvm/dirty_ring.c          | 10 +++++++
>   virt/kvm/kvm_main.c            | 49 +++++++++++++++++++++++++++-------
>   7 files changed, 93 insertions(+), 16 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index eee9f857a986..2ec32bd41792 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -8003,13 +8003,6 @@ flushing is done by the KVM_GET_DIRTY_LOG ioctl).  To achieve that, one
>   needs to kick the vcpu out of KVM_RUN using a signal.  The resulting
>   vmexit ensures that all dirty GFNs are flushed to the dirty rings.
>   
> -NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the corresponding
> -ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls
> -KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG.  After enabling
> -KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual
> -machine will switch to ring-buffer dirty page tracking and further
> -KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail.
> -
>   NOTE: KVM_CAP_DIRTY_LOG_RING_ACQ_REL is the only capability that
>   should be exposed by weakly ordered architecture, in order to indicate
>   the additional memory ordering requirements imposed on userspace when
> @@ -8018,6 +8011,32 @@ Architecture with TSO-like ordering (such as x86) are allowed to
>   expose both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL
>   to userspace.
>   
> +After using the dirty rings, the userspace needs to detect the capability
> +of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures
> +need to be backed by per-slot bitmaps. With this capability advertised
> +and supported, it means the architecture can dirty guest pages without
> +vcpu/ring context, so that some of the dirty information will still be
> +maintained in the bitmap structure. KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
> +can't be enabled until the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> +has been enabled.
> +
> +Note that the bitmap here is only a backup of the ring structure, and
> +normally should only contain a very small amount of dirty pages, which
> +needs to be transferred during VM downtime. Collecting the dirty bitmap
> +should be the very last thing that the VMM does before transmitting state
> +to the target VM. VMM needs to ensure that the dirty state is final and
> +avoid missing dirty pages from another ioctl ordered after the bitmap
> +collection.
> +
> +To collect dirty bits in the backup bitmap, the userspace can use the
> +same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG shouldn't be needed
> +and its behavior is undefined since collecting the dirty bitmap always
> +happens in the last phase of VM's migration.
> +
> +NOTE: One example of using the backup bitmap is saving arm64 vgic/its
> +tables through KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on
> +KVM device "kvm-arm-vgic-its" during VM's migration.
> +

In order to speed up the review and reduce unnecessary respins. After
collecting comments on PATCH[v8 3/7] from Marc and Peter, I would change
above description as below. Could you please confirm it looks good to you?

In the 4th paragraph, the words starting from "Collecting the dirty bitmap..."
to the end, was previously suggested by Oliver, even Marc suggested to avoid
mentioning "migration".

   After enabling the dirty rings, the userspace needs to detect the
   capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring
   structures need to be backed by per-slot bitmaps. With this capability
   advertised, it means the architecture can dirty guest pages without
   vcpu/ring context, so that some of the dirty information will still be
   maintained in the bitmap structure. KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
   can't be enabled if the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL
   hasn't been enabled, or any memslot has been existing.

   Note that the bitmap here is only a backup of the ring structure. The
   use of the ring and bitmap combination is only beneficial if there is
   only a very small amount of memory that is dirtied out of vcpu/ring
   context. Otherwise, the stand-alone per-slot bitmap mechanism needs to
   be considered.

   To collect dirty bits in the backup bitmap, userspace can use the same
   KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG isn't needed as long as all
   the generation of the dirty bits is done in a single pass. Collecting
   the dirty bitmap should be the very last thing that the VMM does before
   transmitting state to the target VM. VMM needs to ensure that the dirty
   state is final and avoid missing dirty pages from another ioctl ordered
   after the bitmap collection.

   NOTE: One example of using the backup bitmap is saving arm64 vgic/its
   tables through KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on
   KVM device "kvm-arm-vgic-its" during VM's migration.

Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 4/7] KVM: arm64: Enable ring-based dirty memory tracking
  2022-11-07  9:47         ` Marc Zyngier
@ 2022-11-07 10:47           ` Gavin Shan
  -1 siblings, 0 replies; 66+ messages in thread
From: Gavin Shan @ 2022-11-07 10:47 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, catalin.marinas, andrew.jones, dmatlack, will, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, shuah, kvmarm, ajones

Hi Marc,

On 11/7/22 5:47 PM, Marc Zyngier wrote:
> On Sun, 06 Nov 2022 21:46:19 +0000,
> Gavin Shan <gshan@redhat.com> wrote:
>> On 11/6/22 11:50 PM, Marc Zyngier wrote:
>>> On Fri, 04 Nov 2022 23:40:46 +0000,
>>> Gavin Shan <gshan@redhat.com> wrote:
>>>>
>>>> Enable ring-based dirty memory tracking on arm64 by selecting
>>>> CONFIG_HAVE_KVM_DIRTY_{RING_ACQ_REL, RING_WITH_BITMAP} and providing
>>>> the ring buffer's physical page offset (KVM_DIRTY_LOG_PAGE_OFFSET).
>>>>
>>>> Besides, helper kvm_vgic_save_its_tables_in_progress() is added to
>>>> indicate if vgic/its tables are being saved or not. The helper is used
>>>> in ARM64's kvm_arch_allow_write_without_running_vcpu() to keep the
>>>> site of saving vgic/its tables out of no-running-vcpu radar.
>>>>
>>>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>>>> ---
>>>>    Documentation/virt/kvm/api.rst     |  2 +-
>>>>    arch/arm64/include/uapi/asm/kvm.h  |  1 +
>>>>    arch/arm64/kvm/Kconfig             |  2 ++
>>>>    arch/arm64/kvm/arm.c               |  3 +++
>>>>    arch/arm64/kvm/mmu.c               | 15 +++++++++++++++
>>>>    arch/arm64/kvm/vgic/vgic-its.c     |  3 +++
>>>>    arch/arm64/kvm/vgic/vgic-mmio-v3.c |  7 +++++++
>>>>    include/kvm/arm_vgic.h             |  2 ++
>>>>    8 files changed, 34 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>>>> index 2ec32bd41792..2fc68f684ad8 100644
>>>> --- a/Documentation/virt/kvm/api.rst
>>>> +++ b/Documentation/virt/kvm/api.rst
>>>> @@ -7921,7 +7921,7 @@ regardless of what has actually been exposed through the CPUID leaf.
>>>>    8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
>>>>    ----------------------------------------------------------
>>>>    -:Architectures: x86
>>>> +:Architectures: x86, arm64
>>>>    :Parameters: args[0] - size of the dirty log ring
>>>>      KVM is capable of tracking dirty memory using ring buffers that
>>>> are
>>>> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
>>>> index 316917b98707..a7a857f1784d 100644
>>>> --- a/arch/arm64/include/uapi/asm/kvm.h
>>>> +++ b/arch/arm64/include/uapi/asm/kvm.h
>>>> @@ -43,6 +43,7 @@
>>>>    #define __KVM_HAVE_VCPU_EVENTS
>>>>      #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
>>>> +#define KVM_DIRTY_LOG_PAGE_OFFSET 64
>>>>      #define KVM_REG_SIZE(id)
>>>> \
>>>>    	(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
>>>> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
>>>> index 815cc118c675..066b053e9eb9 100644
>>>> --- a/arch/arm64/kvm/Kconfig
>>>> +++ b/arch/arm64/kvm/Kconfig
>>>> @@ -32,6 +32,8 @@ menuconfig KVM
>>>>    	select KVM_VFIO
>>>>    	select HAVE_KVM_EVENTFD
>>>>    	select HAVE_KVM_IRQFD
>>>> +	select HAVE_KVM_DIRTY_RING_ACQ_REL
>>>> +	select HAVE_KVM_DIRTY_RING_WITH_BITMAP
>>>>    	select HAVE_KVM_MSI
>>>>    	select HAVE_KVM_IRQCHIP
>>>>    	select HAVE_KVM_IRQ_ROUTING
>>>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>>>> index 94d33e296e10..6b097605e38c 100644
>>>> --- a/arch/arm64/kvm/arm.c
>>>> +++ b/arch/arm64/kvm/arm.c
>>>> @@ -746,6 +746,9 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
>>>>      		if (kvm_check_request(KVM_REQ_SUSPEND, vcpu))
>>>>    			return kvm_vcpu_suspend(vcpu);
>>>> +
>>>> +		if (kvm_dirty_ring_check_request(vcpu))
>>>> +			return 0;
>>>>    	}
>>>>      	return 1;
>>>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>>>> index 60ee3d9f01f8..fbeb55e45f53 100644
>>>> --- a/arch/arm64/kvm/mmu.c
>>>> +++ b/arch/arm64/kvm/mmu.c
>>>> @@ -932,6 +932,21 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>>>>    	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
>>>>    }
>>>>    +/*
>>>> + * kvm_arch_allow_write_without_running_vcpu - allow writing guest memory
>>>> + * without the running VCPU when dirty ring is enabled.
>>>> + *
>>>> + * The running VCPU is required to track dirty guest pages when dirty ring
>>>> + * is enabled. Otherwise, the backup bitmap should be used to track the
>>>> + * dirty guest pages. When vgic/its tables are being saved, the backup
>>>> + * bitmap is used to track the dirty guest pages due to the missed running
>>>> + * VCPU in the period.
>>>> + */
>>>> +bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
>>>> +{
>>>> +	return kvm_vgic_save_its_tables_in_progress(kvm);
>>>
>>> I don't think we need the extra level of abstraction here. Just return
>>> kvm->arch.vgic.save_its_tables_in_progress and be done with it.
>>>
>>> You can also move the helper to the vgic-its code since they are
>>> closely related for now.
>>>
>>
>> Ok. After kvm_arch_allow_write_without_running_vcpu() is moved to vgic-its.c,
>> do we need to replace 'struct vgic_dist::save_its_tables_in_progress' with
>> a file-scoped variant ('bool vgic_its_saving_tables') ?
> 
> No, this still needs to be per-VM.
> 

Yeah, it's still per-VM state. Sorry for my dumb question :)

Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 4/7] KVM: arm64: Enable ring-based dirty memory tracking
@ 2022-11-07 10:47           ` Gavin Shan
  0 siblings, 0 replies; 66+ messages in thread
From: Gavin Shan @ 2022-11-07 10:47 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, kvmarm, kvm, shuah, catalin.marinas, andrew.jones,
	ajones, bgardon, dmatlack, will, suzuki.poulose,
	alexandru.elisei, pbonzini, peterx, seanjc, oliver.upton,
	zhenyzha, shan.gavin

Hi Marc,

On 11/7/22 5:47 PM, Marc Zyngier wrote:
> On Sun, 06 Nov 2022 21:46:19 +0000,
> Gavin Shan <gshan@redhat.com> wrote:
>> On 11/6/22 11:50 PM, Marc Zyngier wrote:
>>> On Fri, 04 Nov 2022 23:40:46 +0000,
>>> Gavin Shan <gshan@redhat.com> wrote:
>>>>
>>>> Enable ring-based dirty memory tracking on arm64 by selecting
>>>> CONFIG_HAVE_KVM_DIRTY_{RING_ACQ_REL, RING_WITH_BITMAP} and providing
>>>> the ring buffer's physical page offset (KVM_DIRTY_LOG_PAGE_OFFSET).
>>>>
>>>> Besides, helper kvm_vgic_save_its_tables_in_progress() is added to
>>>> indicate if vgic/its tables are being saved or not. The helper is used
>>>> in ARM64's kvm_arch_allow_write_without_running_vcpu() to keep the
>>>> site of saving vgic/its tables out of no-running-vcpu radar.
>>>>
>>>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>>>> ---
>>>>    Documentation/virt/kvm/api.rst     |  2 +-
>>>>    arch/arm64/include/uapi/asm/kvm.h  |  1 +
>>>>    arch/arm64/kvm/Kconfig             |  2 ++
>>>>    arch/arm64/kvm/arm.c               |  3 +++
>>>>    arch/arm64/kvm/mmu.c               | 15 +++++++++++++++
>>>>    arch/arm64/kvm/vgic/vgic-its.c     |  3 +++
>>>>    arch/arm64/kvm/vgic/vgic-mmio-v3.c |  7 +++++++
>>>>    include/kvm/arm_vgic.h             |  2 ++
>>>>    8 files changed, 34 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>>>> index 2ec32bd41792..2fc68f684ad8 100644
>>>> --- a/Documentation/virt/kvm/api.rst
>>>> +++ b/Documentation/virt/kvm/api.rst
>>>> @@ -7921,7 +7921,7 @@ regardless of what has actually been exposed through the CPUID leaf.
>>>>    8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
>>>>    ----------------------------------------------------------
>>>>    -:Architectures: x86
>>>> +:Architectures: x86, arm64
>>>>    :Parameters: args[0] - size of the dirty log ring
>>>>      KVM is capable of tracking dirty memory using ring buffers that
>>>> are
>>>> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
>>>> index 316917b98707..a7a857f1784d 100644
>>>> --- a/arch/arm64/include/uapi/asm/kvm.h
>>>> +++ b/arch/arm64/include/uapi/asm/kvm.h
>>>> @@ -43,6 +43,7 @@
>>>>    #define __KVM_HAVE_VCPU_EVENTS
>>>>      #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
>>>> +#define KVM_DIRTY_LOG_PAGE_OFFSET 64
>>>>      #define KVM_REG_SIZE(id)
>>>> \
>>>>    	(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
>>>> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
>>>> index 815cc118c675..066b053e9eb9 100644
>>>> --- a/arch/arm64/kvm/Kconfig
>>>> +++ b/arch/arm64/kvm/Kconfig
>>>> @@ -32,6 +32,8 @@ menuconfig KVM
>>>>    	select KVM_VFIO
>>>>    	select HAVE_KVM_EVENTFD
>>>>    	select HAVE_KVM_IRQFD
>>>> +	select HAVE_KVM_DIRTY_RING_ACQ_REL
>>>> +	select HAVE_KVM_DIRTY_RING_WITH_BITMAP
>>>>    	select HAVE_KVM_MSI
>>>>    	select HAVE_KVM_IRQCHIP
>>>>    	select HAVE_KVM_IRQ_ROUTING
>>>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>>>> index 94d33e296e10..6b097605e38c 100644
>>>> --- a/arch/arm64/kvm/arm.c
>>>> +++ b/arch/arm64/kvm/arm.c
>>>> @@ -746,6 +746,9 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
>>>>      		if (kvm_check_request(KVM_REQ_SUSPEND, vcpu))
>>>>    			return kvm_vcpu_suspend(vcpu);
>>>> +
>>>> +		if (kvm_dirty_ring_check_request(vcpu))
>>>> +			return 0;
>>>>    	}
>>>>      	return 1;
>>>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>>>> index 60ee3d9f01f8..fbeb55e45f53 100644
>>>> --- a/arch/arm64/kvm/mmu.c
>>>> +++ b/arch/arm64/kvm/mmu.c
>>>> @@ -932,6 +932,21 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
>>>>    	kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
>>>>    }
>>>>    +/*
>>>> + * kvm_arch_allow_write_without_running_vcpu - allow writing guest memory
>>>> + * without the running VCPU when dirty ring is enabled.
>>>> + *
>>>> + * The running VCPU is required to track dirty guest pages when dirty ring
>>>> + * is enabled. Otherwise, the backup bitmap should be used to track the
>>>> + * dirty guest pages. When vgic/its tables are being saved, the backup
>>>> + * bitmap is used to track the dirty guest pages due to the missed running
>>>> + * VCPU in the period.
>>>> + */
>>>> +bool kvm_arch_allow_write_without_running_vcpu(struct kvm *kvm)
>>>> +{
>>>> +	return kvm_vgic_save_its_tables_in_progress(kvm);
>>>
>>> I don't think we need the extra level of abstraction here. Just return
>>> kvm->arch.vgic.save_its_tables_in_progress and be done with it.
>>>
>>> You can also move the helper to the vgic-its code since they are
>>> closely related for now.
>>>
>>
>> Ok. After kvm_arch_allow_write_without_running_vcpu() is moved to vgic-its.c,
>> do we need to replace 'struct vgic_dist::save_its_tables_in_progress' with
>> a file-scoped variant ('bool vgic_its_saving_tables') ?
> 
> No, this still needs to be per-VM.
> 

Yeah, it's still per-VM state. Sorry for my dumb question :)

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
  2022-11-07 10:45     ` Gavin Shan
@ 2022-11-07 11:33       ` Marc Zyngier
  -1 siblings, 0 replies; 66+ messages in thread
From: Marc Zyngier @ 2022-11-07 11:33 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvmarm, kvm, catalin.marinas, andrew.jones, will, shan.gavin,
	bgardon, dmatlack, pbonzini, zhenyzha, shuah, kvmarm, ajones

On Mon, 07 Nov 2022 10:45:34 +0000,
Gavin Shan <gshan@redhat.com> wrote:
> 
> Hi Marc, Peter, Oliver and Sean,
> 
> On 11/5/22 7:40 AM, Gavin Shan wrote:
> > ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
> > enabled. It's conflicting with that ring-based dirty page tracking always
> > requires a running VCPU context.
> > 
> > Introduce a new flavor of dirty ring that requires the use of both VCPU
> > dirty rings and a dirty bitmap. The expectation is that for non-VCPU
> > sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
> > the dirty bitmap. Userspace should scan the dirty bitmap before migrating
> > the VM to the target.
> > 
> > Use an additional capability to advertise this behavior. The newly added
> > capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
> > KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
> > capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.
> > 
> > Suggested-by: Marc Zyngier <maz@kernel.org>
> > Suggested-by: Peter Xu <peterx@redhat.com>
> > Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
> > Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> > Signed-off-by: Gavin Shan <gshan@redhat.com>
> > Acked-by: Peter Xu <peterx@redhat.com>
> > ---
> >   Documentation/virt/kvm/api.rst | 33 ++++++++++++++++++-----
> >   include/linux/kvm_dirty_ring.h |  7 +++++
> >   include/linux/kvm_host.h       |  1 +
> >   include/uapi/linux/kvm.h       |  1 +
> >   virt/kvm/Kconfig               |  8 ++++++
> >   virt/kvm/dirty_ring.c          | 10 +++++++
> >   virt/kvm/kvm_main.c            | 49 +++++++++++++++++++++++++++-------
> >   7 files changed, 93 insertions(+), 16 deletions(-)
> > 
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index eee9f857a986..2ec32bd41792 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -8003,13 +8003,6 @@ flushing is done by the KVM_GET_DIRTY_LOG ioctl).  To achieve that, one
> >   needs to kick the vcpu out of KVM_RUN using a signal.  The resulting
> >   vmexit ensures that all dirty GFNs are flushed to the dirty rings.
> >   -NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the corresponding
> > -ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls
> > -KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG.  After enabling
> > -KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual
> > -machine will switch to ring-buffer dirty page tracking and further
> > -KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail.
> > -
> >   NOTE: KVM_CAP_DIRTY_LOG_RING_ACQ_REL is the only capability that
> >   should be exposed by weakly ordered architecture, in order to indicate
> >   the additional memory ordering requirements imposed on userspace when
> > @@ -8018,6 +8011,32 @@ Architecture with TSO-like ordering (such as x86) are allowed to
> >   expose both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> >   to userspace.
> >   +After using the dirty rings, the userspace needs to detect the
> > capability
> > +of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures
> > +need to be backed by per-slot bitmaps. With this capability advertised
> > +and supported, it means the architecture can dirty guest pages without
> > +vcpu/ring context, so that some of the dirty information will still be
> > +maintained in the bitmap structure. KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
> > +can't be enabled until the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> > +has been enabled.
> > +
> > +Note that the bitmap here is only a backup of the ring structure, and
> > +normally should only contain a very small amount of dirty pages, which
> > +needs to be transferred during VM downtime. Collecting the dirty bitmap
> > +should be the very last thing that the VMM does before transmitting state
> > +to the target VM. VMM needs to ensure that the dirty state is final and
> > +avoid missing dirty pages from another ioctl ordered after the bitmap
> > +collection.
> > +
> > +To collect dirty bits in the backup bitmap, the userspace can use the
> > +same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG shouldn't be needed
> > +and its behavior is undefined since collecting the dirty bitmap always
> > +happens in the last phase of VM's migration.
> > +
> > +NOTE: One example of using the backup bitmap is saving arm64 vgic/its
> > +tables through KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on
> > +KVM device "kvm-arm-vgic-its" during VM's migration.
> > +
> 
> In order to speed up the review and reduce unnecessary respins. After
> collecting comments on PATCH[v8 3/7] from Marc and Peter, I would change
> above description as below. Could you please confirm it looks good to you?
> 
> In the 4th paragraph, the words starting from "Collecting the dirty bitmap..."
> to the end, was previously suggested by Oliver, even Marc suggested to avoid
> mentioning "migration".
> 
>   After enabling the dirty rings, the userspace needs to detect the
>   capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring
>   structures need to be backed by per-slot bitmaps. With this capability

s/need/can/. If there was a *need*, it should happen automatically
without user intervention.

>   advertised, it means the architecture can dirty guest pages without
>   vcpu/ring context, so that some of the dirty information will still be
>   maintained in the bitmap structure. KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
>   can't be enabled if the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL
>   hasn't been enabled, or any memslot has been existing.
> 
>   Note that the bitmap here is only a backup of the ring structure. The
>   use of the ring and bitmap combination is only beneficial if there is
>   only a very small amount of memory that is dirtied out of vcpu/ring
>   context. Otherwise, the stand-alone per-slot bitmap mechanism needs to
>   be considered.
> 
>   To collect dirty bits in the backup bitmap, userspace can use the same
>   KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG isn't needed as long as all
>   the generation of the dirty bits is done in a single pass. Collecting
>   the dirty bitmap should be the very last thing that the VMM does before
>   transmitting state to the target VM. VMM needs to ensure that the dirty
>   state is final and avoid missing dirty pages from another ioctl ordered
>   after the bitmap collection.

I would replace "transmitting state to the target VM" with
"considering the state as complete", as I still object to casting this
API into the migration mold. People use this stuff more far more than
migration (checkpointing, for example).

> 
>   NOTE: One example of using the backup bitmap is saving arm64 vgic/its
>   tables through KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on
>   KVM device "kvm-arm-vgic-its" during VM's migration.

Same remark about migration.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
@ 2022-11-07 11:33       ` Marc Zyngier
  0 siblings, 0 replies; 66+ messages in thread
From: Marc Zyngier @ 2022-11-07 11:33 UTC (permalink / raw)
  To: Gavin Shan
  Cc: shuah, kvm, catalin.marinas, andrew.jones, dmatlack, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, will, kvmarm, ajones

On Mon, 07 Nov 2022 10:45:34 +0000,
Gavin Shan <gshan@redhat.com> wrote:
> 
> Hi Marc, Peter, Oliver and Sean,
> 
> On 11/5/22 7:40 AM, Gavin Shan wrote:
> > ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
> > enabled. It's conflicting with that ring-based dirty page tracking always
> > requires a running VCPU context.
> > 
> > Introduce a new flavor of dirty ring that requires the use of both VCPU
> > dirty rings and a dirty bitmap. The expectation is that for non-VCPU
> > sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
> > the dirty bitmap. Userspace should scan the dirty bitmap before migrating
> > the VM to the target.
> > 
> > Use an additional capability to advertise this behavior. The newly added
> > capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
> > KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
> > capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.
> > 
> > Suggested-by: Marc Zyngier <maz@kernel.org>
> > Suggested-by: Peter Xu <peterx@redhat.com>
> > Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
> > Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> > Signed-off-by: Gavin Shan <gshan@redhat.com>
> > Acked-by: Peter Xu <peterx@redhat.com>
> > ---
> >   Documentation/virt/kvm/api.rst | 33 ++++++++++++++++++-----
> >   include/linux/kvm_dirty_ring.h |  7 +++++
> >   include/linux/kvm_host.h       |  1 +
> >   include/uapi/linux/kvm.h       |  1 +
> >   virt/kvm/Kconfig               |  8 ++++++
> >   virt/kvm/dirty_ring.c          | 10 +++++++
> >   virt/kvm/kvm_main.c            | 49 +++++++++++++++++++++++++++-------
> >   7 files changed, 93 insertions(+), 16 deletions(-)
> > 
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index eee9f857a986..2ec32bd41792 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -8003,13 +8003,6 @@ flushing is done by the KVM_GET_DIRTY_LOG ioctl).  To achieve that, one
> >   needs to kick the vcpu out of KVM_RUN using a signal.  The resulting
> >   vmexit ensures that all dirty GFNs are flushed to the dirty rings.
> >   -NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the corresponding
> > -ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls
> > -KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG.  After enabling
> > -KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual
> > -machine will switch to ring-buffer dirty page tracking and further
> > -KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail.
> > -
> >   NOTE: KVM_CAP_DIRTY_LOG_RING_ACQ_REL is the only capability that
> >   should be exposed by weakly ordered architecture, in order to indicate
> >   the additional memory ordering requirements imposed on userspace when
> > @@ -8018,6 +8011,32 @@ Architecture with TSO-like ordering (such as x86) are allowed to
> >   expose both KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> >   to userspace.
> >   +After using the dirty rings, the userspace needs to detect the
> > capability
> > +of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring structures
> > +need to be backed by per-slot bitmaps. With this capability advertised
> > +and supported, it means the architecture can dirty guest pages without
> > +vcpu/ring context, so that some of the dirty information will still be
> > +maintained in the bitmap structure. KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
> > +can't be enabled until the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL
> > +has been enabled.
> > +
> > +Note that the bitmap here is only a backup of the ring structure, and
> > +normally should only contain a very small amount of dirty pages, which
> > +needs to be transferred during VM downtime. Collecting the dirty bitmap
> > +should be the very last thing that the VMM does before transmitting state
> > +to the target VM. VMM needs to ensure that the dirty state is final and
> > +avoid missing dirty pages from another ioctl ordered after the bitmap
> > +collection.
> > +
> > +To collect dirty bits in the backup bitmap, the userspace can use the
> > +same KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG shouldn't be needed
> > +and its behavior is undefined since collecting the dirty bitmap always
> > +happens in the last phase of VM's migration.
> > +
> > +NOTE: One example of using the backup bitmap is saving arm64 vgic/its
> > +tables through KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on
> > +KVM device "kvm-arm-vgic-its" during VM's migration.
> > +
> 
> In order to speed up the review and reduce unnecessary respins. After
> collecting comments on PATCH[v8 3/7] from Marc and Peter, I would change
> above description as below. Could you please confirm it looks good to you?
> 
> In the 4th paragraph, the words starting from "Collecting the dirty bitmap..."
> to the end, was previously suggested by Oliver, even Marc suggested to avoid
> mentioning "migration".
> 
>   After enabling the dirty rings, the userspace needs to detect the
>   capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring
>   structures need to be backed by per-slot bitmaps. With this capability

s/need/can/. If there was a *need*, it should happen automatically
without user intervention.

>   advertised, it means the architecture can dirty guest pages without
>   vcpu/ring context, so that some of the dirty information will still be
>   maintained in the bitmap structure. KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
>   can't be enabled if the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL
>   hasn't been enabled, or any memslot has been existing.
> 
>   Note that the bitmap here is only a backup of the ring structure. The
>   use of the ring and bitmap combination is only beneficial if there is
>   only a very small amount of memory that is dirtied out of vcpu/ring
>   context. Otherwise, the stand-alone per-slot bitmap mechanism needs to
>   be considered.
> 
>   To collect dirty bits in the backup bitmap, userspace can use the same
>   KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG isn't needed as long as all
>   the generation of the dirty bits is done in a single pass. Collecting
>   the dirty bitmap should be the very last thing that the VMM does before
>   transmitting state to the target VM. VMM needs to ensure that the dirty
>   state is final and avoid missing dirty pages from another ioctl ordered
>   after the bitmap collection.

I would replace "transmitting state to the target VM" with
"considering the state as complete", as I still object to casting this
API into the migration mold. People use this stuff more far more than
migration (checkpointing, for example).

> 
>   NOTE: One example of using the backup bitmap is saving arm64 vgic/its
>   tables through KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on
>   KVM device "kvm-arm-vgic-its" during VM's migration.

Same remark about migration.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
  2022-11-07  9:38               ` Marc Zyngier
@ 2022-11-07 14:29                 ` Peter Xu
  -1 siblings, 0 replies; 66+ messages in thread
From: Peter Xu @ 2022-11-07 14:29 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, catalin.marinas, andrew.jones, dmatlack, will, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, shuah, kvmarm, ajones

On Mon, Nov 07, 2022 at 09:38:24AM +0000, Marc Zyngier wrote:
> Peter said there is an undefined behaviour. I want to understand
> whether this is the case or not. QEMU is only one of the users of this
> stuff, as all the vendors have their own custom VMM, and they do
> things in funny ways.

It's not, as we don't special case this case in KVM_CLEAR_DIRTY_LOG.  If
that's confusing, we can drop it in the document.  Thanks.

-- 
Peter Xu

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
@ 2022-11-07 14:29                 ` Peter Xu
  0 siblings, 0 replies; 66+ messages in thread
From: Peter Xu @ 2022-11-07 14:29 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Gavin Shan, kvmarm, kvmarm, kvm, shuah, catalin.marinas,
	andrew.jones, ajones, bgardon, dmatlack, will, suzuki.poulose,
	alexandru.elisei, pbonzini, seanjc, oliver.upton, zhenyzha,
	shan.gavin

On Mon, Nov 07, 2022 at 09:38:24AM +0000, Marc Zyngier wrote:
> Peter said there is an undefined behaviour. I want to understand
> whether this is the case or not. QEMU is only one of the users of this
> stuff, as all the vendors have their own custom VMM, and they do
> things in funny ways.

It's not, as we don't special case this case in KVM_CLEAR_DIRTY_LOG.  If
that's confusing, we can drop it in the document.  Thanks.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
  2022-11-07  9:21             ` Marc Zyngier
@ 2022-11-07 14:59               ` Peter Xu
  -1 siblings, 0 replies; 66+ messages in thread
From: Peter Xu @ 2022-11-07 14:59 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, catalin.marinas, andrew.jones, dmatlack, will, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, shuah, kvmarm, ajones

On Mon, Nov 07, 2022 at 09:21:35AM +0000, Marc Zyngier wrote:
> On Sun, 06 Nov 2022 21:06:43 +0000,
> Peter Xu <peterx@redhat.com> wrote:
> > 
> > On Sun, Nov 06, 2022 at 08:12:22PM +0000, Marc Zyngier wrote:
> > > Hi Peter,
> > > 
> > > On Sun, 06 Nov 2022 16:22:29 +0000,
> > > Peter Xu <peterx@redhat.com> wrote:
> > > > 
> > > > Hi, Marc,
> > > > 
> > > > On Sun, Nov 06, 2022 at 03:43:17PM +0000, Marc Zyngier wrote:
> > > > > > +Note that the bitmap here is only a backup of the ring structure, and
> > > > > > +normally should only contain a very small amount of dirty pages, which
> > > > > 
> > > > > I don't think we can claim this. It is whatever amount of memory is
> > > > > dirtied outside of a vcpu context, and we shouldn't make any claim
> > > > > regarding the number of dirty pages.
> > > > 
> > > > The thing is the current with-bitmap design assumes that the two logs are
> > > > collected in different windows of migration, while the dirty log is only
> > > > collected after the VM is stopped.  So collecting dirty bitmap and sending
> > > > the dirty pages within the bitmap will be part of the VM downtime.
> > > > 
> > > > It will stop to make sense if the dirty bitmap can contain a large portion
> > > > of the guest memory, because then it'll be simpler to just stop the VM,
> > > > transfer pages, and restart on dest node without any tracking mechanism.
> > > 
> > > Oh, I absolutely agree that the whole vcpu dirty ring makes zero sense
> > > in general. It only makes sense if the source of the dirty pages is
> > > limited to the vcpus, which is literally a corner case. Look at any
> > > real machine, and you'll quickly realise that this isn't the case, and
> > > that DMA *is* a huge source of dirty pages.
> > > 
> > > Here, we're just lucky enough not to have much DMA tracking yet. Once
> > > that happens (and I have it from people doing the actual work that it
> > > *is* happening), you'll realise that the dirty ring story is of very
> > > limited use. So I'd rather drop anything quantitative here, as this is
> > > likely to be wrong.
> > 
> > Is it a must that arm64 needs to track device DMAs using the same dirty
> > tracking interface rather than VFIO or any other interface?
> 
> What does it change? At the end of the day, you want a list of dirty
> pages. How you obtain it is irrelevant.
> 
> > It's
> > definitely not the case for x86, but if it's true for arm64, then could the
> > DMA be spread across all the guest pages?  If it's also true, I really
> > don't know how this will work..
> 
> Of course, all pages can be the target of DMA. It works the same way
> it works for the ITS: you sync the state, you obtain the dirty bits,
> you move on.
> 
> And mimicking what x86 does is really not my concern (if you still
> think that arm64 is just another flavour of x86, stay tuned!  ;-).

I didn't mean so, I should probably stop mentioning x86. :)

I had some sense already from the topics in past few years of kvm forum.
Yeah I'll be looking forward to anything more coming.

> 
> > 
> > We're only syncing the dirty bitmap once right now with the protocol.  If
> > that can cover most of the guest mem, it's same as non-live.  If we sync it
> > periodically, then it's the same as enabling dirty-log alone and the rings
> > are useless.
> 
> I'm glad that you finally accept it: the ring *ARE* useless in the
> general sense. Only limited, CPU-only workloads can make any use of
> the current design. This probably covers a large proportion of what
> the cloud vendors do, but this doesn't work for general situations
> where you have a stream of dirty pages originating outside of the
> CPUs.

The ring itself is really not the thing to blame, IMHO it's a good attempt
to try out de-coupling guest size in regard of dirty tracking from kvm.  It
may not be perfect, but it may still service some of the goals, e.g., at
least it allows the user app to detect per-vcpu information and also since
there's the ring full events we can do something more than before like the
vcpu throttling that China Telecom does with the ring structures.

But I agree it's not a generic enough solution.  Hopefully it'll still
cover some use cases so it's not completely not making sense.

Thanks,

-- 
Peter Xu

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
@ 2022-11-07 14:59               ` Peter Xu
  0 siblings, 0 replies; 66+ messages in thread
From: Peter Xu @ 2022-11-07 14:59 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Gavin Shan, kvmarm, kvmarm, kvm, shuah, catalin.marinas,
	andrew.jones, ajones, bgardon, dmatlack, will, suzuki.poulose,
	alexandru.elisei, pbonzini, seanjc, oliver.upton, zhenyzha,
	shan.gavin

On Mon, Nov 07, 2022 at 09:21:35AM +0000, Marc Zyngier wrote:
> On Sun, 06 Nov 2022 21:06:43 +0000,
> Peter Xu <peterx@redhat.com> wrote:
> > 
> > On Sun, Nov 06, 2022 at 08:12:22PM +0000, Marc Zyngier wrote:
> > > Hi Peter,
> > > 
> > > On Sun, 06 Nov 2022 16:22:29 +0000,
> > > Peter Xu <peterx@redhat.com> wrote:
> > > > 
> > > > Hi, Marc,
> > > > 
> > > > On Sun, Nov 06, 2022 at 03:43:17PM +0000, Marc Zyngier wrote:
> > > > > > +Note that the bitmap here is only a backup of the ring structure, and
> > > > > > +normally should only contain a very small amount of dirty pages, which
> > > > > 
> > > > > I don't think we can claim this. It is whatever amount of memory is
> > > > > dirtied outside of a vcpu context, and we shouldn't make any claim
> > > > > regarding the number of dirty pages.
> > > > 
> > > > The thing is the current with-bitmap design assumes that the two logs are
> > > > collected in different windows of migration, while the dirty log is only
> > > > collected after the VM is stopped.  So collecting dirty bitmap and sending
> > > > the dirty pages within the bitmap will be part of the VM downtime.
> > > > 
> > > > It will stop to make sense if the dirty bitmap can contain a large portion
> > > > of the guest memory, because then it'll be simpler to just stop the VM,
> > > > transfer pages, and restart on dest node without any tracking mechanism.
> > > 
> > > Oh, I absolutely agree that the whole vcpu dirty ring makes zero sense
> > > in general. It only makes sense if the source of the dirty pages is
> > > limited to the vcpus, which is literally a corner case. Look at any
> > > real machine, and you'll quickly realise that this isn't the case, and
> > > that DMA *is* a huge source of dirty pages.
> > > 
> > > Here, we're just lucky enough not to have much DMA tracking yet. Once
> > > that happens (and I have it from people doing the actual work that it
> > > *is* happening), you'll realise that the dirty ring story is of very
> > > limited use. So I'd rather drop anything quantitative here, as this is
> > > likely to be wrong.
> > 
> > Is it a must that arm64 needs to track device DMAs using the same dirty
> > tracking interface rather than VFIO or any other interface?
> 
> What does it change? At the end of the day, you want a list of dirty
> pages. How you obtain it is irrelevant.
> 
> > It's
> > definitely not the case for x86, but if it's true for arm64, then could the
> > DMA be spread across all the guest pages?  If it's also true, I really
> > don't know how this will work..
> 
> Of course, all pages can be the target of DMA. It works the same way
> it works for the ITS: you sync the state, you obtain the dirty bits,
> you move on.
> 
> And mimicking what x86 does is really not my concern (if you still
> think that arm64 is just another flavour of x86, stay tuned!  ;-).

I didn't mean so, I should probably stop mentioning x86. :)

I had some sense already from the topics in past few years of kvm forum.
Yeah I'll be looking forward to anything more coming.

> 
> > 
> > We're only syncing the dirty bitmap once right now with the protocol.  If
> > that can cover most of the guest mem, it's same as non-live.  If we sync it
> > periodically, then it's the same as enabling dirty-log alone and the rings
> > are useless.
> 
> I'm glad that you finally accept it: the ring *ARE* useless in the
> general sense. Only limited, CPU-only workloads can make any use of
> the current design. This probably covers a large proportion of what
> the cloud vendors do, but this doesn't work for general situations
> where you have a stream of dirty pages originating outside of the
> CPUs.

The ring itself is really not the thing to blame, IMHO it's a good attempt
to try out de-coupling guest size in regard of dirty tracking from kvm.  It
may not be perfect, but it may still service some of the goals, e.g., at
least it allows the user app to detect per-vcpu information and also since
there's the ring full events we can do something more than before like the
vcpu throttling that China Telecom does with the ring structures.

But I agree it's not a generic enough solution.  Hopefully it'll still
cover some use cases so it's not completely not making sense.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
  2022-11-07 14:59               ` Peter Xu
@ 2022-11-07 15:30                 ` Marc Zyngier
  -1 siblings, 0 replies; 66+ messages in thread
From: Marc Zyngier @ 2022-11-07 15:30 UTC (permalink / raw)
  To: Peter Xu
  Cc: Gavin Shan, kvmarm, kvmarm, kvm, shuah, catalin.marinas,
	andrew.jones, ajones, bgardon, dmatlack, will, suzuki.poulose,
	alexandru.elisei, pbonzini, seanjc, oliver.upton, zhenyzha,
	shan.gavin

On Mon, 07 Nov 2022 14:59:41 +0000,
Peter Xu <peterx@redhat.com> wrote:
> 
> On Mon, Nov 07, 2022 at 09:21:35AM +0000, Marc Zyngier wrote:
> > On Sun, 06 Nov 2022 21:06:43 +0000,
> > Peter Xu <peterx@redhat.com> wrote:
> > > 
> > > It's definitely not the case for x86, but if it's true for
> > > arm64, then could the DMA be spread across all the guest pages?
> > > If it's also true, I really don't know how this will work..
> > 
> > Of course, all pages can be the target of DMA. It works the same way
> > it works for the ITS: you sync the state, you obtain the dirty bits,
> > you move on.
> > 
> > And mimicking what x86 does is really not my concern (if you still
> > think that arm64 is just another flavour of x86, stay tuned!  ;-).
> 
> I didn't mean so, I should probably stop mentioning x86. :)

Please! I turned off my last x86 development machine over the weekend,
and my x86 laptop is now a glorified window manager... ;-)

> I had some sense already from the topics in past few years of kvm forum.
> Yeah I'll be looking forward to anything more coming.

Yup. Hopefully we won't have to wait for too long to see this stuff (I
had good discussions on the subject at both KF and Plumbers in Dublin
earlier this year).

> > > We're only syncing the dirty bitmap once right now with the
> > > protocol.  If that can cover most of the guest mem, it's same as
> > > non-live.  If we sync it periodically, then it's the same as
> > > enabling dirty-log alone and the rings are useless.
> > 
> > I'm glad that you finally accept it: the ring *ARE* useless in the
> > general sense. Only limited, CPU-only workloads can make any use of
> > the current design. This probably covers a large proportion of what
> > the cloud vendors do, but this doesn't work for general situations
> > where you have a stream of dirty pages originating outside of the
> > CPUs.
> 
> The ring itself is really not the thing to blame, IMHO it's a good attempt
> to try out de-coupling guest size in regard of dirty tracking from kvm.  It
> may not be perfect, but it may still service some of the goals, e.g., at
> least it allows the user app to detect per-vcpu information and also since
> there's the ring full events we can do something more than before like the
> vcpu throttling that China Telecom does with the ring structures.

I don't disagree with that: for vcpu-based workloads, the rings are
great and doing their job. It's just that there is another side to
this problem, and you'll have to deal with both eventually. We're just
ahead of the curve here...

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
@ 2022-11-07 15:30                 ` Marc Zyngier
  0 siblings, 0 replies; 66+ messages in thread
From: Marc Zyngier @ 2022-11-07 15:30 UTC (permalink / raw)
  To: Peter Xu
  Cc: kvm, catalin.marinas, andrew.jones, dmatlack, will, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, shuah, kvmarm, ajones

On Mon, 07 Nov 2022 14:59:41 +0000,
Peter Xu <peterx@redhat.com> wrote:
> 
> On Mon, Nov 07, 2022 at 09:21:35AM +0000, Marc Zyngier wrote:
> > On Sun, 06 Nov 2022 21:06:43 +0000,
> > Peter Xu <peterx@redhat.com> wrote:
> > > 
> > > It's definitely not the case for x86, but if it's true for
> > > arm64, then could the DMA be spread across all the guest pages?
> > > If it's also true, I really don't know how this will work..
> > 
> > Of course, all pages can be the target of DMA. It works the same way
> > it works for the ITS: you sync the state, you obtain the dirty bits,
> > you move on.
> > 
> > And mimicking what x86 does is really not my concern (if you still
> > think that arm64 is just another flavour of x86, stay tuned!  ;-).
> 
> I didn't mean so, I should probably stop mentioning x86. :)

Please! I turned off my last x86 development machine over the weekend,
and my x86 laptop is now a glorified window manager... ;-)

> I had some sense already from the topics in past few years of kvm forum.
> Yeah I'll be looking forward to anything more coming.

Yup. Hopefully we won't have to wait for too long to see this stuff (I
had good discussions on the subject at both KF and Plumbers in Dublin
earlier this year).

> > > We're only syncing the dirty bitmap once right now with the
> > > protocol.  If that can cover most of the guest mem, it's same as
> > > non-live.  If we sync it periodically, then it's the same as
> > > enabling dirty-log alone and the rings are useless.
> > 
> > I'm glad that you finally accept it: the ring *ARE* useless in the
> > general sense. Only limited, CPU-only workloads can make any use of
> > the current design. This probably covers a large proportion of what
> > the cloud vendors do, but this doesn't work for general situations
> > where you have a stream of dirty pages originating outside of the
> > CPUs.
> 
> The ring itself is really not the thing to blame, IMHO it's a good attempt
> to try out de-coupling guest size in regard of dirty tracking from kvm.  It
> may not be perfect, but it may still service some of the goals, e.g., at
> least it allows the user app to detect per-vcpu information and also since
> there's the ring full events we can do something more than before like the
> vcpu throttling that China Telecom does with the ring structures.

I don't disagree with that: for vcpu-based workloads, the rings are
great and doing their job. It's just that there is another side to
this problem, and you'll have to deal with both eventually. We're just
ahead of the curve here...

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
  2022-11-04 23:40   ` Gavin Shan
@ 2022-11-07 16:05     ` Sean Christopherson
  -1 siblings, 0 replies; 66+ messages in thread
From: Sean Christopherson @ 2022-11-07 16:05 UTC (permalink / raw)
  To: Gavin Shan
  Cc: maz, kvm, catalin.marinas, andrew.jones, dmatlack, will,
	shan.gavin, bgardon, kvmarm, pbonzini, zhenyzha, shuah, kvmarm,
	ajones

On Sat, Nov 05, 2022, Gavin Shan wrote:
> diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
> index fecbb7d75ad2..758679724447 100644
> --- a/virt/kvm/dirty_ring.c
> +++ b/virt/kvm/dirty_ring.c
> @@ -21,6 +21,16 @@ u32 kvm_dirty_ring_get_rsvd_entries(void)
>  	return KVM_DIRTY_RING_RSVD_ENTRIES + kvm_cpu_dirty_log_size();
>  }
>  
> +bool kvm_use_dirty_bitmap(struct kvm *kvm)
> +{

	lockdep_assert_held(&kvm->slots_lock);

To guard against accessing kvm->dirty_ring_with_bitmap without holding slots_lock.

> +	return !kvm->dirty_ring_size || kvm->dirty_ring_with_bitmap;
> +}
> +
> @@ -4588,6 +4594,31 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>  			return -EINVAL;
>  
>  		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
> +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP: {
> +		struct kvm_memslots *slots;
> +		int r = -EINVAL;
> +
> +		if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
> +		    !kvm->dirty_ring_size)
> +			return r;
> +
> +		mutex_lock(&kvm->slots_lock);
> +
> +		slots = kvm_memslots(kvm);

Sadly, this needs to iterate over all possible memslots thanks to x86's SMM
address space.  Might be worth adding a separate helper (that's local to kvm_main.c
to discourage use), e.g. 

static bool kvm_are_all_memslots_empty(struct kvm *kvm)
{
	int i;

	lockdep_assert_held(&kvm->slots_lock);

	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
		if (!kvm_memslots_empty(__kvm_memslots(kvm, i)))
			return false;
	}

	return true;
}

> +
> +		/*
> +		 * Avoid a race between memslot creation and enabling the ring +
> +		 * bitmap capability to guarantee that no memslots have been
> +		 * created without a bitmap.

Nit, it's not just enabling, the below also allows disabling the bitmap.  The
enabling case is definitely the most interesting, but the above wording makes it
sound like the enabling case is the only thing that being given protection.  That's
kinda true since KVM frees bitmaps without checking kvm_use_dirty_bitmap(), but
that's not a strict requirement.

And there's no race required, e.g. without this check userspace could simply create
a memslot and then toggle on the capability.  Acquiring slots_lock above is what
guards against races.

Might also be worth alluding to the alternative solution of allocating the bitmap
for all memslots here, e.g. something like

		/*
		 * For simplicity, allow toggling ring+bitmap if and only if
		 * there are no memslots, e.g. to ensure all memslots allocate a
		 * bitmap after the capability is enabled.
		 */

> +		 */
> +		if (kvm_memslots_empty(slots)) {
> +			kvm->dirty_ring_with_bitmap = cap->args[0];
> +			r = 0;
> +		}
> +
> +		mutex_unlock(&kvm->slots_lock);
> +		return r;
> +	}
>  	default:
>  		return kvm_vm_ioctl_enable_cap(kvm, cap);
>  	}
> -- 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
@ 2022-11-07 16:05     ` Sean Christopherson
  0 siblings, 0 replies; 66+ messages in thread
From: Sean Christopherson @ 2022-11-07 16:05 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kvmarm, kvmarm, kvm, shuah, catalin.marinas, andrew.jones,
	ajones, bgardon, dmatlack, will, suzuki.poulose,
	alexandru.elisei, pbonzini, maz, peterx, oliver.upton, zhenyzha,
	shan.gavin

On Sat, Nov 05, 2022, Gavin Shan wrote:
> diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
> index fecbb7d75ad2..758679724447 100644
> --- a/virt/kvm/dirty_ring.c
> +++ b/virt/kvm/dirty_ring.c
> @@ -21,6 +21,16 @@ u32 kvm_dirty_ring_get_rsvd_entries(void)
>  	return KVM_DIRTY_RING_RSVD_ENTRIES + kvm_cpu_dirty_log_size();
>  }
>  
> +bool kvm_use_dirty_bitmap(struct kvm *kvm)
> +{

	lockdep_assert_held(&kvm->slots_lock);

To guard against accessing kvm->dirty_ring_with_bitmap without holding slots_lock.

> +	return !kvm->dirty_ring_size || kvm->dirty_ring_with_bitmap;
> +}
> +
> @@ -4588,6 +4594,31 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>  			return -EINVAL;
>  
>  		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
> +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP: {
> +		struct kvm_memslots *slots;
> +		int r = -EINVAL;
> +
> +		if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
> +		    !kvm->dirty_ring_size)
> +			return r;
> +
> +		mutex_lock(&kvm->slots_lock);
> +
> +		slots = kvm_memslots(kvm);

Sadly, this needs to iterate over all possible memslots thanks to x86's SMM
address space.  Might be worth adding a separate helper (that's local to kvm_main.c
to discourage use), e.g. 

static bool kvm_are_all_memslots_empty(struct kvm *kvm)
{
	int i;

	lockdep_assert_held(&kvm->slots_lock);

	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
		if (!kvm_memslots_empty(__kvm_memslots(kvm, i)))
			return false;
	}

	return true;
}

> +
> +		/*
> +		 * Avoid a race between memslot creation and enabling the ring +
> +		 * bitmap capability to guarantee that no memslots have been
> +		 * created without a bitmap.

Nit, it's not just enabling, the below also allows disabling the bitmap.  The
enabling case is definitely the most interesting, but the above wording makes it
sound like the enabling case is the only thing that being given protection.  That's
kinda true since KVM frees bitmaps without checking kvm_use_dirty_bitmap(), but
that's not a strict requirement.

And there's no race required, e.g. without this check userspace could simply create
a memslot and then toggle on the capability.  Acquiring slots_lock above is what
guards against races.

Might also be worth alluding to the alternative solution of allocating the bitmap
for all memslots here, e.g. something like

		/*
		 * For simplicity, allow toggling ring+bitmap if and only if
		 * there are no memslots, e.g. to ensure all memslots allocate a
		 * bitmap after the capability is enabled.
		 */

> +		 */
> +		if (kvm_memslots_empty(slots)) {
> +			kvm->dirty_ring_with_bitmap = cap->args[0];
> +			r = 0;
> +		}
> +
> +		mutex_unlock(&kvm->slots_lock);
> +		return r;
> +	}
>  	default:
>  		return kvm_vm_ioctl_enable_cap(kvm, cap);
>  	}
> -- 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
  2022-11-07 11:33       ` Marc Zyngier
@ 2022-11-07 23:53         ` Gavin Shan
  -1 siblings, 0 replies; 66+ messages in thread
From: Gavin Shan @ 2022-11-07 23:53 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: shuah, kvm, catalin.marinas, andrew.jones, dmatlack, shan.gavin,
	bgardon, kvmarm, pbonzini, zhenyzha, will, kvmarm, ajones

Hi Marc,

On 11/7/22 7:33 PM, Marc Zyngier wrote:
> On Mon, 07 Nov 2022 10:45:34 +0000,
> Gavin Shan <gshan@redhat.com> wrote:
>> On 11/5/22 7:40 AM, Gavin Shan wrote:
>>> ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
>>> enabled. It's conflicting with that ring-based dirty page tracking always
>>> requires a running VCPU context.
>>>
>>> Introduce a new flavor of dirty ring that requires the use of both VCPU
>>> dirty rings and a dirty bitmap. The expectation is that for non-VCPU
>>> sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
>>> the dirty bitmap. Userspace should scan the dirty bitmap before migrating
>>> the VM to the target.
>>>
>>> Use an additional capability to advertise this behavior. The newly added
>>> capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
>>> KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
>>> capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.
>>>
>>> Suggested-by: Marc Zyngier <maz@kernel.org>
>>> Suggested-by: Peter Xu <peterx@redhat.com>
>>> Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
>>> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
>>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>>> Acked-by: Peter Xu <peterx@redhat.com>
>>> ---
>>>    Documentation/virt/kvm/api.rst | 33 ++++++++++++++++++-----
>>>    include/linux/kvm_dirty_ring.h |  7 +++++
>>>    include/linux/kvm_host.h       |  1 +
>>>    include/uapi/linux/kvm.h       |  1 +
>>>    virt/kvm/Kconfig               |  8 ++++++
>>>    virt/kvm/dirty_ring.c          | 10 +++++++
>>>    virt/kvm/kvm_main.c            | 49 +++++++++++++++++++++++++++-------
>>>    7 files changed, 93 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst

[...]

>>
>> In order to speed up the review and reduce unnecessary respins. After
>> collecting comments on PATCH[v8 3/7] from Marc and Peter, I would change
>> above description as below. Could you please confirm it looks good to you?
>>
>> In the 4th paragraph, the words starting from "Collecting the dirty bitmap..."
>> to the end, was previously suggested by Oliver, even Marc suggested to avoid
>> mentioning "migration".
>>
>>    After enabling the dirty rings, the userspace needs to detect the
>>    capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring
>>    structures need to be backed by per-slot bitmaps. With this capability
> 
> s/need/can/. If there was a *need*, it should happen automatically
> without user intervention.
> 

Ok. s/need to/can in next revision :)

>>    advertised, it means the architecture can dirty guest pages without
>>    vcpu/ring context, so that some of the dirty information will still be
>>    maintained in the bitmap structure. KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
>>    can't be enabled if the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL
>>    hasn't been enabled, or any memslot has been existing.
>>
>>    Note that the bitmap here is only a backup of the ring structure. The
>>    use of the ring and bitmap combination is only beneficial if there is
>>    only a very small amount of memory that is dirtied out of vcpu/ring
>>    context. Otherwise, the stand-alone per-slot bitmap mechanism needs to
>>    be considered.
>>
>>    To collect dirty bits in the backup bitmap, userspace can use the same
>>    KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG isn't needed as long as all
>>    the generation of the dirty bits is done in a single pass. Collecting
>>    the dirty bitmap should be the very last thing that the VMM does before
>>    transmitting state to the target VM. VMM needs to ensure that the dirty
>>    state is final and avoid missing dirty pages from another ioctl ordered
>>    after the bitmap collection.
> 
> I would replace "transmitting state to the target VM" with
> "considering the state as complete", as I still object to casting this
> API into the migration mold. People use this stuff more far more than
> migration (checkpointing, for example).
> 

Fair enough. I will change accordingly in next revision.

>>
>>    NOTE: One example of using the backup bitmap is saving arm64 vgic/its
>>    tables through KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on
>>    KVM device "kvm-arm-vgic-its" during VM's migration.
> 
> Same remark about migration.
> 

Ok. I will change this paragraph as below in next revision, to avoid mentioning
"migration".

   NOTE: One example of using the backup bitmap is saving arm64 vgic/its
   tables through KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on
   KVM device "kvm-arm-vgic-its" when dirty ring is enabled.

Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
@ 2022-11-07 23:53         ` Gavin Shan
  0 siblings, 0 replies; 66+ messages in thread
From: Gavin Shan @ 2022-11-07 23:53 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, kvm, catalin.marinas, andrew.jones, will, shan.gavin,
	bgardon, dmatlack, pbonzini, zhenyzha, shuah, kvmarm, ajones

Hi Marc,

On 11/7/22 7:33 PM, Marc Zyngier wrote:
> On Mon, 07 Nov 2022 10:45:34 +0000,
> Gavin Shan <gshan@redhat.com> wrote:
>> On 11/5/22 7:40 AM, Gavin Shan wrote:
>>> ARM64 needs to dirty memory outside of a VCPU context when VGIC/ITS is
>>> enabled. It's conflicting with that ring-based dirty page tracking always
>>> requires a running VCPU context.
>>>
>>> Introduce a new flavor of dirty ring that requires the use of both VCPU
>>> dirty rings and a dirty bitmap. The expectation is that for non-VCPU
>>> sources of dirty memory (such as the VGIC/ITS on arm64), KVM writes to
>>> the dirty bitmap. Userspace should scan the dirty bitmap before migrating
>>> the VM to the target.
>>>
>>> Use an additional capability to advertise this behavior. The newly added
>>> capability (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) can't be enabled before
>>> KVM_CAP_DIRTY_LOG_RING_ACQ_REL on ARM64. In this way, the newly added
>>> capability is treated as an extension of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.
>>>
>>> Suggested-by: Marc Zyngier <maz@kernel.org>
>>> Suggested-by: Peter Xu <peterx@redhat.com>
>>> Co-developed-by: Oliver Upton <oliver.upton@linux.dev>
>>> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
>>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>>> Acked-by: Peter Xu <peterx@redhat.com>
>>> ---
>>>    Documentation/virt/kvm/api.rst | 33 ++++++++++++++++++-----
>>>    include/linux/kvm_dirty_ring.h |  7 +++++
>>>    include/linux/kvm_host.h       |  1 +
>>>    include/uapi/linux/kvm.h       |  1 +
>>>    virt/kvm/Kconfig               |  8 ++++++
>>>    virt/kvm/dirty_ring.c          | 10 +++++++
>>>    virt/kvm/kvm_main.c            | 49 +++++++++++++++++++++++++++-------
>>>    7 files changed, 93 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst

[...]

>>
>> In order to speed up the review and reduce unnecessary respins. After
>> collecting comments on PATCH[v8 3/7] from Marc and Peter, I would change
>> above description as below. Could you please confirm it looks good to you?
>>
>> In the 4th paragraph, the words starting from "Collecting the dirty bitmap..."
>> to the end, was previously suggested by Oliver, even Marc suggested to avoid
>> mentioning "migration".
>>
>>    After enabling the dirty rings, the userspace needs to detect the
>>    capability of KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP to see whether the ring
>>    structures need to be backed by per-slot bitmaps. With this capability
> 
> s/need/can/. If there was a *need*, it should happen automatically
> without user intervention.
> 

Ok. s/need to/can in next revision :)

>>    advertised, it means the architecture can dirty guest pages without
>>    vcpu/ring context, so that some of the dirty information will still be
>>    maintained in the bitmap structure. KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
>>    can't be enabled if the capability of KVM_CAP_DIRTY_LOG_RING_ACQ_REL
>>    hasn't been enabled, or any memslot has been existing.
>>
>>    Note that the bitmap here is only a backup of the ring structure. The
>>    use of the ring and bitmap combination is only beneficial if there is
>>    only a very small amount of memory that is dirtied out of vcpu/ring
>>    context. Otherwise, the stand-alone per-slot bitmap mechanism needs to
>>    be considered.
>>
>>    To collect dirty bits in the backup bitmap, userspace can use the same
>>    KVM_GET_DIRTY_LOG ioctl. KVM_CLEAR_DIRTY_LOG isn't needed as long as all
>>    the generation of the dirty bits is done in a single pass. Collecting
>>    the dirty bitmap should be the very last thing that the VMM does before
>>    transmitting state to the target VM. VMM needs to ensure that the dirty
>>    state is final and avoid missing dirty pages from another ioctl ordered
>>    after the bitmap collection.
> 
> I would replace "transmitting state to the target VM" with
> "considering the state as complete", as I still object to casting this
> API into the migration mold. People use this stuff more far more than
> migration (checkpointing, for example).
> 

Fair enough. I will change accordingly in next revision.

>>
>>    NOTE: One example of using the backup bitmap is saving arm64 vgic/its
>>    tables through KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on
>>    KVM device "kvm-arm-vgic-its" during VM's migration.
> 
> Same remark about migration.
> 

Ok. I will change this paragraph as below in next revision, to avoid mentioning
"migration".

   NOTE: One example of using the backup bitmap is saving arm64 vgic/its
   tables through KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} command on
   KVM device "kvm-arm-vgic-its" when dirty ring is enabled.

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
  2022-11-07 16:05     ` Sean Christopherson
@ 2022-11-08  0:44       ` Gavin Shan
  -1 siblings, 0 replies; 66+ messages in thread
From: Gavin Shan @ 2022-11-08  0:44 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvmarm, kvmarm, kvm, shuah, catalin.marinas, andrew.jones,
	ajones, bgardon, dmatlack, will, suzuki.poulose,
	alexandru.elisei, pbonzini, maz, peterx, oliver.upton, zhenyzha,
	shan.gavin

Hi Sean,

On 11/8/22 12:05 AM, Sean Christopherson wrote:
> On Sat, Nov 05, 2022, Gavin Shan wrote:
>> diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
>> index fecbb7d75ad2..758679724447 100644
>> --- a/virt/kvm/dirty_ring.c
>> +++ b/virt/kvm/dirty_ring.c
>> @@ -21,6 +21,16 @@ u32 kvm_dirty_ring_get_rsvd_entries(void)
>>   	return KVM_DIRTY_RING_RSVD_ENTRIES + kvm_cpu_dirty_log_size();
>>   }
>>   
>> +bool kvm_use_dirty_bitmap(struct kvm *kvm)
>> +{
> 
> 	lockdep_assert_held(&kvm->slots_lock);
> 
> To guard against accessing kvm->dirty_ring_with_bitmap without holding slots_lock.
> 

Right, it makes sense to do so :)

>> +	return !kvm->dirty_ring_size || kvm->dirty_ring_with_bitmap;
>> +}
>> +
>> @@ -4588,6 +4594,31 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>>   			return -EINVAL;
>>   
>>   		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
>> +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP: {
>> +		struct kvm_memslots *slots;
>> +		int r = -EINVAL;
>> +
>> +		if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
>> +		    !kvm->dirty_ring_size)
>> +			return r;
>> +
>> +		mutex_lock(&kvm->slots_lock);
>> +
>> +		slots = kvm_memslots(kvm);
> 
> Sadly, this needs to iterate over all possible memslots thanks to x86's SMM
> address space.  Might be worth adding a separate helper (that's local to kvm_main.c
> to discourage use), e.g.
> 
> static bool kvm_are_all_memslots_empty(struct kvm *kvm)
> {
> 	int i;
> 
> 	lockdep_assert_held(&kvm->slots_lock);
> 
> 	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
> 		if (!kvm_memslots_empty(__kvm_memslots(kvm, i)))
> 			return false;
> 	}
> 
> 	return true;
> }
> 

Strictly speaking, x86 doesn't reach to this point because we bail
on !CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP. However, it's still nice
to make the check complete. Besides, the lockdep_assert_held() is
duplicate to that one inside __kvm_memslots(). I would move the whole
hunk of code into kvm_vm_ioctl_enable_dirty_log_ring_with_bitmap().
In this way, kvm_vm_ioctl_enable_cap_generic() looks clean at least.

>> +
>> +		/*
>> +		 * Avoid a race between memslot creation and enabling the ring +
>> +		 * bitmap capability to guarantee that no memslots have been
>> +		 * created without a bitmap.
> 
> Nit, it's not just enabling, the below also allows disabling the bitmap.  The
> enabling case is definitely the most interesting, but the above wording makes it
> sound like the enabling case is the only thing that being given protection.  That's
> kinda true since KVM frees bitmaps without checking kvm_use_dirty_bitmap(), but
> that's not a strict requirement.
> 
> And there's no race required, e.g. without this check userspace could simply create
> a memslot and then toggle on the capability.  Acquiring slots_lock above is what
> guards against races.
> 
> Might also be worth alluding to the alternative solution of allocating the bitmap
> for all memslots here, e.g. something like
> 
> 		/*
> 		 * For simplicity, allow toggling ring+bitmap if and only if
> 		 * there are no memslots, e.g. to ensure all memslots allocate a
> 		 * bitmap after the capability is enabled.
> 		 */
> 

Frankly, I don't expect the capability to be disabled. Similar to KVM_CAP_DIRTY_LOG_RING
or KVM_CAP_DIRTY_LOG_RING_ACQ_REL, it would a one-shot capability and only enablement is
allowed. The disablement was suggested by Oliver without providing a clarify, even I dropped
it for several times. I would like to see if there is particular reason why Oliver want
to disable the capability.

     kvm->dirty_ring_with_bitmap = cap->args[0];

If Oliver agrees that the capability needn't to be disabled. The whole chunk of
code can be squeezed into kvm_vm_ioctl_enable_dirty_log_ring_with_bitmap() to
make kvm_vm_ioctl_enable_cap_generic() a bit clean, as I said above.

Sean and Oliver, could you help to confirm if the changes look good to you? :)

     static int kvm_vm_ioctl_enable_dirty_log_ring_with_bitmap(struct kvm *kvm)
     {
         int i, r = 0;

         if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
             !kvm->dirty_ring_size)
             return -EINVAL;

         mutex_lock(&kvm->slots_lock);

         /* We only allow it to set once */
         if (kvm->dirty_ring_with_bitmap) {
             r = -EINVAL;
             goto out_unlock;
         }

         /*
          * Avoid a race between memslot creation and enabling the ring +
          * bitmap capability to guarantee that no memslots have been
          * created without a bitmap.
          */
         for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
             if (!kvm_memslots_empty(__kvm_memslots(kvm, i))) {
                 r = -EINVAL;
                 goto out_unlock;
             }
         }

         kvm->dirty_ring_with_bitmap = true;

     out_unlock:
         mutex_unlock(&kvm->slots_lock);

         return r;
     }

     static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
                                                struct kvm_enable_cap *cap)
     {
         switch (cap->cap) {
           :
         case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
             return kvm_vm_ioctl_enable_dirty_log_ring_with_bitmap(kvm);
         default:
             return kvm_vm_ioctl_enable_cap(kvm, cap);
         }
     }


>> +		 */
>> +		if (kvm_memslots_empty(slots)) {
>> +			kvm->dirty_ring_with_bitmap = cap->args[0];
>> +			r = 0;
>> +		}
>> +
>> +		mutex_unlock(&kvm->slots_lock);
>> +		return r;
>> +	}
>>   	default:
>>   		return kvm_vm_ioctl_enable_cap(kvm, cap);
>>   	}
>> -- 
> 

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
@ 2022-11-08  0:44       ` Gavin Shan
  0 siblings, 0 replies; 66+ messages in thread
From: Gavin Shan @ 2022-11-08  0:44 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: maz, kvm, catalin.marinas, andrew.jones, dmatlack, will,
	shan.gavin, bgardon, kvmarm, pbonzini, zhenyzha, shuah, kvmarm,
	ajones

Hi Sean,

On 11/8/22 12:05 AM, Sean Christopherson wrote:
> On Sat, Nov 05, 2022, Gavin Shan wrote:
>> diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
>> index fecbb7d75ad2..758679724447 100644
>> --- a/virt/kvm/dirty_ring.c
>> +++ b/virt/kvm/dirty_ring.c
>> @@ -21,6 +21,16 @@ u32 kvm_dirty_ring_get_rsvd_entries(void)
>>   	return KVM_DIRTY_RING_RSVD_ENTRIES + kvm_cpu_dirty_log_size();
>>   }
>>   
>> +bool kvm_use_dirty_bitmap(struct kvm *kvm)
>> +{
> 
> 	lockdep_assert_held(&kvm->slots_lock);
> 
> To guard against accessing kvm->dirty_ring_with_bitmap without holding slots_lock.
> 

Right, it makes sense to do so :)

>> +	return !kvm->dirty_ring_size || kvm->dirty_ring_with_bitmap;
>> +}
>> +
>> @@ -4588,6 +4594,31 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>>   			return -EINVAL;
>>   
>>   		return kvm_vm_ioctl_enable_dirty_log_ring(kvm, cap->args[0]);
>> +	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP: {
>> +		struct kvm_memslots *slots;
>> +		int r = -EINVAL;
>> +
>> +		if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
>> +		    !kvm->dirty_ring_size)
>> +			return r;
>> +
>> +		mutex_lock(&kvm->slots_lock);
>> +
>> +		slots = kvm_memslots(kvm);
> 
> Sadly, this needs to iterate over all possible memslots thanks to x86's SMM
> address space.  Might be worth adding a separate helper (that's local to kvm_main.c
> to discourage use), e.g.
> 
> static bool kvm_are_all_memslots_empty(struct kvm *kvm)
> {
> 	int i;
> 
> 	lockdep_assert_held(&kvm->slots_lock);
> 
> 	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
> 		if (!kvm_memslots_empty(__kvm_memslots(kvm, i)))
> 			return false;
> 	}
> 
> 	return true;
> }
> 

Strictly speaking, x86 doesn't reach to this point because we bail
on !CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP. However, it's still nice
to make the check complete. Besides, the lockdep_assert_held() is
duplicate to that one inside __kvm_memslots(). I would move the whole
hunk of code into kvm_vm_ioctl_enable_dirty_log_ring_with_bitmap().
In this way, kvm_vm_ioctl_enable_cap_generic() looks clean at least.

>> +
>> +		/*
>> +		 * Avoid a race between memslot creation and enabling the ring +
>> +		 * bitmap capability to guarantee that no memslots have been
>> +		 * created without a bitmap.
> 
> Nit, it's not just enabling, the below also allows disabling the bitmap.  The
> enabling case is definitely the most interesting, but the above wording makes it
> sound like the enabling case is the only thing that being given protection.  That's
> kinda true since KVM frees bitmaps without checking kvm_use_dirty_bitmap(), but
> that's not a strict requirement.
> 
> And there's no race required, e.g. without this check userspace could simply create
> a memslot and then toggle on the capability.  Acquiring slots_lock above is what
> guards against races.
> 
> Might also be worth alluding to the alternative solution of allocating the bitmap
> for all memslots here, e.g. something like
> 
> 		/*
> 		 * For simplicity, allow toggling ring+bitmap if and only if
> 		 * there are no memslots, e.g. to ensure all memslots allocate a
> 		 * bitmap after the capability is enabled.
> 		 */
> 

Frankly, I don't expect the capability to be disabled. Similar to KVM_CAP_DIRTY_LOG_RING
or KVM_CAP_DIRTY_LOG_RING_ACQ_REL, it would a one-shot capability and only enablement is
allowed. The disablement was suggested by Oliver without providing a clarify, even I dropped
it for several times. I would like to see if there is particular reason why Oliver want
to disable the capability.

     kvm->dirty_ring_with_bitmap = cap->args[0];

If Oliver agrees that the capability needn't to be disabled. The whole chunk of
code can be squeezed into kvm_vm_ioctl_enable_dirty_log_ring_with_bitmap() to
make kvm_vm_ioctl_enable_cap_generic() a bit clean, as I said above.

Sean and Oliver, could you help to confirm if the changes look good to you? :)

     static int kvm_vm_ioctl_enable_dirty_log_ring_with_bitmap(struct kvm *kvm)
     {
         int i, r = 0;

         if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
             !kvm->dirty_ring_size)
             return -EINVAL;

         mutex_lock(&kvm->slots_lock);

         /* We only allow it to set once */
         if (kvm->dirty_ring_with_bitmap) {
             r = -EINVAL;
             goto out_unlock;
         }

         /*
          * Avoid a race between memslot creation and enabling the ring +
          * bitmap capability to guarantee that no memslots have been
          * created without a bitmap.
          */
         for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
             if (!kvm_memslots_empty(__kvm_memslots(kvm, i))) {
                 r = -EINVAL;
                 goto out_unlock;
             }
         }

         kvm->dirty_ring_with_bitmap = true;

     out_unlock:
         mutex_unlock(&kvm->slots_lock);

         return r;
     }

     static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
                                                struct kvm_enable_cap *cap)
     {
         switch (cap->cap) {
           :
         case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP:
             return kvm_vm_ioctl_enable_dirty_log_ring_with_bitmap(kvm);
         default:
             return kvm_vm_ioctl_enable_cap(kvm, cap);
         }
     }


>> +		 */
>> +		if (kvm_memslots_empty(slots)) {
>> +			kvm->dirty_ring_with_bitmap = cap->args[0];
>> +			r = 0;
>> +		}
>> +
>> +		mutex_unlock(&kvm->slots_lock);
>> +		return r;
>> +	}
>>   	default:
>>   		return kvm_vm_ioctl_enable_cap(kvm, cap);
>>   	}
>> -- 
> 

Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
  2022-11-08  0:44       ` Gavin Shan
@ 2022-11-08  1:13         ` Oliver Upton
  -1 siblings, 0 replies; 66+ messages in thread
From: Oliver Upton @ 2022-11-08  1:13 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Sean Christopherson, kvmarm, kvmarm, kvm, shuah, catalin.marinas,
	andrew.jones, ajones, bgardon, dmatlack, will, suzuki.poulose,
	alexandru.elisei, pbonzini, maz, peterx, zhenyzha, shan.gavin

On Tue, Nov 08, 2022 at 08:44:52AM +0800, Gavin Shan wrote:
> Frankly, I don't expect the capability to be disabled. Similar to KVM_CAP_DIRTY_LOG_RING
> or KVM_CAP_DIRTY_LOG_RING_ACQ_REL, it would a one-shot capability and only enablement is
> allowed. The disablement was suggested by Oliver without providing a clarify, even I dropped
> it for several times. I would like to see if there is particular reason why Oliver want
> to disable the capability.
> 
>     kvm->dirty_ring_with_bitmap = cap->args[0];
> 
> If Oliver agrees that the capability needn't to be disabled. The whole chunk of
> code can be squeezed into kvm_vm_ioctl_enable_dirty_log_ring_with_bitmap() to
> make kvm_vm_ioctl_enable_cap_generic() a bit clean, as I said above.

Sorry, I don't believe there is much use in disabling the cap, and
really that hunk just came from lazily matching the neigbhoring caps
when sketching out some suggestions. Oops!

> Sean and Oliver, could you help to confirm if the changes look good to you? :)
> 
>     static int kvm_vm_ioctl_enable_dirty_log_ring_with_bitmap(struct kvm *kvm)

This function name is ridiculously long...

>     {
>         int i, r = 0;
> 
>         if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
>             !kvm->dirty_ring_size)
>             return -EINVAL;
> 
>         mutex_lock(&kvm->slots_lock);
> 
>         /* We only allow it to set once */
>         if (kvm->dirty_ring_with_bitmap) {
>             r = -EINVAL;
>             goto out_unlock;
>         }

I don't believe this check is strictly necessary. Something similar to
this makes sense with caps that take a numeric value (like
KVM_CAP_DIRTY_LOG_RING), but this one is a one-way boolean.

> 
>         /*
>          * Avoid a race between memslot creation and enabling the ring +
>          * bitmap capability to guarantee that no memslots have been
>          * created without a bitmap.

You'll want to pick up Sean's suggestion on the comment which, again, I
drafted this in haste :-)

>          */
>         for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
>             if (!kvm_memslots_empty(__kvm_memslots(kvm, i))) {
>                 r = -EINVAL;
>                 goto out_unlock;
>             }
>         }

I'd much prefer you take Sean's suggestion and just create a helper to
test that all memslots are empty. You avoid the insanely long function
name and avoid the use of a goto statement. That is to say, leave the
rest of the implementation inline in kvm_vm_ioctl_enable_cap_generic()

static bool kvm_are_all_memslots_empty(struct kvm *kvm)
{
	int i;

	lockdep_assert_held(&kvm->slots_lock);

	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
		if (!kvm_memslots_empty(__kvm_memslots(kvm, i)))
			return false;
	}

	return true;
}

static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
					   struct kvm_enable_cap *cap)
{
	switch (cap->cap) {

[...]

	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP: {
		int r = -EINVAL;

		if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
		    !kvm->dirty_ring_size)
		    	return r;

		mutex_lock(&kvm->slots_lock);

		/*
		 * For simplicity, allow enabling ring+bitmap if and only if
		 * there are no memslots, e.g. to ensure all memslots allocate a
		 * bitmap after the capability is enabled.
		 */
		if (kvm_are_all_memslots_empty(kvm)) {
			kvm->dirty_ring_with_bitmap = true;
			r = 0;
		}

		mutex_unlock(&kvm->slots_lock);
		return r;
	}

Hmm?

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
@ 2022-11-08  1:13         ` Oliver Upton
  0 siblings, 0 replies; 66+ messages in thread
From: Oliver Upton @ 2022-11-08  1:13 UTC (permalink / raw)
  To: Gavin Shan
  Cc: maz, kvm, bgardon, andrew.jones, dmatlack, will, shan.gavin,
	catalin.marinas, kvmarm, pbonzini, zhenyzha, shuah, kvmarm,
	ajones

On Tue, Nov 08, 2022 at 08:44:52AM +0800, Gavin Shan wrote:
> Frankly, I don't expect the capability to be disabled. Similar to KVM_CAP_DIRTY_LOG_RING
> or KVM_CAP_DIRTY_LOG_RING_ACQ_REL, it would a one-shot capability and only enablement is
> allowed. The disablement was suggested by Oliver without providing a clarify, even I dropped
> it for several times. I would like to see if there is particular reason why Oliver want
> to disable the capability.
> 
>     kvm->dirty_ring_with_bitmap = cap->args[0];
> 
> If Oliver agrees that the capability needn't to be disabled. The whole chunk of
> code can be squeezed into kvm_vm_ioctl_enable_dirty_log_ring_with_bitmap() to
> make kvm_vm_ioctl_enable_cap_generic() a bit clean, as I said above.

Sorry, I don't believe there is much use in disabling the cap, and
really that hunk just came from lazily matching the neigbhoring caps
when sketching out some suggestions. Oops!

> Sean and Oliver, could you help to confirm if the changes look good to you? :)
> 
>     static int kvm_vm_ioctl_enable_dirty_log_ring_with_bitmap(struct kvm *kvm)

This function name is ridiculously long...

>     {
>         int i, r = 0;
> 
>         if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
>             !kvm->dirty_ring_size)
>             return -EINVAL;
> 
>         mutex_lock(&kvm->slots_lock);
> 
>         /* We only allow it to set once */
>         if (kvm->dirty_ring_with_bitmap) {
>             r = -EINVAL;
>             goto out_unlock;
>         }

I don't believe this check is strictly necessary. Something similar to
this makes sense with caps that take a numeric value (like
KVM_CAP_DIRTY_LOG_RING), but this one is a one-way boolean.

> 
>         /*
>          * Avoid a race between memslot creation and enabling the ring +
>          * bitmap capability to guarantee that no memslots have been
>          * created without a bitmap.

You'll want to pick up Sean's suggestion on the comment which, again, I
drafted this in haste :-)

>          */
>         for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
>             if (!kvm_memslots_empty(__kvm_memslots(kvm, i))) {
>                 r = -EINVAL;
>                 goto out_unlock;
>             }
>         }

I'd much prefer you take Sean's suggestion and just create a helper to
test that all memslots are empty. You avoid the insanely long function
name and avoid the use of a goto statement. That is to say, leave the
rest of the implementation inline in kvm_vm_ioctl_enable_cap_generic()

static bool kvm_are_all_memslots_empty(struct kvm *kvm)
{
	int i;

	lockdep_assert_held(&kvm->slots_lock);

	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
		if (!kvm_memslots_empty(__kvm_memslots(kvm, i)))
			return false;
	}

	return true;
}

static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
					   struct kvm_enable_cap *cap)
{
	switch (cap->cap) {

[...]

	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP: {
		int r = -EINVAL;

		if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
		    !kvm->dirty_ring_size)
		    	return r;

		mutex_lock(&kvm->slots_lock);

		/*
		 * For simplicity, allow enabling ring+bitmap if and only if
		 * there are no memslots, e.g. to ensure all memslots allocate a
		 * bitmap after the capability is enabled.
		 */
		if (kvm_are_all_memslots_empty(kvm)) {
			kvm->dirty_ring_with_bitmap = true;
			r = 0;
		}

		mutex_unlock(&kvm->slots_lock);
		return r;
	}

Hmm?

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
  2022-11-08  1:13         ` Oliver Upton
@ 2022-11-08  3:30           ` Gavin Shan
  -1 siblings, 0 replies; 66+ messages in thread
From: Gavin Shan @ 2022-11-08  3:30 UTC (permalink / raw)
  To: Oliver Upton
  Cc: maz, kvm, bgardon, andrew.jones, dmatlack, will, shan.gavin,
	catalin.marinas, kvmarm, pbonzini, zhenyzha, shuah, kvmarm,
	ajones

Hi Oliver,

On 11/8/22 9:13 AM, Oliver Upton wrote:
> On Tue, Nov 08, 2022 at 08:44:52AM +0800, Gavin Shan wrote:
>> Frankly, I don't expect the capability to be disabled. Similar to KVM_CAP_DIRTY_LOG_RING
>> or KVM_CAP_DIRTY_LOG_RING_ACQ_REL, it would a one-shot capability and only enablement is
>> allowed. The disablement was suggested by Oliver without providing a clarify, even I dropped
>> it for several times. I would like to see if there is particular reason why Oliver want
>> to disable the capability.
>>
>>      kvm->dirty_ring_with_bitmap = cap->args[0];
>>
>> If Oliver agrees that the capability needn't to be disabled. The whole chunk of
>> code can be squeezed into kvm_vm_ioctl_enable_dirty_log_ring_with_bitmap() to
>> make kvm_vm_ioctl_enable_cap_generic() a bit clean, as I said above.
> 
> Sorry, I don't believe there is much use in disabling the cap, and
> really that hunk just came from lazily matching the neigbhoring caps
> when sketching out some suggestions. Oops!
> 

Ok. It doesn't really matter too much except the comments seems conflicting.
Thanks for confirming it's unnecessary to disable the capability.

>> Sean and Oliver, could you help to confirm if the changes look good to you? :)
>>
>>      static int kvm_vm_ioctl_enable_dirty_log_ring_with_bitmap(struct kvm *kvm)
> 
> This function name is ridiculously long...
> 

Yeah, It seems I tempted to make the function name as comments :)

>>      {
>>          int i, r = 0;
>>
>>          if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
>>              !kvm->dirty_ring_size)
>>              return -EINVAL;
>>
>>          mutex_lock(&kvm->slots_lock);
>>
>>          /* We only allow it to set once */
>>          if (kvm->dirty_ring_with_bitmap) {
>>              r = -EINVAL;
>>              goto out_unlock;
>>          }
> 
> I don't believe this check is strictly necessary. Something similar to
> this makes sense with caps that take a numeric value (like
> KVM_CAP_DIRTY_LOG_RING), but this one is a one-way boolean.
> 

Yep, it's not required strictly since it can represent two states.

>>
>>          /*
>>           * Avoid a race between memslot creation and enabling the ring +
>>           * bitmap capability to guarantee that no memslots have been
>>           * created without a bitmap.
> 
> You'll want to pick up Sean's suggestion on the comment which, again, I
> drafted this in haste :-)
> 

Ok, no worries :)

>>           */
>>          for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
>>              if (!kvm_memslots_empty(__kvm_memslots(kvm, i))) {
>>                  r = -EINVAL;
>>                  goto out_unlock;
>>              }
>>          }
> 
> I'd much prefer you take Sean's suggestion and just create a helper to
> test that all memslots are empty. You avoid the insanely long function
> name and avoid the use of a goto statement. That is to say, leave the
> rest of the implementation inline in kvm_vm_ioctl_enable_cap_generic()
> 
> static bool kvm_are_all_memslots_empty(struct kvm *kvm)
> {
> 	int i;
> 
> 	lockdep_assert_held(&kvm->slots_lock);
> 
> 	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
> 		if (!kvm_memslots_empty(__kvm_memslots(kvm, i)))
> 			return false;
> 	}
> 
> 	return true;
> }
> 
> static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
> 					   struct kvm_enable_cap *cap)
> {
> 	switch (cap->cap) {
> 
>       [...]
> 
> 	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP: {
> 		int r = -EINVAL;
> 
> 		if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
> 		    !kvm->dirty_ring_size)
> 		    	return r;
> 
> 		mutex_lock(&kvm->slots_lock);
> 
> 		/*
> 		 * For simplicity, allow enabling ring+bitmap if and only if
> 		 * there are no memslots, e.g. to ensure all memslots allocate a
> 		 * bitmap after the capability is enabled.
> 		 */
> 		if (kvm_are_all_memslots_empty(kvm)) {
> 			kvm->dirty_ring_with_bitmap = true;
> 			r = 0;
> 		}
> 
> 		mutex_unlock(&kvm->slots_lock);
> 		return r;
> 	}
> 
> }

Ok. Lets change the chunk as Sean suggested in v9, which should be posted soon.

Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap
@ 2022-11-08  3:30           ` Gavin Shan
  0 siblings, 0 replies; 66+ messages in thread
From: Gavin Shan @ 2022-11-08  3:30 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Sean Christopherson, kvmarm, kvmarm, kvm, shuah, catalin.marinas,
	andrew.jones, ajones, bgardon, dmatlack, will, suzuki.poulose,
	alexandru.elisei, pbonzini, maz, peterx, zhenyzha, shan.gavin

Hi Oliver,

On 11/8/22 9:13 AM, Oliver Upton wrote:
> On Tue, Nov 08, 2022 at 08:44:52AM +0800, Gavin Shan wrote:
>> Frankly, I don't expect the capability to be disabled. Similar to KVM_CAP_DIRTY_LOG_RING
>> or KVM_CAP_DIRTY_LOG_RING_ACQ_REL, it would a one-shot capability and only enablement is
>> allowed. The disablement was suggested by Oliver without providing a clarify, even I dropped
>> it for several times. I would like to see if there is particular reason why Oliver want
>> to disable the capability.
>>
>>      kvm->dirty_ring_with_bitmap = cap->args[0];
>>
>> If Oliver agrees that the capability needn't to be disabled. The whole chunk of
>> code can be squeezed into kvm_vm_ioctl_enable_dirty_log_ring_with_bitmap() to
>> make kvm_vm_ioctl_enable_cap_generic() a bit clean, as I said above.
> 
> Sorry, I don't believe there is much use in disabling the cap, and
> really that hunk just came from lazily matching the neigbhoring caps
> when sketching out some suggestions. Oops!
> 

Ok. It doesn't really matter too much except the comments seems conflicting.
Thanks for confirming it's unnecessary to disable the capability.

>> Sean and Oliver, could you help to confirm if the changes look good to you? :)
>>
>>      static int kvm_vm_ioctl_enable_dirty_log_ring_with_bitmap(struct kvm *kvm)
> 
> This function name is ridiculously long...
> 

Yeah, It seems I tempted to make the function name as comments :)

>>      {
>>          int i, r = 0;
>>
>>          if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
>>              !kvm->dirty_ring_size)
>>              return -EINVAL;
>>
>>          mutex_lock(&kvm->slots_lock);
>>
>>          /* We only allow it to set once */
>>          if (kvm->dirty_ring_with_bitmap) {
>>              r = -EINVAL;
>>              goto out_unlock;
>>          }
> 
> I don't believe this check is strictly necessary. Something similar to
> this makes sense with caps that take a numeric value (like
> KVM_CAP_DIRTY_LOG_RING), but this one is a one-way boolean.
> 

Yep, it's not required strictly since it can represent two states.

>>
>>          /*
>>           * Avoid a race between memslot creation and enabling the ring +
>>           * bitmap capability to guarantee that no memslots have been
>>           * created without a bitmap.
> 
> You'll want to pick up Sean's suggestion on the comment which, again, I
> drafted this in haste :-)
> 

Ok, no worries :)

>>           */
>>          for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
>>              if (!kvm_memslots_empty(__kvm_memslots(kvm, i))) {
>>                  r = -EINVAL;
>>                  goto out_unlock;
>>              }
>>          }
> 
> I'd much prefer you take Sean's suggestion and just create a helper to
> test that all memslots are empty. You avoid the insanely long function
> name and avoid the use of a goto statement. That is to say, leave the
> rest of the implementation inline in kvm_vm_ioctl_enable_cap_generic()
> 
> static bool kvm_are_all_memslots_empty(struct kvm *kvm)
> {
> 	int i;
> 
> 	lockdep_assert_held(&kvm->slots_lock);
> 
> 	for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
> 		if (!kvm_memslots_empty(__kvm_memslots(kvm, i)))
> 			return false;
> 	}
> 
> 	return true;
> }
> 
> static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
> 					   struct kvm_enable_cap *cap)
> {
> 	switch (cap->cap) {
> 
>       [...]
> 
> 	case KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP: {
> 		int r = -EINVAL;
> 
> 		if (!IS_ENABLED(CONFIG_HAVE_KVM_DIRTY_RING_WITH_BITMAP) ||
> 		    !kvm->dirty_ring_size)
> 		    	return r;
> 
> 		mutex_lock(&kvm->slots_lock);
> 
> 		/*
> 		 * For simplicity, allow enabling ring+bitmap if and only if
> 		 * there are no memslots, e.g. to ensure all memslots allocate a
> 		 * bitmap after the capability is enabled.
> 		 */
> 		if (kvm_are_all_memslots_empty(kvm)) {
> 			kvm->dirty_ring_with_bitmap = true;
> 			r = 0;
> 		}
> 
> 		mutex_unlock(&kvm->slots_lock);
> 		return r;
> 	}
> 
> }

Ok. Lets change the chunk as Sean suggested in v9, which should be posted soon.

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2022-11-08  3:31 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-04 23:40 [PATCH v8 0/7] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
2022-11-04 23:40 ` Gavin Shan
2022-11-04 23:40 ` [PATCH v8 1/7] KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL Gavin Shan
2022-11-04 23:40   ` Gavin Shan
2022-11-04 23:40 ` [PATCH v8 2/7] KVM: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h Gavin Shan
2022-11-04 23:40   ` Gavin Shan
2022-11-04 23:40 ` [PATCH v8 3/7] KVM: Support dirty ring in conjunction with bitmap Gavin Shan
2022-11-04 23:40   ` Gavin Shan
2022-11-06 15:43   ` Marc Zyngier
2022-11-06 15:43     ` Marc Zyngier
2022-11-06 16:22     ` Peter Xu
2022-11-06 16:22       ` Peter Xu
2022-11-06 20:12       ` Marc Zyngier
2022-11-06 20:12         ` Marc Zyngier
2022-11-06 21:06         ` Peter Xu
2022-11-06 21:06           ` Peter Xu
2022-11-06 21:23           ` Gavin Shan
2022-11-06 21:23             ` Gavin Shan
2022-11-07  9:38             ` Marc Zyngier
2022-11-07  9:38               ` Marc Zyngier
2022-11-07 14:29               ` Peter Xu
2022-11-07 14:29                 ` Peter Xu
2022-11-07  9:21           ` Marc Zyngier
2022-11-07  9:21             ` Marc Zyngier
2022-11-07 14:59             ` Peter Xu
2022-11-07 14:59               ` Peter Xu
2022-11-07 15:30               ` Marc Zyngier
2022-11-07 15:30                 ` Marc Zyngier
2022-11-06 21:40     ` Gavin Shan
2022-11-06 21:40       ` Gavin Shan
2022-11-07  9:45       ` Marc Zyngier
2022-11-07  9:45         ` Marc Zyngier
2022-11-07 10:45   ` Gavin Shan
2022-11-07 10:45     ` Gavin Shan
2022-11-07 11:33     ` Marc Zyngier
2022-11-07 11:33       ` Marc Zyngier
2022-11-07 23:53       ` Gavin Shan
2022-11-07 23:53         ` Gavin Shan
2022-11-07 16:05   ` Sean Christopherson
2022-11-07 16:05     ` Sean Christopherson
2022-11-08  0:44     ` Gavin Shan
2022-11-08  0:44       ` Gavin Shan
2022-11-08  1:13       ` Oliver Upton
2022-11-08  1:13         ` Oliver Upton
2022-11-08  3:30         ` Gavin Shan
2022-11-08  3:30           ` Gavin Shan
2022-11-04 23:40 ` [PATCH v8 4/7] KVM: arm64: Enable ring-based dirty memory tracking Gavin Shan
2022-11-04 23:40   ` Gavin Shan
2022-11-06 15:50   ` Marc Zyngier
2022-11-06 15:50     ` Marc Zyngier
2022-11-06 21:46     ` Gavin Shan
2022-11-06 21:46       ` Gavin Shan
2022-11-07  9:47       ` Marc Zyngier
2022-11-07  9:47         ` Marc Zyngier
2022-11-07 10:47         ` Gavin Shan
2022-11-07 10:47           ` Gavin Shan
2022-11-04 23:40 ` [PATCH v8 5/7] KVM: selftests: Use host page size to map ring buffer in dirty_log_test Gavin Shan
2022-11-04 23:40   ` Gavin Shan
2022-11-04 23:40 ` [PATCH v8 6/7] KVM: selftests: Clear dirty ring states between two modes " Gavin Shan
2022-11-04 23:40   ` Gavin Shan
2022-11-04 23:40 ` [PATCH v8 7/7] KVM: selftests: Automate choosing dirty ring size " Gavin Shan
2022-11-04 23:40   ` Gavin Shan
2022-11-06 16:08 ` [PATCH v8 0/7] KVM: arm64: Enable ring-based dirty memory tracking Marc Zyngier
2022-11-06 16:08   ` Marc Zyngier
2022-11-06 21:50   ` Gavin Shan
2022-11-06 21:50     ` Gavin Shan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.